My ScanSnap Setup And Workflow – Post Scan Processing

My ScanSnap Setup And Workflow – Post Scan Processing

Update: This post is now slightly out of date as I now use the ScanSnap S1300. You may want to sign up for my free 7 part e-Course while will more comprehensively take you through the steps to go paperless.
This is Part 2 of the My ScanSnap Setup And Workflow series. Make sure to check out Part 1ScanSnap Settings.

Now that we have set up ScanSnap Manager with my four profiles, here is what I do with the files.

At first I started using DevonThink Pro Office, but I found that it was a little overkill for my needs. If I had a huge amount of documents that I needed regular access to it would be perfect, but for my home needs I wanted to go with something a little more lightweight.

One main drawback (maybe the only one) is that my ScanSnap S300M did not come with any OCR software in the box. I could download a form to have ReadIris Pro 11 mailed to me, but that didn’t help me at first.

Luckily, I had Adobe Acrobat Professional already, so I decided to use that for my ScanSnap workflow. If you have the ScanSnap S1500 or S1500M, your scanner will come with Acrobat already.

Really Boring, Really Fast

Excited to start OCR’ing up a storm, I set ScanSnap Manager to output to Acrobat and away I went.

It worked quite well. I would hit the button, it would open the resulting file in Acrobat, and then I would go Document | OCR Text Recognition | Recognize Text Using OCR and follow the resulting menus.

I think this would be OK normally, but since I had a ton of things to scan from my file cabinet, this got really boring, really fast to have to sit there and manually OCR every document over and over again. I knew there had to be a better way.

Applescript To The Rescue

I am a complete AppleScript newbie, but I found this great post from Macworld where the author made an AppleScript Folder Action that would watch a certain folder, and when a document got put in it, it would kick off Acrobat (or ReadIris Pro) and OCR it automatically.

I set ScanSnap Manager to save to a folder called ToProcess and gave that folder a Folder Action to run the MacWorld script.

This worked quite well, and would possibly work OK on an ongoing basis, but again I ran into problems when doing my massive scan-a-thon – if i dropped a document in to the folder while the other Acrobat session was still OCR’ing, it would give error messages.

Droplets Are Fun

The solution I came up with was to change the script so that it became a droplet. To do this I ripped off part of the script referenced in this thread.

A droplet is just an Application that you save somewhere (I have it on my Dock). You run it by dragging a file onto its icon.

Here is the script that I cobbled together . Feel free to download and use as you please.

OCRit – Droplet to kick batch OCR PDFs using Adobe Acrobat

Final Workflow

So now, I have the following workflow:

  • Scan document using the ScanSnap, ScanSnap Manager saves the file in the ToProcess folder
  • When I am done my batch, I drag the PDF files onto the OCRIt icon, which kicks of Adobe Acrobat Professional and tells it to recognize the text in the document
  • When that is done, process/move the files as needed

It is working quite well for you, but I guess if I wanted to avoid all this I could have just stuck with DevonThink as it has built in OCR. What is your workflow? How do you handle the Optical Character Recognition part?

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply 8 comments

Lee - August 17, 2009 Reply

My thoughts on solving this problem:

FineReader for ScanSnap isn't nearly as suitable as Acrobat. 1. If your document has writing in two different directions, FineReader will get confused and do an extremely poor OCR job. 2. If your document is skewed, again the OCR isn't as accurate. Neither of these is as much of a problem with Acrobat.

If you select FineReader as the application to run on the scanned document, and you feed a 2nd document while the first is still being OCR'ed, you will get an error on the 2nd document and no OCR. This is true even if you open a new copy of FineReader for each new document.

Neither Acrobat nor FineReader can be configured to do well at scanning a file cabinet of documents and automatically performing OCR without having to manually batch the jobs after you're done scanning. I believe this is intentional on the part of both OCR vendors. Both vendors sell a higher end corporate version that is heavy duty. Users have been clamoring for command line control of Acrobat for years, but I think Adobe is afraid of cannibalizing their more expensive products. If Adobe gave us the ability to OCR a pdf from the command line, our problems would be solved.

My goal is to fully automate OCRing of documents as they are scanned regardless of many or how fast they are scanned. The closest I've come is a set of bash and applescripts. The bash scripts maintain a queue of documents that need to be scanned. They load one document at a time into Acrobat and call an applescript that "tells" Acrobat to OCR. The conversation between the applescript and Acrobat occasionally fails.

I think there are two directions I'll pursue now. Omnipage isn't crippled in this regard and I believe they support monitoring a folder for new pdfs that are saved by your scanner. So they've done the work. I also believe they have one of the best OCR engines. the other direction is a plugin for Acrobat that gives you command line access to batching. This costs as much as Omnipage Pro, so I'm not favoring that solution now.

If someone else has managed to fully automate this so you can scan multiple documents just by pushing the ScanSnap button and end up with an OCR'ed pdf without need to interact with the computer, please get in touch with me. I'm lee@at@salk-dot-edu
-dot-edu

Juan - June 25, 2009 Reply

Or you can use Evernote 🙂

Brooks Duncan - June 3, 2009 Reply

Hi Cheng, I've been trying to track down the ScanSnap Manager, but unfortunately haven't been able to. I've found this: http://www.fujitsu.com/global/support/computing/p… but it only seems to be for V2 and not the V3.0. I'll keep trying though!

cheng - May 30, 2009 Reply

Hi
I had my S1500 when I studied in Japan and worked without problem on Japanese windows system.
But after I came back to Taiwan, I can't install the Japanese driver on my traditional Chinese windows system, do you know if I can get any English driver or anyway I can solve this problem? Thanks a a lot!

Brooks Duncan - April 10, 2009 Reply

Hi Mike,

You're not the first person to ask that question recently, so I am going to have a post done on that topic very soon. My initial impressions are no, if the built-in OCR works fine for you then don't worry about it. Keep your eyes open for the post though.

Mike - April 10, 2009 Reply

I just purchased the S1500M which came with Acrobat 8 Pro and ABBYY for OCRing and the ScanManager software has an option to OCR when scanning. Is Acrobat that much better than ABBYY or is the advantage just doing the OCRing in a batch after scanning?

Thanks for all the tips, they're great!

Brooks Duncan - December 17, 2008 Reply

Hi Sandy,

Can you check a few things? Go to System Preferences | Universal Access, and see if “Enable Access for Assistive Devices” is checked. If not, check it.

This is sort of similar, but run /Applications/AppleScript/AppleScript Utility and make sure GUI Scripting is checked.

Give those a try and let me know how it goes.

Sandy Pope - December 17, 2008 Reply

I’m looking forward to getting the same kind of flow going, but I can’t get any scripts or droplets to work. When I use your droplet, I get the error message: “Sorry, an error occured: NSReceiverEvaluaitonScriptError: 4(1)” and a “Never mind” button.
I’m using a ScanSnap S510M with Adobe Professional 8 on a PowerPC iMac 10.4.11.
Any suggestions?
ps–the Macworld article’s script doesn’t work for me, either.

Leave a Reply: