Doing OCR Batch Processing Using The ScanSnap And ABBYY FineReader

Doing OCR Batch Processing Using The ScanSnap And ABBYY FineReader

Sometimes, when you have to scan a large number of documents at once, the step of doing OCR (making the PDF searchable) after each document can really slow things down. It may be preferable to scan them all in and then OCR them all in one big shot.

In the past I have posted about how to do batch OCR using Adobe Acrobat and have posted an Acrobat Applescript.

Over at the Optimality! blog, Tobi has posted a walkthrough of using ABBYY Finereader, which comes with the ScanSnap S1500M (and S1500 for that matter) to do batch OCR.

The problem is that in the default setup, each scan is OCRed right after the scan and depending on the age your machine (my G5 is getting a little long in the tooth) in can take quite a while. When you’re in the process of scanning many hundred’s of pages of paper documents, you don’t want to have to wait for the computer to do it’s OCR recognition, you’d rather feed it all the documents and let it do OCR while you’re doing something else.

Fortunately, this is possible. Reading all the way through the handbook as well as through the ABBYY online help I found out that you can scan to PDF only, and then afterwards convert the PDFs with ABBYY FineReader.

Check out the post here. Do you have any other tricks for doing batch OCR?

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply 16 comments

Gerald - June 9, 2011 Reply

anyone here who knows someone/company looking for homebased worker to do the ocr process

Telegard - February 25, 2011 Reply

There is a solution to this problem that works well for me…

In the ScanSnap manager, deselect the "Quick Menu" option, and create a new profile (myProfile or something).
Under Application select "Scan to Searchable PDF"
Under File Option, deselect "Convert to Searchable PDF"
Set your other options as you would like, and select "Apply"

In your Mac spotlight search, enter "FinerReader", then select the "FineReader for ScanSnap Preferences" application icon in the result set
On the general tab of the preferences panel, deselect "Open file after recognition"
Then select the "Delete scanned images after recognition" on the same tab
Close the preferences panel.

Now when you scan (using the profile you created earlier), the document will be sent off to FineReader for OCR which will allow the ScanSnap to continue scanning. The documents will queue as they are scanned and be processed in order by FineReader.

Hope this helps,
Cheers!

    Brooks Duncan
    Brooks Duncan - February 25, 2011 Reply

    This is awesome, thanks Telegard. Great tip!

    jluros - March 14, 2011 Reply

    The option, "Convert to Searchable PDF" is not present under the File Option tab in 2.2 L14 for my 510M.

      Brooks Duncan
      Brooks Duncan - March 15, 2011 Reply

      Someone correct me if I am wrong, but I believe on the S510M you need to use Adobe Acrobat to OCR the document. I don't think Abbyy FineReader is built in to ScanSnap Manager the way it is with later ScanSnaps.

sarah - April 14, 2010 Reply

I have run about 100 documents through finereader but I am not sure what I should be looking for.The PDFs look the same except for the file name addition "processed by…"
How do I know if it did a good job? Also, can I get rid of the old PDFs (the ones without the "processed by…" on the end)?

I would like to have all of thesse PDFS OCRd before I put them into Devonthink pro office so that if I ever decide to use a different app, spotlight will still be able to search them.Thanks

sarah - April 13, 2010 Reply

When I scanned everything with my scansnap (My whole file cabinet) I created regular PDFs because scanning with the Scansnap manager wouldnt queue the OCR and it would take forever to scan one document since I had to wait for the first document to finish OCRing before I could scan the second document

    Brooks Duncan
    Brooks Duncan - April 13, 2010 Reply

    Ahh I see what you mean. I think if I were you I'd try the ABBYY instructions in the linked article since you have it installed automatically with ScanSnap already (or should) and then you don't need to mess around with AppleScript etc.

    Maybe what you can do is try say 5 PDFs using the methods in the linked article, and then try say 5 PDFs using Acrobat with these instructionshttp://www.documentsnap.com/use-acrobat-batch-pro… and see which way gives you smaller files and better quality?

Sarah - April 13, 2010 Reply

S1500m

    Brooks Duncan
    Brooks Duncan - April 13, 2010 Reply

    Hmm, normally you wouldn't need to drag to anything because the ScanSnap software should OCR it for you. Are you wanting to drag things to Acrobat or FineReader because OCRing every document takes too long so you want to do it all in one shot after scanning?

sarah - April 13, 2010 Reply

Which is better? Dragging the PDFs to FIneReader or Acrobat? My Scansnap came with Acrobat 8. Also, do I need to keep both versions of the PDF? the original and the one that was processed by FineReader? If not, whats a simple way to delete the original?

Leo - February 9, 2010 Reply

Does FineReader come with the ScanSnap? I have the S510M and don't see FineReader on my Mac. Thanks!

Michael F - January 6, 2010 Reply

The easiest thing is just to scan everything to plain PDF, then run Finereader and drag a bunch of PDFs to its dock icon. As long as they were created in Scansnap, it should OCR them one after another, and save them as something like “Original_File_Name 1 processed by FineReader.pdf”. There is some limit to the number of files you can do at once, but it’s a fairly high one.

    Michael F - January 7, 2010 Reply

    Whoops, now that I read the original post I see that is exactly what the linked article says. Duh.

Ron C - January 5, 2010 Reply

We (that’s really my wife – I’m just IT support) use a ScanSnap along with DevonThink Pro and can do concurrent scanning and OCR-ing.

DTP uses ABBYY FineReader to do it’s work, but the entire OCR process is under the control of DTP and not the Fujitsu driver.

I can’t remember the exact configuration BUT it was pretty much straight out of the DTP playbook.

Ron

Leave a Reply: