Doing OCR Batch Processing Using The ScanSnap And ABBYY FineReader
January 5, 2010
Sometimes, when you have to scan a large number of documents at once, the step of doing OCR (making the PDF searchable) after each document can really slow things down. It may be preferable to scan them all in and then OCR them all in one big shot.
In the past I have posted about how to do batch OCR using Adobe Acrobat and have posted an Acrobat Applescript.
Over at the Optimality! blog, Tobi has posted a walkthrough of using ABBYY Finereader, which comes with the ScanSnap S1500M (and S1500 for that matter) to do batch OCR.
The problem is that in the default setup, each scan is OCRed right after the scan and depending on the age your machine (my G5 is getting a little long in the tooth) in can take quite a while. When you’re in the process of scanning many hundred’s of pages of paper documents, you don’t want to have to wait for the computer to do it’s OCR recognition, you’d rather feed it all the documents and let it do OCR while you’re doing something else.
Fortunately, this is possible. Reading all the way through the handbook as well as through the ABBYY online help I found out that you can scan to PDF only, and then afterwards convert the PDFs with ABBYY FineReader.
Check out the post here. Do you have any other tricks for doing batch OCR?
Related posts:
- Abbyy Finereader and Adobe Acrobat – Why Does Fujitsu Include Both?
- ABBYY Finereader And Snow Leopard – File Not Created With ScanSnap
- Use Acrobat Batch Processing To OCR Your PDFs Easily
- ABBYY FineReader For ScanSnap Update For Snow Leopard OSX 10.6 Now Available
- Fujitsu ScanSnap Update For Mac OSX Snow Leopard Now Available
Comments
4 Responses to “Doing OCR Batch Processing Using The ScanSnap And ABBYY FineReader”
Got something to say?


We (that’s really my wife – I’m just IT support) use a ScanSnap along with DevonThink Pro and can do concurrent scanning and OCR-ing.
DTP uses ABBYY FineReader to do it’s work, but the entire OCR process is under the control of DTP and not the Fujitsu driver.
I can’t remember the exact configuration BUT it was pretty much straight out of the DTP playbook.
Ron
The easiest thing is just to scan everything to plain PDF, then run Finereader and drag a bunch of PDFs to its dock icon. As long as they were created in Scansnap, it should OCR them one after another, and save them as something like “Original_File_Name 1 processed by FineReader.pdf”. There is some limit to the number of files you can do at once, but it’s a fairly high one.
Whoops, now that I read the original post I see that is exactly what the linked article says. Duh.
Does FineReader come with the ScanSnap? I have the S510M and don't see FineReader on my Mac. Thanks!