Sometimes, when you have to scan a large number of documents at once, the step of doing OCR (making the PDF searchable) after each document can really slow things down. It may be preferable to scan them all in and then OCR them all in one big shot.
In the past I have posted about how to do batch OCR using Adobe Acrobat and have posted an Acrobat Applescript.
Over at the Optimality! blog, Tobi has posted a walkthrough of using ABBYY Finereader, which comes with the ScanSnap S1500M (and S1500 for that matter) to do batch OCR.
The problem is that in the default setup, each scan is OCRed right after the scan and depending on the age your machine (my G5 is getting a little long in the tooth) in can take quite a while. When you’re in the process of scanning many hundred’s of pages of paper documents, you don’t want to have to wait for the computer to do it’s OCR recognition, you’d rather feed it all the documents and let it do OCR while you’re doing something else.
Fortunately, this is possible. Reading all the way through the handbook as well as through the ABBYY online help I found out that you can scan to PDF only, and then afterwards convert the PDFs with ABBYY FineReader.
Check out the post here. Do you have any other tricks for doing batch OCR?
Related posts:
- Use Acrobat Batch Processing To OCR Your PDFs Easily
- Abbyy Finereader and Adobe Acrobat – Why Does Fujitsu Include Both?
- ABBYY Finereader And Snow Leopard – File Not Created With ScanSnap
- ABBYY FineReader For ScanSnap Update For Snow Leopard OSX 10.6 Now Available
- Found: ScanSnap Applescript to remove ABBYY FineReader file














We (that’s really my wife – I’m just IT support) use a ScanSnap along with DevonThink Pro and can do concurrent scanning and OCR-ing.
DTP uses ABBYY FineReader to do it’s work, but the entire OCR process is under the control of DTP and not the Fujitsu driver.
I can’t remember the exact configuration BUT it was pretty much straight out of the DTP playbook.
Ron
The easiest thing is just to scan everything to plain PDF, then run Finereader and drag a bunch of PDFs to its dock icon. As long as they were created in Scansnap, it should OCR them one after another, and save them as something like “Original_File_Name 1 processed by FineReader.pdf”. There is some limit to the number of files you can do at once, but it’s a fairly high one.
Whoops, now that I read the original post I see that is exactly what the linked article says. Duh.
Does FineReader come with the ScanSnap? I have the S510M and don't see FineReader on my Mac. Thanks!
Which is better? Dragging the PDFs to FIneReader or Acrobat? My Scansnap came with Acrobat 8. Also, do I need to keep both versions of the PDF? the original and the one that was processed by FineReader? If not, whats a simple way to delete the original?
Hi Sarah, which ScanSnap do you have?
S1500m
Hmm, normally you wouldn't need to drag to anything because the ScanSnap software should OCR it for you. Are you wanting to drag things to Acrobat or FineReader because OCRing every document takes too long so you want to do it all in one shot after scanning?
When I scanned everything with my scansnap (My whole file cabinet) I created regular PDFs because scanning with the Scansnap manager wouldnt queue the OCR and it would take forever to scan one document since I had to wait for the first document to finish OCRing before I could scan the second document
Ahh I see what you mean. I think if I were you I'd try the ABBYY instructions in the linked article since you have it installed automatically with ScanSnap already (or should) and then you don't need to mess around with AppleScript etc.
Maybe what you can do is try say 5 PDFs using the methods in the linked article, and then try say 5 PDFs using Acrobat with these instructionshttp://www.documentsnap.com/use-acrobat-batch-pro… and see which way gives you smaller files and better quality?
I have run about 100 documents through finereader but I am not sure what I should be looking for.The PDFs look the same except for the file name addition "processed by…"
How do I know if it did a good job? Also, can I get rid of the old PDFs (the ones without the "processed by…" on the end)?
I would like to have all of thesse PDFS OCRd before I put them into Devonthink pro office so that if I ever decide to use a different app, spotlight will still be able to search them.Thanks
There is a solution to this problem that works well for me…
In the ScanSnap manager, deselect the "Quick Menu" option, and create a new profile (myProfile or something).
Under Application select "Scan to Searchable PDF"
Under File Option, deselect "Convert to Searchable PDF"
Set your other options as you would like, and select "Apply"
In your Mac spotlight search, enter "FinerReader", then select the "FineReader for ScanSnap Preferences" application icon in the result set
On the general tab of the preferences panel, deselect "Open file after recognition"
Then select the "Delete scanned images after recognition" on the same tab
Close the preferences panel.
Now when you scan (using the profile you created earlier), the document will be sent off to FineReader for OCR which will allow the ScanSnap to continue scanning. The documents will queue as they are scanned and be processed in order by FineReader.
Hope this helps,
Cheers!
This is awesome, thanks Telegard. Great tip!
The option, "Convert to Searchable PDF" is not present under the File Option tab in 2.2 L14 for my 510M.
Someone correct me if I am wrong, but I believe on the S510M you need to use Adobe Acrobat to OCR the document. I don't think Abbyy FineReader is built in to ScanSnap Manager the way it is with later ScanSnaps.
anyone here who knows someone/company looking for homebased worker to do the ocr process