Doing OCR Batch Processing Using The ScanSnap And ABBYY FineReader

Sometimes, when you have to scan a large number of documents at once, the step of doing OCR (making the PDF searchable) after each document can really slow things down. It may be preferable to scan them all in and then OCR them all in one big shot.

In the past I have posted about how to do batch OCR using Adobe Acrobat and have posted an Acrobat Applescript.

Over at the Optimality! blog, Tobi has posted a walkthrough of using ABBYY Finereader, which comes with the ScanSnap S1500M (and S1500 for that matter) to do batch OCR.

The problem is that in the default setup, each scan is OCRed right after the scan and depending on the age your machine (my G5 is getting a little long in the tooth) in can take quite a while. When you’re in the process of scanning many hundred’s of pages of paper documents, you don’t want to have to wait for the computer to do it’s OCR recognition, you’d rather feed it all the documents and let it do OCR while you’re doing something else.

Fortunately, this is possible. Reading all the way through the handbook as well as through the ABBYY online help I found out that you can scan to PDF only, and then afterwards convert the PDFs with ABBYY FineReader.

Check out the post here. Do you have any other tricks for doing batch OCR?



Need Some Help Going Paperless?

How about three ways to help unclutter and de-stress by turning piles of paper into an organized electronic system?

First Name:
Email:*

Related posts:

  1. ABBYY FineReader For ScanSnap Update For Snow Leopard OSX 10.6 Now Available
  2. FineReader Mac Update For ScanSnap Works For Older ScanSnaps Too
  3. Fujitsu ScanSnap Update For Mac OSX Snow Leopard Now Available
  4. Fujitsu Releases Cross-Platform ScanSnap S1300 .. Yeah!

Tags: , , , , ,

11 Responses to “Doing OCR Batch Processing Using The ScanSnap And ABBYY FineReader”

  1. Ron C 05. Jan, 2010 at 3:15 pm #

    We (that’s really my wife – I’m just IT support) use a ScanSnap along with DevonThink Pro and can do concurrent scanning and OCR-ing.

    DTP uses ABBYY FineReader to do it’s work, but the entire OCR process is under the control of DTP and not the Fujitsu driver.

    I can’t remember the exact configuration BUT it was pretty much straight out of the DTP playbook.

    Ron

  2. Michael F 06. Jan, 2010 at 7:53 pm #

    The easiest thing is just to scan everything to plain PDF, then run Finereader and drag a bunch of PDFs to its dock icon. As long as they were created in Scansnap, it should OCR them one after another, and save them as something like “Original_File_Name 1 processed by FineReader.pdf”. There is some limit to the number of files you can do at once, but it’s a fairly high one.

    • Michael F 07. Jan, 2010 at 3:54 am #

      Whoops, now that I read the original post I see that is exactly what the linked article says. Duh.

  3. Leo 09. Feb, 2010 at 1:55 pm #

    Does FineReader come with the ScanSnap? I have the S510M and don't see FineReader on my Mac. Thanks!

  4. sarah 13. Apr, 2010 at 10:26 pm #

    Which is better? Dragging the PDFs to FIneReader or Acrobat? My Scansnap came with Acrobat 8. Also, do I need to keep both versions of the PDF? the original and the one that was processed by FineReader? If not, whats a simple way to delete the original?

    • BrooksD 13. Apr, 2010 at 10:35 pm #

      Hi Sarah, which ScanSnap do you have?

  5. Sarah 13. Apr, 2010 at 2:38 pm #

    S1500m

    • BrooksD 13. Apr, 2010 at 10:58 pm #

      Hmm, normally you wouldn't need to drag to anything because the ScanSnap software should OCR it for you. Are you wanting to drag things to Acrobat or FineReader because OCRing every document takes too long so you want to do it all in one shot after scanning?

  6. sarah 13. Apr, 2010 at 11:09 pm #

    When I scanned everything with my scansnap (My whole file cabinet) I created regular PDFs because scanning with the Scansnap manager wouldnt queue the OCR and it would take forever to scan one document since I had to wait for the first document to finish OCRing before I could scan the second document

    • BrooksD 13. Apr, 2010 at 11:31 pm #

      Ahh I see what you mean. I think if I were you I'd try the ABBYY instructions in the linked article since you have it installed automatically with ScanSnap already (or should) and then you don't need to mess around with AppleScript etc.

      Maybe what you can do is try say 5 PDFs using the methods in the linked article, and then try say 5 PDFs using Acrobat with these instructionshttp://www.documentsnap.com/use-acrobat-batch-pro... and see which way gives you smaller files and better quality?

  7. sarah 14. Apr, 2010 at 10:29 pm #

    I have run about 100 documents through finereader but I am not sure what I should be looking for.The PDFs look the same except for the file name addition "processed by…"
    How do I know if it did a good job? Also, can I get rid of the old PDFs (the ones without the "processed by…" on the end)?

    I would like to have all of thesse PDFS OCRd before I put them into Devonthink pro office so that if I ever decide to use a different app, spotlight will still be able to search them.Thanks

Leave a Reply