Acrobat Applescript For ScanSnap OCR

This was referenced in my ScanSnap workflow series, but I thought I would provide it in its own article as well.

I have a ScanSnap S300M and Adobe Acrobat, and was getting pretty tired of sitting there OCRing the PDFs manually in Acrobat.

I came across this article by MacWorld which had a great Applescript Folder Action that would kick off Acrobat’s OCR whenever a PDF was dropped into the folder.

It worked well but I found that then I had to sit there and watch the OCR go after each document, and it seemed to have problems if I scanned another file while the OCR was still going.

I wanted a solution where I could just throw a bunch of PDFs at Acrobat and walk away.

Thanks to this thread on MacScripter, I turned the Macworld script into a droplet. Now I just go through and scan a bunch of PDFs to a folder, then drag the files onto the droplet, and go to bed. Acrobat OCRs each one one by one.

Here is the script. You can download it for free, but make sure you go to the Macworld article because it is 90% his work.

OCRIt-Acrobat – Droplet to batch OCR PDFs in Adobe Acrobat

To use it:

  • Download and uncompress the file and save it to your Desktop, Dock or wherever
  • Drag one or more PDFs onto the icon
  • Enjoy

Update: User nodis in the comments pointed out a great optimization that makes significantly smaller PDFs. Thanks!

Here is the source code:

property mytitle : "ocrIt-Acrobat" -- Modified from a script created by Macworld http://www.macworld.com/article/60229/2007/10/nov07geekfactor.html

-- I am called when the user open the script with a double click
on run
tell me
activate
display dialog "I am an AppleScript droplet." & return & return & "Please drop a bunch of PDF files onto my icon to batch OCR them." buttons {"OK"} default button 1 with title mytitle with icon note
end tell
end run

-- I am called when the user drops Finder items onto the script icon
-- Timeout of 36000 seconds to allow for OCRing really big documents
on open droppeditems
with timeout of 36000 seconds
try
repeat with droppeditem in droppeditems
set the item_info to info for droppeditem
tell application "Adobe Acrobat Professional"
activate
open droppeditem
end tell
tell application "System Events"

tell application process “Acrobat”

click the menu item “Recognize Text Using OCR…” of menu 1 of menu item “OCR Text Recognition” of the menu “Document” of menu bar 1
try
click radio button “All pages” of group 1 of group 2 of group 1 of window “Recognize Text”
end try
click button “OK” of window “Recognize Text”

end tell

end tell
tell application “Adobe Acrobat Professional”
save the front document with linearize
close the front document
end tell
end repeat
– catching unexpected errors
on error errmsg number errnum
my dsperrmsg(errmsg, errnum)
end try
end timeout
end open

-- I am displaying error messages
on dsperrmsg(errmsg, errnum)
tell me
activate
display dialog "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")" buttons {"Never mind"} default button 1 with icon stop with title mytitle
end tell
end dsperrmsg

Update 2: If you use Acrobat X, please see this post about OCR AppleScript for Acrobat X.


Related posts:

  1. My ScanSnap Setup And Workflow – Post Scan Processing
  2. My ScanSnap Setup And Workflow – ScanSnap Settings

Tags: , , ,

46 Responses to “Acrobat Applescript For ScanSnap OCR”

  1. Dave November 28, 2008 at 6:58 pm #

    Man, I love you!!!!!!! I can’t tell you how long I’ve been looking for something like this. It’s just perfect!! THANK YOU! THANK YOU! THANK YOU!!!!!!!!!

  2. Brooks November 28, 2008 at 10:34 pm #

    Ha, no worries at all Dave. I mostly just built on what other people had done, but glad I could help!

  3. Barry March 26, 2009 at 2:48 am #

    I get an error message when I drag a file into the Droplet. It says "Sorry an error occurred. Access for assistive devices is disabled (-1719)" How do I get this droplet to automatically OCR my files? Thanks for your help!

  4. Barry March 26, 2009 at 3:21 am #

    Ok…I figured it out. I had to adjust the settings on Universal Access in my Mac. Now, how do I do multiple files at once???

  5. BrooksD March 26, 2009 at 3:43 am #

    Hi Barry, you got it- universal access. Good job!

    For multiple files, assuming you've saved the droplet somewhere, all you need to do is drag a bunch of files onto the icon. It'll then OCR them one by one.

    Let me know how it goes!

  6. barbara June 7, 2009 at 4:05 pm #

    you rock the house. thank you so much for this. This saved my life.

  7. BrooksD June 7, 2009 at 4:07 pm #

    Hah thanks! Glad to hear it helped out.

  8. kenny July 9, 2009 at 10:57 pm #

    hi. anybody else know why the size of the document increases by as many as 6x when you run a PDF through this script. it works great – but the size increase is substantial. unless i am wrong, only text should be added plus some location information, and therefore size should not go up by as much.

    is it possible to have a version of this script where the size of the document does not super expand!?

    thanks.

  9. antony July 16, 2009 at 8:39 am #

    Hi,
    Thanks for the nice script.
    I have a problem withe the saving command.
    How can i save this file as a txt.
    thanks in advance.
    regards
    Antony

    • BrooksD July 16, 2009 at 1:06 pm #

      @antony
      You can just highlight the code in the post above, Copy, and then Paste it into your text editor of choice. Then just File Save As to a .txt extension if that is what you want.

  10. pendolino July 22, 2009 at 10:20 am #

    this looks great but it does not work for acrobat 7.0 standard. i tried to modify the script but it kept throwing up errors. can you please attach the script file as text? i suspect the copy paste from the site may be messing up the formatting although i cant be sure.

  11. Rodger August 8, 2009 at 2:12 am #

    Thanks for writing this applet and making it available, I am looking forward to using it.
    I am having trouble using Adobe 7.0 professional. Dragging a pdf file on the droplet opens the file in acrobat but an error window quickly appears saying

    "System Events got an error:Can't get menu item "OCR Text Recognition" of menu "Document" of menu bar 1 of application process "Acrobat". (-1728)

    Could you please tell me what this means and perhaps offer some advice on how I might fix it. Would using Acrobat 9 solve my problem? I have never written/edited and applescript but I'd be glad to have a go if necessary.
    Thanks

  12. Phil Boardman September 14, 2009 at 12:25 pm #

    Hacked script to work with Acrobat 7.
    http://phil.boardman.id.au/journal/489/

    • BrooksD September 14, 2009 at 1:08 pm #

      Nice, thank you so much Phil!

      • Justine September 8, 2010 at 8:29 pm #

        The link to the hacked script to work with Acrobat 7 no longer works….help!

  13. nodis February 12, 2010 at 5:19 pm #

    Great tip, but the script needs one modification. The line in the AppleScript where the document is saved should be changed to:

    save the front document with linearize

    This is the equivalent in scriptese as saving as "optimized" (or a "Save as…"), which results in a lot of Cruft being thrown away and the saved PDF optimized for progressive Web download. When I run the script on test OCRd PDFs, both with and without this change to the AppleScript, I get ~95% smaller PDFs with this change (note that this is also with Acrobat's new ClearScan OCR method selected — the new default).

    • msim February 12, 2010 at 5:44 pm #

      hi @nodis: can you point to exactly which line should be changed.
      i have not been using this script essentially because the PDFs become bloated.
      and i am unable to understand why addition of scanned text should lead to any bloating! perhaps you have found the answer!

      • nodis February 12, 2010 at 8:25 pm #

        Msim,

        The line that needs to be changed is towards the end:

        save the front document

        One simply adds "with linearize" at the end. As you will see below, our intrepid host already plans to make this change.

        On Adobe's Web site, linearizing a PDF is described as follows:

        "A linearized PDF document is organized to enable incremental access in a network environment. For example, a linearized PDF document can be displayed in a web browser before the entire PDF document is downloaded."

        Old versions of Acrobat (like version 3) described this same thing as "Optimizing."

        For whatever reason, "linearizing" the save from within the AppleScript causes a bunch of redundant data in the OCR-d PDF to be thrown away, resulting in a much smaller file.

        • Michael March 21, 2010 at 10:00 pm #

          I am trying this with Acrobat Pro 7. Getting that error message mentioned by Rodger. "System Events got an error:Can't get menu item "OCR Text Recognition" of menu "Document" of menu bar 1 of application process "Acrobat". (-1728)
          Also in the batch processing of my software it is changing the document to RTF file. Don't see option to keep it as .pdf. Please let me know what you know about either of these issues. Thank you.

        • Michael March 28, 2010 at 2:40 am #

          Are these files all supposed to be getting smaller than original in Acrobat 9 with Clearscan? f so, what settings? Even with clearscan and 72 dpi downscale? I am still getting larger (50% more than original). Although not the bloated sizes before that were 4x. What am I missing here?

  14. nodis February 12, 2010 at 8:01 pm #

    Not a big deal. I must say that the difference in size and quality between OCR-ing PDFs using the latest ABBYY FinePrint for ScanSnap and Acro 9/ClearScan OCR downsampling at 600 dpi/and your droplet is amazing.

    I just scanned a 25 page B&W Word document. The original image-only PDF is 6.2 MB. ABBYY produces an OCR-ed PDF that weighs in at 10.5 MB choosing "High" quality; it is 1.8 MB at Medium quality (and looks like crap visually at Medium quality, I should add — lots of compression artifacts).

    By comparison, using your Applescript droplet, with the edit I suggested, along with Acrobat 9 and the "ClearScan" and 600 dpi OCR options, my OCR-d PDF comes out at 356 KB. This version looks superb, and the OCR quality seems fine.

    I don't in any way mean to knock ABBYY — but the size/quality ratio of Acro 9+ClearScan 600dpi+your AppleAScript is just absurdly good. I'm frankly surprised not to see Adobe market this feature more — or Fujitsu, their OEM scanner partner.

    • Chris November 5, 2010 at 6:42 am #

      I've got the script working with my ScanSnap 1300M. However, my OCR'd PDFs are much larger than yours – even though I'm starting with a lower resolution scan (at least I think I am; I have "Auto" selected in the file resolution drop-down menu in ScanSnap Manager). For example, a 2-page (front and back) document ended up being 2.5 MB. Any ideas?

    • Chris November 5, 2010 at 11:11 am #

      As a matter of fact, I just noticed that the output after OCR in Acrobat 9 is actually 3 TIMES LARGER than it was before the OCR operation. What the heck?

      • BrooksD November 8, 2010 at 4:32 pm #

        Hi Chris, do you have ClearScan selected in Acrobat? (Sorry I can't send you a screeenshot- I don't have Acrobat 9).

  15. BrooksD February 16, 2010 at 3:43 pm #

    OK everyone, I have updated the post and posted the updated version at http://www.documentsnap.com/files/OCRIt-Acrobat-1… with nodis' changes. Thanks again!!!

  16. sims February 17, 2010 at 9:41 am #

    The linked app requires me to install Rosetta. Since I do not need it for anything else, I have avoided it. Is there a way to save the script in AppleScript editor and use it without Rosetta?

    I tried to copy and paste it into AppleScriptEditor. Upon saving the script it asked me to help it locate Acrobat, which I did by point it to "Adobe Acrobat Pro.app". But then it failed and pointed me to this line in the script – tell application process “Acrobat”.

    The cursor is at the " before the ACrobat in the line above and I receive a Syntax Error – Expected expression, property or key form etc but found unknown token.

    Any ideas? Thanks!

    • nodis February 19, 2010 at 12:45 pm #

      Hmmm. You're right. The revised script is saved as a PPC app, as opposed to Universal. You can fix that easily though.

      In Snow Leopard, launch AppleScript Editor (you'll find it in your /Applications/Utilities folder). Open the revised droplet from within ApplesScript Editor. Choose "Save As…" to re-save and, voila, a Universal version of the app.

      • BrooksD February 22, 2010 at 12:33 am #

        Hi guys, I've replaced the linked file with one that should be Universal. Sorry about that!

        • sims February 25, 2010 at 10:52 am #

          thank you guys.
          this one works and works beautifully!

  17. Nick S. July 23, 2010 at 6:06 pm #

    Hi, I am using snow leopard and adobe professional 8

    Getting the following error message:
    "Sorry, an error occured: System Events got an error: Can't get window 'Recognize Text' of application process 'Acrobat' (-1728)

    • BrooksD July 23, 2010 at 6:11 pm #

      Hi Nick, can you go to System Preferences > Universal Access and check to see if "Enable access for assistive devices" is checked? If so, are you using an English version of Acrobat or some other language?

    • Karl August 10, 2010 at 12:08 pm #

      Hi Nick,
      I got the german Acrobat 9 for mac and for the script to run I needed to translate the correspendet menu commands as you can see in the following script excerpt:

      tell application process "Acrobat"

      click the menu item "Text mit OCR erkennen…" of menu 1 of menu item "OCR-Texterkennung" of the menu "Dokument" of menu bar 1
      try
      click radio button "Alle Seiten" of group 1 of group 2 of group 1 of window "Text erkennen"
      end try
      click button "OK" of window "Text erkennen"

      end tell

  18. Karl August 10, 2010 at 12:31 pm #

    Hi,
    does anyone know how to change the script that it saves the text recognized file into a special folder like the DevonThink Inbox?

    • Karl August 13, 2010 at 2:08 am #

      I got it:

      tell application "Adobe Acrobat Pro"
      set theName to name of front document
      set file_path to "Platon:Users:name:Library:Application Support:Devonthink Pro 2:inbox:" & theName

      save the front document to file file_path with linearize
      close the front document
      end tell
      tell application "Finder"
      delete droppeditem
      end tell

      • BrooksD August 13, 2010 at 7:02 am #

        Nice work Karl thanks for posting this!

  19. @hackeron December 19, 2010 at 12:40 pm #

    Doesn't work on Acrobat X :(

  20. Orin January 21, 2011 at 10:57 am #

    As the last commenter mentioned, I just upgraded to Acrobat X (10) and the script no longer works. If someone with some scripting know how could make a new new/updated OCR script droplet it would be MUCH appreciated. I really miss it!
    Thanks!

  21. Etienne April 15, 2011 at 5:46 pm #

    Hi,

    Did you get the chance to make it work with Acrobat X pro?

    Thanks.

  22. j-lon April 19, 2011 at 2:39 pm #

    Anyone have a copy of the Acrobat 7 script? The link above appears to be dead.

  23. Brooks August 5, 2011 at 8:04 am #

    For those asking about Acrobat X, please see this post: http://www.documentsnap.com/ocr-applescript-for-a

  24. Johannes April 10, 2012 at 1:36 pm #

    Is there something similar for the Windows version of Adobe Acrobat Professional?

    • BrooksD April 10, 2012 at 4:53 pm #

      Not that I'm aware of.<p style=”color: #A0A0A8;”>

Leave a Reply