Acrobat Applescript For ScanSnap OCR

September 5, 2008

Hi there. If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

This was referenced in my ScanSnap workflow series, but I thought I would provide it in its own article as well.

I have a ScanSnap S300M and Adobe Acrobat, and was getting pretty tired of sitting there OCRing the PDFs manually in Acrobat.

I came across this article by MacWorld which had a great Applescript Folder Action that would kick off Acrobat’s OCR whenever a PDF was dropped into the folder.

It worked well but I found that then I had to sit there and watch the OCR go after each document, and it seemed to have problems if I scanned another file while the OCR was still going.

I wanted a solution where I could just throw a bunch of PDFs at Acrobat and walk away.

Thanks to this thread on MacScripter, I turned the Macworld script into a droplet. Now I just go through and scan a bunch of PDFs to a folder, then drag the files onto the droplet, and go to bed. Acrobat OCRs each one one by one.

Here is the script. You can download it for free, but make sure you go to the Macworld article because it is 90% his work.

OCRIt-Acrobat – Droplet to batch OCR PDFs in Adobe Acrobat

To use it:

  • Download and uncompress the file and save it to your Desktop, Dock or wherever
  • Drag one or more PDFs onto the icon
  • Enjoy

Update: User nodis in the comments pointed out a great optimization that makes significantly smaller PDFs. Thanks!

Here is the source code:

property mytitle : "ocrIt-Acrobat" -- Modified from a script created by Macworld http://www.macworld.com/article/60229/2007/10/nov07geekfactor.html

-- I am called when the user open the script with a double click
on run
tell me
activate
display dialog "I am an AppleScript droplet." & return & return & "Please drop a bunch of PDF files onto my icon to batch OCR them." buttons {"OK"} default button 1 with title mytitle with icon note
end tell
end run

-- I am called when the user drops Finder items onto the script icon
-- Timeout of 36000 seconds to allow for OCRing really big documents
on open droppeditems
with timeout of 36000 seconds
try
repeat with droppeditem in droppeditems
set the item_info to info for droppeditem
tell application "Adobe Acrobat Professional"
activate
open droppeditem
end tell
tell application "System Events"

tell application process “Acrobat”

click the menu item “Recognize Text Using OCR…” of menu 1 of menu item “OCR Text Recognition” of the menu “Document” of menu bar 1
try
click radio button “All pages” of group 1 of group 2 of group 1 of window “Recognize Text”
end try
click button “OK” of window “Recognize Text”

end tell

end tell
tell application “Adobe Acrobat Professional”
save the front document with linearize
close the front document
end tell
end repeat
– catching unexpected errors
on error errmsg number errnum
my dsperrmsg(errmsg, errnum)
end try
end timeout
end open

-- I am displaying error messages
on dsperrmsg(errmsg, errnum)
tell me
activate
display dialog "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")" buttons {"Never mind"} default button 1 with icon stop with title mytitle
end tell
end dsperrmsg

Share and Enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Mixx
  • LinkedIn
  • Netvibes
  • Ping.fm
  • Propeller
  • Reddit
  • StumbleUpon
  • TwitThis

Related posts:

  1. Updated: Acrobat Applescript for ScanSnap OCR
  2. Making Acrobat OCR’ed PDFs Smaller With Formatted Text & Graphics
  3. Abbyy Finereader and Adobe Acrobat – Why Does Fujitsu Include Both?
  4. OCR Your ScanSnap PDF Before Sending It To Evernote
  5. Applescript: Easily convert PDF documents to JPG or PNG

Comments

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

25 Responses to “Acrobat Applescript For ScanSnap OCR”

  1. Dave on November 28th, 2008 6:58 pm

    Man, I love you!!!!!!! I can’t tell you how long I’ve been looking for something like this. It’s just perfect!! THANK YOU! THANK YOU! THANK YOU!!!!!!!!!

  2. Brooks on November 28th, 2008 10:34 pm

    Ha, no worries at all Dave. I mostly just built on what other people had done, but glad I could help!

  3. Barry on March 26th, 2009 2:48 am

    I get an error message when I drag a file into the Droplet. It says "Sorry an error occurred. Access for assistive devices is disabled (-1719)" How do I get this droplet to automatically OCR my files? Thanks for your help!

  4. Barry on March 26th, 2009 3:21 am

    Ok…I figured it out. I had to adjust the settings on Universal Access in my Mac. Now, how do I do multiple files at once???

  5. BrooksD on March 26th, 2009 3:43 am

    Hi Barry, you got it- universal access. Good job!

    For multiple files, assuming you've saved the droplet somewhere, all you need to do is drag a bunch of files onto the icon. It'll then OCR them one by one.

    Let me know how it goes!

  6. barbara on June 7th, 2009 4:05 pm

    you rock the house. thank you so much for this. This saved my life.

  7. BrooksD on June 7th, 2009 4:07 pm

    Hah thanks! Glad to hear it helped out.

  8. kenny on July 9th, 2009 10:57 pm

    hi. anybody else know why the size of the document increases by as many as 6x when you run a PDF through this script. it works great – but the size increase is substantial. unless i am wrong, only text should be added plus some location information, and therefore size should not go up by as much.

    is it possible to have a version of this script where the size of the document does not super expand!?

    thanks.

  9. antony on July 16th, 2009 8:39 am

    Hi,
    Thanks for the nice script.
    I have a problem withe the saving command.
    How can i save this file as a txt.
    thanks in advance.
    regards
    Antony

  10. BrooksD on July 16th, 2009 1:06 pm

    @antony
    You can just highlight the code in the post above, Copy, and then Paste it into your text editor of choice. Then just File Save As to a .txt extension if that is what you want.

  11. pendolino on July 22nd, 2009 10:20 am

    this looks great but it does not work for acrobat 7.0 standard. i tried to modify the script but it kept throwing up errors. can you please attach the script file as text? i suspect the copy paste from the site may be messing up the formatting although i cant be sure.

  12. BrooksD on July 22nd, 2009 12:54 pm

    @pendolino I haven't tried with Acrobat 7 so there may be some tweaking required. Here is the script in txt format: http://www.documentsnap.com/files/OCRIt-Acrobat.t...

  13. Rodger on August 8th, 2009 2:12 am

    Thanks for writing this applet and making it available, I am looking forward to using it.
    I am having trouble using Adobe 7.0 professional. Dragging a pdf file on the droplet opens the file in acrobat but an error window quickly appears saying

    "System Events got an error:Can't get menu item "OCR Text Recognition" of menu "Document" of menu bar 1 of application process "Acrobat". (-1728)

    Could you please tell me what this means and perhaps offer some advice on how I might fix it. Would using Acrobat 9 solve my problem? I have never written/edited and applescript but I'd be glad to have a go if necessary.
    Thanks

  14. BrooksD on August 8th, 2009 3:46 am

    Hi Rodger,
    I don't have Acrobat 7 so I can't look into it, but I am guessing that it must be something to do with differences in the menu between 7 and 8.

    Basically if you look at the script, the code needs to match the menu titles exactly.

    If you'd like, if you can send me screenshots of each step of the Acrobat 7 Ocr process, I can probably make you a 7.0 version (assuming you can script to 7).

    You can send them in an email to brooks@documentsnap.com.

  15. Phil Boardman on September 14th, 2009 12:25 pm

    Hacked script to work with Acrobat 7.
    http://phil.boardman.id.au/journal/489/

  16. BrooksD on September 14th, 2009 1:08 pm

    Nice, thank you so much Phil!

  17. nodis on February 12th, 2010 5:19 pm

    Great tip, but the script needs one modification. The line in the AppleScript where the document is saved should be changed to:

    save the front document with linearize

    This is the equivalent in scriptese as saving as "optimized" (or a "Save as…"), which results in a lot of Cruft being thrown away and the saved PDF optimized for progressive Web download. When I run the script on test OCRd PDFs, both with and without this change to the AppleScript, I get ~95% smaller PDFs with this change (note that this is also with Acrobat's new ClearScan OCR method selected — the new default).

  18. msim on February 12th, 2010 5:44 pm

    hi @nodis: can you point to exactly which line should be changed.
    i have not been using this script essentially because the PDFs become bloated.
    and i am unable to understand why addition of scanned text should lead to any bloating! perhaps you have found the answer!

  19. nodis on February 12th, 2010 8:01 pm

    Not a big deal. I must say that the difference in size and quality between OCR-ing PDFs using the latest ABBYY FinePrint for ScanSnap and Acro 9/ClearScan OCR downsampling at 600 dpi/and your droplet is amazing.

    I just scanned a 25 page B&W Word document. The original image-only PDF is 6.2 MB. ABBYY produces an OCR-ed PDF that weighs in at 10.5 MB choosing "High" quality; it is 1.8 MB at Medium quality (and looks like crap visually at Medium quality, I should add — lots of compression artifacts).

    By comparison, using your Applescript droplet, with the edit I suggested, along with Acrobat 9 and the "ClearScan" and 600 dpi OCR options, my OCR-d PDF comes out at 356 KB. This version looks superb, and the OCR quality seems fine.

    I don't in any way mean to knock ABBYY — but the size/quality ratio of Acro 9+ClearScan 600dpi+your AppleAScript is just absurdly good. I'm frankly surprised not to see Adobe market this feature more — or Fujitsu, their OEM scanner partner.

  20. nodis on February 12th, 2010 8:25 pm

    Msim,

    The line that needs to be changed is towards the end:

    save the front document

    One simply adds "with linearize" at the end. As you will see below, our intrepid host already plans to make this change.

    On Adobe's Web site, linearizing a PDF is described as follows:

    "A linearized PDF document is organized to enable incremental access in a network environment. For example, a linearized PDF document can be displayed in a web browser before the entire PDF document is downloaded."

    Old versions of Acrobat (like version 3) described this same thing as "Optimizing."

    For whatever reason, "linearizing" the save from within the AppleScript causes a bunch of redundant data in the OCR-d PDF to be thrown away, resulting in a much smaller file.

  21. BrooksD on February 16th, 2010 3:43 pm

    OK everyone, I have updated the post and posted the updated version at http://www.documentsnap.com/files/OCRIt-Acrobat-1... with nodis' changes. Thanks again!!!

  22. sims on February 17th, 2010 9:41 am

    The linked app requires me to install Rosetta. Since I do not need it for anything else, I have avoided it. Is there a way to save the script in AppleScript editor and use it without Rosetta?

    I tried to copy and paste it into AppleScriptEditor. Upon saving the script it asked me to help it locate Acrobat, which I did by point it to "Adobe Acrobat Pro.app". But then it failed and pointed me to this line in the script – tell application process “Acrobat”.

    The cursor is at the " before the ACrobat in the line above and I receive a Syntax Error – Expected expression, property or key form etc but found unknown token.

    Any ideas? Thanks!

  23. nodis on February 19th, 2010 12:45 pm

    Hmmm. You're right. The revised script is saved as a PPC app, as opposed to Universal. You can fix that easily though.

    In Snow Leopard, launch AppleScript Editor (you'll find it in your /Applications/Utilities folder). Open the revised droplet from within ApplesScript Editor. Choose "Save As…" to re-save and, voila, a Universal version of the app.

  24. BrooksD on February 22nd, 2010 12:33 am

    Hi guys, I've replaced the linked file with one that should be Universal. Sorry about that!

  25. sims on February 25th, 2010 10:52 am

    thank you guys.
    this one works and works beautifully!

Got something to say?