Tag Archives: applescript

Found: ScanSnap Applescript to remove ABBYY FineReader file

I haven’t tried this myself as I don’t personally use FineReader, but if you do and you are interested in this Applescript that I came across today from UnderDoug.ca, check it out.

Remove Unprocessed By FineReader.scpt is a FolderAction AppleScript to remove files produced by ScanSnap like “/2008_12_28_17_32_30.pdf” once FineReader produces one like “2008_12_28_17_32_30 processed by FineReader.pdf”.

You can get the script here.

Comments ( 0 )

Acrobat Applescript For ScanSnap OCR

This was referenced in my ScanSnap workflow series, but I thought I would provide it in its own article as well.

I have a ScanSnap S300M and Adobe Acrobat, and was getting pretty tired of sitting there OCRing the PDFs manually in Acrobat.

I came across this article by MacWorld which had a great Applescript Folder Action that would kick off Acrobat’s OCR whenever a PDF was dropped into the folder.

It worked well but I found that then I had to sit there and watch the OCR go after each document, and it seemed to have problems if I scanned another file while the OCR was still going.

I wanted a solution where I could just throw a bunch of PDFs at Acrobat and walk away.

Thanks to this thread on MacScripter, I turned the Macworld script into a droplet. Now I just go through and scan a bunch of PDFs to a folder, then drag the files onto the droplet, and go to bed. Acrobat OCRs each one one by one.

Here is the script. You can download it for free, but make sure you go to the Macworld article because it is 90% his work.

OCRIt-Acrobat – Droplet to batch OCR PDFs in Adobe Acrobat

To use it:

  • Download and uncompress the file and save it to your Desktop, Dock or wherever
  • Drag one or more PDFs onto the icon
  • Enjoy

Update: User nodis in the comments pointed out a great optimization that makes significantly smaller PDFs. Thanks!

Here is the source code:

property mytitle : "ocrIt-Acrobat" -- Modified from a script created by Macworld http://www.macworld.com/article/60229/2007/10/nov07geekfactor.html

-- I am called when the user open the script with a double click
on run
tell me
activate
display dialog "I am an AppleScript droplet." & return & return & "Please drop a bunch of PDF files onto my icon to batch OCR them." buttons {"OK"} default button 1 with title mytitle with icon note
end tell
end run

-- I am called when the user drops Finder items onto the script icon
-- Timeout of 36000 seconds to allow for OCRing really big documents
on open droppeditems
with timeout of 36000 seconds
try
repeat with droppeditem in droppeditems
set the item_info to info for droppeditem
tell application "Adobe Acrobat Professional"
activate
open droppeditem
end tell
tell application "System Events"

tell application process “Acrobat”

click the menu item “Recognize Text Using OCR…” of menu 1 of menu item “OCR Text Recognition” of the menu “Document” of menu bar 1
try
click radio button “All pages” of group 1 of group 2 of group 1 of window “Recognize Text”
end try
click button “OK” of window “Recognize Text”

end tell

end tell
tell application “Adobe Acrobat Professional”
save the front document with linearize
close the front document
end tell
end repeat
– catching unexpected errors
on error errmsg number errnum
my dsperrmsg(errmsg, errnum)
end try
end timeout
end open

-- I am displaying error messages
on dsperrmsg(errmsg, errnum)
tell me
activate
display dialog "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")" buttons {"Never mind"} default button 1 with icon stop with title mytitle
end tell
end dsperrmsg

Update 2: If you use Acrobat X, please see this post about OCR AppleScript for Acrobat X.

Comments ( 46 )

My ScanSnap Setup And Workflow – Post Scan Processing

Update: This post is now slightly out of date as I now use the ScanSnap S1300. You may want to sign up for my free 7 part e-Course while will more comprehensively take you through the steps to go paperless.
This is Part 2 of the My ScanSnap Setup And Workflow series. Make sure to check out Part 1ScanSnap Settings.

Now that we have set up ScanSnap Manager with my four profiles, here is what I do with the files.

At first I started using DevonThink Pro Office, but I found that it was a little overkill for my needs. If I had a huge amount of documents that I needed regular access to it would be perfect, but for my home needs I wanted to go with something a little more lightweight.

One main drawback (maybe the only one) is that my ScanSnap S300M did not come with any OCR software in the box. I could download a form to have ReadIris Pro 11 mailed to me, but that didn’t help me at first.

Luckily, I had Adobe Acrobat Professional already, so I decided to use that for my ScanSnap workflow. If you have the ScanSnap S1500 or S1500M, your scanner will come with Acrobat already.

Really Boring, Really Fast

Excited to start OCR’ing up a storm, I set ScanSnap Manager to output to Acrobat and away I went.

It worked quite well. I would hit the button, it would open the resulting file in Acrobat, and then I would go Document | OCR Text Recognition | Recognize Text Using OCR and follow the resulting menus.

I think this would be OK normally, but since I had a ton of things to scan from my file cabinet, this got really boring, really fast to have to sit there and manually OCR every document over and over again. I knew there had to be a better way.

Applescript To The Rescue

I am a complete AppleScript newbie, but I found this great post from Macworld where the author made an AppleScript Folder Action that would watch a certain folder, and when a document got put in it, it would kick off Acrobat (or ReadIris Pro) and OCR it automatically.

I set ScanSnap Manager to save to a folder called ToProcess and gave that folder a Folder Action to run the MacWorld script.

This worked quite well, and would possibly work OK on an ongoing basis, but again I ran into problems when doing my massive scan-a-thon – if i dropped a document in to the folder while the other Acrobat session was still OCR’ing, it would give error messages.

Droplets Are Fun

The solution I came up with was to change the script so that it became a droplet. To do this I ripped off part of the script referenced in this thread.

A droplet is just an Application that you save somewhere (I have it on my Dock). You run it by dragging a file onto its icon.

Here is the script that I cobbled together . Feel free to download and use as you please.

OCRit – Droplet to kick batch OCR PDFs using Adobe Acrobat

Final Workflow

So now, I have the following workflow:

  • Scan document using the ScanSnap, ScanSnap Manager saves the file in the ToProcess folder
  • When I am done my batch, I drag the PDF files onto the OCRIt icon, which kicks of Adobe Acrobat Professional and tells it to recognize the text in the document
  • When that is done, process/move the files as needed

It is working quite well for you, but I guess if I wanted to avoid all this I could have just stuck with DevonThink as it has built in OCR. What is your workflow? How do you handle the Optical Character Recognition part?

Comments ( 8 )