
I don’t know if it is because I have been glued to a computer since I was six years old, but my handwriting and printing is terrible. Really terrible. I think my 5 year old son and I have pretty similar handwriting skills.
Normally this is not a problem, except when I have to fill out a form. It’s a little embarrassing filling out some official form with my chicken scratch, which is one of the many reasons why I love PDFPen. Among many other things, it lets you fill out and edit any PDF document on your computer and then print it out.
However, that ability is not what this post is about. PDFPen will also OCR PDFs to make them searchable, and I wanted a way to OCR a bunch of documents automatically with an Applescript, similar to what has been done with Adobe Acrobat and with ABBYY FineReader.
I found two scripts out there. One from David Sparks at MacSparky, which some users reported problems with in newer PDFPen versions, and one from Michael Tsai at C-Command Software which will OCR a document with PDFPen and send it to EagleFiler.
Since both of these scripts were almost what I wanted, I decided to stand on the shoulder of giants and merge them together into this Applescript.
Here is the script:
-- Downloaded From: http://www.documentsnap.com
-- Last Modified: 2010-09-28
-- Includes code from MacSparky http://www.macsparky.com/blog/2009/5/24/pdfpen-ocr-folder-action-script.html
-- Includes code from C-Command Software http://c-command.com/scripts/eaglefiler/ocr-with-pdfpen
on adding folder items to this_folder after receiving added_items
try
repeat with added_item in added_items
my ocr(added_item)
end repeat
on error errText
display dialog "Error: " & errText
end try
end adding folder items to
on ocr(added_item)
tell application "PDFpen"
open added_item as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
end tell
end ocr
To implement, follow MacSparky’s excellent instructions.
I hope this is of use to someone, and thanks to David and Michael for their excellent Applescripts.
Related posts:













Awesome, thanks! Just thinking though….I'd love to use Hazel instead of Folder Actions if possible.
Would it be possible to modify the script a bit so that Hazel can monitor for new files in the folder, then call a simpler AppleScript to tell PDFpen to OCR it?
I wish I knew AppleScript…I would contribute!
Hey Josh, that should be doable. Just give me a few days and I should have something for you. Good idea.
Hey there Josh, give this a try: http://www.documentsnap.com/hazel-rule-to-ocr-doc…
Hope it helps!
We defiiently need more smart people like you around.
I am trying to use this script and it wants to ask what language to scan the document in. Is there a way of making that a part of the script?
Hm strange, I'll take a look and let you know.
Something is also causing the Script to force PDFPen to close. I will email you the error offlist if you would like to peek at it.
Hi Brooks,
I tried to save the script in Folder Action Scripts like D.Sparks recommends but I get the following error: The document “Untitled” could not be saved as “Untitled.scpt”.
I have tried all sorts to save any script anywhere, but all instances fail.
I am running Lion, does anyone else have this problem, or know of a solution?
Don't worry – I have solved it. Folder permissions!!!
Great Niv, and great to hear it works on Lion.
Brooks:
I'm having 2 problems with this script I'm hoping you can help with.
1. It asks me what language I want to use for OCR. This adds an unnecessary step. Can the script be modified to specify "English"?
2. PDFPen quits unexpectedly at the end of the script.
Thanks for any help you can provide.
Also: any chance of turning this into a droplet? Thanks!
I was inspired by this to create my own workflow which consists of the following steps:
Scan with Fujitsu ScanSnap > OCR with PDFpenPro > Export to Yojimbo
My goal was to automate all of this when the Scan button is pushed on the scanner. The following script accomplishes just that.
If you save it as an application it also functions as a droplet. I'm sure it can be easily modified to export to other applications if you're not a Yojimbo user.
Note: you will also need to DISable the preference to automatically OCR scanned documents in PDFpenPro, else you'll get that annoying dialog about language preference.
– SCRIPT —
on open ScannedDocument
tell application "PDFpenPro"
activate
open ScannedDocument
ocr document 1
repeat
if performing ocr of document 1 is false then
exit repeat
end if
end repeat
save document 1
set documentPath to path of document 1
tell application "Yojimbo"
import documentPath
end tell
close document 1 –delete this if you want the doc to stay open
quit –delete this if you want PDFpen to stay open
end tell
end open
Ah, one important step that I forgot, you also need to set the saved application as the target in the scan software preferences.