PDFPen OCR Applescript To Automatically Make PDFs Searchable

PDFPen OCR Applescript To Automatically Make PDFs Searchable

I don’t know if it is because I have been glued to a computer since I was six years old, but my handwriting and printing is terrible. Really terrible. I think my 5 year old son and I have pretty similar handwriting skills.

Normally this is not a problem, except when I have to fill out a form. It’s a little embarrassing filling out some official form with my chicken scratch, which is one of the many reasons why I love PDFPen. Among many other things, it lets you fill out and edit any PDF document on your computer and then print it out.

However, that ability is not what this post is about. PDFPen will also OCR PDFs to make them searchable, and I wanted a way to OCR a bunch of documents automatically with an Applescript, similar to what has been done with Adobe Acrobat and with ABBYY FineReader.

I found two scripts out there. One from David Sparks at MacSparky, which some users reported problems with in newer PDFPen versions, and one from Michael Tsai at C-Command Software which will OCR a document with PDFPen and send it to EagleFiler.

Since both of these scripts were almost what I wanted, I decided to stand on the shoulder of giants and merge them together into this Applescript.

Here is the script:
-- Downloaded From: http://www.documentsnap.com
-- Last Modified: 2010-09-28
-- Includes code from MacSparky http://www.macsparky.com/blog/2009/5/24/pdfpen-ocr-folder-action-script.html
-- Includes code from C-Command Software http://c-command.com/scripts/eaglefiler/ocr-with-pdfpen

on adding folder items to this_folder after receiving added_items
try
repeat with added_item in added_items
my ocr(added_item)
end repeat
on error errText
display dialog "Error: " & errText
end try
end adding folder items to


on ocr(added_item)
tell application "PDFpen"
open added_item as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
end tell
end ocr

[box type=”download”]PDFpen Users: Download The Text Script Here (Right-click and Save-As)[/box] [box type=”download”]PDFpen Pro Users: Download The Text Script Here (Right-click and Save-As)[/box]

To implement, follow MacSparky’s excellent instructions.

I hope this is of use to someone, and thanks to David and Michael for their excellent Applescripts.

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply 18 comments

Wander - February 23, 2015 Reply

I have tried this script but once it opens the document, it says “PDFpen is performing OCR (optical character recognition) to translate the document into text that can be selected.” and it hangs. I have to force quit PDFpen for it to close.
The script is as follows:

tell application “PDFpen”
open theFile as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
close document 1
quit
end tell

Temple Run 2 Cheats - August 27, 2014 Reply

I have been exploring for a bit for any high-quality articles or weblog posts on this kind
of space . Exploring in Yahoo I eventually stumbled
upon this web site. Reading this information So i am satisfied to
exhibit that I have a very excellent uncanny feeling I found
out just what I needed. I most no doubt will make sure to don?t omit this web site and provides it
a glance regularly.

My Doxie Go Wireless Automated Workflow | Tips To Learn How To Go Paperless | DocumentSnap Paperless Blog - July 24, 2012 Reply

[…] This rule watches toPDF for any PDF files, and if it finds any, it runs an AppleScript which calls PDFPen to do the OCR. I mainly did it this way because I already had a script that performs OCR with PDFPen. […]

Patch - January 28, 2012 Reply

Ah, one important step that I forgot, you also need to set the saved application as the target in the scan software preferences.

Patch - January 28, 2012 Reply

I was inspired by this to create my own workflow which consists of the following steps:

Scan with Fujitsu ScanSnap > OCR with PDFpenPro > Export to Yojimbo

My goal was to automate all of this when the Scan button is pushed on the scanner. The following script accomplishes just that.

If you save it as an application it also functions as a droplet. I'm sure it can be easily modified to export to other applications if you're not a Yojimbo user.

Note: you will also need to DISable the preference to automatically OCR scanned documents in PDFpenPro, else you'll get that annoying dialog about language preference.

— SCRIPT —
on open ScannedDocument
tell application "PDFpenPro"
activate
open ScannedDocument
ocr document 1
repeat
if performing ocr of document 1 is false then
exit repeat
end if
end repeat

save document 1
set documentPath to path of document 1

tell application "Yojimbo"
import documentPath
end tell

close document 1 –delete this if you want the doc to stay open
quit –delete this if you want PDFpen to stay open
end tell
end open

Chris - November 4, 2011 Reply

Also: any chance of turning this into a droplet? Thanks!

Chris - November 4, 2011 Reply

Brooks:

I'm having 2 problems with this script I'm hoping you can help with.

1. It asks me what language I want to use for OCR. This adds an unnecessary step. Can the script be modified to specify "English"?

2. PDFPen quits unexpectedly at the end of the script.

Thanks for any help you can provide.

Niv - September 4, 2011 Reply

Hi Brooks,

I tried to save the script in Folder Action Scripts like D.Sparks recommends but I get the following error: The document “Untitled” could not be saved as “Untitled.scpt”.
I have tried all sorts to save any script anywhere, but all instances fail.
I am running Lion, does anyone else have this problem, or know of a solution?

    Niv - September 4, 2011 Reply

    Don't worry – I have solved it. Folder permissions!!!

      Brooks Duncan - September 4, 2011 Reply

      Great Niv, and great to hear it works on Lion.

ToddPeperkorn - December 15, 2010 Reply

Something is also causing the Script to force PDFPen to close. I will email you the error offlist if you would like to peek at it.

ToddPeperkorn - December 15, 2010 Reply

I am trying to use this script and it wants to ask what language to scan the document in. Is there a way of making that a part of the script?

    Brooks Duncan - December 15, 2010 Reply

    Hm strange, I'll take a look and let you know.

    charlie - November 15, 2022 Reply

    Hi, did you ever figure out the language issue? Was just trying this and running into same issue…having a hard time finding documentation on the applescript implemented by PDFPenPro especially now that it has been sold by Smile.

Josh - September 29, 2010 Reply

Awesome, thanks! Just thinking though….I'd love to use Hazel instead of Folder Actions if possible.

Would it be possible to modify the script a bit so that Hazel can monitor for new files in the folder, then call a simpler AppleScript to tell PDFpen to OCR it?

I wish I knew AppleScript…I would contribute!

    Brooks Duncan - September 29, 2010 Reply

    Hey Josh, that should be doable. Just give me a few days and I should have something for you. Good idea.

    Brooks Duncan - October 1, 2010 Reply

    Hey there Josh, give this a try: http://www.documentsnap.com/hazel-rule-to-ocr-doc

    Hope it helps!

    Thena - December 22, 2011 Reply

    We defiiently need more smart people like you around.

Leave a Reply: