Updated: Acrobat Applescript for ScanSnap OCR

February 16, 2010

As many of you know, in 2008 I posted an Applescript that will use Adobe Acrobat to make PDFs searchable using Acrobat’s OCR capabilities.

In the comments to that post, user nodis pointed out that adding 2 words to one of the lines can make the PDFs quite a bit smaller.

In my testing, I ran a 1.3 MB PDF through the script. Before nodis’ change, the resulting PDF was 1.7 MB. After the change, it was 424K!

Here is the updated script:

OCRIt-Acrobat – Droplet to batch OCR PDFs in Adobe Acrobat

To use it:

  • Download and uncompress the file and save it to your Desktop, Dock or wherever
  • Drag one or more PDFs onto the icon
  • Enjoy

Let me know how it works out for you and if you see similar reductions in file size.

Cool Paperless Setup Video

February 4, 2010

As much of a paperless geek that I am, I normally wouldn’t sit and watch a video of someone scanning and shredding paper.

However, I just wanted to point you to this YouTube video by user allenday. He’s got a really cool setup of a ScanSnap S300M, Adobe Acrobat, a Mac Mini, a wall-mounted Sharp Aquos, the Royal PX1000MX to shred, and uploads everything to Evernote.

To do the OCRing, he uses the Acrobat OCR Applescript Droplet that I hacked/posted about earlier.


Very cool setup, thanks for sharing allenday! Do any of you have a cool paperless setup? Feel free to share pics or videos in the comments.

Applescript: Easily convert PDF documents to JPG or PNG

November 3, 2009

There are, of course, a million ways to convert PDF documents to JPG or PNG files. However, sometimes you just want something quick and easy.

A while ago, reader AS pointed me out an Applescript droplet written by Martin Michel over at MacScripter.

AS mentioned that it would be nice to have a version that converts to PNG as well. Being nothing if not nice, I used my almost non-existant Applescript/Python skills to convert Martin’s script to output PNG. All credit for this goes to Martin.. I just did some modifications.

Here’s how to do it:

PDF To JPG

PDF to PNG

  • Download PDF2PNG from here
  • Drag a PDF or multiple PDFs onto the icon
  • Select the resolution (or accept the default)
  • A PNG file will be created for each page in the PDF

Hope this helps some of you. I know they will come in handy for me. Thanks AS and thanks Martin Michel! Let me know how it works out for you.

OCR Your ScanSnap PDF Before Sending It To Evernote

July 14, 2009

Update: Of course, a few days after I posted this, Evernote announced that they would make PDFs searchable for Premium users. So if you are not a Premium user, this will help. Otherwise, just upload away.

One of the most popular posts on this site is on how to use the Fujitsu ScanSnap with Evernote. It describes how to set up a profile in ScanSnap Manager to send the resulting PDF to Evernote.

There is one problem with doing it this way – Evernote does not OCR PDFs. I assume they’ll be fixing this someday, but for now, if you want your document searchable within Evernote, you need to OCR it before sending it into Evernote.

How you do this depends on which model of the ScanSnap that you have, and whether you have Windows or a Mac.

ScanSnap For Windows

If you have the ScanSnap S300, S510, or S1500, your solution is pretty simple.

What we’re going to do is set Evernote to watch a folder so that anything it finds in there it will automatically import. Then set up ScanSnap to save files to that folder.

  • In Evernote, go to File -> Import -> File Import Wizard
  • Hit Next and select the Source folder that you want Evernote to watch and set your notebook
  • Choose “Watch folder for changes and import files automatically”

Now set up ScanSnap normally to scan to that folder you just selected, and whatever files you save into that folder will be grabbed by Evernote.

ScanSnap S510M or S1500M For Mac

For whatever reason, Evernote for the Mac does not have the Watch Folder functionality that the Windows client does (why not Evernote?!). However, thanks to the magic of Applescript, we can do the same thing.

This will work for the ScanSnap S510M or S1500M.

  • Download this file – AddToEvernote.scpt and save it to /Library/Scripts/Folder Action Scripts
  • Create or select a folder that you want scanned PDFs to go into. Right-click on it and select More and then Enable Folder Actions
  • Right click on the folder again and select More and then Attach a Folder Action. Select the AddToEvernote script that you just saved

Now set up ScanSnap normally to scan to the folder that you just configured. When you add a PDF to it, the Applescript will go through that folder and add the files into Evernote. Handy!

ScanSnap S300M For Mac

For whatever reason (I say that a lot), the ScanSnap S300M does not come with OCR software (why not Fujitsu!?).

However, we’re in luck. Awesome DocumentSnap reader Sebastian Poll wrote this Applescript that will use Adobe Acrobat to automatically OCR the PDF and then kick it straight into Evernote.

Obviously, it requires Acrobat. If you don’t have Acrobat, you can use whatever method you currently use to OCR and then use the AddToEvernote above to import it in.

Note that Sebastian’s version was actually written with some of the code in German. I changed it to English, so if there are problems, it is probably my fault and not his.

  • Download this file – OCREvernote.scpt and save it to /Library/Scripts/Folder Action Scripts
  • Create or select a folder that you want scanned PDFs to go into. Right-click on it and select More and then Enable Folder Actions
  • Right click on the folder again and select More and then Attach a Folder Action. Select the OCREvernote script that you just saved

Now set up ScanSnap normally to scan to the folder that you just configured. When you add a PDF to it, the Applescript will go through that folder and OCR with Acrobat and then add the files into Evernote.

Do you use the ScanSnap with Evernote? Do you have any other methods of making PDFs searchable? Or do you not bother? Leave a message in the comments.

Abbyy Finereader and Adobe Acrobat – Why Does Fujitsu Include Both?

April 20, 2009

finereadervsacrobat.gif

I have received a number of questions recently about the software that is included with the Fujitsu ScanSnap. For example, why does the ScanSnap come with both Abbyy FineReader and Adobe Acrobat? Aren’t they both for doing OCR?

I suspect part of the reason that this question comes up is because of my posts about my ScanSnap workflow and my Adobe Acrobat OCR Applescript. Is all that necessary?

Let me start by saying that I personally have the ScanSnap S300M. The S300M comes neither with Abbyy FineReader not with Adobe Acrobat. If you have the S1500 or S1500M, your scanner will come with both and doing OCR is much more integrated than with the S300M, so my post-scan processing fun may not be necessary.

So What’s The Difference?

The ScanSnap comes with a special version of Abbyy FineReader called FineReader for ScanSnap. They’ve integrated that with ScanSnap Organizer, so if you are using the built-in automatic OCR’ing, that is what it is using.

If all you care about is having your PDFs searchable and don’t mind performing the OCR right after scanning, then the supplied FineReader is probably all you need.

To my mind, there are basically two main reasons why you will want to use Adobe Acrobat:

  • You want to do PDF editing after the fact
  • You want to batch your OCR after the fact

PDF Editing

So you have your scanned PDF. Now what? If you want to remove/rearrange pages and do a whole ton of other editing functions, Acrobat is a great tool. It is most definitely not just for making a PDF searchable.

You can see a bunch more information for Adobe Acrobat 9 (included with the ScanSnap 1500) and Acrobat 8 (included with the ScanSnap 1500M). You can see from the price that it’s a pretty good deal that this software is included with the ScanSnap.

Batch OCR

If you have a whole bunch of documents to scan in, it may be annoying to scan, sit there and wait for it to OCR, scan, OCR, scan, OCR, and so on. Some people prefer to scan all their documents to PDF in one shot, and then OCR them all at once. You can use Acrobat to do that instead of the included FineReader.

So there you have it, some of the differences between the two. What are some of the reasons you use one over the other?

Found: ScanSnap Applescript to remove ABBY FineReader fike

December 30, 2008

I haven’t tried this myself as I don’t personally use FineReader, but if you do and you are interested in this Applescript that I came across today from UnderDoug.ca, check it out.

Remove Unprocessed By FineReader.scpt is a FolderAction AppleScript to remove files produced by ScanSnap like “/2008_12_28_17_32_30.pdf” once FineReader produces one like “2008_12_28_17_32_30 processed by FineReader.pdf”.

You can get the script here.

Acrobat Applescript For ScanSnap OCR

September 5, 2008

This was referenced in my ScanSnap workflow series, but I thought I would provide it in its own article as well.

I have a ScanSnap S300M and Adobe Acrobat, and was getting pretty tired of sitting there OCRing the PDFs manually in Acrobat.

I came across this article by MacWorld which had a great Applescript Folder Action that would kick off Acrobat’s OCR whenever a PDF was dropped into the folder.

It worked well but I found that then I had to sit there and watch the OCR go after each document, and it seemed to have problems if I scanned another file while the OCR was still going.

I wanted a solution where I could just throw a bunch of PDFs at Acrobat and walk away.

Thanks to this thread on MacScripter, I turned the Macworld script into a droplet. Now I just go through and scan a bunch of PDFs to a folder, then drag the files onto the droplet, and go to bed. Acrobat OCRs each one one by one.

Here is the script. You can download it for free, but make sure you go to the Macworld article because it is 90% his work.

OCRIt-Acrobat – Droplet to batch OCR PDFs in Adobe Acrobat

To use it:

  • Download and uncompress the file and save it to your Desktop, Dock or wherever
  • Drag one or more PDFs onto the icon
  • Enjoy

Update: User nodis in the comments pointed out a great optimization that makes significantly smaller PDFs. Thanks!

Here is the source code:

property mytitle : "ocrIt-Acrobat" -- Modified from a script created by Macworld http://www.macworld.com/article/60229/2007/10/nov07geekfactor.html

-- I am called when the user open the script with a double click
on run
tell me
activate
display dialog "I am an AppleScript droplet." & return & return & "Please drop a bunch of PDF files onto my icon to batch OCR them." buttons {"OK"} default button 1 with title mytitle with icon note
end tell
end run

-- I am called when the user drops Finder items onto the script icon
-- Timeout of 36000 seconds to allow for OCRing really big documents
on open droppeditems
with timeout of 36000 seconds
try
repeat with droppeditem in droppeditems
set the item_info to info for droppeditem
tell application "Adobe Acrobat Professional"
activate
open droppeditem
end tell
tell application "System Events"

tell application process “Acrobat”

click the menu item “Recognize Text Using OCR…” of menu 1 of menu item “OCR Text Recognition” of the menu “Document” of menu bar 1
try
click radio button “All pages” of group 1 of group 2 of group 1 of window “Recognize Text”
end try
click button “OK” of window “Recognize Text”

end tell

end tell
tell application “Adobe Acrobat Professional”
save the front document with linearize
close the front document
end tell
end repeat
– catching unexpected errors
on error errmsg number errnum
my dsperrmsg(errmsg, errnum)
end try
end timeout
end open

-- I am displaying error messages
on dsperrmsg(errmsg, errnum)
tell me
activate
display dialog "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")" buttons {"Never mind"} default button 1 with icon stop with title mytitle
end tell
end dsperrmsg

My ScanSnap Setup And Workflow – Post Scan Processing

August 11, 2008

This is Part 2 of the My ScanSnap Setup And Workflow series. Make sure to check out Part 1ScanSnap Settings.

Now that we have set up ScanSnap Manager with my four profiles, here is what I do with the files.

At first I started using DevonThink Pro Office, but I found that it was a little overkill for my needs. If I had a huge amount of documents that I needed regular access to it would be perfect, but for my home needs I wanted to go with something a little more lightweight.

One main drawback (maybe the only one) is that my ScanSnap S300M did not come with any OCR software in the box. I could download a form to have ReadIris Pro 11 mailed to me, but that didn’t help me at first.

Luckily, I had Adobe Acrobat Professional already, so I decided to use that for my ScanSnap workflow. If you have the ScanSnap S1500 or S1500M, your scanner will come with Acrobat already.

Really Boring, Really Fast

Excited to start OCR’ing up a storm, I set ScanSnap Manager to output to Acrobat and away I went.

It worked quite well. I would hit the button, it would open the resulting file in Acrobat, and then I would go Document | OCR Text Recognition | Recognize Text Using OCR and follow the resulting menus.

I think this would be OK normally, but since I had a ton of things to scan from my file cabinet, this got really boring, really fast to have to sit there and manually OCR every document over and over again. I knew there had to be a better way.

Applescript To The Rescue

I am a complete AppleScript newbie, but I found this great post from Macworld where the author made an AppleScript Folder Action that would watch a certain folder, and when a document got put in it, it would kick off Acrobat (or ReadIris Pro) and OCR it automatically.

I set ScanSnap Manager to save to a folder called ToProcess and gave that folder a Folder Action to run the MacWorld script.

This worked quite well, and would possibly work OK on an ongoing basis, but again I ran into problems when doing my massive scan-a-thon – if i dropped a document in to the folder while the other Acrobat session was still OCR’ing, it would give error messages.

Droplets Are Fun

The solution I came up with was to change the script so that it became a droplet. To do this I ripped off part of the script referenced in this thread.

A droplet is just an Application that you save somewhere (I have it on my Dock). You run it by dragging a file onto its icon.

Here is the script that I cobbled together . Feel free to download and use as you please.

OCRit – Droplet to kick batch OCR PDFs using Adobe Acrobat

Final Workflow

So now, I have the following workflow:

  • Scan document using the ScanSnap, ScanSnap Manager saves the file in the ToProcess folder
  • When I am done my batch, I drag the PDF files onto the OCRIt icon, which kicks of Adobe Acrobat Professional and tells it to recognize the text in the document
  • When that is done, process/move the files as needed

It is working quite well for you, but I guess if I wanted to avoid all this I could have just stuck with DevonThink as it has built in OCR. What is your workflow? How do you handle the Optical Character Recognition part?