How To Create Searchable PDFs With The ScanSnap S300M
February 2, 2010
So you read all this great stuff about how the Fujitsu ScanSnap is awesome and creates searchable PDFs, and you’re on a Mac and want a portable scanner, so you drop the cash on a ScanSnap S300M.
Then you get it home and find out – wait a minute – the S300M doesn’t come with OCR software! If you’ve been there (and I have), hopefully this post will help you out, as I get a lot of questions about this.
Mail-In Rebate
Your local Fujitsu website may provide a mail-in rebate for OCR software if you purchase the S300M. At the time of this writing, the US Fujitsu websites has a mail-in rebate for a free copy of ReadIris OCR software
The rebate is at http://www.fujitsu.com/us/services/computing/peripherals/scanners/rebates.html . Check if your country has something similar.
Acrobat
While the S300M doesn’t come with Adobe Acrobat, if you have a copy of it laying around, or have access to it, you can use the ScanSnap with it. Here is an example of how I use the S300M with Acrobat 8.
Evernote
Evernote Premium allows users to upload PDFs and they will be automatically OCR’ed and made searchable.
DevonThink
If you use a program like Devonthink Pro Office to manage your documents, they will be made searchable.
NeatWorks
NeatWorks is a software that is bundled with the NeatDesk scanner, but it can be purchased on its own. See this post for how to use NeatWorks with the Fujitsu ScanSnap.
These are some ideas for how to make searchable PDFs with the ScanSnap S300M. Do you have any others? Leave a message in the comments.
Doing OCR Batch Processing Using The ScanSnap And ABBYY FineReader
January 5, 2010
Sometimes, when you have to scan a large number of documents at once, the step of doing OCR (making the PDF searchable) after each document can really slow things down. It may be preferable to scan them all in and then OCR them all in one big shot.
In the past I have posted about how to do batch OCR using Adobe Acrobat and have posted an Acrobat Applescript.
Over at the Optimality! blog, Tobi has posted a walkthrough of using ABBYY Finereader, which comes with the ScanSnap S1500M (and S1500 for that matter) to do batch OCR.
The problem is that in the default setup, each scan is OCRed right after the scan and depending on the age your machine (my G5 is getting a little long in the tooth) in can take quite a while. When you’re in the process of scanning many hundred’s of pages of paper documents, you don’t want to have to wait for the computer to do it’s OCR recognition, you’d rather feed it all the documents and let it do OCR while you’re doing something else.
Fortunately, this is possible. Reading all the way through the handbook as well as through the ABBYY online help I found out that you can scan to PDF only, and then afterwards convert the PDFs with ABBYY FineReader.
Check out the post here. Do you have any other tricks for doing batch OCR?
ABBYY Finereader And Snow Leopard – File Not Created With ScanSnap
August 31, 2009
One issue with the Fujitsu ScanSnap and OSX 10.6 Snow Leopard that I forgot to mention the other day is the ABBYY FineReader that comes bundled with it.
When scanning with the version of Finereader that ships with the ScanSnap S510M and S1500M, you may get an error message like “File not created with ScanSnap”.
This is a known issue and according to this bulletin from Fujitsu Support, it will be fixed “within 2009″.
Fujitsu has assured me that they’re working on it, so hopefully we’re not talking December 31 here!
I personally do not use FineReader.. anyone have any workarounds for the Snow Leopard issue that they use? Leave a note in the comments.
Update: Thanks to reader Spike in the comments for the tip, ABBYY has released a version of FineReader Express Edition that supports Snow Leopard. More info here.
Update #2 Nov 19/09: The ABBYY FineReader for ScanSnap Snow Leopard Update is now available.
Evernote Premium Now Makes PDFs Searchable
July 28, 2009

Well, that didn’t take long. Just 13 days after my post about making PDFs searchable before uploading to Evernote, they went ahead and added that feature for Premium users yesterday.
Starting now, if you are a premium user and you upload a PDF, Evernote will OCR it on the backend and make it searchable.
If you have non-searchable PDFs already uploaded (and are premium), they are in the process of going through and OCRing them too.
Before you ask, apparently if you upload a PDF that is already searchable, they won’t touch it.
Here’s a quick video about the new feature:
I have heard the lack of this feature mentioned as a drawback of Evernote for ages, so adding it is a great move on their part.
Unfortunately I am not a Premium user so I can’t try this out (I probably should be though). Any Evernote premium users out there want to give it a try and let us know how it works?
Making Acrobat OCR’ed PDFs Smaller With Formatted Text & Graphics
July 23, 2009
One complaint that people have with the PDFs that Acrobat kicks out when doing OCR, either by doing it manually or via an Acrobat OCR Applescript, is that the files can get really big.
There are a few solutions to this, but one of them is to change the PDF Output Style.
The default that Acrobat uses is called Searchable Image. What that does is place all the OCR’ed text etc. “behind” the image, so that when you view the PDF you are looking at the original image, but you can copy and search on the text.
However, there’s another setting. If you choose the PDF Output Style of Formatted Text & Graphic, what that will do is actually convert the text image to text itself, formatted with whatever style was there before.
I did a simple test this morning and here is what I found:
- Scanned Document before OCR: 312K
- OCR with Acrobat Searchable Image: 940K
- OCR with Acrobat Formatted Text & Graphics: 60K (!)
To change Acrobat to FT&G, here is what you do:
- Go to Document -> OCR Text Recognition -> Recognize Text Using OCR…
- Click the Edit button

- In PDF Output Style, change to Formatted Text & Graphics
- Hit OK
Acrobat will now use Formatted Text & Graphics, and should keep that setting for your future scans too.
What’s The Catch?
As with anything, there is a downside. Acrobat does its best to make the text look like what was there before, but it is not perfect. Also, anything that is mis-OCR’ed will actually show up in the document.
It depends on what your objectives are. If you want to have the exact replica of what you are scanning, you’ll probably want to use Searchable Image.
However, if size is your main concern and you just want to have a fairly-faithful representation, Formatted Text & Graphics may be the way to go.
Do you have any other tricks for making PDFs smaller?
OCR Your ScanSnap PDF Before Sending It To Evernote
July 14, 2009
Update: Of course, a few days after I posted this, Evernote announced that they would make PDFs searchable for Premium users. So if you are not a Premium user, this will help. Otherwise, just upload away.
One of the most popular posts on this site is on how to use the Fujitsu ScanSnap with Evernote. It describes how to set up a profile in ScanSnap Manager to send the resulting PDF to Evernote.
There is one problem with doing it this way – Evernote does not OCR PDFs. I assume they’ll be fixing this someday, but for now, if you want your document searchable within Evernote, you need to OCR it before sending it into Evernote.
How you do this depends on which model of the ScanSnap that you have, and whether you have Windows or a Mac.
ScanSnap For Windows
If you have the ScanSnap S300, S510, or S1500, your solution is pretty simple.
What we’re going to do is set Evernote to watch a folder so that anything it finds in there it will automatically import. Then set up ScanSnap to save files to that folder.
- In Evernote, go to File -> Import -> File Import Wizard
- Hit Next and select the Source folder that you want Evernote to watch and set your notebook
- Choose “Watch folder for changes and import files automatically”
Now set up ScanSnap normally to scan to that folder you just selected, and whatever files you save into that folder will be grabbed by Evernote.
ScanSnap S510M or S1500M For Mac
For whatever reason, Evernote for the Mac does not have the Watch Folder functionality that the Windows client does (why not Evernote?!). However, thanks to the magic of Applescript, we can do the same thing.
This will work for the ScanSnap S510M or S1500M.
- Download this file – AddToEvernote.scpt and save it to /Library/Scripts/Folder Action Scripts
- Create or select a folder that you want scanned PDFs to go into. Right-click on it and select More and then Enable Folder Actions
- Right click on the folder again and select More and then Attach a Folder Action. Select the AddToEvernote script that you just saved
Now set up ScanSnap normally to scan to the folder that you just configured. When you add a PDF to it, the Applescript will go through that folder and add the files into Evernote. Handy!
ScanSnap S300M For Mac
For whatever reason (I say that a lot), the ScanSnap S300M does not come with OCR software (why not Fujitsu!?).
However, we’re in luck. Awesome DocumentSnap reader Sebastian Poll wrote this Applescript that will use Adobe Acrobat to automatically OCR the PDF and then kick it straight into Evernote.
Obviously, it requires Acrobat. If you don’t have Acrobat, you can use whatever method you currently use to OCR and then use the AddToEvernote above to import it in.
Note that Sebastian’s version was actually written with some of the code in German. I changed it to English, so if there are problems, it is probably my fault and not his.
- Download this file – OCREvernote.scpt and save it to /Library/Scripts/Folder Action Scripts
- Create or select a folder that you want scanned PDFs to go into. Right-click on it and select More and then Enable Folder Actions
- Right click on the folder again and select More and then Attach a Folder Action. Select the OCREvernote script that you just saved
Now set up ScanSnap normally to scan to the folder that you just configured. When you add a PDF to it, the Applescript will go through that folder and OCR with Acrobat and then add the files into Evernote.
Do you use the ScanSnap with Evernote? Do you have any other methods of making PDFs searchable? Or do you not bother? Leave a message in the comments.
Abbyy Finereader and Adobe Acrobat – Why Does Fujitsu Include Both?
April 20, 2009

I have received a number of questions recently about the software that is included with the Fujitsu ScanSnap. For example, why does the ScanSnap come with both Abbyy FineReader and Adobe Acrobat? Aren’t they both for doing OCR?
I suspect part of the reason that this question comes up is because of my posts about my ScanSnap workflow and my Adobe Acrobat OCR Applescript. Is all that necessary?
Let me start by saying that I personally have the ScanSnap S300M. The S300M comes neither with Abbyy FineReader not with Adobe Acrobat. If you have the S1500 or S1500M, your scanner will come with both and doing OCR is much more integrated than with the S300M, so my post-scan processing fun may not be necessary.
So What’s The Difference?
The ScanSnap comes with a special version of Abbyy FineReader called FineReader for ScanSnap. They’ve integrated that with ScanSnap Organizer, so if you are using the built-in automatic OCR’ing, that is what it is using.
If all you care about is having your PDFs searchable and don’t mind performing the OCR right after scanning, then the supplied FineReader is probably all you need.
To my mind, there are basically two main reasons why you will want to use Adobe Acrobat:
- You want to do PDF editing after the fact
- You want to batch your OCR after the fact
PDF Editing
So you have your scanned PDF. Now what? If you want to remove/rearrange pages and do a whole ton of other editing functions, Acrobat is a great tool. It is most definitely not just for making a PDF searchable.
You can see a bunch more information for Adobe Acrobat 9 (included with the ScanSnap 1500) and Acrobat 8 (included with the ScanSnap 1500M). You can see from the price that it’s a pretty good deal that this software is included with the ScanSnap.
Batch OCR
If you have a whole bunch of documents to scan in, it may be annoying to scan, sit there and wait for it to OCR, scan, OCR, scan, OCR, and so on. Some people prefer to scan all their documents to PDF in one shot, and then OCR them all at once. You can use Acrobat to do that instead of the included FineReader.
So there you have it, some of the differences between the two. What are some of the reasons you use one over the other?
Acrobat Applescript For ScanSnap OCR
September 5, 2008
This was referenced in my ScanSnap workflow series, but I thought I would provide it in its own article as well.
I have a ScanSnap S300M and Adobe Acrobat, and was getting pretty tired of sitting there OCRing the PDFs manually in Acrobat.
I came across this article by MacWorld which had a great Applescript Folder Action that would kick off Acrobat’s OCR whenever a PDF was dropped into the folder.
It worked well but I found that then I had to sit there and watch the OCR go after each document, and it seemed to have problems if I scanned another file while the OCR was still going.
I wanted a solution where I could just throw a bunch of PDFs at Acrobat and walk away.
Thanks to this thread on MacScripter, I turned the Macworld script into a droplet. Now I just go through and scan a bunch of PDFs to a folder, then drag the files onto the droplet, and go to bed. Acrobat OCRs each one one by one.
Here is the script. You can download it for free, but make sure you go to the Macworld article because it is 90% his work.
To use it:
- Download and uncompress the file and save it to your Desktop, Dock or wherever
- Drag one or more PDFs onto the icon
- Enjoy
Update: User nodis in the comments pointed out a great optimization that makes significantly smaller PDFs. Thanks!
Here is the source code:
property mytitle : "ocrIt-Acrobat" -- Modified from a script created by Macworld http://www.macworld.com/article/60229/2007/10/nov07geekfactor.html
-- I am called when the user open the script with a double click
on run
tell me
activate
display dialog "I am an AppleScript droplet." & return & return & "Please drop a bunch of PDF files onto my icon to batch OCR them." buttons {"OK"} default button 1 with title mytitle with icon note
end tell
end run
-- I am called when the user drops Finder items onto the script icon
-- Timeout of 36000 seconds to allow for OCRing really big documents
on open droppeditems
with timeout of 36000 seconds
try
repeat with droppeditem in droppeditems
set the item_info to info for droppeditem
tell application "Adobe Acrobat Professional"
activate
open droppeditem
end tell
tell application "System Events"
tell application process “Acrobat”
click the menu item “Recognize Text Using OCR…” of menu 1 of menu item “OCR Text Recognition” of the menu “Document” of menu bar 1
try
click radio button “All pages” of group 1 of group 2 of group 1 of window “Recognize Text”
end try
click button “OK” of window “Recognize Text”
end tell
end tell
tell application “Adobe Acrobat Professional”
save the front document with linearize
close the front document
end tell
end repeat
– catching unexpected errors
on error errmsg number errnum
my dsperrmsg(errmsg, errnum)
end try
end timeout
end open
-- I am displaying error messages
on dsperrmsg(errmsg, errnum)
tell me
activate
display dialog "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")" buttons {"Never mind"} default button 1 with icon stop with title mytitle
end tell
end dsperrmsg
My ScanSnap Setup And Workflow – Post Scan Processing
August 11, 2008
This is Part 2 of the My ScanSnap Setup And Workflow series. Make sure to check out Part 1 – ScanSnap Settings.
Now that we have set up ScanSnap Manager with my four profiles, here is what I do with the files.
At first I started using DevonThink Pro Office, but I found that it was a little overkill for my needs. If I had a huge amount of documents that I needed regular access to it would be perfect, but for my home needs I wanted to go with something a little more lightweight.
One main drawback (maybe the only one) is that my ScanSnap S300M did not come with any OCR software in the box. I could download a form to have ReadIris Pro 11 mailed to me, but that didn’t help me at first.
Luckily, I had Adobe Acrobat Professional already, so I decided to use that for my ScanSnap workflow. If you have the ScanSnap S1500 or S1500M, your scanner will come with Acrobat already.
Really Boring, Really Fast
Excited to start OCR’ing up a storm, I set ScanSnap Manager to output to Acrobat and away I went.
It worked quite well. I would hit the button, it would open the resulting file in Acrobat, and then I would go Document | OCR Text Recognition | Recognize Text Using OCR and follow the resulting menus.
I think this would be OK normally, but since I had a ton of things to scan from my file cabinet, this got really boring, really fast to have to sit there and manually OCR every document over and over again. I knew there had to be a better way.
Applescript To The Rescue
I am a complete AppleScript newbie, but I found this great post from Macworld where the author made an AppleScript Folder Action that would watch a certain folder, and when a document got put in it, it would kick off Acrobat (or ReadIris Pro) and OCR it automatically.
I set ScanSnap Manager to save to a folder called ToProcess and gave that folder a Folder Action to run the MacWorld script.
This worked quite well, and would possibly work OK on an ongoing basis, but again I ran into problems when doing my massive scan-a-thon – if i dropped a document in to the folder while the other Acrobat session was still OCR’ing, it would give error messages.
Droplets Are Fun
The solution I came up with was to change the script so that it became a droplet. To do this I ripped off part of the script referenced in this thread.
A droplet is just an Application that you save somewhere (I have it on my Dock). You run it by dragging a file onto its icon.
Here is the script that I cobbled together . Feel free to download and use as you please.
Final Workflow
So now, I have the following workflow:
- Scan document using the ScanSnap, ScanSnap Manager saves the file in the ToProcess folder
- When I am done my batch, I drag the PDF files onto the OCRIt icon, which kicks of Adobe Acrobat Professional and tells it to recognize the text in the document
- When that is done, process/move the files as needed
It is working quite well for you, but I guess if I wanted to avoid all this I could have just stuck with DevonThink as it has built in OCR. What is your workflow? How do you handle the Optical Character Recognition part?

