My ScanSnap Setup And Workflow – Post Scan Processing
August 11, 2008
This is Part 2 of the My ScanSnap Setup And Workflow series. Make sure to check out Part 1 – ScanSnap Settings.
Now that we have set up ScanSnap Manager with my four profiles, here is what I do with the files.
At first I started using DevonThink Pro Office, but I found that it was a little overkill for my needs. If I had a huge amount of documents that I needed regular access to it would be perfect, but for my home needs I wanted to go with something a little more lightweight.
One main drawback (maybe the only one) is that my ScanSnap S300M did not come with any OCR software in the box. I could download a form to have ReadIris Pro 11 mailed to me, but that didn’t help me at first.
Luckily, I had Adobe Acrobat Professional already, so I decided to use that for my ScanSnap workflow. If you have the ScanSnap S1500 or S1500M, your scanner will come with Acrobat already.
Really Boring, Really Fast
Excited to start OCR’ing up a storm, I set ScanSnap Manager to output to Acrobat and away I went.
It worked quite well. I would hit the button, it would open the resulting file in Acrobat, and then I would go Document | OCR Text Recognition | Recognize Text Using OCR and follow the resulting menus.
I think this would be OK normally, but since I had a ton of things to scan from my file cabinet, this got really boring, really fast to have to sit there and manually OCR every document over and over again. I knew there had to be a better way.
Applescript To The Rescue
I am a complete AppleScript newbie, but I found this great post from Macworld where the author made an AppleScript Folder Action that would watch a certain folder, and when a document got put in it, it would kick off Acrobat (or ReadIris Pro) and OCR it automatically.
I set ScanSnap Manager to save to a folder called ToProcess and gave that folder a Folder Action to run the MacWorld script.
This worked quite well, and would possibly work OK on an ongoing basis, but again I ran into problems when doing my massive scan-a-thon – if i dropped a document in to the folder while the other Acrobat session was still OCR’ing, it would give error messages.
Droplets Are Fun
The solution I came up with was to change the script so that it became a droplet. To do this I ripped off part of the script referenced in this thread.
A droplet is just an Application that you save somewhere (I have it on my Dock). You run it by dragging a file onto its icon.
Here is the script that I cobbled together . Feel free to download and use as you please.
Final Workflow
So now, I have the following workflow:
- Scan document using the ScanSnap, ScanSnap Manager saves the file in the ToProcess folder
- When I am done my batch, I drag the PDF files onto the OCRIt icon, which kicks of Adobe Acrobat Professional and tells it to recognize the text in the document
- When that is done, process/move the files as needed
It is working quite well for you, but I guess if I wanted to avoid all this I could have just stuck with DevonThink as it has built in OCR. What is your workflow? How do you handle the Optical Character Recognition part?
Document Storage: The Yahoo or Google Philosophy?
July 31, 2008

Once you have your documents scanned you then of course need to put the PDFs somewhere.
There are basically two schools of thought for document storage: storing in a folder structure (the Old Yahoo model), or dumping it all in one place and letting search take over (the Google model).
Old Yahoo Model – Folders
If you have been around the Internet for a long time, you will remember that Yahoo started out as strictly a directory, where websites would get placed in a hierarchical category structure.
The folder model of document structure is sort of like that. Set up an elaborate folder structure, and when it is time to file away a document, you figure out which folder it should go in.
The advantages of this are that you don’t need any third party application, and the folder concept is something that we have used for years and everyone understands.
The downside is that you then have to figure out which folder the file goes into, and when you are looking for a document, you have to go through and figure out where you saved it.
It also takes regular processing to go through and move the files to the right place in your structure.
The Google Model – Search
Google’s advantage (among many) is that it didn’t have to rely on people putting websites in certain categories, and it didn’t rely on searchers knowing which category to find the site. Users could just type in a keyword and as long as Google indexed the site, it would show the result.
With the search model of document storage, PDFs are dumped in one or just a few folders, and then when you want to find something, you just do a keyword search to bring back documents containing that keyword.
This can be a very effective model as long as PDFs are consistently OCR’ed so that they are searchable, and you know what you are looking for.
Once you have a collection of searchable PDF files, you can use Windows Desktop Search, Google Desktop, or Spotlight on the Mac to search through the documents and find the right one.
You can also take it to the next level and use a software like Yep, Evernote, Devonthink , or OneNote to collect and store your documents and do the searching inside it.
The downside of using the search model is, as I said, you have to know what you are searching for before you search. It may be hard to remember certain keywords from the document.
Also, if you are searching for fairly generic keywords, your search may bring back a ton of results, making it a pain to wade through them.
Which Model Do You Use?
Personally, I use a hybrid.
I do have a folder structure but I try to keep things high level without too many subfolders. I then make sure that documents are searchable by OCRing them once my Fujitsu ScanSnap has done it’s job.
When I am looking for a document, I generally use the search method because that is how I am used to finding information. It’s just nice to know that the folder structure is there as a backup.
What setup do you have for saving/finding your scanned PDFs?

