Tag Archives: searchable PDF

Evernote Premium Now Makes PDFs Searchable

evernotelogo.gif

Well, that didn’t take long. Just 13 days after my post about making PDFs searchable before uploading to Evernote, they went ahead and added that feature for Premium users yesterday.

Starting now, if you are a premium user and you upload a PDF, Evernote will OCR it on the backend and make it searchable.

If you have non-searchable PDFs already uploaded (and are premium), they are in the process of going through and OCRing them too.

Before you ask, apparently if you upload a PDF that is already searchable, they won’t touch it.

Here’s a quick video about the new feature:



I have heard the lack of this feature mentioned as a drawback of Evernote for ages, so adding it is a great move on their part.
Unfortunately I am not a Premium user so I can’t try this out (I probably should be though). Any Evernote premium users out there want to give it a try and let us know how it works?

Comments ( 3 )

OCR Your ScanSnap PDF Before Sending It To Evernote

Update: Of course, a few days after I posted this, Evernote announced that they would make PDFs searchable for Premium users. So if you are not a Premium user, this will help. Otherwise, just upload away.

One of the most popular posts on this site is on how to use the Fujitsu ScanSnap with Evernote. It describes how to set up a profile in ScanSnap Manager to send the resulting PDF to Evernote.

There is one problem with doing it this way – Evernote does not OCR PDFs. I assume they’ll be fixing this someday, but for now, if you want your document searchable within Evernote, you need to OCR it before sending it into Evernote.

How you do this depends on which model of the ScanSnap that you have, and whether you have Windows or a Mac.

ScanSnap For Windows

If you have the ScanSnap S300, S510, or S1500, your solution is pretty simple.

What we’re going to do is set Evernote to watch a folder so that anything it finds in there it will automatically import. Then set up ScanSnap to save files to that folder.

  • In Evernote, go to File -> Import -> File Import Wizard
  • Hit Next and select the Source folder that you want Evernote to watch and set your notebook
  • Choose “Watch folder for changes and import files automatically”

Now set up ScanSnap normally to scan to that folder you just selected, and whatever files you save into that folder will be grabbed by Evernote.

ScanSnap S510M or S1500M For Mac

For whatever reason, Evernote for the Mac does not have the Watch Folder functionality that the Windows client does (why not Evernote?!). However, thanks to the magic of Applescript, we can do the same thing.

This will work for the ScanSnap S510M or S1500M.

  • Download this file – AddToEvernote.scpt and save it to /Library/Scripts/Folder Action Scripts
  • Create or select a folder that you want scanned PDFs to go into. Right-click on it and select More and then Enable Folder Actions
  • Right click on the folder again and select More and then Attach a Folder Action. Select the AddToEvernote script that you just saved

Now set up ScanSnap normally to scan to the folder that you just configured. When you add a PDF to it, the Applescript will go through that folder and add the files into Evernote. Handy!

ScanSnap S300M For Mac

For whatever reason (I say that a lot), the ScanSnap S300M does not come with OCR software (why not Fujitsu!?).

However, we’re in luck. Awesome DocumentSnap reader Sebastian Poll wrote this Applescript that will use Adobe Acrobat to automatically OCR the PDF and then kick it straight into Evernote.

Obviously, it requires Acrobat. If you don’t have Acrobat, you can use whatever method you currently use to OCR and then use the AddToEvernote above to import it in.

Note that Sebastian’s version was actually written with some of the code in German. I changed it to English, so if there are problems, it is probably my fault and not his.

  • Download this file – OCREvernote.scpt and save it to /Library/Scripts/Folder Action Scripts
  • Create or select a folder that you want scanned PDFs to go into. Right-click on it and select More and then Enable Folder Actions
  • Right click on the folder again and select More and then Attach a Folder Action. Select the OCREvernote script that you just saved

Now set up ScanSnap normally to scan to the folder that you just configured. When you add a PDF to it, the Applescript will go through that folder and OCR with Acrobat and then add the files into Evernote.

Do you use the ScanSnap with Evernote? Do you have any other methods of making PDFs searchable? Or do you not bother? Leave a message in the comments.

Comments ( 25 )

Abbyy Finereader and Adobe Acrobat – Why Does Fujitsu Include Both?

finereadervsacrobat.gif

I have received a number of questions recently about the software that is included with the Fujitsu ScanSnap. For example, why does the ScanSnap come with both Abbyy FineReader and Adobe Acrobat? Aren’t they both for doing OCR?

I suspect part of the reason that this question comes up is because of my posts about my ScanSnap workflow and my Adobe Acrobat OCR Applescript. Is all that necessary?

Let me start by saying that I personally have the ScanSnap S300M. The S300M comes neither with Abbyy FineReader not with Adobe Acrobat. If you have the S1500 or S1500M, your scanner will come with both and doing OCR is much more integrated than with the S300M, so my post-scan processing fun may not be necessary.

So What’s The Difference?

The ScanSnap comes with a special version of Abbyy FineReader called FineReader for ScanSnap. They’ve integrated that with ScanSnap Organizer, so if you are using the built-in automatic OCR’ing, that is what it is using.

If all you care about is having your PDFs searchable and don’t mind performing the OCR right after scanning, then the supplied FineReader is probably all you need.

To my mind, there are basically two main reasons why you will want to use Adobe Acrobat:

  • You want to do PDF editing after the fact
  • You want to batch your OCR after the fact

PDF Editing

So you have your scanned PDF. Now what? If you want to remove/rearrange pages and do a whole ton of other editing functions, Acrobat is a great tool. It is most definitely not just for making a PDF searchable.

You can see a bunch more information for Adobe Acrobat 9 (included with the ScanSnap 1500) and Acrobat 8 (included with the ScanSnap 1500M). You can see from the price that it’s a pretty good deal that this software is included with the ScanSnap.

Batch OCR

If you have a whole bunch of documents to scan in, it may be annoying to scan, sit there and wait for it to OCR, scan, OCR, scan, OCR, and so on. Some people prefer to scan all their documents to PDF in one shot, and then OCR them all at once. You can use Acrobat to do that instead of the included FineReader.

So there you have it, some of the differences between the two. What are some of the reasons you use one over the other?

Comments ( 12 )

My ScanSnap Setup And Workflow – Post Scan Processing

Update: This post is now slightly out of date as I now use the ScanSnap S1300. You may want to sign up for my free 7 part e-Course while will more comprehensively take you through the steps to go paperless.
This is Part 2 of the My ScanSnap Setup And Workflow series. Make sure to check out Part 1ScanSnap Settings.

Now that we have set up ScanSnap Manager with my four profiles, here is what I do with the files.

At first I started using DevonThink Pro Office, but I found that it was a little overkill for my needs. If I had a huge amount of documents that I needed regular access to it would be perfect, but for my home needs I wanted to go with something a little more lightweight.

One main drawback (maybe the only one) is that my ScanSnap S300M did not come with any OCR software in the box. I could download a form to have ReadIris Pro 11 mailed to me, but that didn’t help me at first.

Luckily, I had Adobe Acrobat Professional already, so I decided to use that for my ScanSnap workflow. If you have the ScanSnap S1500 or S1500M, your scanner will come with Acrobat already.

Really Boring, Really Fast

Excited to start OCR’ing up a storm, I set ScanSnap Manager to output to Acrobat and away I went.

It worked quite well. I would hit the button, it would open the resulting file in Acrobat, and then I would go Document | OCR Text Recognition | Recognize Text Using OCR and follow the resulting menus.

I think this would be OK normally, but since I had a ton of things to scan from my file cabinet, this got really boring, really fast to have to sit there and manually OCR every document over and over again. I knew there had to be a better way.

Applescript To The Rescue

I am a complete AppleScript newbie, but I found this great post from Macworld where the author made an AppleScript Folder Action that would watch a certain folder, and when a document got put in it, it would kick off Acrobat (or ReadIris Pro) and OCR it automatically.

I set ScanSnap Manager to save to a folder called ToProcess and gave that folder a Folder Action to run the MacWorld script.

This worked quite well, and would possibly work OK on an ongoing basis, but again I ran into problems when doing my massive scan-a-thon – if i dropped a document in to the folder while the other Acrobat session was still OCR’ing, it would give error messages.

Droplets Are Fun

The solution I came up with was to change the script so that it became a droplet. To do this I ripped off part of the script referenced in this thread.

A droplet is just an Application that you save somewhere (I have it on my Dock). You run it by dragging a file onto its icon.

Here is the script that I cobbled together . Feel free to download and use as you please.

OCRit – Droplet to kick batch OCR PDFs using Adobe Acrobat

Final Workflow

So now, I have the following workflow:

  • Scan document using the ScanSnap, ScanSnap Manager saves the file in the ToProcess folder
  • When I am done my batch, I drag the PDF files onto the OCRIt icon, which kicks of Adobe Acrobat Professional and tells it to recognize the text in the document
  • When that is done, process/move the files as needed

It is working quite well for you, but I guess if I wanted to avoid all this I could have just stuck with DevonThink as it has built in OCR. What is your workflow? How do you handle the Optical Character Recognition part?

Comments ( 8 )

Document Storage: The Yahoo or Google Philosophy?

yahoovgoogle.png

Once you have your documents scanned you then of course need to put the PDFs somewhere.

There are basically two schools of thought for document storage: storing in a folder structure (the Old Yahoo model), or dumping it all in one place and letting search take over (the Google model).

Old Yahoo Model – Folders

If you have been around the Internet for a long time, you will remember that Yahoo started out as strictly a directory, where websites would get placed in a hierarchical category structure.

The folder model of document structure is sort of like that. Set up an elaborate folder structure, and when it is time to file away a document, you figure out which folder it should go in.

The advantages of this are that you don’t need any third party application, and the folder concept is something that we have used for years and everyone understands.

The downside is that you then have to figure out which folder the file goes into, and when you are looking for a document, you have to go through and figure out where you saved it.

It also takes regular processing to go through and move the files to the right place in your structure.

The Google Model – Search

Google’s advantage (among many) is that it didn’t have to rely on people putting websites in certain categories, and it didn’t rely on searchers knowing which category to find the site. Users could just type in a keyword and as long as Google indexed the site, it would show the result.

With the search model of document storage, PDFs are dumped in one or just a few folders, and then when you want to find something, you just do a keyword search to bring back documents containing that keyword.

This can be a very effective model as long as PDFs are consistently OCR’ed so that they are searchable, and you know what you are looking for.

Once you have a collection of searchable PDF files, you can use Windows Desktop Search, Google Desktop, or Spotlight on the Mac to search through the documents and find the right one.

You can also take it to the next level and use a software like Yep, Evernote, Devonthink , or OneNote to collect and store your documents and do the searching inside it.

The downside of using the search model is, as I said, you have to know what you are searching for before you search. It may be hard to remember certain keywords from the document.

Also, if you are searching for fairly generic keywords, your search may bring back a ton of results, making it a pain to wade through them.

Which Model Do You Use?

Personally, I use a hybrid.

I do have a folder structure but I try to keep things high level without too many subfolders. I then make sure that documents are searchable by OCRing them once my Fujitsu ScanSnap has done it’s job.

When I am looking for a document, I generally use the search method because that is how I am used to finding information. It’s just nice to know that the folder structure is there as a backup.

What setup do you have for saving/finding your scanned PDFs?

Comments ( 4 )

10 Tips For Achieving Paper Zen

cleandesk.jpg
Photo by unimatrixZxero

Many people dream of the mythical “paperless office”. While these tips aren’t going to take you all the way there (and I don’t think anything truly will), they will take you a long way towards making friends with paper again.

1. Switch to paper-free option when possible. When at all possible, get rid of the paper coming in in the first place. Many banks or vendors will let you switch to online statements and bills, and when possible pay your bills online via your bank’s website instead of writing a check.

2. Get a scanner with automatic document feed and duplexing. If you try to go paperless (or even just less paper) with a flatbed scanner, chances are you are eventually going to find it a pain. A scanner (like a Fujitsu ScanSnap) that lets you put in a stack of paper and automatically scans both sides in with a push of a button will make life much easier.

3. Scan/process/shred right away. If you let things pile up too much, it becomes a chore and you won’t want to do it. Try to through your document in the scanner/shredder right when you get it.

4. Have everything close at hand. Stolen from GTD, you are more likely to process everything right away and correctly if all your equipment, file folders, and other processing materials are right there at arm’s length. If you have to walk to do something, you probably won’t.

5. Get buy-in from family/colleagues. Nothing is worse than coming up with a great system to reduce paper use, but your spouse or co-worker keeps on with their hoarding and filing ways. Try to involve them in designing and implementing the new process so they have buy-in right from the start and it is “theirs” too.

6. Chose a folder/filename system that makes sense to you. Sure you know what the receipt for your new USB turntable is now, but if you see a23422add.pdf in My Documents next year will you know what it is without opening it up? Come up with a folder and naming system and stick to it.

7. Make your PDFs searchable. Similar to #6, don’t just scan things to a PDF image. Use your scanning software to make the PDF searchable. That way in the future you can find it later on just by doing a Spotlight or Google Desktop search.

8. Be careful what you scan & shred. Like this guy says, don’t get too carried away with what you scan and shred. Other people (like your girlfriend), might not be quite as impressed with your mad paperless skillz.

9. Combine the process with something else. If you don’t have the discipline to do #3 right away, try to combine your processing with something else. If you have a laptop and a portable scanner, do your scanning in a batch while watching the football game or something.

10. Automate backups. Nothing will cause you stress with a system like this like knowing you are one harddrive failure away from disaster. Put yourself in paper zen mode by knowing that all your data is safe and secure. Use a backup system and make it automated so that you don’t even need to think about it.

Do you have any other tips for achieving “paper zen”? Share in the comments.

Comments ( 1 )