A few months ago, Ernest Svenson from PDF For Lawyers wrote a very interesting piece: When should you OCR documents? A quick primer.
His premise is that you should be very selective about the documents that you make searchable by applying Optical Character Recognition.
Many people know that OCR stands for ‘optical character recognition,’ or if they don’t know that then they know that OCR is what you do to a scanned document to make it text-searchable. When you buy a new scanner like the Fujitsu ScanSnap it’ll come with OCR software, and most people get the idea that they should OCR all documents that they scan. I don’t recommend this, and don’t know many “paperless experts” who recommend it.
He makes an interesting case that performing OCR takes too long, takes more space, and most of the time we don’t need to search for the stuff that we scan anyways, so while waste the time?
I have to admit, I take a different approach. I do tend to OCR almost everything. I have a few reasons why:
- I tend to “pre-organize” my documents so that I can scan like types together. This way, I don’t usually have the problem of having to wait for the OCR to finish before scanning the next document. I just put the stack in and hit go.
- I don’t personally find that it takes all that long to OCR.
- Storage is getting cheaper and cheaper, and if my PDFs are a little larger, that is not something I personally worry about too much.
- I find that you never know what you will need to find until you need to find it. I prefer to err on the side of making documents more findable and not less.
- The biggest reason is: I don’t like having to make decisions about this sort of stuff. Every time you need to make a decision about doing your scanning is one more opportunity for things to fall off the rails. I prefer to “set it and forget it”.
As Ernest points out, some of this can be mitigated by doing batch OCR in ABBYY FineReader or in Acrobat.
I am not a lawyer, so it could be that the paper volume and time associated with a legal office makes selective OCR more important. Having OCR mostly on works well for me, but each situation is of course different.
How about you, do you pick and choose what you OCR?
(Photo by orangeacid)