Sometimes you just want a way to get text out of an image, whether it is a PDF you’ve downloaded, a document you want to scan, or even one from your digital camera or smartphone.
There are a number of ways to do this, but one free way on Windows is to use TopOCR.
As their site says, TopOCR was developed by a team of university mathematicians and it was primarily designed to handle documents captured by a digital camera or smartphone.
Input
For input, it will take any image file (JPEG, TIFF, GIF and BMP).
If you have a scanner that supports TWAIN, it will interface with your scanner and scan right into TopOCR.
Since TopOCR will only take image files, if you have an existing PDF that you want to OCR, you have to do some backflips.
Performing OCR On An Existing PDF In TopOCR
You’re going to have to do some copy & paste action.
First in Acrobat Reader, got to Edit > Preferences, click on General, and set “Use fixed resolution for Snapshot took images” to 300 pixels/inch. Hit OK.
Now load up the PDF you want to OCR, and go Tools > Select & Zoom > Snapshot Tool.
Highlight the part of the PDF you want to OCR, and the Snapshot Tool will put it to the clipboard.
Now to go TopOCR and paste. In the Image Window, you’ll see your PDF. In the Text Window it will automatically (and quickly!) OCR the text.
All this c&p stuff is only needed if you have an existing PDF that you want to OCR. If you have an image or a document in your TWAIN scanner, it’s not necessary.
Exporting The Text
Once you have the text the way you like it, you can save it as TXT or as a searchable PDF.
There is also a pretty unique feature. You can convert the text to speech and save it as an MP3.
There is a lot more to TopOCR, especially around the settings to clean up images from a camera. If you have a need for something like that, it is worth checking out, especially for the price (free!).
Do you have another free OCR solution that you use? Leave a note in the comments.