Using Microsoft Office Document Imaging To OCR For Free

If you are a Windows user and already have Microsoft Office XP through 2007, chances are you already have the ability to OCR documents to get the text out of them.

It’s called Microsoft Office Document Imaging (MODI). I’m not going to lie, what I am about to show you is not exactly the best way to OCR documents. If you have software that came with your scanner, I’d stick to that.

However, if you don’t already have OCR software and all you want to do is get some text out of an image, the software you already have is better than nothing at all.

Finding Microsoft Office Document Imaging

First, you want to check to see if you already have it installed. In Office 2007, go to Start > Programs > Microsoft Office > Microsoft Office Tools, and you should see Microsoft Office Document Imaging.

If you don’t see it there, never fear. It’s an optional part of the Office install. In Control Panel, go to Add/Remove Programs, select Microsoft Office, click Change, and then select add features. You will find MODI under Microsoft Office Tools. Install it and you should be good to go.

Ah Microsoft, I Love You

It probably won’t surprise you to learn that Microsoft Office Document Imaging will not import PDFs (why would they support an Adobe product?!). It will only import TIFFs and Microsoft’s own Microsoft Document Imaging format (.MDI).

In this example, I’m going to assume that we want to get the text out of a PDF that has not been OCR’ed already. Sure you could use MODI to scan a document in, but I figure if you have the hardcopy document and a scanner, you’d probably just use the scanner’s software anyways.

Copying A PDF In

Since we can’t actually import a PDF, we’re going to do some copy & paste magic.

Open up your PDF in Acrobat Reader or whatever PDF reader you are using and either Select All or Select just the portion you want to OCR. Then hit Copy.

Select Info In PDF

(By the way, that’s my picture of a Fung Wah bus that made it into New York Magazine. Aren’t you proud of me?).

Then switch to MODI, and you would think you would go Edit > Paste right? Of course not! This is Microsoft!

Instead go to Page and then Paste Page. Voila, the image you just copied is now in Microsoft Office Document Imaging.

Saving The Text

So now that you have the image in MODI, what do you do with it? To OCR the text, go Tools and then Recognize Text Using OCR.

You can then save it as a TIF (though I understand that only MODI can read that TIF), or MDI. Since that is more than a little useless, I’m going to cover sending the text to Word.

Send Text To Word

To send the text (and graphics, if you’d like) go up to Tools and then Send Text to Word. The OCR’ed text will then appear in a Word document with all the images at the bottom, if you checked the “Maintain Pictures in Output” box.

So, again, this is not the greatest OCR process in the whole world, but hey. If you’re a Windows user you probably already have Office, so it’s good to know what is available if you ever need it.

Photo: Naufragio

Tags:

8 Responses to “Using Microsoft Office Document Imaging To OCR For Free”

  1. zamir January 24, 2012 at 11:36 am #

    Good article. To see how to implement MS-Office programmatically check: http://zamirsblog.blogspot.com/2010/12/ocr-using-

    • BrooksD January 24, 2012 at 12:05 pm #

      Very nice zamir. Thanks!

  2. Sergio May 21, 2012 at 7:58 am #

    Thanks for this article. Great tip (To use Page>Page Paste instead of Edit>Paste).

    • BrooksD May 21, 2012 at 8:12 am #

      Great Sergio, glad it helped.

  3. Bigg Frank August 3, 2012 at 11:13 am #

    Great article, it's so nice to come to a site that explains things simply and fully Thanks.

    • BrooksD August 3, 2012 at 11:20 am #

      Thanks Frank!

  4. shre October 18, 2012 at 3:33 am #

    thanks a lot!

Trackbacks/Pingbacks

  1. Using Microsoft Office Document Imaging To OCR For Free « TrackBug - August 28, 2012

    [...] Using Microsoft Office Document Imaging To OCR For Free [...]

Leave a Reply