Google Drive OCR For Searchable PDFs

Google DriveYou almost certainly have heard that Google has finally released their long-rumored file storage service, Google Drive.

The service does a lot of Dropbox-y things, but I thought I would focus on one of the advertised features that is relevant to going paperless: the ability to OCR uploaded documents.

Note that this is not a comprehensive runthrough of Google Drive and I am not going to talk about privacy issues. As with any cloud service, you should be familiar with the company’s policies and decide for yourself whether you are comfortable with them having your data on their servers.

Setting Up Sync

Once you have enabled your Google Drive account, you then have the opportunity to download the Mac or Windows client. While installing, it will create a “Special Folder” on your computer.

Google Drive Special Folder

Google Drive Special Folder

If you are an existing Google Docs user, you may want to click the Advanced setup button on the next screen. There you can configure which of your existing Google Docs documents will get synchronized to your computer automatically.

Google Drive Advanced Setup

Google Drive Advanced Setup

Accessing Your Drive

Once the Google Drive application is installed and running, you will then have a special Google Drive folder. Some files may have been automatically copied to it.

If you have “non-Google Docs” files such as PDFs saved in Google Docs already, they will have downloaded. Depending on your settings, it may have also downloaded some .gdoc files as well. When you double-click them, you’ll be taken to your Google Drive on the web.

Uploading Documents To Google Drive

Uploading documents to Google Drive is as easy as copying them to the Google Drive folder on your computer. They’ll automatically be uploaded.

While uploading, they’ll have some blue arrows. For some reason when I dragged in two files at once, they were stacked on top of each other and I couldn’t separate them.

Google Drive Stacked Docs

Google Drive Stacked Docs

Once they were uploaded, they had green checkmarks and I was good to go.

Google Drive Synced

Google Drive Synced

Google Drive OCR

The PDFs that I copied to Google Drive were not searchable, but when I logged in to the web interface and typed a phrase from the PDF, it found it right away.

Google Drive Found It

Google Drive Found It

When I clicked on the document, I could see that it is now searchable and could find and highlight text.

Google Drive Document Searchable

Google Drive Document Searchable

Unfortunately, when I downloaded the PDF, it was the original non-searchable PDF. The text was not embedded. Oh well, can’t win them all.

Google sets limitations on what type of PDFs can be OCRed. You can read about them here.

OCR Quality

To test OCR quality, I did the same test as in my OCR Smackdown post. Here are the results:

The spreadsheet has become the virtual “slide rule” for CMAS. It’s used for everything from preliminary strategic plans to ñnancial statements. As with any familiar method, it finds its way into numerous situations where better alternatives are available, rnost significantly in its widespread use as a de facto reporting tool.
The appeal of the spreadsheet as the quickest Way to get a report out is not hard to appreciate. “Excel is probably the most comfortable environment for a lot of iìnancial professionals,” Alok Ajmera, vice­president, professional services with Mississauga, Ont.-based Prophix Software, says. “There’s a very little learning curve, you can effectively do whatever you want with the data, and it works fairly well in smaller organizations.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

Already Searchable Documents

This is a weird one. You would think that if you uploaded a document that was already searchable, Google Drive’s search would use that text. Unfortunately, no.

I am not sure if this is a bug or by design, but people are complaining that Google Drive can’t find text in documents that were OCRed prior to uploading. I was able to replicate the behavior.

I really hope they fix this, because it makes things much less useful.

Wrap-Up

Google Drive’s OCR capabilities work more or less as advertised. Given some of the limitations, I am not sure that I would want to rely on the service for all my OCR or synchronization needs, but if you are already a heavy user of Google’s services and don’t already have a scanner with OCR capabilities, it is certainly better than nothing for the price.

Any Google Drive users out there? How do you like it for document storage?

No comments yet.

Leave a Reply