Document Storage: The Yahoo or Google Philosophy?

Document Storage: The Yahoo or Google Philosophy?


Once you have your documents scanned you then of course need to put the PDFs somewhere.

There are basically two schools of thought for document storage: storing in a folder structure (the Old Yahoo model), or dumping it all in one place and letting search take over (the Google model).

Old Yahoo Model – Folders

If you have been around the Internet for a long time, you will remember that Yahoo started out as strictly a directory, where websites would get placed in a hierarchical category structure.

The folder model of document structure is sort of like that. Set up an elaborate folder structure, and when it is time to file away a document, you figure out which folder it should go in.

The advantages of this are that you don’t need any third party application, and the folder concept is something that we have used for years and everyone understands.

The downside is that you then have to figure out which folder the file goes into, and when you are looking for a document, you have to go through and figure out where you saved it.

It also takes regular processing to go through and move the files to the right place in your structure.

The Google Model – Search

Google’s advantage (among many) is that it didn’t have to rely on people putting websites in certain categories, and it didn’t rely on searchers knowing which category to find the site. Users could just type in a keyword and as long as Google indexed the site, it would show the result.

With the search model of document storage, PDFs are dumped in one or just a few folders, and then when you want to find something, you just do a keyword search to bring back documents containing that keyword.

This can be a very effective model as long as PDFs are consistently OCR’ed so that they are searchable, and you know what you are looking for.

Once you have a collection of searchable PDF files, you can use Windows Desktop Search, Google Desktop, or Spotlight on the Mac to search through the documents and find the right one.

You can also take it to the next level and use a software like Yep, Evernote, Devonthink , or OneNote to collect and store your documents and do the searching inside it.

The downside of using the search model is, as I said, you have to know what you are searching for before you search. It may be hard to remember certain keywords from the document.

Also, if you are searching for fairly generic keywords, your search may bring back a ton of results, making it a pain to wade through them.

Which Model Do You Use?

Personally, I use a hybrid.

I do have a folder structure but I try to keep things high level without too many subfolders. I then make sure that documents are searchable by OCRing them once my Fujitsu ScanSnap has done it’s job.

When I am looking for a document, I generally use the search method because that is how I am used to finding information. It’s just nice to know that the folder structure is there as a backup.

What setup do you have for saving/finding your scanned PDFs?

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply 5 comments

purificadoras de agua - November 27, 2013 Reply

This particular is definitely an great site you’ve going here. The difficulty is extremely beneficial along with immediately to the level. Thrilled to read simple things more details on your blog the next occasion.

Dallee - April 15, 2009 Reply

There is a great post on tagging files in Windows — the comments also give a Mac equivalent — here:

This is the direct link for the Windows program:

Tagging would go along with the Google approach …

I'm starting with using a file structure just for the sake of convenience and not having to learn anything new in addition to mastering scanning. And I've really appreciated this website for making getting underway much easier!

    Brooks Duncan - April 16, 2009 Reply

    Hey thanks a lot for that tip Dallee, I will play around with that program. Good pointer!

Brooks Duncan - December 1, 2008 Reply

Hey Brian,

I personally use a copy of Acrobat that I already had. I have ReadIris Pro 11 too, but I don’t use that because you can’t use Applescript with it (that i know of).

There is more about how I acutally do things here:

Brian Dusablon - December 1, 2008 Reply

What do you use for OCR?

Leave a Reply: