Mozy Asks.. How Much Is A Petabyte?
July 30, 2009

Did you know that datacenters worldwide use up as much energy as Sweden?
Neither did I, but it is one of the many things I learned from this awesome post by Mozy from a few weeks ago where they put together a bunch of stats to show how much data a petabyte actually is. Awesome!
How Much Is A Petabyte? – Mozy.com
Evernote Premium Now Makes PDFs Searchable
July 28, 2009

Well, that didn’t take long. Just 13 days after my post about making PDFs searchable before uploading to Evernote, they went ahead and added that feature for Premium users yesterday.
Starting now, if you are a premium user and you upload a PDF, Evernote will OCR it on the backend and make it searchable.
If you have non-searchable PDFs already uploaded (and are premium), they are in the process of going through and OCRing them too.
Before you ask, apparently if you upload a PDF that is already searchable, they won’t touch it.
Here’s a quick video about the new feature:
I have heard the lack of this feature mentioned as a drawback of Evernote for ages, so adding it is a great move on their part.
Unfortunately I am not a Premium user so I can’t try this out (I probably should be though). Any Evernote premium users out there want to give it a try and let us know how it works?
Making Acrobat OCR’ed PDFs Smaller With Formatted Text & Graphics
July 23, 2009
One complaint that people have with the PDFs that Acrobat kicks out when doing OCR, either by doing it manually or via an Acrobat OCR Applescript, is that the files can get really big.
There are a few solutions to this, but one of them is to change the PDF Output Style.
The default that Acrobat uses is called Searchable Image. What that does is place all the OCR’ed text etc. “behind” the image, so that when you view the PDF you are looking at the original image, but you can copy and search on the text.
However, there’s another setting. If you choose the PDF Output Style of Formatted Text & Graphic, what that will do is actually convert the text image to text itself, formatted with whatever style was there before.
I did a simple test this morning and here is what I found:
- Scanned Document before OCR: 312K
- OCR with Acrobat Searchable Image: 940K
- OCR with Acrobat Formatted Text & Graphics: 60K (!)
To change Acrobat to FT&G, here is what you do:
- Go to Document -> OCR Text Recognition -> Recognize Text Using OCR…
- Click the Edit button

- In PDF Output Style, change to Formatted Text & Graphics
- Hit OK
Acrobat will now use Formatted Text & Graphics, and should keep that setting for your future scans too.
What’s The Catch?
As with anything, there is a downside. Acrobat does its best to make the text look like what was there before, but it is not perfect. Also, anything that is mis-OCR’ed will actually show up in the document.
It depends on what your objectives are. If you want to have the exact replica of what you are scanning, you’ll probably want to use Searchable Image.
However, if size is your main concern and you just want to have a fairly-faithful representation, Formatted Text & Graphics may be the way to go.
Do you have any other tricks for making PDFs smaller?
Admin: Trying out Tweetboard – Let’s Chat!
July 21, 2009
You might notice over on the left of the site there is a new feature of DocumentSnap. I’m trying out a service called Tweetboard.
I’ve always wanted to introduce a way for DocSnap readers to talk/ask questions to one another (other than comments of course) but a whole forum seemed like a bit of overkill.
So, I’m introducing Tweetboard which uses Twitter to create a threaded message board.
To use it, just click on the box over on the left side of the screen:

When that opens up, you can see the message board, and you can post new questions/messages or reply to existing ones.

Of course, you can follow documentsnap on Twitter (which I am going to start using more), and of course there is my personal feed as well.
Feel free to give it a try and let me know what you think.
ScanSnap Evernote Giveaway
July 15, 2009

Yes, I do realize that this is the second ScanSnap Evernote post in a row. I just realized that I had somehow not blogged about this contest that Evernote is having in July where they are giving away 4 Fujitsu ScanSnap S300 scanners.
Here is how to enter from their blog post:
* Follow @evernote and @ScanSnapIt on Twitter
* Send a public Twitter message to @evernote containing the hashtag #evernote_scansnap
Tweet once per person per week to be entered into that week’s drawing.
The first draw has already been done on July 10, and you can view the awesomely high-tech draw below. There is still time to win the other scanners on the 17th, 24th, and 31st.
Good luck!
OCR Your ScanSnap PDF Before Sending It To Evernote
July 14, 2009
Update: Of course, a few days after I posted this, Evernote announced that they would make PDFs searchable for Premium users. So if you are not a Premium user, this will help. Otherwise, just upload away.
One of the most popular posts on this site is on how to use the Fujitsu ScanSnap with Evernote. It describes how to set up a profile in ScanSnap Manager to send the resulting PDF to Evernote.
There is one problem with doing it this way – Evernote does not OCR PDFs. I assume they’ll be fixing this someday, but for now, if you want your document searchable within Evernote, you need to OCR it before sending it into Evernote.
How you do this depends on which model of the ScanSnap that you have, and whether you have Windows or a Mac.
ScanSnap For Windows
If you have the ScanSnap S300, S510, or S1500, your solution is pretty simple.
What we’re going to do is set Evernote to watch a folder so that anything it finds in there it will automatically import. Then set up ScanSnap to save files to that folder.
- In Evernote, go to File -> Import -> File Import Wizard
- Hit Next and select the Source folder that you want Evernote to watch and set your notebook
- Choose “Watch folder for changes and import files automatically”
Now set up ScanSnap normally to scan to that folder you just selected, and whatever files you save into that folder will be grabbed by Evernote.
ScanSnap S510M or S1500M For Mac
For whatever reason, Evernote for the Mac does not have the Watch Folder functionality that the Windows client does (why not Evernote?!). However, thanks to the magic of Applescript, we can do the same thing.
This will work for the ScanSnap S510M or S1500M.
- Download this file – AddToEvernote.scpt and save it to /Library/Scripts/Folder Action Scripts
- Create or select a folder that you want scanned PDFs to go into. Right-click on it and select More and then Enable Folder Actions
- Right click on the folder again and select More and then Attach a Folder Action. Select the AddToEvernote script that you just saved
Now set up ScanSnap normally to scan to the folder that you just configured. When you add a PDF to it, the Applescript will go through that folder and add the files into Evernote. Handy!
ScanSnap S300M For Mac
For whatever reason (I say that a lot), the ScanSnap S300M does not come with OCR software (why not Fujitsu!?).
However, we’re in luck. Awesome DocumentSnap reader Sebastian Poll wrote this Applescript that will use Adobe Acrobat to automatically OCR the PDF and then kick it straight into Evernote.
Obviously, it requires Acrobat. If you don’t have Acrobat, you can use whatever method you currently use to OCR and then use the AddToEvernote above to import it in.
Note that Sebastian’s version was actually written with some of the code in German. I changed it to English, so if there are problems, it is probably my fault and not his.
- Download this file – OCREvernote.scpt and save it to /Library/Scripts/Folder Action Scripts
- Create or select a folder that you want scanned PDFs to go into. Right-click on it and select More and then Enable Folder Actions
- Right click on the folder again and select More and then Attach a Folder Action. Select the OCREvernote script that you just saved
Now set up ScanSnap normally to scan to the folder that you just configured. When you add a PDF to it, the Applescript will go through that folder and OCR with Acrobat and then add the files into Evernote.
Do you use the ScanSnap with Evernote? Do you have any other methods of making PDFs searchable? Or do you not bother? Leave a message in the comments.

