Archive › Processing

Easily Split PDF Pages Using OSX Preview

Split PDF Axe

Back in 2010, I wrote a post on how to split a PDF on Mac OSX. I suggested two ways: dragging pages out of the PDF in Preview, or use Automator.

Eddie over at Practically Efficient has an even easier way that I can’t believe I didn’t think of.

First, you want to make sure that the Sidebar is exposed so that you can see all of the pages of your PDF.

Preview Sidebar

Then in the Sidebar, select the pages that you want to extract.

Preview Pages Selected

Hit ⌘-C or go to Edit > Copy to copy them to the clipboard.

Finally, hit ⌘-N or go to File > New From Clipboard.

Viola, you’ll have a new PDF document just with the pages that you had selected. Great tip!

(Photo by Adam Baker)

Comments ( 6 )

GTD And Going Paperless – Filing Documents

GTD

This is a guest post from Alex Satrapa, a software engineer and Paperless Document Organization Guide interviewee.

Through DocumentSnap you’ve been introduced to the Paperless Workflow, which relates to organising documents. You may also have been introduced to the Getting Things Done system, which relates to taking action. The two can be used very effectively together: the Paperless step “Move To Correct Folder” is where the entire GTD workflow happens, with the various folders external to the GTD workflow mapping into the Document Storage block of the paperless workflow chart.

Let’s work through this process, the way it works in real life for one Paperless practitioner. This practitioner (“you”) has set up an automated workflow using Yep and Hazel. More about the details later.

Today, you receive a paper invoice from ACME Corp for your new rocket jetpack (because catching the roadrunner requires the proper tools). You work through the Paperless process “Scan with Scanner” step – place the document in the feeder for the Scansnap, press the button, when the document comes out you feed it through the shredder. The next couple of steps happen automatically: “Scan To Inbox Folder” & “Apply OCR”.

Now that we’re at the Paperless “Rename File” step, we also apply the GTD “What Is It?” step. On the computer you open up Yep, which is configured to show you all the documents in the Inbox folder. In Yep, you rename the invoice to “ACME – Invoice – 2011-04-01″ because it’s from ACME, it’s an invoice, and the invoice is due by April 1st. You then tag it as “invoice”, “acme”, and “roadrunner” (because it’s related to the “roadrunner” project), add “Invoice for rocket jetpack” to the Spotlight comments field, and Hazel moves the document to the Pending folder (which is the “Move To Correct Folder” step).

You process the rest of your mail in a similar manner. All invoices have been deemed actionable, but deferred.

Aside: You defer invoices simply because you’ve made the decision to pay invoices in batches, since (for example) access to the payments interface of the Internet banking service requires an SMS authentication, which is only valid for ten minutes.

Having processed all your mail, the Inbox is now empty. With an empty Inbox, you can take a short break and marvel at the nice empty space you have on your desk.

Now it’s time to get to the “Pre-defined Work” part of the GTD flow. So you open up the Pending folder in Yep. There are a bunch of invoices that have been moved here by Hazel – filter them by selecting the “invoice” tag in Yep. Next, you open your Internet banking system and work through the process of paying each invoice. Let’s go through the process with the rocket jetpack invoice.

You open the rocket jetpack invoice. You’re using a Mac, so the document is opened in Preview.

You pay the invoice using the bank’s “BPay” system, and the bank gives you back a confirmation including a receipt number of 12345. Now you annotate the PDF with (for example) “Paid BPay 2011-03-20 #12345″. The important part here is to include the date, the payment method and the receipt number in case you have to look that information up later.

Now you save the changes to that PDF and close it. Since we’re using Yep, you can just add the “paid” tag, and Hazel will file that document away in the “Financial Year 2011″ folder.

Here’s what we have done, using the GTD and Paperless workflow terms:

  • GTD: What is it? It’s an invoice from ACME corp for a rocket jetpack, with 30 day terms, due April 1 2011.
  • Paperless: Rename File “ACME – Invoice – 2011-04-01″
  • GTD: Is it actionable? Yes, we need to pay it Soon™
  • GTD: What is the next action? Pay the invoice in the next batch of bill payments.
  • GTD: Defer It: we batch invoice payments together since the banking system requires authentication using an SMS code that is valid for a few minutes, or is otherwise a hassle to open every time we need to pay an invoice
  • Paperless: Move to the correct folder, “Pending”, which is part of our “Document Storage” box on the workflow
  • GTD: Processing is complete, work through “Pre-defined Work” from left to right
  • GTD: Pay the invoice and refile under the Reference system, which is still part of the Paperless “Document Storage”.

Yep and Hazel

Yep is a neat little app from Ironic Software. It’s basically a nifty tool intended to replace the Finder, for the purpose of finding things. Yep supports the OpenMeta tagging standard, which allows you to “tag” a document without playing silly tricks with the Spotlight comments field.

Hazel comes from Noodlesoft. It’s a very basic rules-based robot that monitors specified folders, and can perform actions on files when certain conditions are met. Hazel supports the OpenMeta standard.

The rules that were in place for this workflow are:

  • “Inbox” folder
  • “Unpaid Invoices”: if a file is tagged with “invoice”, move it to the “Pending” folder
  • “Pending” folder
  • “Paid Invoices”: if a file is tagged with “invoice” and “paid”, move it to “FY 2011″ folder

Other Resources

Some other helpful resources:

  • Free GTD Chart From DavidCo
  • GTD Cheatsheet From Livedev
    Alex Satrapa is a software engineer who missed all the classes at elementary school about personal organisation and is desperately trying to catch up. He has a blog on an eclectic range of issues on Livejournal, ranging from iPad fanboyism through Perl programming tips to commentary on IT commentators.
Comments ( 3 )

Papers Of The US War Department

Back in 1799, George Washington wrote a letter complaining that he was short-changed on his expense claim. In fact, “the expenses he incurred during his recent trip to Philadelphia were more than the pay and emoluments that he received from the government and even exceed another month’s pay and emoluments.”

How do I know this? because of an interesting scanning project (and you know how I like interesting scanning projects) called the Papers of the War Department 1784-1800.

Back in November 1800, a fire ripped through the War Office and it was thought that all the papers in it were destroyed. It turns out that thankfully they were saved, and now there is a massive project to restore them:

Papers of the War Department 1784-1800 will present this collection of more than 55,000 documents in a free, online format with extensive and searchable metadata linked to digitized images of each document, thereby insuring free access for a wide range of users.

What makes this project different is that it is crowdsourced. PWD is using an open source tool called Scripto to open up the transcription to pretty much anyone (even you!). You can learn more about becoming a transcription associate here, and you can then browse the documents or search the archive and transcribe away.

The editors of the PWD project have also selected a list of transcription candidates if you want an idea where to get started.

So if you are interested in helping out with a great project, or want to transcribe a letter about Canadian refugees, they can use your help.

(Photo by Phil Roeder)

Comments ( 0 )

ScanDrop Updates Cloud Scanning For Windows And Mac

A year ago, I wrote a post about how to scan to Google Docs with ScanDrop, a utility released by the folks at OfficeDrop.

There have been a number of changes in that year, so I thought it would be a good time to check out ScanDrop again.

At the time, my primary gripes were:

  1. It was Windows only
  2. You had to scan to JPG (not really a gripe, but I found it odd)

Both of them have been somewhat addressed, so lets take a closer look.

What Is ScanDrop?

ScanDrop is an application that you install that lets you scan to the cloud. What the heck does that mean exactly?

You scan a document with your scanner, and this application will then take it and upload it to Google Docs, OfficeDrop, Evernote, or you can save it to your hard drive or Dropbox.

Even though services like Google Docs and Evernote are (sort of) competitors to OfficeDrop, it is clever of the company to release a tool that interacts with all of them. As a marketing exercise, better to get the program in as many peoples’ hands as possible and hopefully some of them will convert to OfficeDrop users.

ScanDrop for Windows

ScanDrop started out as a Windows application. You can download it here for free. When you start it up, you are presented with a choice of services to upload your document to.

ScanDrop Windows Select

In the Windows version of ScanDrop you can scan to Evernote, Google Docs, OfficeDrop, or your local disk.

If you are using a TWAIN-compliant scanner (ie, not a ScanSnap), you can hit the scan button in the application, choose your scanner, and then it will scan in your document. Here you can see a scan I have just done in preparation for uploading to Google Docs:

ScanDrop Windows

If you have a ScanSnap, they provide an application that should create the appropriate ScanSnap settings for you. It didn’t work for me, but it may be because I am running a ScanSnap S1100 which is not one of the ScanSnaps they list support for.

Not a problem, I just created a ScanSnap Manager profile manually and pointed it to the ScanDrop application on the Applications tab.

Once you have scanned in your document, you can manipulate the pages, crop them, rotate them, and if you have changed your mind, change the cloud storage you want to upload to.

When you’re ready, hit the Upload button and it will be sent. Here is an example of it being uploaded to Evernote. I set the notebook and tags from within ScanDrop.

ScanDrop Windows Evernote

ScanDrop for Mac

The folks at OfficeDrop have released a version of ScanDrop for the Mac. Unlike its Windows cousin, it is not free. You can get it via the Mac App Store for $1.99 at the time of this writing. Not a bad price if uploading to the cloud is something you will be doing a lot and your scanner’s software does not support it.

Like the Windows version, when you first start up ScanDrop you are asked to choose a cloud location to upload to.

ScanDrop Mac Selection

In addition to Evernote, Google Docs, OfficeDrop, and local disk, the Mac version adds an option to scan to Dropbox. From what I can tell, this is exactly like the can to local disk option but just pre-selects your Dropbox folder to scan to.

If you have a TWAIN scanner and hit the scan button, it will bring up an integrated version of Image Capture (as far as I can tell) that Mac users may be familiar with.

ScanDrop Mac Image Capture

From there you can select your scanner, set your scan options, and then scan.

If you have a ScanSnap, you set up a profile in ScanSnap Manager to scan to the ScanDrop application. You can see instructions to do that here. One note about these instructions. In Step 8 they say you need to choose the JPG option on the File Option tab. I was able to select PDF and it worked for me.

Once you’re done, it brings it into ScanDrop where you can manipulate the image and upload it to your cloud provider. You can see here I am preparing to upload to Evernote, and have set my notebook and can set tags.

ScanDrop Mac Ready To Upload

NOTE: At the time of writing this, ScanDrop for Mac version 1.03 has a bug which prevents uploading to Evernote from working. They do list a workaround, but it didn’t work for me. They say they are working on a fix so as soon as that gets through the Mac App Store hopefully uploading to Evernote will work again.

Once you’ve uploaded, the document will appear as a PDF in the cloud service. Here’s a screenshot of it in Google Docs:

ScanDrop Google Doc scan

Here is a sample of a ScanDrop-scanned PDF in Google Docs:

ScanDrop Google Doc PDF

What About OCR?

If you are wanting to have your PDF searchable, it is worth noting that ScanDrop itself does not do OCR. It relies on the cloud service to do it. The only exception to this is when you use a ScanSnap to scan to PDF before uploading it. In that case, you can use the ScanSnap’s integrated OCR to OCR the PDF first.

In all other cases, if you are an Evernote Premium or OfficeDrop subscriber, you can upload it to the cloud and they will OCR the PDF for you. If you use Google Docs, you can use their OCR, but it isn’t a very good solution.

If you are using ScanDrop to scan to your local disk, you will need to use some OCR tool afterwards to make the PDF searchable.

Do you scan documents to the cloud? How do you do it? Leave a note in the comments and let us know.

Comments ( 3 )

Hazel For Downloaded Files Part 2: Applying OpenMeta Tags

A month ago, I posted about how to use Hazel to process downloaded PDF files. It was getting pretty technical, so I decided to end it before taking it one step further: how to use Hazel to apply OpenMeta tags to the files.

Both in the comments and in my email, there have been a lot of requests to cover the tagging piece, so that is what I am going to do by popular demand. Keep in mind that this is Mac only and is a bit on the technical side, so if you aren’t comfortable using the command line (or have no desire to use Hazel to tag your files), you’ll want to sit this one out.

First, Noodlesoft calls Hazel’s support of OpenMeta tags unofficial and experimental, so keep that in mind. If you still want to go ahead:

Install the OpenMeta Command Line Tool

  • Go to the OpenMeta Project Downloads page and download the latest version of the openmeta_commandline zip. to your hard drive.
  • Double-click on the zip file and it will extract a file just called openmeta. Move that file to a location that you remember. In my example, I am going to put it in /Applications

Tell Hazel To Enable Super-Secret OpenMeta Support

  • Go into Terminal. The easiest way to do that is to open Spotlight by going Command-Spacebar and typing Terminal
  • Type the following:

    defaults write com.noodlesoft.Hazel OMToolPath /Applications/openmeta

Obviously, you will want to replace the ending path with wherever you stored your openmeta program that you just unzipped.

Here is a screenshot of me having ran the command:

Terminal window

Apply Your Tags

Now, lets revisit the moving-PDF rule from the last Hazel PDF post:

  • In the “Do the following:” section, click the + sign and then the drop-down list. You will now see a new option that wasn’t there before: “Add OpenMeta Tags”

Hazel Add OpenMeta Tags

  • Choose that, and you can now add your comma-separated list of tags. If you want, you can have it replace all the existing tags with these new ones. Here is my final rule:

Final Hazel Rule

  • Now when I look at the file with TagIt, you can see that it has the two tags that I assigned with Hazel:

Tags

Again, a bit on the technical side, but if you are someone that uses tags in your workflow, this may help automate things.

Comments ( 2 )

FileCenter Paperless Office Software For Windows

As I have mentioned earlier, the question of “what do I do with my files once I have scanned them?” comes up quite a bit. If you use Windows, it can be hard to decide which paperless office software package to use (or if you even need one).

In a strange role reversal, a client of mine came across and recommended FileCenter by Lucion to me. Since playing around with it, I have passed on that recommendation to a number of Windows users that have written me (and they all love it), but somehow I have never actually written about FileCenter on the blog. So, here we go.

FileCenter is a software package that sits firmly in the middle of the options out there. It is easy enough to use for home use, but it has enough power to be used by businesses. It is not a heavy-duty enterprise Document Management System, but it doesn’t cost thousands of dollars either.

How Are Files Represented?

When you use FileCenter, you have a choice between showing your files as a normal Windows-like folder structure, or having them represented as “Cabinets”, “Drawers”, and “Folders”.

Personally I recommend going with the cabinets view but do whatever works for you.

File Storage

One of the best features of FileCenter, in my opinion, is that it uses the normal Windows file structure to store its documents. It does not move them into some proprietary database.

For example, here is a screenshot from FileCenter showing a cabinet, drawer, and some files:

FileCenter cabinets

Now here is the screenshot from Windows:

FileCenter windows folders

Cabinets, drawers, and folders are just represented as normal Windows folders, so it is very easy to get at your documents if you ever need to.

(By the way, they don’t all have to be in the same folder. A cabinet can be on a network drive or some other location).

File Naming Rules

If you have a bunch of regularly recurring documents that have a standard name (for example, bills), you can create file naming rules that will automatically name the document when you file it.

As an example, I have a drawer called “Terasen Gas” to represent a gas bill. I set up a naming rule so that when I file something in that folder, it automatically names it to today’s date with the name of the drawer.

Here is the rule:

FileCenter naming rule

Now when I want to drag a file from my Inbox to a folder, I choose my rule as the “drop name” and then drag it

FileCenter drop name

Now you can see the file is automatically renamed.

FileCenter renamed

Folder Templates

If you work with clients, projects, or have some situation where you often have a folder structure with a set of subfolders, you can set up folder templates to automatically create these for you.

For example, on this client folder, I will choose Apply Folder Templates and choose one I set up called Client.

FileCenter folder template

Now it has automatically created the client folder structure under ABC Corp:

FileCenter new folders

You can probably see how useful this could be when you have a bunch of date-based folders.

“But Wait, There’s More!”

These are only a few of the features that FileCenter has, but I think you get the idea. It does OCR, lets you split and edit documents, converts PDFs, encrypts and securely deletes documents, and a bunch of other stuff.

I recommend checking out the features page for a list of all of them. They have little videos for each feature.

Versions

FileCenter comes in three versions: Standard, Pro, and Pro PLUS. You can compare the versions here.

It is not the cheapest software out there, but it is not the most expensive either. If it were me, I would probably go for the Pro version as it has the drop renaming and other features that Standard doesn’t have.

If you have a ScanSnap, you can probably get away without Pro PLUS as the scanner’s software can take care of most of the extra features (page rotation, advanced OCR).

Any FileCenter users out there? Leave a comment and let us know what you like and don’t like about it.

Comments ( 3 )

Use Hazel To Magically Process Downloaded PDFs

You know how sometimes you come across a tip that is so clever you are both happy to have learned it and annoyed that you didn’t think of it yourself at the same time?

That’s how I felt sitting in Katie Floyd’s Going Paperless presentation at Macworld 2011 (more on my Macworld roundup here).

It is well known that I love a little Mac app called Hazel that allows you to create rules to perform almost any function on your computer. One great use of Hazel is to move all the PDFs that you are downloading from the web now that you are going paperless.

Katie’s tip is to use the Source URL of the downloaded PDF as a basis for doing the processing. Brilliant! Have no idea what on earth I just said? No worries, I’m about to go through it.

Where Was That PDF Downloaded From?

You may not know this, but when you download a PDF from the web, it has the address that it was downloaded from embedded in the file.

The easiest way to see this is to highlight the file in Finder and hit Command-i. It will bring up the Info window, and look under the “More Info” section.

You’ll see a bunch of junk in there including this:

Where from

Obviously the URL that you downloaded it from will be there, not mine. Since we have the URL embedded, we can use it to do stuff later.

Create A New Rule In Hazel

Fire up Hazel, and if you haven’t already, add the folder that you download PDFs to in the folder on the left. Then we want to create a new rule, so hit the Plus button in the right column.

New Hazel Rule

Create The Conditions

Now lets give the rule a name, and create the conditions that Hazel will use to figure out which file to act on. The conditions we’ll use in this case are:

  • File type is PDF
  • Source URL is scotiaitrade.com

Again, substitute your own domain. Here’s the screenshot of the conditions:

Hazel conditions

Figure Out What To Do With The File

In all likelihood, the filename that your bank or vendor uses is totally useless. Sure you could type it in yourself when you save it, but we are trying to be as automated as possible here.

What I am going to do is:

  • Rename the file using the convention yyyy_mm_dd-name-type.pdf
  • Move the file to my Statements folder

Obviously use your own naming convention and move it to wherever is appropriate for you.

Before the screenshot, I know some of you are going to ask “How do I move it to Evernote?” It is your lucky day, I already wrote a post about how to send PDFs to Evernote using Hazel so you can combine the two.

First in the Do The Following section, I will change it to Rename File, and the default pattern it gives you is “name” and “extension”. We’ve already established that the name is no good, so delete it.

Drag “date created” up to the pattern section, and it will look like this:

Hazel date created

Take a look at the example at the bottom. That is not the date format that we want, so click on date created and choose Edit Date Pattern. The yyyy mm dd looks good, but click into the box and replace the dashes with underscores.

Hazel date pattern

Hit Done and now the date format is looking good. We just need to give the file a name. In this case, I know that any PDF that I download from scotiaitrade.com is going to be my RRSP statement. So, for this rule, I want the file to be called yyyy_mm_dd-ScotiaiTrade-RRSP.pdf.

Click into the pattern box and just type in “-ScotiaiTrade-RRSP” between “date created” and “extension”.

Hazel finished rename

Next we want to move it. Click the plus sign beside our rename rule, and create a Move rule that moves it to your folder.

Here’s the final rule:

Hazel rule

Click OK, and if you want to see if this will work, click on the little gear icon on the bottom and choose Preview Rule Matches. Looks like it should work!

Hazel Rule Preview

Depending on the other Hazel rules going on, it may take a while to kick off. If you’re impatient, you can run it manually by clicking on the Hazel icon up in your menu bar.

And look, there it is in my Statements folder:

Hazel file renamed

Lets Get Geeky(-er)

For 90% of normal people, the above will work fine. There is only one problem that I know will drive me crazy.

The statement that I am downloading today (in February) is actually January’s statement, and having that February date in the filename will just not do. Unfortunately, I don’t believe that Hazel has the functionality to do any sort of date math, so for me TextExpander will have to come to the rescue.

If you are not familiar with TextExpander, it is a Mac application like Hazel: you don’t really think you need it until you have it, and then once you have it you can’t imagine life without it. Basically, it allows you to type in small snippets of text and have them expand (get it?) to longer ones.

In this case, what I am going to do is take the renaming step out of the Hazel rule and then set up a TextExpander snippet to do the filenaming for me. I am not going to go through the ins and outs of TextExpander in this post but here is a screenshot of my rule:

TextExpander

That looks super-ugly (don’t let it scare you off TextExpander, most rules wouldn’t be like this), but basically %@-1M%Y_%m_%d-ScotiaiTrade-RRSP is saying “subtract one month from today’s date and format it as yyyymmdd”. Now when I type ,its, TextExpander automatically turns it into 2011_01_15-ScotiaiTrade-RRSP.

So if you are still following along, my updated workflow is:

Download PDF > When the Save As box pops up, type ,its which automatically sets the filename with a date one month ago > Save it to Downloads > Hazel sees it, looks at the Source URL, and automatically moves it to Statements.

What About Tags?

I personally use tags for my documents, so the next logical step would be to have Hazel assign OpenMeta tags to the documents. However, that is getting a little geeky even for this post. If it is something you’d like me to post about, drop a note in the comments and I will put something together.

Comments ( 14 )

Free Online OCR With RICOH Innovations

Most scanners these days come with an Optical Character Recognition, or OCR, program of some sort to make PDFs searchable. However, what if you don’t have an OCR program or you just want to do a quick and dirty file conversion without messing around with an application?

A few DocumentSnap commenters have pointed out that RICOH Innovations has created a number of Beta applications, one of which is an online Document Conversion tool.

As the site says:

The document conversion widget provides free OCR to convert your images into editable and searchable pdf, MsWord, HTML and text documents, providing capabilities such as pdf to doc conversion.

I thought I would put the tool to the test using the same parameters as in my ABBYY Finereader vs. Adobe Acrobat OCR comparison.

  • Speed: Once I uploaded the file and hit Convert, it took 27.5 seconds to complete the PDF conversion
  • File Size: The original was 1.5 MB, the converted copy was 160 KB
  • Accuracy: Here is a screenshot from the original:

Article text

Here is the OCR’ed version:

The spreadsheet has become the virtual “slide rule” for CMAs. It’s used for everything from preliminary strategic plans to financial statements. As with any familiar method, it finds its way into numerous situations where better alternatives are available, most significantly in its widespread use as a de facto reporting tool.
The appeal of die spreadsheet as the quickest way to get a report out is not hard to appreciate. “Excel is probably the most comfortable environment for a lot of financial professionals,” Alok Ajmera, vice-president, professional services with Mississauga, Ont.-based Prophix Software, says. “There’s a very little learning curve, you can effectively do whatever you want with the data, and it works fairly well in smaller organizations.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

Pretty good, I’d say

  • Quality: Here is where it gets dicey. Unlike the other tools reviewed, RICOH’s online tool doesn’t put a text layer behind the image, it actually converts the image to text. The results are pretty good actually, but it is not for you if you want your original PDF’s exact look. Here is a screenshot:

Converted PDF

Given that this is a free tool in beta, it is not surprising that there are some limits. The maximum file size is 20 MB and you can only request 20 conversions per hour.

You can choose to download the files immediately or have them emailed to you when they are reader.

Also, you may or may not want to use this tool to OCR sensitive documents. As they say, “In short, we will use submitted data to improve the service.”.

Privacy issues aside, this looks to be a tool that can come in handy when you really need it and is worth playing around with.

If you know of other good online OCR products, leave a note in the comments.

Comments ( 2 )

How To Scan To Gmail With The Fujitsu ScanSnap

I am sure most computer-savvy readers can relate to this: family gatherings involve good food, good times, and providing computer help.

This Christmas was no exception; in between bites of lefse (the only Norwegian thing my family does), my dad, who uses a Fujitsu ScanSnap S300 (the precursor of the ScanSnap S1300) was having trouble emailing his genealogy stuff.

He asked me how he could use his ScanSnap to email using Gmail rather than Outlook. I thought it might make a good post, so here we are.

In Windows, the ScanSnap will use your default email client. So the task becomes, how do you set Gmail as your default mail client?

As with most things, there are a bunch of different solutions. The most common one is to use the Gmail Notifier. When you install that, you can check the box to make Gmail your default client. I don’t know if it is because I am on 64-bit Windows 7 or because Larry & Sergey hate me, but I just could not get it to work on my computer. Whatever I did, it still used Windows Live Mail.

I ended up using a free little app called Affixa. Affixa has a number of features but the main ones I care about here are that it allows you to use Gmail, Google Apps, or Yahoo Mail as your default Windows mail client.

Set Up Affixa

When you first run it, you can add your accounts and then set a few options for each account.

Affixa Options

For me, I set it to launch Gmail after creating a draft message and use HTTPS for added security.

Set Up ScanSnap Manager

On the ScanSnap side, I used the built-in Scan To E-Mail profile. On the Application tab it has Scan to E-mail, and then when you click the Application Settings button, there are a few settings that you might want to take a look at.

Scan Email Settings

I personally like to have “Show preview” popped up so that I can set a filename, and if you want to keep a PDF copy of the document, check “Save scanned images to file”.

After all that, set up the other Profile options on the other tabs as desired.

Sending The Email

Once you have your setup done, just put some documents in the ScanSnap and hit the scan button. Affixa will show that it is logging in, and then if you checked “Launch Gmail” then it will take you right to your message. If not, go to Gmail, go to the Drafts folder, and there it will be.

New Gmail Message

As I said, Affixa has way more functionality than what I have described here, so check out the site if you want to know more.

What About The Mac?

I’m afraid I don’t have good news here. Google has an improved Google Notifier for Mac that I thought would do the trick, but no luck. When I tried scanning, ScanSnap Manager popped up a message specifically saying that it needed to use Mail.app or Entourage.app. Why, I have no idea.

If you have come up with a Mac solution or have another Windows method that works for you, drop a note in the comment and let us know.

Comments ( 7 )

OCR Smackdown: ABBYY FineReader vs. Adobe Acrobat

A very common request that I get here at DocumentSnap is to compare the Optical Character Recognition (OCR) capabilities of ABBYY FineReader with Adobe Acrobat. Why? Well, for starters, both of them come included with models the Fujitsu ScanSnap as well as other scanners.

I decided to do a quick test comparing the OCR of the two packages using the following criteria:

  • OCR Speed
  • Resulting File Size
  • Accuracy

The Hardware

For a scanner I used my ScanSnap S1300.

I used two computers for the test:

  • Windows: A new cheap Acer laptop with a Core i3 2.40 GHz processor and 4 GB RAM running Windows 7
  • Mac: An old 2.5 GHz Intel Core 2 Duo MacBook Pro with 4 GB RAM running Mac OS X Snow Leopard

The Software

Here are the packages I used:

  • Windows: ABBYY FineReader For ScanSnap 4.1 (called from ScanSnap Manager) vs. Adobe Acrobat 9 Pro
  • Mac: ABBYY FineReader For ScanSnap 4.1 (run standalone) vs. Adobe Acrobat 8 Pro

Yes, I realize that Adobe Acrobat X is out, but since I am not aware of any scanners that come bundled with it yet, I decided to stick with the versions that ship with the ScanSnap. I’ll update Acrobat X in a later post.

The Document

I scanned a magazine article for this test. It probably would have been better to do this with a bunch of different documents to compare, but hey.

In all cases except one, I scanned without OCR so that I could run it standalone later. Here’s some info on the document that I used:

  • Pages: 2
  • Scan Quality: 300dpi, Color
  • Resulting File Size: 1.5 MB
  • Columns: 2, with some images

Maybe I am blind, but I couldn’t figure out a way to run ABBYY FineReader for ScanSnap on Windows standalone. If you know how, please leave a message in the comments. In that test, I re-scanned with “Create Searchable PDF” checked in the ScanSnap Manager settings.

The Settings

I tried not to do too many fancy settings to keep things as “real-life” as possible. There were essentially three configurations:

ABBYY FineReader

ABBYY FineReader OCR Settings

I set Save Mode to “Text under page image” and Quality to High. These were the settings for the Mac ABBYY, and I believe it is what ScanSnap Manager on Windows uses as well.

Adobe Acrobat (Normal)

Adobe Acrobat OCR Settings

I set the output style to “Searchable Image (Exact)” because leaving it just as Searchable Image in my experience has caused some weird things to happen with the resulting PDF. I used these settings on both Windows and Mac.

Adobe Acrobat (With ClearScan)

Adobe Acrobat ClearScan

In Acrobat 9 there is a setting called ClearScan. I used that as an additional test to see what the difference is.

Speed

Windows

  • ABBYY Windows: 20.5 seconds
  • Acrobat 9: 13.9 seconds
  • Acrobat 9 With Clearscan: 17.6 seconds

Mac

  • ABBYY Mac: 44.7 seconds
  • Acrobat 8: 20.2 seconds

Winner: Acrobat!

Since they are different machines, you can’t directly compare the Windows and Mac times, but clearly in both cases Acrobat is faster.

File Size

The non-OCR’ed PDF was 1.5 MB.

Windows

  • ABBYY Windows: 1.7 MB (+.2 MB)
  • Acrobat 9: 1.5 MB (same)
  • Acrobat 9 With ClearScan: 315 KB (-1.16 MB)

Mac

  • ABBYY Mac: 1.4 MB (-.1 MB)
  • Acrobat 8: 1.5 MB (same)

Winner: Acrobat 9 with ClearScan!

With an astonishing 1.16 MB reduction in file size after OCR, Acrobat 9 with ClearScan is the winner. Wow.

Accuracy

Here is a passage from the article:

Article Text Before OCR

Let’s see how each of the packages did:

ABBYY Windows

The spreadsheet has become the virtual “slide rule” for CMAs. It’s used for everything from preliminary strategic plans to financial statements. As with any familiar method, it finds its way into numerous situations where better alternatives are available, mostsignificantly in itswidespread use as a de facto reporting tool.
The appeal of the spreadsheet as the quickest way to get a report out is not hard to appreciate. “Excel is probably the most comfortable environment for a lot of financial professionals,” Alok Ajmera, vice-president, professional services withMississauga, Ont.-basedProphixSoftware, says. “There’s a very little learning curve, you can effectively do whatever you want with the data, and it works fairly well in smaller organizations.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

Acrobat 9 Windows

T he spreadsheet has become the virtual “slide rule” for CMAs. It’s used for everything from preliminary su·ategic plans to financial statements. As with any farniliar method, it finds its way into numerous situations where better alternatives are available, most significantly in its widespread use as a de facto reporting tool.
The appeal of tlle spreadsheet as the quickest way to get a report out is not hard to appreciate. “Excel is probably tlle most comfortable environment for a lot of financial professionals,” AJok Ajmera, vice-president, professional services with Mississauga, Ont.-based Prophix Software, says. “There’s a very little learning curve, you can effectively do whatever you want witll tlle data, and it works fairly well in smaller organizations.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

Acrobat 9 With ClearScan

The spreadsheet has become the virtual “slide rule” for CMAs. It’s used for everything from preliminary su·ategic plans to financial statements. As with any farniliar method, it finds its way into numerous situations where better alternatives are available, most significantly in its widespread use as a de facto reporting tool.
The appeal of tlle spreadsheet as the quickest way to get a report out is not hard to appreciate. “Excel is probably tlle most comfortable environment for a lot of financial professionals,” AJok Ajmera, vice-president, professional services with Mississauga, Ont.-based Prophix Software, says. “There’s a very little learning curve, you can effectively do whatever you want witll tlle data, and it works fairly well in smaller organizations.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

ABBYY Mac

The spreadsheet has become the virtual “slide rule” for CiMAs. It’s used for everything from preliminary strategic plans to financial statements. As with any familiar method, it finds its way into numerous situations where better alternatives are available, most significantly in its widespread use as a de facto reporting tool.
The appeal of die spreadsheet as the quickest way to get a report out is not hard to appreciate. “Excel is probably the most comfortable environment for a lot of financial professionals,” Alok Ajmera, vice-president, professional sendees with Mississauga, Ont.-based Prophix Software, says. “There’s a very little learning curve, you can effectively do whatever you want with the data, and it works fairly well in smaller organizations.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

Acrobat 8 Mac

T he spreadsheet has become the virtual “slide rule” for CMAs. It’s used for everything frorn preliminary strategic plans to financial statements. Aswith any familiar method, it finds its way into numerous situations where better alterna tives are available, most significantly in its widespread use as a de facto reporting tool.
T he appeal of the spreadsheet as the quickest
way to get a report out is not hard to appreciate.
“Excel is probably the most comfortable
environment for a lot of financial professionals,” avaJlaun:.:,JIIU:::’l;)It;IIIULauuy1111l::>WIUC::>PU:C1U uocd::>
a de facto reporting tool. T he appeal of the spreadsheet as the quickest
way to get a report out is not hard to appreciate. “Excel is probably me most comfortable environment for a lot of financial professionals,” AJok Ajmera, vice-president, professional services with Mississauga, Ont.-based Prophix Software, says. “T here’s a very little learning curve, you can effectively do whatever you want with the data, and it works fairly well in smaller organiza tions.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

Winner: ABBYY FineReader for Mac looks the best to me. Acrobat 8 on the Mac is pretty terrible (in this example anyways).

Conclusion

Is there a “best” choice? It seems that in this example anyways, Adobe Acrobat 9 with ClearScan turned on gives fast results with good OCR while dramatically reducing the file size.

If you don’t really care about speed so much, FineReader produces good OCR results and for ScanSnap users, has the additional benefit of being integrated with ScanSnap Manager.

As with most things, the best software is the one that works the best for you. Have you found similar results? Any other tests of your own to share? Leave a note in the comments.

(Photo by Polina Sergeeva)

Comments ( 15 )