Tag Archives: mac

OCR Smackdown: ABBYY FineReader vs. Adobe Acrobat

A very common request that I get here at DocumentSnap is to compare the Optical Character Recognition (OCR) capabilities of ABBYY FineReader with Adobe Acrobat. Why? Well, for starters, both of them come included with models the Fujitsu ScanSnap as well as other scanners.

I decided to do a quick test comparing the OCR of the two packages using the following criteria:

  • OCR Speed
  • Resulting File Size
  • Accuracy

The Hardware

For a scanner I used my ScanSnap S1300.

I used two computers for the test:

  • Windows: A new cheap Acer laptop with a Core i3 2.40 GHz processor and 4 GB RAM running Windows 7
  • Mac: An old 2.5 GHz Intel Core 2 Duo MacBook Pro with 4 GB RAM running Mac OS X Snow Leopard

The Software

Here are the packages I used:

  • Windows: ABBYY FineReader For ScanSnap 4.1 (called from ScanSnap Manager) vs. Adobe Acrobat 9 Pro
  • Mac: ABBYY FineReader For ScanSnap 4.1 (run standalone) vs. Adobe Acrobat 8 Pro

Yes, I realize that Adobe Acrobat X is out, but since I am not aware of any scanners that come bundled with it yet, I decided to stick with the versions that ship with the ScanSnap. I’ll update Acrobat X in a later post.

The Document

I scanned a magazine article for this test. It probably would have been better to do this with a bunch of different documents to compare, but hey.

In all cases except one, I scanned without OCR so that I could run it standalone later. Here’s some info on the document that I used:

  • Pages: 2
  • Scan Quality: 300dpi, Color
  • Resulting File Size: 1.5 MB
  • Columns: 2, with some images

Maybe I am blind, but I couldn’t figure out a way to run ABBYY FineReader for ScanSnap on Windows standalone. If you know how, please leave a message in the comments. In that test, I re-scanned with “Create Searchable PDF” checked in the ScanSnap Manager settings.

The Settings

I tried not to do too many fancy settings to keep things as “real-life” as possible. There were essentially three configurations:

ABBYY FineReader

ABBYY FineReader OCR Settings

I set Save Mode to “Text under page image” and Quality to High. These were the settings for the Mac ABBYY, and I believe it is what ScanSnap Manager on Windows uses as well.

Adobe Acrobat (Normal)

Adobe Acrobat OCR Settings

I set the output style to “Searchable Image (Exact)” because leaving it just as Searchable Image in my experience has caused some weird things to happen with the resulting PDF. I used these settings on both Windows and Mac.

Adobe Acrobat (With ClearScan)

Adobe Acrobat ClearScan

In Acrobat 9 there is a setting called ClearScan. I used that as an additional test to see what the difference is.

Speed

Windows

  • ABBYY Windows: 20.5 seconds
  • Acrobat 9: 13.9 seconds
  • Acrobat 9 With Clearscan: 17.6 seconds

Mac

  • ABBYY Mac: 44.7 seconds
  • Acrobat 8: 20.2 seconds

Winner: Acrobat!

Since they are different machines, you can’t directly compare the Windows and Mac times, but clearly in both cases Acrobat is faster.

File Size

The non-OCR’ed PDF was 1.5 MB.

Windows

  • ABBYY Windows: 1.7 MB (+.2 MB)
  • Acrobat 9: 1.5 MB (same)
  • Acrobat 9 With ClearScan: 315 KB (-1.16 MB)

Mac

  • ABBYY Mac: 1.4 MB (-.1 MB)
  • Acrobat 8: 1.5 MB (same)

Winner: Acrobat 9 with ClearScan!

With an astonishing 1.16 MB reduction in file size after OCR, Acrobat 9 with ClearScan is the winner. Wow.

Accuracy

Here is a passage from the article:

Article Text Before OCR

Let’s see how each of the packages did:

ABBYY Windows

The spreadsheet has become the virtual “slide rule” for CMAs. It’s used for everything from preliminary strategic plans to financial statements. As with any familiar method, it finds its way into numerous situations where better alternatives are available, mostsignificantly in itswidespread use as a de facto reporting tool.
The appeal of the spreadsheet as the quickest way to get a report out is not hard to appreciate. “Excel is probably the most comfortable environment for a lot of financial professionals,” Alok Ajmera, vice-president, professional services withMississauga, Ont.-basedProphixSoftware, says. “There’s a very little learning curve, you can effectively do whatever you want with the data, and it works fairly well in smaller organizations.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

Acrobat 9 Windows

T he spreadsheet has become the virtual “slide rule” for CMAs. It’s used for everything from preliminary su·ategic plans to financial statements. As with any farniliar method, it finds its way into numerous situations where better alternatives are available, most significantly in its widespread use as a de facto reporting tool.
The appeal of tlle spreadsheet as the quickest way to get a report out is not hard to appreciate. “Excel is probably tlle most comfortable environment for a lot of financial professionals,” AJok Ajmera, vice-president, professional services with Mississauga, Ont.-based Prophix Software, says. “There’s a very little learning curve, you can effectively do whatever you want witll tlle data, and it works fairly well in smaller organizations.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

Acrobat 9 With ClearScan

The spreadsheet has become the virtual “slide rule” for CMAs. It’s used for everything from preliminary su·ategic plans to financial statements. As with any farniliar method, it finds its way into numerous situations where better alternatives are available, most significantly in its widespread use as a de facto reporting tool.
The appeal of tlle spreadsheet as the quickest way to get a report out is not hard to appreciate. “Excel is probably tlle most comfortable environment for a lot of financial professionals,” AJok Ajmera, vice-president, professional services with Mississauga, Ont.-based Prophix Software, says. “There’s a very little learning curve, you can effectively do whatever you want witll tlle data, and it works fairly well in smaller organizations.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

ABBYY Mac

The spreadsheet has become the virtual “slide rule” for CiMAs. It’s used for everything from preliminary strategic plans to financial statements. As with any familiar method, it finds its way into numerous situations where better alternatives are available, most significantly in its widespread use as a de facto reporting tool.
The appeal of die spreadsheet as the quickest way to get a report out is not hard to appreciate. “Excel is probably the most comfortable environment for a lot of financial professionals,” Alok Ajmera, vice-president, professional sendees with Mississauga, Ont.-based Prophix Software, says. “There’s a very little learning curve, you can effectively do whatever you want with the data, and it works fairly well in smaller organizations.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

Acrobat 8 Mac

T he spreadsheet has become the virtual “slide rule” for CMAs. It’s used for everything frorn preliminary strategic plans to financial statements. Aswith any familiar method, it finds its way into numerous situations where better alterna tives are available, most significantly in its widespread use as a de facto reporting tool.
T he appeal of the spreadsheet as the quickest
way to get a report out is not hard to appreciate.
“Excel is probably the most comfortable
environment for a lot of financial professionals,” avaJlaun:.:,JIIU:::’l;)It;IIIULauuy1111l::>WIUC::>PU:C1U uocd::>
a de facto reporting tool. T he appeal of the spreadsheet as the quickest
way to get a report out is not hard to appreciate. “Excel is probably me most comfortable environment for a lot of financial professionals,” AJok Ajmera, vice-president, professional services with Mississauga, Ont.-based Prophix Software, says. “T here’s a very little learning curve, you can effectively do whatever you want with the data, and it works fairly well in smaller organiza tions.”
Periodic and complex reporting in processes like revenue management or cost management, however, is where the spreadsheet model really starts to break down.

Winner: ABBYY FineReader for Mac looks the best to me. Acrobat 8 on the Mac is pretty terrible (in this example anyways).

Conclusion

Is there a “best” choice? It seems that in this example anyways, Adobe Acrobat 9 with ClearScan turned on gives fast results with good OCR while dramatically reducing the file size.

If you don’t really care about speed so much, FineReader produces good OCR results and for ScanSnap users, has the additional benefit of being integrated with ScanSnap Manager.

As with most things, the best software is the one that works the best for you. Have you found similar results? Any other tests of your own to share? Leave a note in the comments.

(Photo by Polina Sergeeva)

Comments ( 15 )

How To Split PDF Documents Into Single Pages Using Mac OSX

Today a consulting client had an issue that we’ve all done: He scanned a stack of paper intending to make it one PDF per sheet, but instead it went into one big PDF.

Since he didn’t want to re-scan, I broke down a few options for how to split a PDF using the built-in tools of Mac OSX. You can think of this as a companion piece to How To Combine PDFs Using Mac OSX Automator.

Option 1: Use Preview To Split Pages

Preview.app (the application you use to view PDFs and images) has some document management tools under the hood.

To split a file into pages using Preview:

  • Open the file in Preview
  • If you don’t see a list of pages on the right-hand side, click the Sidebar button near the search bar to open it
  • Click and drag each page to your desktop or to a Finder window. It will then copy that page to its own PDF

Option 2: Use Automator To Split Pages

Much like combining PDF files to make one big one, you can split a PDF into separate pages using Automator.

There are a number of ways to do this of course, but in this example I will be making a Droplet. If you want to skip all this setup, I have attached my Droplet to the end of this post. It will hopefully work for you.

Ready? Here we go.

Start Automator

  • In Finder, go to Applications and then start Automator, the cool little robot icon

Choose Application

  • In the window that pops up, highlight Application and then hit Choose

Choose Application

Set Up The PDF Action

  • In the Library section on the left, you’ll see a line for PDF. Choose that
  • In the next column over, there is an option for PDF To Images. Click that and drag it into the the section on the right

PDF To Images

  • Choose where you want the PDF to be saved to by default
  • Choose if you want the new PDFs to have the same name as the original, or if you want to change it.
  • Choose if you want it to replace the original file
  • I want the ability to change it on the fly if needed, so I hit Options and then check Show this action when the workflow runs.

Here is what this step looks like:

Split To PDF Options

Nice work! Now you have your Automator action created. Go to File > Save As and save it either to your Desktop, your Applications folder, or anywhere else you desire.

Using The Droplet

You have just created a Droplet. This means that if you drag a PDF onto the icon, it will automatically run those actions you just created.

When you do this, if everything works well, a popup will come up asking where you want to save the PDFs and if you want to change the name. Choose and hit Continue.

Downloading The Droplet

As mentioned, if you don’t want to go through the hassle of setting this all up, you are welcome to use mine.

To use it, download the file to your computer, double-click it to Unzip it, and move the resulting “DSSplitPDF” file somewhere.

Once you’ve done that, follow the “Using The Droplet” instructions above.

Click Here To Download DSSplitPDF-1.1.zip

I am sure you all have more tips to split PDFs, either on Mac or Windows. School us in the comments!

(Photo by recursion_see_recursion)

Comments ( 20 )

Create A Save As PDF Keyboard Shortcut In Mac OSX

File this one under both “oldie but a goodie” and “insanely useful”.

As you probably know, Mac OSX has “Save As PDF” built into the operating system, so you can print to a PDF from pretty well anywhere. Convenient, but what if you could make it even more convenient?

David Sparks over at MacSparky put together a tutorial outlining how to create a keyboard shortcut for that “Save To PDF” functionality, so that all you need to do is hit Command-P twice.

The tutorial is easy to follow, but there are two updates that are needed for Snow Leopard.

Instead of going to Keyboard & Mouse in System Preferences, go to Keyboard:

Keyboard System Preference

And then the location of All Applications has changed a bit:

All Applications

After that, follow the rest of David’s tutorial and you should be good to go. Just make sure you type “Save As PDF…” exactly as he shows it.

(via shaunblanc).

Comments ( 1 )

Summarize Text Using Mac OSX Summarize Or Microsoft Word AutoSummarize

Whether you are wanting to create an executive summary for a document, or you just want to get the gist of a long document before diving in, wouldn’t it be helpful if your computer could do the skimming for you?

The Mac OSX operating system and Microsoft Word on Windows have little-known summarizing tools that can do a pretty decent job of giving you the key points of a document or block of text.

In the examples below, I will use a PDF copy of the free 4 Ways To Tame Your Documents e-Book that you can get by signing up for the free Paper Sanity e-Course or my weekly Paper Cuts newsletter.

Mac OSX Summarize Service

Sometimes there are hidden features in the nooks and crannies of the OSX operating system, and Summarize Text is one of them. However, before I show you how to use it, we have to check if it is set up first.

Do You Have Summarize Enabled?

Open up a searchable PDF in Preview, a text file in a text editor, or a website in Safari.

Highlight some text and go to the Services menu. In this example in Preview I will go to Preview and then Services. Do you see Summarize in the list like this screenshot? If not, you’re going to have to enable it.

Is Summarize There?

Enabling Summarize

From that same Preview (or whichever app you are in) > Services menu, click Services Preferences.

In the right pane, scroll way down to the bottom of the Text section and you should see Summarize there. Check it to enable.

Enable Summarize Service

Summarize It!

Back in whatever application you were just using, highlight the text you want to summarize or Select All if you want to do the whole document.

Right click on the text, and you should see Summarize (it may be buried in a Services submenu). Click it and it will open up the Summary application.

Right click choose Summarize

Your text will now be summarized, but it doesn’t stop there. By default it shrinks it by about 50%.

You can move the slider to make it bigger or smaller, so you can go way down to 1-5% and get a super short summary.

But you will probably get the best results at around the 25-30% mark.

Once you have things how you like them, you can either read the text there in the Summary application, copy & paste the text out, or save it as an RTF file.

AutoSummarize In Microsoft Word for Windows

While Windows users don’t have this functionality built into the operating system (as far as I know), a similar function does come included in Microsoft Word (at least in 2003 and 2007).

I was going to write a tutorial for you, but this AutoSummarize video by Microsoft pretty much tells you everything you need to know.

I can see myself using text summarization when I have a long document to read and I want to get the key points before digging in. How about you? Leave a note in the comments if you think this feature would (or wouldn’t be) useful and how you’d use it.

(Photo: visual.dichotomy)

Comments ( 4 )

How To Combine PDF Files in Mac OSX Using Automator To Make A Service

A close friend of mine is currently doing a year long round-the-world trip. Being the sucker nice guy that I am, I have agreed to receive her mail and keep her important documents for her.

Sometimes I have situations where I need to scan stuff for her and send them electronically. Usually my trusty ScanSnap S1300 does the job, but there is the occasional situation where I have to use my flatbed.

However, what happens when I have multiple PDFs that really should go in one document? I needed a way to combine the PDFs together.

There are a million ways to do this, including some I have talked about before like using Preview.app to drag and drop pages, and there are lots of applications that one can use to combine PDFs, but I wanted to do something that would be:

  • Already built into the OS and not require any additional software
  • Easy to use
  • Repeatable so I only have to set it up once and can use it again and again

I came across this great tutorial by George Harito that runs through how to use Automator to create a Service in Snow Leopard to combine PDFs for you. If you don’t know what the heck that means, don’t worry about it. I’ll take you through step by step.

(In case you’re wondering why I don’t just point to George’s tutorial and be done with it, it’s because there are some extra unneeded steps in there that might confuse some people, so I decided to recreate it here. The inspiration comes from George though and he deserves all the credit).

Create The Service

We’re going to be using an application called Automator to set this up. It looks a bit scary but don’t worry, what we’ll be doing is very easy.

Start Automator

  • In Finder, go to Applications and then start Automator, the cool little robot icon

Choose Service

  • In the window that pops up, highlight Service and then hit Choose

Choose Service

Set Up The PDF Action

  • At the top in the middle, you will see a line that says Service receives selected and then a dropdown. Choose PDF files

  • In the Library section on the left, you’ll see a line for PDF. Choose that
  • In the next column over, there is an option for Combine PDF Pages. Click that and drag it into the the section on the right

Now we want to be able to give our new PDF a name.

  • Go to the Library section and choose Files & Folders and drag Rename Finder Items to the canvas under your Combine PDF Pages
  • It is going to ask you this: This action will change the names of the Finder items passed into it. Would you like to add a Copy Finder Items action so that the copies are changed and your originals are preserved?. You almost certainly want to hit Don’t Add because you want it to rename the file, not make a copy.
  • We want to totally rename the file, so choose Name Single Item
  • At the bottom, click on Options and then check Show this action when the workflow runs. It will then prompt you for a name for your new PDF when you run it.

So now we’ve told it to combine the PDFs and give it a name, but now we need to tell it where to put the new file.

  • In the Library, still in Files & Folders, drag Move Finder Items to the canvas under your previous action
  • You can leave the default location if you want, but click on Options and then check Show this action when the workflow runs. That way, it will ask you where you want to save your new PDF.

Here is the whole rule. Click to embiggen.

Awesome! You’re done. Now go File | Save As and give it a name like Combine PDF Files.

Using the Service

So why did we go through all this trouble to set it up as a Service?

Now any time I ever want to combine a bunch of PDFs, I just need to go to Finder, right click them, and check out the new option that I have:

When I choose Combine PDFs, I get a popup where I give it a name.

Then, another popup appears where I choose where it save it.

Et voila, I have my combined PDF with the 3 files that I had selected in the Finder.

From now on it will be super easy to combine any PDFs.

How about you?

What tricks/processes do you have for combining PDFs on the Mac? Leave a note in the comments or on the Facebook page.

Comments ( 28 )

Behind These Paperless Evernote Hazel Eyes

HazelUsing the Kelly Clarkson quote was just way too easy, I know.  This post is not in fact about American Idol winners, but is about Hazel, a Mac-only rules-based file management application.  It does a ton of stuff, but today I am going to talk about how you can use it in a paperless workflow.

To be honest, DocumentSnap readers have been mentioning Hazel to me for quite some time, but for whatever reason I have never gotten around to looking at it until now.  As usual, you guys are way smarter than I am.  Why on earth did I wait?

Basically, you can think of Hazel as something that brings iTunes Smart Playlist-like rules to the files on your Mac.

How can this help in a paperless workflow?  Well, for example, you could have Hazel watch a folder, and then anything that you drop into it could be tagged, Spotlight comments added, OCR’ed, and then sent to a specific folder.

David Sparks from MacSparky has a great runthrough on how he does this.  I definitely recommend checking it out.  He has a bunch of Hazel rules that get triggered when he names a file something, like “gas bill”.  As soon as he names a file “gas bill.pdf”, the Hazel rules kick in and it gets renamed with the appropriate date added, then it gets sent to a nested folder structure based on type and date.  Very cool stuff.

He also describes this workflow in episodes #3 and #25 of the Mac Power Users Podcast.

Hazel And Evernote

As I said, there are a bunch of different ways you can use for Hazel in a paperless workflow.  One that pops to mind is to create a rule that sends something to Evernote.  Lets say we scan or receive PDFs and want to send certain ones to Evernote.

In my example, I’ll create a folder under Documents called “ToEvernote”.

Then I will create a Hazel Rule called “Evernote Import” that watches that folder, and acts on any PDFs that I save there.

First I will create a condition that acts on any files with Extension PDF:

Then I will run an Applescript, so will choose “Run Applescript”.  I will leave as “embedded script” and then hit “Edit Script”

Then I will paste in the following code to that box:

tell application "Evernote"
activate
create note from file theFile
end tell

Then I will hit the Plus sign to add a new action.  Once a file has been added to Evernote, I don’t want to keep it around, so I trash it.  I choose Move File and then select the Trash folder.

Here is what my final rule looks like:

Now, as soon as I drag a PDF into that toEvernote folder, Evernote pops up with the new note and the PDF is trashed. Coolio!

Of course, you can get extremely fancy here, but between this post and David Sparks’, you should be well on your way to paperless fun with Hazel.

I’m rocking the 14 day free trial now, but I think I will be paying the $22 to buy the full version.  Great stuff.

Do you use Hazel? Have any tricks? Leave a note in the comments.

Comments ( 3 )

FineReader Mac Update For ScanSnap Works For Older ScanSnaps Too

File this post under the “I will probably regret posting this” category.

I had a tip from DocumentSnap reader Hamad that was too helpful not to share.

Remember back in November 2009, Fujitsu had to release an update for Abbyy FineReader because it was having problems with Mac OSX Snow Leopard?

According to the website, it is an update for the ScanSnap S1500M and S510M.

However, it turns out that this version of FineReader works for even old-school ScanSnaps too. For example, I’ve even had reports of it working with a ScanSnap fi-5110EOXM from 2005!

It seems that, as long as the PDF is created by a ScanSnap (any Mac ScanSnap), it will work.

So, if you are running Snow Leopard and have an older ScanSnap, give it a try. It will probably work for you too. If you’re using something other than an S1500M or S510M and it works, drop us a comment and let us know.

Comments ( 5 )

Updated: Acrobat Applescript for ScanSnap OCR

As many of you know, in 2008 I posted an Applescript that will use Adobe Acrobat to make PDFs searchable using Acrobat’s OCR capabilities.

In the comments to that post, user nodis pointed out that adding 2 words to one of the lines can make the PDFs quite a bit smaller.

In my testing, I ran a 1.3 MB PDF through the script. Before nodis’ change, the resulting PDF was 1.7 MB. After the change, it was 424K!

Here is the updated script:

OCRIt-Acrobat – Droplet to batch OCR PDFs in Adobe Acrobat

To use it:

  • Download and uncompress the file and save it to your Desktop, Dock or wherever
  • Drag one or more PDFs onto the icon
  • Enjoy

Let me know how it works out for you and if you see similar reductions in file size.

Update: If you use Acrobat X, please see this post about OCR AppleScript for Acrobat X.

Comments ( 17 )

Cool Paperless Setup Video

As much of a paperless geek that I am, I normally wouldn’t sit and watch a video of someone scanning and shredding paper.

However, I just wanted to point you to this YouTube video by user allenday. He’s got a really cool setup of a ScanSnap S300M, Adobe Acrobat, a Mac Mini, a wall-mounted Sharp Aquos, the Royal PX1000MX to shred, and uploads everything to Evernote.

To do the OCRing, he uses the Acrobat OCR Applescript Droplet that I hacked/posted about earlier.


Very cool setup, thanks for sharing allenday! Do any of you have a cool paperless setup? Feel free to share pics or videos in the comments.

Comments ( 4 )

How To Create Searchable PDFs With The ScanSnap S300M

scansnap300m.jpg So you read all this great stuff about how the Fujitsu ScanSnap is awesome and creates searchable PDFs, and you’re on a Mac and want a portable scanner, so you drop the cash on a ScanSnap S300M.

Then you get it home and find out – wait a minute – the S300M doesn’t come with OCR software! If you’ve been there (and I have), hopefully this post will help you out, as I get a lot of questions about this.

Mail-In Rebate

Your local Fujitsu website may provide a mail-in rebate for OCR software if you purchase the S300M. At the time of this writing, the US Fujitsu websites has a mail-in rebate for a free copy of ReadIris OCR software

The rebate is at http://www.fujitsu.com/us/services/computing/peripherals/scanners/rebates.html . Check if your country has something similar.

Acrobat

While the S300M doesn’t come with Adobe Acrobat, if you have a copy of it laying around, or have access to it, you can use the ScanSnap with it. Here is an example of how I use the S300M with Acrobat 8.

Evernote

Evernote Premium allows users to upload PDFs and they will be automatically OCR’ed and made searchable.

DevonThink

If you use a program like Devonthink Pro Office to manage your documents, they will be made searchable.

NeatWorks

NeatWorks is a software that is bundled with the NeatDesk scanner, but it can be purchased on its own. See this post for how to use NeatWorks with the Fujitsu ScanSnap.

These are some ideas for how to make searchable PDFs with the ScanSnap S300M. Do you have any others? Leave a message in the comments.

Comments ( 3 )