Tag Archives: applescript

Reader Story: Automating Document Processing

This post is part of the paperless stories feature at DocumentSnap. Some stories are from readers that have successfully gone paperless, some are still going through it. Would you like to share your story too?

Today’s featured DocumentSnap reader is Tom Cook from Michigan. You can find him at http://tomcook.net/.

What problems were you trying to solve by going paperless?

The largest problem I had was the buildup of papers that needed to be filed. Our filing cabinet was on another floor, so things would just pile up and get shoved into a cabinet. I also hate having to keep all that paperwork that will likely never need to be seen again.

What were the biggest stumbling blocks?

The cost of a scanner. ScanSnaps are expensive. I tried doing it with the flatbed we had, but it was too slow to process new things on a daily basis.

The other hassle has been the backlog of old stuff. Going through the file cabinet, or piles and scanning the huge pile is a big job.

Tell us about your paperless workflow

I use Yep as an organizer. To get documents into it they come from a few workflows.

Real Paper – Where ever it comes from (mail, school, etc) gets processed ASAP. If its junk, it goes to recycling. If it is sensitive junk it gets shredded. If it is worth keeping it goes into the Scansnap 1300 and I press the scan button. It is scanned to a folder Yep watches, and then opened in the Applescript droplet (from you) and gets OCR, alignment, etc. Scanned to the droplet is the default action most of the time. When I have a large amount, such as purging the file cabinet I have the ScanSnap just save all the docs to a folder and then I manually drop them all on the droplet and let it run.

Electronic documents generated on the computer – These are usually saved as PDF’s directly into the WebReciepts folder on my Mac, or added to Yep via the print dialogue box.

Electronic docs from other computers – Sometimes I need to file something to Yep, but I am at work. I set up a process where I email a PDF to myself with the subject of “File This”. This triggers a mail rule that moves it to a local mail folder and launches an Applescript that runs Automator actions to save the PDF attachment(s) to a folder watched by Yep. I have only just started with this and I am still working on refining the process. I haven’t been able to get it to delete the message after processing.

Electronic Statements – Most banks, utilities only allow access to PDF statements/bills back a year or two. I have set up quarterly reminders to go to all the locations to download statements and bills. Doing it monthly would be too much logging in hassle, and annually would do much hassle downloading 12 statements. Most of the places I get them from it is a few clicks to get to each statement. I use a Google Spreadsheet as a check list to make sure I get all of them.

Backups – I use an external hard drive and Time Machine for routine onsite backup. I also keep an external HD at my parents house (about 30 min away) and a few times a year (or more if I have added a lot of docs) I do a clone of the whole computer. The thought of losing all those pictures and documents to drive failure/theft/damage is nothing compared to the cost of a couple hard drives.

Thanks Tom, that’s a great use of automation. I especially like the mail rule and the reminder to go in and download your statements on a regular basis.

If you have questions for Tom, leave a comment and I will try to get them answered.

(Photo by IK’s World Trip)

Comments ( 0 )

How To Scan To The OmniFocus Inbox

Back in January, DocumentSnap reader Jos van de Voort van de Kleij wrote in with a question: How could he use his ScanSnap S1300 to scan directly to his OmniFocus inbox?

Now, despite the writing and podcasting of David Sparks, Ben Brooks, and Merlin Mann, I have not yet taken the plunge and started using OmniFocus. However, I was at Macworld at the time, so I went to the Omni booth to ask for suggestions, and they pointed me to the Omni Forums.

After some great help from the Omni folks, Jos has worked out a workflow that lets him scan to his OmniFocus Inbox. Now, why would he want to do this? Here are his words:

Remember: this workflow allows you to create paperless “ticklers” in OmniFocus therefore eliminating the need for the 43 folders as described in the book by David Allen. As an example you could scan a bill, give it a “pay by” date and it will disappear from your radar until the date comes to pay the bill. You will have the scan of the bill to support preparing the payment. When done you click the action completed and it goes away automatically. Yet another step in going paperless!

Sounds good to me. Here are the steps to get it done.

Set Up The A Folder Action

  • Download the OmniFocus Applescript to somewhere on your computer
  • Unzip it
  • Drop the script into this folder:Macintosh HD>Library>Scripts>Folder Action Scripts
  • Find or create a folder that you are going to want to scan/save to
  • Right-click on this folder, and select Services>Folder Actions Setup

Folder Actions Setup

  • Select the “Add files as OmniFocus actions” script

Attach script

  • Click Attach

Now you have a folder where any time a file gets saved there, it automatically gets imported to OmniFocus.

Set Up Your Scanner

Set up your scanner to scan to this folder (in my case I called it “To OmniFocus”).

If you are using a ScanSnap, you can set up a profile for this.

On the Application tab, choose Scan To File (or, if you want to be able to rename it first, Scan To Folder)

Application Tab

On the Save tab, browse to your OmniFocus scan folder that you created earlier.

Save Tab

For the rest of the tabs, set whichever options you prefer.

Scan To OmniFocus!

Once this is all set up, you are good to go. Select your OmniFocus profile, hit Scan, and the Quick Entry window should pop up.

OmniFocus Quick Entry

Do whatever you OmniFocus people do there, and hit Save. The task will then be created with your scanned document as an attachment in your OmniFocus Inbox.

OmniFocus Inbox

Since we’re talking OmniFocus, any other OF devotees out there that have tips or tricks for how they reference their documents from it? Let us know in the comments.

Comments ( 23 )

Hazel Rule To OCR Documents Using PDFPen

The other day I posted an Applescript to OCR documents using PDFPen.

In the comments, awesome DocumentSnap reader Josh requested that it be done as a Hazel rule instead. Given that my love for Hazel is well documented, I am happy to oblige.

I created a folder and then created the following Hazel rule to run against it:

  • Extension is PDF
  • Date Last Modified is after Date Last Matched (to stop it from trying to re-OCR documents)

Then I asked it to run the following Applescript:

tell application "PDFpen"
open theFile as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
end tell

Of course, if you are using PDFpenPro, replace the first line with “PDFpenPro”.

Here’s a screenshot (unfortunately the bottom of the script is cut off):

Hazel PDFPen Rule

Hope this helps out you Hazel and PDFPen fans out there. Enjoy.

Comments ( 12 )

PDFPen OCR Applescript To Automatically Make PDFs Searchable

I don’t know if it is because I have been glued to a computer since I was six years old, but my handwriting and printing is terrible. Really terrible. I think my 5 year old son and I have pretty similar handwriting skills.

Normally this is not a problem, except when I have to fill out a form. It’s a little embarrassing filling out some official form with my chicken scratch, which is one of the many reasons why I love PDFPen. Among many other things, it lets you fill out and edit any PDF document on your computer and then print it out.

However, that ability is not what this post is about. PDFPen will also OCR PDFs to make them searchable, and I wanted a way to OCR a bunch of documents automatically with an Applescript, similar to what has been done with Adobe Acrobat and with ABBYY FineReader.

I found two scripts out there. One from David Sparks at MacSparky, which some users reported problems with in newer PDFPen versions, and one from Michael Tsai at C-Command Software which will OCR a document with PDFPen and send it to EagleFiler.

Since both of these scripts were almost what I wanted, I decided to stand on the shoulder of giants and merge them together into this Applescript.

Here is the script:
-- Downloaded From: http://www.documentsnap.com
-- Last Modified: 2010-09-28
-- Includes code from MacSparky http://www.macsparky.com/blog/2009/5/24/pdfpen-ocr-folder-action-script.html
-- Includes code from C-Command Software http://c-command.com/scripts/eaglefiler/ocr-with-pdfpen

on adding folder items to this_folder after receiving added_items
try
repeat with added_item in added_items
my ocr(added_item)
end repeat
on error errText
display dialog "Error: " & errText
end try
end adding folder items to


on ocr(added_item)
tell application "PDFpen"
open added_item as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
end tell
end ocr

PDFpen Users: Download The Text Script Here (Right-click and Save-As)
PDFpen Pro Users: Download The Text Script Here (Right-click and Save-As)

To implement, follow MacSparky’s excellent instructions.

I hope this is of use to someone, and thanks to David and Michael for their excellent Applescripts.

Comments ( 14 )

Behind These Paperless Evernote Hazel Eyes

HazelUsing the Kelly Clarkson quote was just way too easy, I know.  This post is not in fact about American Idol winners, but is about Hazel, a Mac-only rules-based file management application.  It does a ton of stuff, but today I am going to talk about how you can use it in a paperless workflow.

To be honest, DocumentSnap readers have been mentioning Hazel to me for quite some time, but for whatever reason I have never gotten around to looking at it until now.  As usual, you guys are way smarter than I am.  Why on earth did I wait?

Basically, you can think of Hazel as something that brings iTunes Smart Playlist-like rules to the files on your Mac.

How can this help in a paperless workflow?  Well, for example, you could have Hazel watch a folder, and then anything that you drop into it could be tagged, Spotlight comments added, OCR’ed, and then sent to a specific folder.

David Sparks from MacSparky has a great runthrough on how he does this.  I definitely recommend checking it out.  He has a bunch of Hazel rules that get triggered when he names a file something, like “gas bill”.  As soon as he names a file “gas bill.pdf”, the Hazel rules kick in and it gets renamed with the appropriate date added, then it gets sent to a nested folder structure based on type and date.  Very cool stuff.

He also describes this workflow in episodes #3 and #25 of the Mac Power Users Podcast.

Hazel And Evernote

As I said, there are a bunch of different ways you can use for Hazel in a paperless workflow.  One that pops to mind is to create a rule that sends something to Evernote.  Lets say we scan or receive PDFs and want to send certain ones to Evernote.

In my example, I’ll create a folder under Documents called “ToEvernote”.

Then I will create a Hazel Rule called “Evernote Import” that watches that folder, and acts on any PDFs that I save there.

First I will create a condition that acts on any files with Extension PDF:

Then I will run an Applescript, so will choose “Run Applescript”.  I will leave as “embedded script” and then hit “Edit Script”

Then I will paste in the following code to that box:

tell application "Evernote"
activate
create note from file theFile
end tell

Then I will hit the Plus sign to add a new action.  Once a file has been added to Evernote, I don’t want to keep it around, so I trash it.  I choose Move File and then select the Trash folder.

Here is what my final rule looks like:

Now, as soon as I drag a PDF into that toEvernote folder, Evernote pops up with the new note and the PDF is trashed. Coolio!

Of course, you can get extremely fancy here, but between this post and David Sparks’, you should be well on your way to paperless fun with Hazel.

I’m rocking the 14 day free trial now, but I think I will be paying the $22 to buy the full version.  Great stuff.

Do you use Hazel? Have any tricks? Leave a note in the comments.

Comments ( 3 )

Updated: Acrobat Applescript for ScanSnap OCR

As many of you know, in 2008 I posted an Applescript that will use Adobe Acrobat to make PDFs searchable using Acrobat’s OCR capabilities.

In the comments to that post, user nodis pointed out that adding 2 words to one of the lines can make the PDFs quite a bit smaller.

In my testing, I ran a 1.3 MB PDF through the script. Before nodis’ change, the resulting PDF was 1.7 MB. After the change, it was 424K!

Here is the updated script:

OCRIt-Acrobat – Droplet to batch OCR PDFs in Adobe Acrobat

To use it:

  • Download and uncompress the file and save it to your Desktop, Dock or wherever
  • Drag one or more PDFs onto the icon
  • Enjoy

Let me know how it works out for you and if you see similar reductions in file size.

Update: If you use Acrobat X, please see this post about OCR AppleScript for Acrobat X.

Comments ( 17 )

Cool Paperless Setup Video

As much of a paperless geek that I am, I normally wouldn’t sit and watch a video of someone scanning and shredding paper.

However, I just wanted to point you to this YouTube video by user allenday. He’s got a really cool setup of a ScanSnap S300M, Adobe Acrobat, a Mac Mini, a wall-mounted Sharp Aquos, the Royal PX1000MX to shred, and uploads everything to Evernote.

To do the OCRing, he uses the Acrobat OCR Applescript Droplet that I hacked/posted about earlier.


Very cool setup, thanks for sharing allenday! Do any of you have a cool paperless setup? Feel free to share pics or videos in the comments.

Comments ( 4 )

Applescript: Easily convert PDF documents to JPG or PNG

There are, of course, a million ways to convert PDF documents to JPG or PNG files. However, sometimes you just want something quick and easy.

A while ago, reader AS pointed me out an Applescript droplet written by Martin Michel over at MacScripter.

AS mentioned that it would be nice to have a version that converts to PNG as well. Being nothing if not nice, I used my almost non-existant Applescript/Python skills to convert Martin’s script to output PNG. All credit for this goes to Martin.. I just did some modifications.

Here’s how to do it:

PDF To JPG

PDF to PNG

  • Download PDF2PNG from here
  • Drag a PDF or multiple PDFs onto the icon
  • Select the resolution (or accept the default)
  • A PNG file will be created for each page in the PDF

Hope this helps some of you. I know they will come in handy for me. Thanks AS and thanks Martin Michel! Let me know how it works out for you.

Comments ( 10 )

OCR Your ScanSnap PDF Before Sending It To Evernote

Update: Of course, a few days after I posted this, Evernote announced that they would make PDFs searchable for Premium users. So if you are not a Premium user, this will help. Otherwise, just upload away.

One of the most popular posts on this site is on how to use the Fujitsu ScanSnap with Evernote. It describes how to set up a profile in ScanSnap Manager to send the resulting PDF to Evernote.

There is one problem with doing it this way – Evernote does not OCR PDFs. I assume they’ll be fixing this someday, but for now, if you want your document searchable within Evernote, you need to OCR it before sending it into Evernote.

How you do this depends on which model of the ScanSnap that you have, and whether you have Windows or a Mac.

ScanSnap For Windows

If you have the ScanSnap S300, S510, or S1500, your solution is pretty simple.

What we’re going to do is set Evernote to watch a folder so that anything it finds in there it will automatically import. Then set up ScanSnap to save files to that folder.

  • In Evernote, go to File -> Import -> File Import Wizard
  • Hit Next and select the Source folder that you want Evernote to watch and set your notebook
  • Choose “Watch folder for changes and import files automatically”

Now set up ScanSnap normally to scan to that folder you just selected, and whatever files you save into that folder will be grabbed by Evernote.

ScanSnap S510M or S1500M For Mac

For whatever reason, Evernote for the Mac does not have the Watch Folder functionality that the Windows client does (why not Evernote?!). However, thanks to the magic of Applescript, we can do the same thing.

This will work for the ScanSnap S510M or S1500M.

  • Download this file – AddToEvernote.scpt and save it to /Library/Scripts/Folder Action Scripts
  • Create or select a folder that you want scanned PDFs to go into. Right-click on it and select More and then Enable Folder Actions
  • Right click on the folder again and select More and then Attach a Folder Action. Select the AddToEvernote script that you just saved

Now set up ScanSnap normally to scan to the folder that you just configured. When you add a PDF to it, the Applescript will go through that folder and add the files into Evernote. Handy!

ScanSnap S300M For Mac

For whatever reason (I say that a lot), the ScanSnap S300M does not come with OCR software (why not Fujitsu!?).

However, we’re in luck. Awesome DocumentSnap reader Sebastian Poll wrote this Applescript that will use Adobe Acrobat to automatically OCR the PDF and then kick it straight into Evernote.

Obviously, it requires Acrobat. If you don’t have Acrobat, you can use whatever method you currently use to OCR and then use the AddToEvernote above to import it in.

Note that Sebastian’s version was actually written with some of the code in German. I changed it to English, so if there are problems, it is probably my fault and not his.

  • Download this file – OCREvernote.scpt and save it to /Library/Scripts/Folder Action Scripts
  • Create or select a folder that you want scanned PDFs to go into. Right-click on it and select More and then Enable Folder Actions
  • Right click on the folder again and select More and then Attach a Folder Action. Select the OCREvernote script that you just saved

Now set up ScanSnap normally to scan to the folder that you just configured. When you add a PDF to it, the Applescript will go through that folder and OCR with Acrobat and then add the files into Evernote.

Do you use the ScanSnap with Evernote? Do you have any other methods of making PDFs searchable? Or do you not bother? Leave a message in the comments.

Comments ( 25 )

Abbyy Finereader and Adobe Acrobat – Why Does Fujitsu Include Both?

finereadervsacrobat.gif

I have received a number of questions recently about the software that is included with the Fujitsu ScanSnap. For example, why does the ScanSnap come with both Abbyy FineReader and Adobe Acrobat? Aren’t they both for doing OCR?

I suspect part of the reason that this question comes up is because of my posts about my ScanSnap workflow and my Adobe Acrobat OCR Applescript. Is all that necessary?

Let me start by saying that I personally have the ScanSnap S300M. The S300M comes neither with Abbyy FineReader not with Adobe Acrobat. If you have the S1500 or S1500M, your scanner will come with both and doing OCR is much more integrated than with the S300M, so my post-scan processing fun may not be necessary.

So What’s The Difference?

The ScanSnap comes with a special version of Abbyy FineReader called FineReader for ScanSnap. They’ve integrated that with ScanSnap Organizer, so if you are using the built-in automatic OCR’ing, that is what it is using.

If all you care about is having your PDFs searchable and don’t mind performing the OCR right after scanning, then the supplied FineReader is probably all you need.

To my mind, there are basically two main reasons why you will want to use Adobe Acrobat:

  • You want to do PDF editing after the fact
  • You want to batch your OCR after the fact

PDF Editing

So you have your scanned PDF. Now what? If you want to remove/rearrange pages and do a whole ton of other editing functions, Acrobat is a great tool. It is most definitely not just for making a PDF searchable.

You can see a bunch more information for Adobe Acrobat 9 (included with the ScanSnap 1500) and Acrobat 8 (included with the ScanSnap 1500M). You can see from the price that it’s a pretty good deal that this software is included with the ScanSnap.

Batch OCR

If you have a whole bunch of documents to scan in, it may be annoying to scan, sit there and wait for it to OCR, scan, OCR, scan, OCR, and so on. Some people prefer to scan all their documents to PDF in one shot, and then OCR them all at once. You can use Acrobat to do that instead of the included FineReader.

So there you have it, some of the differences between the two. What are some of the reasons you use one over the other?

Comments ( 12 )