ScanSnap and Hazel Is A Match Made In Paperless Heaven

HighlighterThere are a lot of tricks out there for keeping your documents organized based on their location or filename, but the holy grail is to be able to keep them organized based on the actual contents of the documents themselves.

I have written before about how the Fujitsu ScanSnap S1500, the S1500M and the S1300 allow you to use a highlighter pen to automatically assign keywords to a PDF.

However, once you have those keywords assigned, how does that help you?

If you’re on Windows, you can use the “Distribute By Keyword” feature of the included ScanSnap Organizer to move the files to a cabinet, but Mac users are out of luck there.

I humbly submit that using a highlighter, OCR, and the awesomeness that is Hazel, Mac users can one-up even the mighty ScanSnap Organizer.

What Is Hazel?

As my clients have been learning lately, I have been engaged in a torrid love affair with a Mac application known as Hazel from Noodlesoft. At a very high level, it lets you create rules to automatically keep your files organized.

I have written about how you can use Hazel with Evernote, and David Sparks at Macsparky has a great guide for moving PDFs based on filename.

I wanted to do something that would marry the searchable goodness of the ScanSnap with the ninja skills of Hazel.

Set Up The ScanSnap For Keyword Highlighting

The first thing you’ll need to do is set up a ScanSnap Manager profile to read highlighted text and make keywords out of it.

First, on the Scanning tab, I have had best luck setting the Image quality to “Best” (300dpi). At anything lower, the ScanSnap wasn’t picking up the keywords consistently.

Image quality

Then on the File Option tab, make sure that “Set the marked text as a keyword for the PDF file” is checked. That will tell it to look for any highlighted text and turn it into a keyword in the PDF.

Set marked

You will, of course, want to choose a folder to save the PDF to. Make a note of this folder because we will need it when we switch to Hazel. In my case it is called ToMove.

Get Out Your Highlighter

Is it Hi-liter or Highlighter? I never know. Anyways, now take your pen and highlight the word or phrase that you want to move the file based on.

Essentially what we will be doing is saying “if the PDF contains this keyword, do something with it”.

All I have handy are grocery receipts, so you can see I highlighted “EXTRA FOODS”.

Grocery receipt highlighted

Scan And Check Keywords

Now scan your document using your shiny new ScanSnap Manager profile. When it is done, open up your new PDF in Preview, go to Tools > Inspector (or hit Cmd-I), and click on the magnifying glass. If everything worked properly, you should see the text that you highlighted.

PDF with keywords

Set Hazel To Move Based On Keyword

Let’s say we want to move any PDF with the keyword “EXTRA FOODS” to a folder called Filed Documents (we’d probably want to move it to a grocery-specific folder, but let’s just pretend).

Open up Hazel and on the left side, click the Plus to add a new folder. Add your ToMove folder that you used as a scan destination in ScanSnap Manager.

Hazel To Move

Now in the right pane, click the plus to add a new rule. Give it a name.

You can set a number of criteria and rules here, but to keep it simple we will leave it as “all conditions”, then set:

  • Kind is PDF
  • Keywords contain EXTRA FOODS

Next, set it to Move the file to folder Filed Documents

Hazel move based on keyword

Hit OK to save it. If you want to see what your rule will catch, you can click on the little Gear icon near the bottom and choose “Preview Rule Matches”. If everything is set up properly, your newly-scanned document should show there.

If it doesn’t show, check the PDF to make sure that it really has keywords and re-check your rule setup.

If your document shows in the preview, either wait for Hazel to do its thing, or click on the Hazel icon in the Menu bar, choose Run Rules, and choose the rule that you just created.

Set Hazel To Rename Based On Keyword

Let’s say that instead of moving a file based on a certain keyword, we want to give our files a name based on the highlighted text. Is this possible? Why yes, yes it is. Let’s use our new Hazel Ninja powers and do it.

Create a new Hazel rule as we did before, but this time for the criteria, set this:

  • Kind is PDF
  • Keywords is not blank

Next, in the “Do the following” section, choose “Move file” to folder “Filed Documents” (if you choose), and then set up the following:

  • Choose Rename file
  • In the with pattern section it will say “name” and then “extension”. Click on “name” and hit the delete key. We want to get rid of that.
  • Let’s give the filename a date. Drag “date created” up before extension. If you prefer, click the little down arrow in “date created” and choose Edit Date Pattern and change to whatever pattern you choose.
  • Drag “other” up between “date created” and “extension”. It will ask you to select a Spotlight Attribute. Scroll down to find Keywords and hit Select.
  • If you prefer, click on the little down arrow in “keywords” and change which keywords are selected and how they are formatted.
  • You might want to click between “date created” and “keywords” and put a dash, but that is up to you.

Your final rule should look something like this:

Hazel move rename keywords

Now when we scan that same Extra Foods receipt, our Hazel rule will move the file to Filed Documents and rename it like this.

Renamed PDF

Forget Keywords, Use Hazel To Move Based On Searchable Text

Let’s say you want to forget about this whole highlighter/keyword thing. You already have scanned and searchable PDFs. Can’t you just move based on the OCR’ed text in the documents? Let’s find out.

So you really, really like the vegetable kale and you want to move any scanned receipt that has the word Kale in it (can you tell all I had around for this demo is grocery receipts?).

First, here is our receipt:

Kale receipt

Next, we obviously need to be using a ScanSnap Manager profile that has “Convert to searchable PDF” checked on the File Options tab. Again you will have better results if you use 300dpi for Image quality.

Now we set up another Hazel rule, this time using the following criteria:

  • Kind is PDF
  • Contents contain Kale

Then do something with it such as move it to Filed Documents.

Hazel OCR Rule

Now when you scan a document that has the word “Kale” in it, Hazel will move it.

These were a few examples of things you can do in Hazel to be a document management ninja. Hopefully it will give you some ideas.

Want More Help With Going Paperless?
  • Receive my free guide 4 Ways To Tame Your Documents. 
  • Receive my popular free Paper Cuts newsletter.
  • Receive my free 7 part Paper Sanity e-Course. 

Tags: , , , , ,

29 Responses to “ScanSnap and Hazel Is A Match Made In Paperless Heaven”

  1. Julie K September 21, 2010 at 7:40 pm #

    I don't have those options with S300M V2.2 L12. Is this in new 1500 and 1300 hardware only?

    • BrooksD September 22, 2010 at 7:02 am #

      Hi, the options were new with the S1500, S1500M, and S1300. Unfortunately they aren't (as far as I know) in the software that came with the S300M.

    • BrooksD September 22, 2010 at 7:03 am #

      I should mention that you can still give PDFs keywords using Preview, it's just that the ScanSnap won't do it automagically with a highlighter.

  2. Ron C September 22, 2010 at 6:38 am #

    And what version of ScanSnap Manager are you using?

    • BrooksD September 22, 2010 at 7:02 am #

      Hi Ron, I'm using 3.1 L11 for Mac, but any version of the software that comes with the S1500, S1500M, or S1300 should have keyword highlighting functionality built in.

  3. Mark September 24, 2010 at 1:36 am #

    Hi. I'm trying to get this to work and having no luck populating the keywords when scanning. I'm using the Scansnap S1500m, a yellow highlighter and then scansnap settings you suggest and its not picking up the highlighted text. I wondered if you had any ideas where I'm tripping up.
    Many thanks

    • BrooksD September 24, 2010 at 7:10 am #

      Hi Mark, this might sound strange but have you tried a different highlighter? I have heard that green ones work best. If not, the ScanSnap Manager help has a pretty comprehensive list of things to try. I copied the relevant parts to a PDF if you prefer: http://cache.documentsnap.com/docs/keywordhelp.pd….

      • Mark September 28, 2010 at 7:16 am #

        Appreciate your reply BrooksD but I've tried a couple of highlighters, including a green one that was recommended. Still no luck. Its as if it just doesn't have the recognize highlighted section ticked, even though it does. Running out of ideas as to why.

        • mark September 28, 2010 at 7:18 am #

          I do get a pop up message after ticking the keyword box in the scansnap settings that says "confirm the keyword for the pdf after scanning" which I have to click ok to.

  4. Finis P September 24, 2010 at 9:01 am #

    Running Version 3.0 L20 and the "Set the marked text as a keyword for the PDF file" is greyed out. I have been using Profiles for quite some time now and do not have the Quick Menu turned on. Interestingly, the ONLY way I can get the "Set the marked text" option to not be greyed out is to turn Quick Menu on. I have the profile set to black and white and still no joy. Any ideas?

    • BrooksD September 24, 2010 at 9:06 am #

      Hi Finis, I think it might have to do with your profile color setting. Here's what the help says:

      This checkbox can be selected only under the following conditions:
      [PDF(*.pdf)] is selected in the [File format] pop-up menu.
      [Auto Color Detection] or [Color] is specified for [Color mode] in the [Scanning] tab.
      ScanSnap S1500 / S1500M / S1300 / S510M is connected.

      So, I think you need to be using Auto or Color for your Color mode. Give that a try?

  5. Mark October 6, 2010 at 5:29 pm #

    I have S1300 with OS X 10.6.4 and most recent ScanSnap software. Often, when using the "Set the market text a a keyword", the keywords appear when the PDF is viewed with Acrobat, but are not shown with Preview. Any idea why this is happening?

    • BrooksD October 8, 2010 at 11:44 am #

      That's weird Mark, I haven't experienced myself. I am traveling at the moment and can't play around with that but when I return I'll see if I can reproduce. Very strange.

  6. Art July 29, 2011 at 5:23 pm #

    In the "kale" example you describe above, in which you are using Hazel's ability to find the presence of a text string in the file's contents, the Hazel rule's action is to move the file to a "Filed Documents" folder. Is there a way for Hazel to pass the value of the "contents" variable entered manually in Hazel, to the Hazel command Rename File? Or alternatively pass it to "Sort file into subfolder"? The idea is to use that content text string to rename either the file or the subfolder, respectively. After repeated experimentation, as best I can determine, Hazel seems unable to do this. I have also tried using the value manually inputted with the "Text Content" variable (selected from Spotlight using Hazel's "Other" variable) in both the "conditions" and the "action" parts of the Hazel rule. Still no soap. Any ideas? Is this the hard reality of how these things (ie., Spotlight metadata and Hazel) "play" with each other? If so, then as an old TV comedian used to say, "Whadda ravoltin' duhvelupmunt!"

    • Justin August 4, 2011 at 2:40 am #

      You can use keywords that you selected via a highlighter in the file name with "other" it's name is keywords. But what you describe I don't believe can be done in the way you want it too. But I most likely can be done if you use applescript or Automator if you think outside of the box.

      I got hazel to tag my PDF on import to Evernote using said highlighter method with both AppleScript and automator. I have a post on the hazel forums about it right now, it might give you some ideas.

  7. Andrew March 12, 2012 at 5:30 pm #

    Very inspiring. I went a little further and used multiple keywords…

    I highlight the month, year and account name before scanning and have ScanSnap create keywords. Then using Hazel, Automator and some AppleScript I parse these keywords and rename the pdf file based on them — e.g., 2012-03 BofA Checking.pdf. After renaming, I have another Hazel rule move the file into the subdirectory corresponding to the account name. Works really well!

  8. @undefined October 18, 2012 at 11:00 pm #

    If I'm LOVING Evernote ScanSnap integration, how would Hazel fit into the picture? I actually use my old (1 year old) HP windows laptop that's now my dedicated ScanSnap machine. My family workflow is:
    1. Get document to scan (letter, receipt, instruction manual, etc.)
    2. Flip open scanner (windows laptop is always on)
    3. Load document and hit blue button
    4. Throw document in trash.
    Check on my Mac or phone later and see that document got scanned OK. If not, dig the document out of the trash or more likely find out that the document scanned but not to Evernote.

    To find some document, just try to search Evernote for some relatively unique word. The web interface is a bit better as you don't have to wait for synchronization on the desktop app.

    • BrooksD October 19, 2012 at 9:46 am #

      Great, thanks undefined!

  9. Steve November 19, 2012 at 4:06 pm #

    Brook, I, like other commenters above, have a ScanSnap S300M, which lacks this functionality. Are you, or any of your other readers, familiar with any other OCR software for the Mac that can extract keywords based on highlighted text, as you have described above?

    Extensive googling has turned up nothing for me.

    • BrooksD November 19, 2012 at 4:11 pm #

      Sorry, I am not aware of any other software that will do the highlighter part. Any OCR software (like PDFPen) will let you OCR the document which Hazel can then act on, and you can manually give PDFs keywords using Preview which Hazel can act on, but the recognized highlighter part I am not aware of, sorry.

  10. Steve November 19, 2012 at 4:36 pm #

    Thanks for the quick reply.

    Unfortunately, my investigations turned up pretty much what you said – lots of OCR options, but none that can extract single pieces of marked text.

    With something like a bank statement that has many dates of different transactions, what I want to be able to do is to extract just the date of the statement itself, which will be difficult without something like the process that you described in this article.

  11. Kera18 December 29, 2013 at 7:31 pm #

    I loved this write up. I'm a little behind as obviously some of the last posts were about a year ago. I just got the s1300i. I have a Mac. Is Hazel still the best option for helping to manage my filed papers?

  12. Cliff January 15, 2014 at 10:33 am #

    Hello,

    I have a profile setup to automatically add my scan to Evernote. Evernote uses the file name as the name of the new note. I want to combine this profile with Hazel and a highlighter to change my file name before it gets to Evernote so my note’s names make more sense. Is this possible?

    Thanks for the excellent write-up,

    Cliff

    • Brooks Duncan January 15, 2014 at 10:39 am #

      Hi Cliff,

      Sure that shouldn’t be a problem. First follow the instructions above to have it rename the file based on the keywords, and then check out this post to add on the Evernote part: http://www.documentsnap.com/behind-these-paperless-evernote-hazel-eyes/.

      If you’re on Mavericks, you may need to change it slightly. Check the comments on that post for that.

      • Cliff January 15, 2014 at 10:49 am #

        Thanks Brooks and excellent workflow episode of MPU you were on.

  13. Chip April 9, 2014 at 8:14 pm #

    Brooks,
    Heard you on MPU and have been dying to get myself into the Paperless fast lane ever since. I have a Mac and now have Hazel and an iX500 but I am having some issues with Hazel executing my rules. I’ve gotten over the Hazel rules learning curve (thanks to Noodlesoft’s awesome support) but I am having a problem with OCR that is causing my rules to execute inconsistently and am wondering if you have any experience that might help me out.

    I am using Hazel to look for key phrases in scanned PDF’s to use to file the scanned materials and the result of the OCR does not match the written words. For example a page that contains:
    “Transaction Confirmation” when read instead has
    “Transact i on Con\Ufb01rmat i on” when I look at the file using mdimport -d2 (from the terminal)

    When I open the PDF in PDFPen Pro and search for Transaction Confirmation the phrase is found correctly.

    FYI, I let the ScanSnap iX500 perform the OCR, although I am considering changing the work flow to allow PDFPen Pro perform the OCR if the results will be more consistent. I could also simply let PDFPen Pro perform the search instead of Hazel although I am concerned it will slow down the execution of the workflow.

    Any thoughts you have would be appreciated!

    • BrooksD April 9, 2014 at 8:19 pm #

      Hmm interesting Chip. If you select all the text in the document and then copy and paste it into, say, TextEdit, does it show up as “Transaction Confirmation”, or the messed up version? That’d be the first step.

      Also make sure you are scanning at a decent resolution. I do 300dpi. Anything lower than that and you might not get good OCR results.

  14. Emily May 29, 2014 at 3:07 pm #

    Is there any way to do something similar in Windows? There must be a more efficient way of renaming the scanned images besides opening them, noting the contents, closing them, finding them again, and manually renaming them.

Leave a Reply