ScanSnap and Hazel Is A Match Made In Paperless Heaven

ScanSnap and Hazel Is A Match Made In Paperless Heaven

What is the holy grail of going paperless?

There are a lot of tricks out there for keeping your documents organized based on their location or filename, but the holy grail is to be able to keep them organized based on the actual contents of the documents themselves.

In other words, our computer does the work for us.

I have written before about how the Fujitsu ScanSnap allows you to use a highlighter pen to automatically assign keywords to a PDF.

However, once you have those keywords assigned, how does that help you?

If you’re on Windows, you can use the Distribute By Keyword feature of the included ScanSnap Organizer to move the files to a cabinet, but Mac users are out of luck there.

I humbly submit that using a highlighter, OCR, and the awesomeness that is Hazel, Mac users can one-up even the mighty ScanSnap Organizer.

What Is Hazel?

For years now, I have been engaged in a torrid love affair with a Mac application known as Hazel from Noodlesoft. At a very high level, it lets you create rules to automatically keep your files organized.

I have written about how you can use Hazel with Evernote, and David Sparks at Macsparky has a great guide for moving PDFs based on filename.

I wanted to do something that would marry the searchable goodness of the ScanSnap with the ninja skills of Hazel.

Set Up The ScanSnap For Keyword Highlighting

The first thing you’ll need to do is set up a ScanSnap Manager profile to read highlighted text and make keywords out of it.

First, on the Scanning tab, I have had best luck setting the Image quality to “Best” (300dpi). At anything lower, the ScanSnap wasn’t picking up the keywords consistently.

Image quality

Then on the File Option tab, make sure that “Set the marked text as a keyword for the PDF file” is checked. That will tell it to look for any highlighted text and turn it into a keyword in the PDF.

Set marked

You will, of course, want to choose a folder to save the PDF to. Make a note of this folder because we will need it when we switch to Hazel. In my case it is called ToMove.

Get Out Your Highlighter

Is it Hi-liter or Highlighter? I never know. Anyways, now take your pen and highlight the word or phrase that you want to move the file based on.

Essentially what we will be doing is saying “if the PDF contains this keyword, do something with it”.

All I have handy are grocery receipts, so you can see I highlighted “EXTRA FOODS”.

Grocery receipt highlighted

Scan And Check Keywords

Now scan your document using your shiny new ScanSnap Manager profile. When it is done, open up your new PDF in Preview, go to Tools > Inspector (or hit Cmd-I), and click on the magnifying glass. If everything worked properly, you should see the text that you highlighted.

PDF with keywords

Set Hazel To Move Based On Keyword

Let’s say we want to move any PDF with the keyword “EXTRA FOODS” to a folder called Filed Documents (we’d probably want to move it to a grocery-specific folder, but let’s just pretend).

Open up Hazel and on the left side, click the Plus to add a new folder. Add your ToMove folder that you used as a scan destination in ScanSnap Manager.

Hazel To Move

Now in the right pane, click the plus to add a new rule. Give it a name.

You can set a number of criteria and rules here, but to keep it simple we will leave it as “all conditions”, then set:

  • Kind is PDF
  • Keywords contain EXTRA FOODS

Next, set it to Move the file to folder Filed Documents

Hazel move based on keyword

Hit OK to save it. If you want to see what your rule will catch, you can click on the little Gear icon near the bottom and choose “Preview Rule Matches”. If everything is set up properly, your newly-scanned document should show there.

If it doesn’t show, check the PDF to make sure that it really has keywords and re-check your rule setup.

If your document shows in the preview, either wait for Hazel to do its thing, or click on the Hazel icon in the Menu bar, choose Run Rules, and choose the rule that you just created.

Set Hazel To Rename Based On Keyword

Let’s say that instead of moving a file based on a certain keyword, we want to give our files a name based on the highlighted text. Is this possible? Why yes, yes it is. Let’s use our new Hazel Ninja powers and do it.

Create a new Hazel rule as we did before, but this time for the criteria, set this:

  • Kind is PDF
  • Keywords is not blank

Next, in the “Do the following” section, choose “Move file” to folder “Filed Documents” (if you choose), and then set up the following:

  • Choose Rename file
  • In the with pattern section it will say “name” and then “extension”. Click on “name” and hit the delete key. We want to get rid of that.
  • Let’s give the filename a date. Drag “date created” up before extension. If you prefer, click the little down arrow in “date created” and choose Edit Date Pattern and change to whatever pattern you choose.
  • Drag “other” up between “date created” and “extension”. It will ask you to select a Spotlight Attribute. Scroll down to find Keywords and hit Select.
  • If you prefer, click on the little down arrow in “keywords” and change which keywords are selected and how they are formatted.
  • You might want to click between “date created” and “keywords” and put a dash, but that is up to you.

Your final rule should look something like this:

Hazel move rename keywords

Now when we scan that same Extra Foods receipt, our Hazel rule will move the file to Filed Documents and rename it like this.

Renamed PDF

Forget Keywords, Use Hazel To Move Based On Searchable Text

Let’s say you want to forget about this whole highlighter/keyword thing. You already have scanned and searchable PDFs. Can’t you just move based on the OCR’ed text in the documents? Let’s find out.

So you really, really like the vegetable kale and you want to move any scanned receipt that has the word Kale in it (can you tell all I had around for this demo is grocery receipts?).

First, here is our receipt:

Kale receipt

Next, we obviously need to be using a ScanSnap Manager profile that has “Convert to searchable PDF” checked on the File Options tab. Again you will have better results if you use 300dpi for Image quality.

Now we set up another Hazel rule, this time using the following criteria:

  • Kind is PDF
  • Contents contain Kale

Then do something with it such as move it to Filed Documents.

Hazel OCR Rule

Now when you scan a document that has the word “Kale” in it, Hazel will move it.

Bonus: You can even have Hazel read the dates from the text of the PDF and use them in your filename. Here is how to do that.

(By the way, if you’re a Windows user, there is a similar tool called File Juggler.)

There Is A Lot You Can Do With Hazel

These were a few examples of things you can do in Hazel to be a document management ninja. Hopefully it will give you some ideas.

Remember that OCR is never 100% perfect, and the effectiveness of these rules will be dependant on the quality of the scan and OCR.

Do you have other Hazel-eriffic document tricks? Drop a comment and let us know.

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply 38 comments

Annamarie - April 9, 2018 Reply

Hi,

I love Hazel and I love Scansnap. I have an action folder and lots of rules. BUT EVERY SINGLE TIME a window comes up that asks me where I want to save the file. I made a screenshot: https://www.dropbox.com/s/lle8saxgtp0vop3/Screen%20Shot%202018-04-09%20at%209.46.39%20AM.png?dl=0

I’ve looked and looked – no one else seems to have this issue. IT DRIVES ME NUTS. Where do I turn it off? Thanks.

BurnFatDiet.us - December 9, 2017 Reply

I blog quite often and I truly thank you for your content.
Your article has really peaked my interest. I’m going to bookmark your site and keep checking for new information about once a week.
I subscribed to your Feed too.

Julie Shoeman - May 27, 2017 Reply

I tried finding Hazel to no avail. Its no longer available on the app store. What are the other options at this point? This article is old, so can you update it? I appreciate your website.

    Geoff - June 7, 2017 Reply

    The Hazel website link is in the article he linked to in other comments…. here it is https://www.noodlesoft.com

Kmac - January 1, 2015 Reply

Well I have tried this for almost 6 months.. I can’t get this damn thing to find any contents or Keywords. The Keyword is there because I can see it in the inspector but Hazel refuses to look inside the contents and find anything. Is there a special setting to make hazel look into a pdf and read the contents? this is so infuriating and frustrating. I have tried everything on this page and none of it works..

    Brooks Duncan - January 5, 2015 Reply

    Hm. I know you can see the keywords but are you sure the PDF itself is searchable? In Preview, try highlighting all the text, then go Edit > Select All, then Edit > Copy, and then in a text editor or word processor paste the text in.

    Is the text you see there what you’re expecting?

    If the text looks right then something weird must be going on. I’d go to the Noodlesoft Support or the NoodleSoft Forum http://www.noodlesoft.com/forums/ and the author should be able to figure out what is going on.

      Kmac - January 11, 2015 Reply

      Thanks.. will do.. if hazel was my wife i would be arrested for spousal abuse LOLOL : )

    Trickyt57 - August 12, 2018 Reply

    I had the same problem. I solved it. I think you may be writing more than 1 rule. The second rule does not execute. Finish your first rule with “continue matching rules”

    Explanation:

    I wanted to put a pile of unsorted doocuments on my scanner, scan to PDF have Hazel rename them intelligently then move them to the correct folder.

    My default format is

    000000-doc_name-yyyy.mm.dd-yyyy.mm.dd.pdf

    E.g. “0000098-citibank statement-2018.07.31-2018.08.15.pdf”

    The first number is a sequential document number.
    The first date is the date inside the document. (I have this set to “automatic”, rather than”pattern”
    The second date is the scan date.

    Firstly make sure your scanner default rules are set so that docs are scanned as “READABLE PDF”s and not just PDF. Depending on your equipment, your scanner default rules may be found on the scanner control panel, or on your Mac. For my HP PageWide Pro 477dw Multifunction Printer, the software is on the mac as part of the hp software under scanner.

    After the renaming rule, the scanner did not proceed to the next rule to move the file.

    So do this:

    Secondly, (and this is where the OP may be going wrong), finish your previous rule with the following rule:

    Continue matching rules

    This action tells Hazel to continue matching against subsequent rules instead of stopping. Normally Hazel stops once it finds a match, but this action indicates that rule evaluation should continue. Note that you cannot continue evaluation if the file or folder is moved out of the monitored folder. Therefore, you can’t use this action in conjunction with the “Move” or “Sort into subfolder” actions.

Emily - May 29, 2014 Reply

Is there any way to do something similar in Windows? There must be a more efficient way of renaming the scanned images besides opening them, noting the contents, closing them, finding them again, and manually renaming them.

Chip - April 9, 2014 Reply

Brooks,
Heard you on MPU and have been dying to get myself into the Paperless fast lane ever since. I have a Mac and now have Hazel and an iX500 but I am having some issues with Hazel executing my rules. I’ve gotten over the Hazel rules learning curve (thanks to Noodlesoft’s awesome support) but I am having a problem with OCR that is causing my rules to execute inconsistently and am wondering if you have any experience that might help me out.

I am using Hazel to look for key phrases in scanned PDF’s to use to file the scanned materials and the result of the OCR does not match the written words. For example a page that contains:
“Transaction Confirmation” when read instead has
“Transact i on Con\Ufb01rmat i on” when I look at the file using mdimport -d2 (from the terminal)

When I open the PDF in PDFPen Pro and search for Transaction Confirmation the phrase is found correctly.

FYI, I let the ScanSnap iX500 perform the OCR, although I am considering changing the work flow to allow PDFPen Pro perform the OCR if the results will be more consistent. I could also simply let PDFPen Pro perform the search instead of Hazel although I am concerned it will slow down the execution of the workflow.

Any thoughts you have would be appreciated!

    Brooks Duncan - April 9, 2014 Reply

    Hmm interesting Chip. If you select all the text in the document and then copy and paste it into, say, TextEdit, does it show up as “Transaction Confirmation”, or the messed up version? That’d be the first step.

    Also make sure you are scanning at a decent resolution. I do 300dpi. Anything lower than that and you might not get good OCR results.

Cliff - January 15, 2014 Reply

Hello,

I have a profile setup to automatically add my scan to Evernote. Evernote uses the file name as the name of the new note. I want to combine this profile with Hazel and a highlighter to change my file name before it gets to Evernote so my note’s names make more sense. Is this possible?

Thanks for the excellent write-up,

Cliff

    Brooks Duncan - January 15, 2014 Reply

    Hi Cliff,

    Sure that shouldn’t be a problem. First follow the instructions above to have it rename the file based on the keywords, and then check out this post to add on the Evernote part: http://www.documentsnap.com/behind-these-paperless-evernote-hazel-eyes/.

    If you’re on Mavericks, you may need to change it slightly. Check the comments on that post for that.

      Cliff - January 15, 2014 Reply

      Thanks Brooks and excellent workflow episode of MPU you were on.

Kera18 - December 29, 2013 Reply

I loved this write up. I'm a little behind as obviously some of the last posts were about a year ago. I just got the s1300i. I have a Mac. Is Hazel still the best option for helping to manage my filed papers?

Steve - November 19, 2012 Reply

Thanks for the quick reply.

Unfortunately, my investigations turned up pretty much what you said – lots of OCR options, but none that can extract single pieces of marked text.

With something like a bank statement that has many dates of different transactions, what I want to be able to do is to extract just the date of the statement itself, which will be difficult without something like the process that you described in this article.

Steve - November 19, 2012 Reply

Brook, I, like other commenters above, have a ScanSnap S300M, which lacks this functionality. Are you, or any of your other readers, familiar with any other OCR software for the Mac that can extract keywords based on highlighted text, as you have described above?

Extensive googling has turned up nothing for me.

    Brooks Duncan - November 19, 2012 Reply

    Sorry, I am not aware of any other software that will do the highlighter part. Any OCR software (like PDFPen) will let you OCR the document which Hazel can then act on, and you can manually give PDFs keywords using Preview which Hazel can act on, but the recognized highlighter part I am not aware of, sorry.

@undefined - October 18, 2012 Reply

If I'm LOVING Evernote ScanSnap integration, how would Hazel fit into the picture? I actually use my old (1 year old) HP windows laptop that's now my dedicated ScanSnap machine. My family workflow is:
1. Get document to scan (letter, receipt, instruction manual, etc.)
2. Flip open scanner (windows laptop is always on)
3. Load document and hit blue button
4. Throw document in trash.
Check on my Mac or phone later and see that document got scanned OK. If not, dig the document out of the trash or more likely find out that the document scanned but not to Evernote.

To find some document, just try to search Evernote for some relatively unique word. The web interface is a bit better as you don't have to wait for synchronization on the desktop app.

    Brooks Duncan - October 19, 2012 Reply

    Great, thanks undefined!

Andrew - March 12, 2012 Reply

Very inspiring. I went a little further and used multiple keywords…

I highlight the month, year and account name before scanning and have ScanSnap create keywords. Then using Hazel, Automator and some AppleScript I parse these keywords and rename the pdf file based on them — e.g., 2012-03 BofA Checking.pdf. After renaming, I have another Hazel rule move the file into the subdirectory corresponding to the account name. Works really well!

    Don - September 30, 2015 Reply

    I would love to hear more detail how you are accomplishing this what you describe is EXACTLY what I am trying to do!

Art - July 29, 2011 Reply

In the "kale" example you describe above, in which you are using Hazel's ability to find the presence of a text string in the file's contents, the Hazel rule's action is to move the file to a "Filed Documents" folder. Is there a way for Hazel to pass the value of the "contents" variable entered manually in Hazel, to the Hazel command Rename File? Or alternatively pass it to "Sort file into subfolder"? The idea is to use that content text string to rename either the file or the subfolder, respectively. After repeated experimentation, as best I can determine, Hazel seems unable to do this. I have also tried using the value manually inputted with the "Text Content" variable (selected from Spotlight using Hazel's "Other" variable) in both the "conditions" and the "action" parts of the Hazel rule. Still no soap. Any ideas? Is this the hard reality of how these things (ie., Spotlight metadata and Hazel) "play" with each other? If so, then as an old TV comedian used to say, "Whadda ravoltin' duhvelupmunt!"

    Justin - August 4, 2011 Reply

    You can use keywords that you selected via a highlighter in the file name with "other" it's name is keywords. But what you describe I don't believe can be done in the way you want it too. But I most likely can be done if you use applescript or Automator if you think outside of the box.

    I got hazel to tag my PDF on import to Evernote using said highlighter method with both AppleScript and automator. I have a post on the hazel forums about it right now, it might give you some ideas.

Mark - October 6, 2010 Reply

I have S1300 with OS X 10.6.4 and most recent ScanSnap software. Often, when using the "Set the market text a a keyword", the keywords appear when the PDF is viewed with Acrobat, but are not shown with Preview. Any idea why this is happening?

    Brooks Duncan - October 8, 2010 Reply

    That's weird Mark, I haven't experienced myself. I am traveling at the moment and can't play around with that but when I return I'll see if I can reproduce. Very strange.

Finis P - September 24, 2010 Reply

Running Version 3.0 L20 and the "Set the marked text as a keyword for the PDF file" is greyed out. I have been using Profiles for quite some time now and do not have the Quick Menu turned on. Interestingly, the ONLY way I can get the "Set the marked text" option to not be greyed out is to turn Quick Menu on. I have the profile set to black and white and still no joy. Any ideas?

    Brooks Duncan - September 24, 2010 Reply

    Hi Finis, I think it might have to do with your profile color setting. Here's what the help says:

    This checkbox can be selected only under the following conditions:
    [PDF(*.pdf)] is selected in the [File format] pop-up menu.
    [Auto Color Detection] or [Color] is specified for [Color mode] in the [Scanning] tab.
    ScanSnap S1500 / S1500M / S1300 / S510M is connected.

    So, I think you need to be using Auto or Color for your Color mode. Give that a try?

Mark - September 24, 2010 Reply

Hi. I'm trying to get this to work and having no luck populating the keywords when scanning. I'm using the Scansnap S1500m, a yellow highlighter and then scansnap settings you suggest and its not picking up the highlighted text. I wondered if you had any ideas where I'm tripping up.
Many thanks

    Brooks Duncan - September 24, 2010 Reply

    Hi Mark, this might sound strange but have you tried a different highlighter? I have heard that green ones work best. If not, the ScanSnap Manager help has a pretty comprehensive list of things to try. I copied the relevant parts to a PDF if you prefer: http://cache.documentsnap.com/docs/keywordhelp.pd….

      Mark - September 28, 2010 Reply

      Appreciate your reply BrooksD but I've tried a couple of highlighters, including a green one that was recommended. Still no luck. Its as if it just doesn't have the recognize highlighted section ticked, even though it does. Running out of ideas as to why.

        mark - September 28, 2010 Reply

        I do get a pop up message after ticking the keyword box in the scansnap settings that says "confirm the keyword for the pdf after scanning" which I have to click ok to.

Ron C - September 22, 2010 Reply

And what version of ScanSnap Manager are you using?

    Brooks Duncan - September 22, 2010 Reply

    Hi Ron, I'm using 3.1 L11 for Mac, but any version of the software that comes with the S1500, S1500M, or S1300 should have keyword highlighting functionality built in.

Julie K - September 21, 2010 Reply

I don't have those options with S300M V2.2 L12. Is this in new 1500 and 1300 hardware only?

    Brooks Duncan - September 22, 2010 Reply

    Hi, the options were new with the S1500, S1500M, and S1300. Unfortunately they aren't (as far as I know) in the software that came with the S300M.

    Brooks Duncan - September 22, 2010 Reply

    I should mention that you can still give PDFs keywords using Preview, it's just that the ScanSnap won't do it automagically with a highlighter.

Leave a Reply: