Use Hazel To Magically Process Downloaded PDFs

Use Hazel To Magically Process Downloaded PDFs

You know how sometimes you come across a tip that is so clever you are both happy to have learned it and annoyed that you didn’t think of it yourself at the same time?

That’s how I felt sitting in Katie Floyd’s Going Paperless presentation at Macworld 2011 (more on my Macworld roundup here).

It is well known that I love a little Mac app called Hazel that allows you to create rules to perform almost any function on your computer. One great use of Hazel is to move all the PDFs that you are downloading from the web now that you are going paperless.

Katie’s tip is to use the Source URL of the downloaded PDF as a basis for doing the processing. Brilliant! Have no idea what on earth I just said? No worries, I’m about to go through it.

Where Was That PDF Downloaded From?

You may not know this, but when you download a PDF from the web, it has the address that it was downloaded from embedded in the file.

The easiest way to see this is to highlight the file in Finder and hit Command-i. It will bring up the Info window, and look under the “More Info” section.

You’ll see a bunch of junk in there including this:

Where from

Obviously the URL that you downloaded it from will be there, not mine. Since we have the URL embedded, we can use it to do stuff later.

Create A New Rule In Hazel

Fire up Hazel, and if you haven’t already, add the folder that you download PDFs to in the folder on the left. Then we want to create a new rule, so hit the Plus button in the right column.

New Hazel Rule

Create The Conditions

Now lets give the rule a name, and create the conditions that Hazel will use to figure out which file to act on. The conditions we’ll use in this case are:

  • File type is PDF
  • Source URL is scotiaitrade.com

Again, substitute your own domain. Here’s the screenshot of the conditions:

Hazel conditions

Figure Out What To Do With The File

In all likelihood, the filename that your bank or vendor uses is totally useless. Sure you could type it in yourself when you save it, but we are trying to be as automated as possible here.

What I am going to do is:

  • Rename the file using the convention yyyy_mm_dd-name-type.pdf
  • Move the file to my Statements folder

Obviously use your own naming convention and move it to wherever is appropriate for you.

Before the screenshot, I know some of you are going to ask “How do I move it to Evernote?” It is your lucky day, I already wrote a post about how to send PDFs to Evernote using Hazel so you can combine the two.

First in the Do The Following section, I will change it to Rename File, and the default pattern it gives you is “name” and “extension”. We’ve already established that the name is no good, so delete it.

Drag “date created” up to the pattern section, and it will look like this:

Hazel date created

Take a look at the example at the bottom. That is not the date format that we want, so click on date created and choose Edit Date Pattern. The yyyy mm dd looks good, but click into the box and replace the dashes with underscores.

Hazel date pattern

Hit Done and now the date format is looking good. We just need to give the file a name. In this case, I know that any PDF that I download from scotiaitrade.com is going to be my RRSP statement. So, for this rule, I want the file to be called yyyy_mm_dd-ScotiaiTrade-RRSP.pdf.

Click into the pattern box and just type in “-ScotiaiTrade-RRSP” between “date created” and “extension”.

Hazel finished rename

Next we want to move it. Click the plus sign beside our rename rule, and create a Move rule that moves it to your folder.

Here’s the final rule:

Hazel rule

Click OK, and if you want to see if this will work, click on the little gear icon on the bottom and choose Preview Rule Matches. Looks like it should work!

Hazel Rule Preview

Depending on the other Hazel rules going on, it may take a while to kick off. If you’re impatient, you can run it manually by clicking on the Hazel icon up in your menu bar.

And look, there it is in my Statements folder:

Hazel file renamed

Lets Get Geeky(-er)

For 90% of normal people, the above will work fine. There is only one problem that I know will drive me crazy.

The statement that I am downloading today (in February) is actually January’s statement, and having that February date in the filename will just not do. Unfortunately, I don’t believe that Hazel has the functionality to do any sort of date math, so for me TextExpander will have to come to the rescue.

If you are not familiar with TextExpander, it is a Mac application like Hazel: you don’t really think you need it until you have it, and then once you have it you can’t imagine life without it. Basically, it allows you to type in small snippets of text and have them expand (get it?) to longer ones.

In this case, what I am going to do is take the renaming step out of the Hazel rule and then set up a TextExpander snippet to do the filenaming for me. I am not going to go through the ins and outs of TextExpander in this post but here is a screenshot of my rule:

TextExpander

That looks super-ugly (don’t let it scare you off TextExpander, most rules wouldn’t be like this), but basically %@-1M%Y_%m_%d-ScotiaiTrade-RRSP is saying “subtract one month from today’s date and format it as yyyymmdd”. Now when I type ,its, TextExpander automatically turns it into 2011_01_15-ScotiaiTrade-RRSP.

So if you are still following along, my updated workflow is:

Download PDF > When the Save As box pops up, type ,its which automatically sets the filename with a date one month ago > Save it to Downloads > Hazel sees it, looks at the Source URL, and automatically moves it to Statements.

What About Tags?

I personally use tags for my documents, so the next logical step would be to have Hazel assign OpenMeta tags to the documents. However, that is getting a little geeky even for this post. If it is something you’d like me to post about, drop a note in the comments and I will put something together.

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply 16 comments

How Do You Track Payable Paperless Bills? - September 25, 2014 Reply

[…] might remember that I have put together this great workflow for processing downloaded PDFs using Hazel and a bit of TextExpander. I’ve even enhanced it by having Hazel apply OpenMeta […]

DocumentSnap Time Machine | Tips To Learn How To Go Paperless | DocumentSnap Paperless Blog - February 19, 2012 Reply

[…] Use Hazel To Magically Process Downloaded PDFs Love me some Hazel. A great way to automate processing downloaded PDFs. […]

Jack Forbush - June 19, 2011 Reply

Brilliant workflow although I could use some assistance. I receive multiple "faxes" a day through a fax-to-email service. They arrive to me as a PDF attachment from the said fax-to-email service. In a utopian world, I'd like to create a workflow that not only performs an OCR on these documents, but also generates tags for improved search and retrieve attempts. I have the OCR workflow created with PDFPenPro and the documents are saved to a particular "documents for review" folder via an Automator script, but I can't figure out how to have tags automatically generated. For example,

1. I receive the PDF from "newfax@ufax.net"
2. via an automator script, the attachment is collected and placed in my "documents for review" folder with the addition of a prefix in the automator script, ie "uFax_xxxxx.PDF"
3. via a folder action (AppleScript on PDFPenPro), these documents are OCR'd
4. what I'd love to do is have tags automatically generated based on what the OCR picks up….say if it's from company ABC (which I can't determine unless I open the document and read it myself; however, the cover page of the "fax" indicates if it's company ABC or company XYZ), I'd like to have this PDF tagged with "company ABC"

Any suggestions?

    Brooks Duncan - June 19, 2011 Reply

    Assuming the OCR is reliable, you could probably do this with a combination of the "Forget Keywords" section of this post: http://www.documentsnap.com/scansnap-and-hazel-is… and this post: http://www.documentsnap.com/hazel-for-downloaded-….

    So for the Hazel condition, you could have something like "contents contains ABC", and for the "Do the following", you could have it apply the ABC Company OpenMeta tag.

    Should be doable, let me know if this doesn't make sense.

Aleh Cherp - March 19, 2011 Reply

I am also interested in learning to use Openmeta tags with Hazel

Scott - February 28, 2011 Reply

What, exactly, does the tagging of the downloaded PDFs? I was quite psyched when I read this, as the tedious task of renaming and tagging PDF statements from various institutions is, well, tedious. Imagine my surprise when the first ones I tried (from USAA bank, for what it's worth) didn't have ANY info in the 'More Info' section whatsoever. Nothing for 'Where From', or any other attributes.

I tried with both Chrome and Safari, so it doesn't seem like a browser-specific issue.

if it's a Mac OS X feature, could there be some default somewhere that controls it?

p.s. Also quite interested in your use of OpenMeta tags – yes please!

    Brooks Duncan - February 28, 2011 Reply

    Hmm that is strange, never encountered the info not being there before. I'll look into what controls that.In the meantime, what you could do is, when saving the PDF, name it “usaa.PDF” or something and then adjust the Hazel rule to act off that name.

      Scott - March 1, 2011 Reply

      Thanks for the quick response!

      Given your conviction, It occurred to me that perhaps I was somehow missing something. I tried again on a different MacBook and – lo and behold! – there it was, plain as day.

      Digging further, I used the command-line utility 'xattr' to list the extended attributes on both PDFs (on each macbook) — sure enough, the 'WhereFrom' attribute existed on BOTH.

      So the only issue left is — why doesn't it show up in the Get Info window in the Finder on that Mac? Ah, I may never know. At least the mystery is solved.

      Thanks again 🙂

        Brooks Duncan - March 2, 2011 Reply

        Great Scott, glad it is working. I was starting to doubt myself!

Michelle - February 16, 2011 Reply

Ooooo. More Hazel! More geekiness! Yes, please!

Hozman - February 16, 2011 Reply

Thanks for the post. Count me three for tags. I love Hazel, but so far I have been using Devonthink pro Office for PDF downloads, then do the filing. I download a lot of medical journal PDFs by virtue of my profession, so I need a way to sort them. Thanks a bunch.

Renaud - February 15, 2011 Reply

I forgot : thanks for your great tips and tutorials !

Renaud - February 15, 2011 Reply

Count me too, as an OpenMeta/Tags/HoudahSpot user, and future Hazel user.

Tweets that mention Use Hazel To Magically Process Downloaded PDFs | DocumentSnap -- Topsy.com - February 15, 2011 Reply

[…] This post was mentioned on Twitter by Katie, Brooks Duncan. Brooks Duncan said: Just posted: Use Hazel To Magically Process Downloaded PDFs – http://bit.ly/ebJz5r […]

Bill - February 15, 2011 Reply

Count me as someone who would love to know how you use hazel to manage your OpenMeta tags. (And then pretend that I have about a hundred friends who would like the same thing but didn't bother to post a comment). If 100 isn't a big enough number, then pretend it is 1000. These are imaginary friends, after all 🙂

    Brooks Duncan - February 15, 2011 Reply

    Ha OK Bill, twist my arm. Obviously I was just looking for an excuse/validation to do it anyways. 🙂

Leave a Reply: