Archive › Processing

How To Split PDF Documents Into Single Pages Using Mac OSX

Today a consulting client had an issue that we’ve all done: He scanned a stack of paper intending to make it one PDF per sheet, but instead it went into one big PDF.

Since he didn’t want to re-scan, I broke down a few options for how to split a PDF using the built-in tools of Mac OSX. You can think of this as a companion piece to How To Combine PDFs Using Mac OSX Automator.

Option 1: Use Preview To Split Pages

Preview.app (the application you use to view PDFs and images) has some document management tools under the hood.

To split a file into pages using Preview:

  • Open the file in Preview
  • If you don’t see a list of pages on the right-hand side, click the Sidebar button near the search bar to open it
  • Click and drag each page to your desktop or to a Finder window. It will then copy that page to its own PDF

Option 2: Use Automator To Split Pages

Much like combining PDF files to make one big one, you can split a PDF into separate pages using Automator.

There are a number of ways to do this of course, but in this example I will be making a Droplet. If you want to skip all this setup, I have attached my Droplet to the end of this post. It will hopefully work for you.

Ready? Here we go.

Start Automator

  • In Finder, go to Applications and then start Automator, the cool little robot icon

Choose Application

  • In the window that pops up, highlight Application and then hit Choose

Choose Application

Set Up The PDF Action

  • In the Library section on the left, you’ll see a line for PDF. Choose that
  • In the next column over, there is an option for PDF To Images. Click that and drag it into the the section on the right

PDF To Images

  • Choose where you want the PDF to be saved to by default
  • Choose if you want the new PDFs to have the same name as the original, or if you want to change it.
  • Choose if you want it to replace the original file
  • I want the ability to change it on the fly if needed, so I hit Options and then check Show this action when the workflow runs.

Here is what this step looks like:

Split To PDF Options

Nice work! Now you have your Automator action created. Go to File > Save As and save it either to your Desktop, your Applications folder, or anywhere else you desire.

Using The Droplet

You have just created a Droplet. This means that if you drag a PDF onto the icon, it will automatically run those actions you just created.

When you do this, if everything works well, a popup will come up asking where you want to save the PDFs and if you want to change the name. Choose and hit Continue.

Downloading The Droplet

As mentioned, if you don’t want to go through the hassle of setting this all up, you are welcome to use mine.

To use it, download the file to your computer, double-click it to Unzip it, and move the resulting “DSSplitPDF” file somewhere.

Once you’ve done that, follow the “Using The Droplet” instructions above.

Click Here To Download DSSplitPDF-1.1.zip

I am sure you all have more tips to split PDFs, either on Mac or Windows. School us in the comments!

(Photo by recursion_see_recursion)

Comments ( 20 )

Shoeboxed Review – Receipts, Business Cards, And Documents In The Cloud

I’ve mentioned Shoeboxed quite a few times here on DocumentSnap as a good way to handle receipts and business cards if you don’t have the time or the equipment to do the scanning yourself.

Basically how it works is you take your receipts, business cards, and other documents and put them into a pre-paid (usually- more on that later) envelope. You drop them in the mail, and then Shoeboxed does the scanning and processing for you.

I hadn’t used the service too much myself in the past because a) I don’t get too many business cards and b) for some reason, I had it in my head that the service was only available in the United States and since I am in Canada, I thought I was out of luck.

It turns out that I was (as so often is the case) wrong, and that the only thing you need to be in the US for is to use the pre-paid envelopes. You are more than welcome to use Shoeboxed if you are in another country, you just have to pay the postage yourself. Seems fair enough.

So with that new revelation, I decided to check the service out in the interests of science.

New Setup

Signing up is pretty simple. They have four plans, including a free one. The free one lets you upload and manage your documents, but of course they won’t actually scan your documents for you. From there it goes from $9.95/month up to $49.95/month depending on how much volume you have and what extras you need. They have a 30 day free trial too, and you can actually send stuff in for the trial.

To sign up, you just give your name, email, address, and you are good to go.

Once you get into the interface for the first time, you have a side menu where you can upload receipts and pre-set up some of the stores that you shop at, your credit cards, and set up your categories.

Shoeboxed Menu

The slightly weird thing is that you can manually upload receipts, but I couldn’t see how you can manually upload business cards or documents. You have to actually send those in.

Speaking of categories, I like how it gives you a set up default categories to work from instead of having to come up with your own. You can, of course, change the defaults as you see fit.

Shoeboxed default categories

As far as setup goes, I decided not to pre-set up any stores or credit cards because I wanted to see how Shoeboxed would handle things right out of the gate.

What I Sent

On 9/23/2010 I put the following documents in an envelope and dropped it in the mail.

What I sent:

  • White 8×11 typed document
  • White 8×11 document typed with Comic Sans (yeahhhh)
  • White 8×14 document with lots of graphics
  • Green 8×14 document double sided typed
  • Blue 8×11 document single sided
  • Horizontal Business card
  • Vertical Business card
  • White 8×11 document with handwriting
  • 3 receipts, one with a survey at the top

So for those counting along at home, that is 6 documents, 3 receipts, and 2 business cards.

I thought about leaving some staples in some of the documents, but decided that wouldn’t be nice so I pulled all the staples before sending it in.

Notification of Receipt

On 9/30/2010 I received an email from Shoeboxed saying that the envelope had been received. That’s five business days which is pretty impressive considering the fact that I mailed it from Canada. I would expect that using a pre-paid envelope within the United States would be much faster.

They said they would email when the envelope was processed, or when you log in there is an Envelope Status section that tells you where things are.

Envelope Status

Documents Processed

I received an email that my envelope was processed on 10/4/2010, so that is 4 elapsed days or 2 business days. Whether that is too long for you is a decision that is yours. I personally would be fine with it. The higher tier account that you have the faster their turnaround time is, so I assume they prioritize processing of Business and Classic accounts.

Scanning Results

When you first log in to Shoeboxed, it shows your last five receipts. Since I only sent in three, it obviously will only show three here.

New Receipts

First off I was impressed that they had the vendors and amounts right even though I did no pre-setup at all.

Receipts

The scanned receipts come out clear and you can click on each receipt to see the details associated with it. For me, it pre-populated the Vendor, Date, and Total amount. You can click to edit the tax, shipping/handling, currency, and itemize the receipt if you want.

Receipt

Once you have your receipts how you want them, you can send them to Quicken, Excel, Quickbooks, MYOB, Outright, or Evernote. You can also create an invoice from them with Freshbooks, which is handy if you are someone that has to bill your expenses.

Business Cards

Again, the scanning was very clear on both the front and back of the card. The name, title, company, email, and phone numbers were pre-populated. For one of the cards, it populated the city, country, and postal code but not the actual street address.

At first I thought this was a Shoeboxed error, but then I realized that there is a setting for business cards to “Collect street address from my business cards” which I did not check. My bad!

There’s an option to export the cards to Evernote which is handy.

Documents

As you might have seen in an earlier screenshot, the Documents menu has a “Beta” tag, so it is a bit of a work in progress.

Anything that you send them that is not a receipt or a business card will go into the Documents section. All the documents that I sent them appeared with titles and, as a nice touch, the date was the date in the document, not the date that they scanned it (even for the handwritten document).

Documents

All of the documents were scanned well, although they were greyscale. The blue and green paper showed up as grey. Not a big deal but just something to be aware of.

For each document you can edit the title and date and add notes.

Document

One document that I sent in was multi-sheet, and it appeared in Shoeboxed as two separate documents. I am not sure if they would have put them together if I had paper clipped them. Fortunately, they thought of this and have an easy “Merge Documents” feature to put things back together.

Unfortunately you can’t yet export documents to Evernote. Hopefully that will be coming soon.

Shopping Inbox

If you’d like, you can have stores such as Amazon send your receipts and ads to your Shoeboxed email address and they will appear in your “Shopping Inbox” section. Then if any of the emails are receipts, you can mark it and they will extract the receipt data. For some, it will automatically do so. As I was typing this I forwarded an Apple Store email to my Shoeboxed address, and before I was finished the post I had a new receipt in the system all properly tagged. Pretty cool.

Getting Your Stuff Back

If you want to have your paper receipts and documents mailed back to you, you’ll need to subscribe to either the Classic or Business plan. In all cases, you have the option to tell them to shred your documents for you.

Getting Data Out

Those of you who have read DocumentSnap for a while know that I am big on being able to get your data out if you are using a web service.

For receipts, you can export each individual receipt to PDF or Evernote. You can also group export a bunch of receipts to a number of different formats (that I outlined earlier), and choose to export all receipts or only certain categories.

For business cards, you can export your cards to Evernote, Constant Contact, Jigsaw, or to a CSV. I didn’t see a way to export a business card to a PDF, but maybe I am blind.

For documents, you can export each document as PDF. There isn’t yet a way (that I could see) to export all of your documents at once, but since Documents is still in Beta, I am sure that is coming.

All in all, I really like Shoeboxed. If you are someone who works with a lot of receipts or business cards, it is definitely something you want to check out. Give their free trial a whirl and see how you like it.

Do you have experience with Shoeboxed? Let us know in the comments how you like it.

Update: Shoeboxed has released a free iPhone app that acts as a business card scanner.  It’s pretty slick.

Comments ( 2 )

Use Adobe Acrobat To Add Pages To An Existing Document

Sometimes rather than creating new PDF files every time, you want to scan to an existing document.

Over on the ScanSnap Community, they’ve posted a helpful video showing how to use Adobe Acrobat (which comes with the ScanSnap S1500) to scan to Acrobat and then add the pages to an existing document.

The video is below, but head on over to the Community for some restrictions and things to keep in mind.

If nothing else, watch the video for the swingin’ music.

Comments ( 1 )

Hazel Rule To OCR Documents Using PDFPen

The other day I posted an Applescript to OCR documents using PDFPen.

In the comments, awesome DocumentSnap reader Josh requested that it be done as a Hazel rule instead. Given that my love for Hazel is well documented, I am happy to oblige.

I created a folder and then created the following Hazel rule to run against it:

  • Extension is PDF
  • Date Last Modified is after Date Last Matched (to stop it from trying to re-OCR documents)

Then I asked it to run the following Applescript:

tell application "PDFpen"
open theFile as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
end tell

Of course, if you are using PDFpenPro, replace the first line with “PDFpenPro”.

Here’s a screenshot (unfortunately the bottom of the script is cut off):

Hazel PDFPen Rule

Hope this helps out you Hazel and PDFPen fans out there. Enjoy.

Comments ( 12 )

PDFPen OCR Applescript To Automatically Make PDFs Searchable

I don’t know if it is because I have been glued to a computer since I was six years old, but my handwriting and printing is terrible. Really terrible. I think my 5 year old son and I have pretty similar handwriting skills.

Normally this is not a problem, except when I have to fill out a form. It’s a little embarrassing filling out some official form with my chicken scratch, which is one of the many reasons why I love PDFPen. Among many other things, it lets you fill out and edit any PDF document on your computer and then print it out.

However, that ability is not what this post is about. PDFPen will also OCR PDFs to make them searchable, and I wanted a way to OCR a bunch of documents automatically with an Applescript, similar to what has been done with Adobe Acrobat and with ABBYY FineReader.

I found two scripts out there. One from David Sparks at MacSparky, which some users reported problems with in newer PDFPen versions, and one from Michael Tsai at C-Command Software which will OCR a document with PDFPen and send it to EagleFiler.

Since both of these scripts were almost what I wanted, I decided to stand on the shoulder of giants and merge them together into this Applescript.

Here is the script:
-- Downloaded From: http://www.documentsnap.com
-- Last Modified: 2010-09-28
-- Includes code from MacSparky http://www.macsparky.com/blog/2009/5/24/pdfpen-ocr-folder-action-script.html
-- Includes code from C-Command Software http://c-command.com/scripts/eaglefiler/ocr-with-pdfpen

on adding folder items to this_folder after receiving added_items
try
repeat with added_item in added_items
my ocr(added_item)
end repeat
on error errText
display dialog "Error: " & errText
end try
end adding folder items to


on ocr(added_item)
tell application "PDFpen"
open added_item as alias
tell document 1
ocr
repeat while performing ocr
delay 1
end repeat
delay 1
close with saving
end tell
end tell
end ocr

PDFpen Users: Download The Text Script Here (Right-click and Save-As)
PDFpen Pro Users: Download The Text Script Here (Right-click and Save-As)

To implement, follow MacSparky’s excellent instructions.

I hope this is of use to someone, and thanks to David and Michael for their excellent Applescripts.

Comments ( 14 )

How To Leave OfficeDrop (Not That You’d Want To)

When using an online service for going paperless, the ability to get your stuff out of the system is as important as what you can do in the system.

You don’t want a situation where you upload all your documents somewhere and then if you decide to leave the service, you can’t.

Recently I wrote about how to export your data out of Evernote, and now OfficeDrop, an online scanning and digital filing system, has written a blog post outlining how to get your data out should you choose to.

Whether you are cancelling your account or just need to get your documents out of our cloud and onto your desktop, there are 3 easy ways to move your files from OfficeDrop to your computer. As always, no matter how you download your documents, they will always retain their text-searchable PDF format.

You can read the blog post for more information, but the three options they currently provide are:

  1. Download the documents
  2. Request a DVD copy of your account
  3. Request a link to a zip file of your account contents that you can download (this happens automatically if you cancel).

I really like companies being upfront about how to cancel and get your stuff out rather than hiding it. Nice work OfficeDrop.

Comments ( 3 )

Lifehacker OCR Call For Votes

The folks over at Lifehacker are running one of their famous High Five calls for submissions, this time about readers’ favorite OCR tools.

OCR tools have been around for decades, but only recently have they been affordable (in many instances free) and accessible to people outside of government and corporate offices. This week we want to hear about your favorite OCR tool and what features make it so good at converting hard-copy print into machine-readable and editable text.

So, if you have a favorite program (or want to see what others are suggesting), head on over and have your say.

(Photo by: Laineys Repetoire)

Comments ( 0 )

How To Combine PDF Files On Microsoft Windows For Free

Recently I wrote about how to combine PDF files using Mac OSX, so today I’d like to give Windows the same treatment.

Of course, there are a huge number of applications out there that will do this, and if I had any brains I’d point you to some expensive one that would get me a referral fee.

However I never claimed to be smart, so like the Mac tutorial, I am going to point you to some free and open source ways to do this.

In the Mac tutorial, I limited myself to exclusively using functionality that was part of the operating system. I couldn’t find the equivalent in Windows, so if I have missed something obvious please let me know in the comments.

With all that said, let’s check out two free and open source ways to combine PDFs in Microsoft Windows: PDF Split and Merge and pdftk.

PDF Split and Merge

PDF Split and Merge (aka PDFsam) is a Java application that comes in two versions: basic (free) and enhanced (requires forum membership and a donation of at least 1 Euro).

As the name implies, the application can do a number of PDF manipulation functions including splitting and (wait for it) merging. We’re going to focus on the merging part.

I don’t know what it is with open source software, but 9 times out of 10 the terminology/workflow that they use isn’t intuitive at all to a civilian.

To start with, you need to chose a Plugin. So, we’re going to choose the Merge/Extract plugin.

Merge and Extract PDF

Then we’ll click the Add button and add the files that we want to combine (click the image to make it bigger). You then have a chance to change the order if you’d like.

Add PDFs

Once you have your files how you want them, go down to the bottom of the screen and either type or hit Browse to select your destination file/folder.

When you are ready to roll, hit the Run button. When it finishes, you will have the combined PDF file.

pdftk

If working in the command line is more your style, pdftk is definitely for you. I love their description of it:

If PDF is electronic paper, then pdftk is an electronic staple-remover, hole-punch, binder, secret-decoder-ring, and X-Ray-glasses. Pdftk is a simple tool for doing everyday things with PDF documents.

pdftk can do a whole assortment of PDF tools, but again we’ll be focusing on merging documents.

It’s pretty simple. Download pdftk, unzip the .exe file somewhere, go to the directory where your documents are, and issue a command similar to this:

pdftk Scan.pdf Scan1.pdf cat output Combinedpdftk.pdf

Here’s a screenshot of it in action:

Et voila, here is the output:

I could have also just thrown everything together into one PDF:

pdftk *.pdf cat output Combinedpdftk.pdf

As you can see, pdftk is a pretty powerful little application that [can do a lot more than what I have described here][6]. I’ll be writing more on it in the future.

So, these have been two free ways to combine PDFs on Windows. Do you have another method that you like? Let us know in the comments.

Comments ( 0 )

How To Combine PDF Files in Mac OSX Using Automator To Make A Service

A close friend of mine is currently doing a year long round-the-world trip. Being the sucker nice guy that I am, I have agreed to receive her mail and keep her important documents for her.

Sometimes I have situations where I need to scan stuff for her and send them electronically. Usually my trusty ScanSnap S1300 does the job, but there is the occasional situation where I have to use my flatbed.

However, what happens when I have multiple PDFs that really should go in one document? I needed a way to combine the PDFs together.

There are a million ways to do this, including some I have talked about before like using Preview.app to drag and drop pages, and there are lots of applications that one can use to combine PDFs, but I wanted to do something that would be:

  • Already built into the OS and not require any additional software
  • Easy to use
  • Repeatable so I only have to set it up once and can use it again and again

I came across this great tutorial by George Harito that runs through how to use Automator to create a Service in Snow Leopard to combine PDFs for you. If you don’t know what the heck that means, don’t worry about it. I’ll take you through step by step.

(In case you’re wondering why I don’t just point to George’s tutorial and be done with it, it’s because there are some extra unneeded steps in there that might confuse some people, so I decided to recreate it here. The inspiration comes from George though and he deserves all the credit).

Create The Service

We’re going to be using an application called Automator to set this up. It looks a bit scary but don’t worry, what we’ll be doing is very easy.

Start Automator

  • In Finder, go to Applications and then start Automator, the cool little robot icon

Choose Service

  • In the window that pops up, highlight Service and then hit Choose

Choose Service

Set Up The PDF Action

  • At the top in the middle, you will see a line that says Service receives selected and then a dropdown. Choose PDF files

  • In the Library section on the left, you’ll see a line for PDF. Choose that
  • In the next column over, there is an option for Combine PDF Pages. Click that and drag it into the the section on the right

Now we want to be able to give our new PDF a name.

  • Go to the Library section and choose Files & Folders and drag Rename Finder Items to the canvas under your Combine PDF Pages
  • It is going to ask you this: This action will change the names of the Finder items passed into it. Would you like to add a Copy Finder Items action so that the copies are changed and your originals are preserved?. You almost certainly want to hit Don’t Add because you want it to rename the file, not make a copy.
  • We want to totally rename the file, so choose Name Single Item
  • At the bottom, click on Options and then check Show this action when the workflow runs. It will then prompt you for a name for your new PDF when you run it.

So now we’ve told it to combine the PDFs and give it a name, but now we need to tell it where to put the new file.

  • In the Library, still in Files & Folders, drag Move Finder Items to the canvas under your previous action
  • You can leave the default location if you want, but click on Options and then check Show this action when the workflow runs. That way, it will ask you where you want to save your new PDF.

Here is the whole rule. Click to embiggen.

Awesome! You’re done. Now go File | Save As and give it a name like Combine PDF Files.

Using the Service

So why did we go through all this trouble to set it up as a Service?

Now any time I ever want to combine a bunch of PDFs, I just need to go to Finder, right click them, and check out the new option that I have:

When I choose Combine PDFs, I get a popup where I give it a name.

Then, another popup appears where I choose where it save it.

Et voila, I have my combined PDF with the 3 files that I had selected in the Finder.

From now on it will be super easy to combine any PDFs.

How about you?

What tricks/processes do you have for combining PDFs on the Mac? Leave a note in the comments or on the Facebook page.

Comments ( 28 )

How To Use The Fujitsu ScanSnap With Microsoft OneNote 2010

Coming up on two years ago (yikes) I did a post in which I had helped a DocumentSnap reader use his Fujitsu ScanSnap to scan into Microsoft OneNote 2007.

I know that OneNote is a popular program, but I had never actually used it myself other than a few minutes at Gnomedex 2005 when Robert Scoble was extolling its’ virtues on his tablet.

I decided to rectify that, so I downloaded a trial of Microsoft OneNote 2010 and decided to try setting it up with my ScanSnap S1300 and see how it works.

Creating A Notebook

OneNote uses the concept of Notebooks to store notes in. When you first install it there is a Personal notebook, but I decided to create a notebook called Documents.

This might not make sense for you- you might want your scanned documents to spread amongst other notebooks, but for simplicity I’ll put them all in one.

OneNote 2010 allows you to create a Web notebook that you can share. I won’t be doing that here but look into it if it is something interesting to you.

Setting Up Sections

Inside each Notebook, you can set up different sections for categorization purposes. In this example, I’m going to set up sections for Home, Tax, Kids, and Office. Set up whatever makes sense to you, of course.

Putting Stuff Into OneNote

When you click on the Insert menu item, you can see that there are a whole bunch of ways to get information into OneNote, whether by attaching files, recording audio or video, using a screen clipper, or by using a File Printout.

You can also drag a file in to the OneNote application itself.

Don’t Hit That Scanner Button

If you are using a ScanSnap, you might be tempted by the siren song of that Scanner Printout button.

Since the ScanSnap doesn’t support TWAIN, it’s not going to work in your case. What we need to do is set it up from the ScanSnap Manager side.

Setting Up ScanSnap Manager

When you install OneNote, it creates a special Printer Driver called Send To OneNote 2010 (this will vary based on your version of course). What we are going to do is create a ScanSnap Manager Profile to scan to that Printer.

To set this up:

  • Right-click on the ScanSnap icon in your tray and choose Scan Button Settings
  • Click on the Profile box and choose Add Profile. Call it something like Scan To OneNote
  • On the Application tab, choose Scan To Print (click to embiggen)


  • If you don’t want to keep a copy of the PDF in the directory specified on the Save tab, hit the Application Settings button
  • Set the rest of your quality, duplexing, etc. options as desired.

Scan Away

Now when you hit the scan button using your ScanSnap, it will pop up the printer dialog box. Choose your Send To OneNote 2010 printer, and it will scan it right into OneNote.

You can then move the document into whichever Notebook/Section you want.

Once you do that, your note will be created (click image to embiggen).

OneNote Note Created

What About OCR?

Good question. Luckily, OneNote will do the OCR for you.

I couldn’t possibly do a better job outlining this than this article at How-To Geek so head over there and check out all the ways that you can use OCR in Microsoft OneNote.

From my initial assessments, Microsoft OneNote 2010 seems like a pretty cool tool for managing both your documents and your information in general.

Having said that, I am not sure what it is like once you get into hardcore use. If you have thoughts or experiences one way or the other on OneNote, drop a comment below and let us know.

Comments ( 5 )