One complaint that people have with the PDFs that Acrobat kicks out when doing OCR, either by doing it manually or via an Acrobat OCR Applescript, is that the files can get really big.
There are a few solutions to this, but one of them is to change the PDF Output Style.
The default that Acrobat uses is called Searchable Image. What that does is place all the OCR’ed text etc. “behind” the image, so that when you view the PDF you are looking at the original image, but you can copy and search on the text.
However, there’s another setting. If you choose the PDF Output Style of Formatted Text & Graphic, what that will do is actually convert the text image to text itself, formatted with whatever style was there before.
I did a simple test this morning and here is what I found:
- Scanned Document before OCR: 312K
- OCR with Acrobat Searchable Image: 940K
- OCR with Acrobat Formatted Text & Graphics: 60K (!)
To change Acrobat to FT&G, here is what you do:
- Go to Document -> OCR Text Recognition -> Recognize Text Using OCR…
- Click the Edit button
- In PDF Output Style, change to Formatted Text & Graphics
- Hit OK
Acrobat will now use Formatted Text & Graphics, and should keep that setting for your future scans too.
What’s The Catch?
As with anything, there is a downside. Acrobat does its best to make the text look like what was there before, but it is not perfect. Also, anything that is mis-OCR’ed will actually show up in the document.
It depends on what your objectives are. If you want to have the exact replica of what you are scanning, you’ll probably want to use Searchable Image.
However, if size is your main concern and you just want to have a fairly-faithful representation, Formatted Text & Graphics may be the way to go.
Do you have any other tricks for making PDFs smaller?