One complaint that people have with the PDFs that Acrobat kicks out when doing OCR, either by doing it manually or via an Acrobat OCR Applescript, is that the files can get really big.
There are a few solutions to this, but one of them is to change the PDF Output Style.
The default that Acrobat uses is called Searchable Image. What that does is place all the OCR’ed text etc. “behind” the image, so that when you view the PDF you are looking at the original image, but you can copy and search on the text.
However, there’s another setting. If you choose the PDF Output Style of Formatted Text & Graphic, what that will do is actually convert the text image to text itself, formatted with whatever style was there before.
I did a simple test this morning and here is what I found:
- Scanned Document before OCR: 312K
- OCR with Acrobat Searchable Image: 940K
- OCR with Acrobat Formatted Text & Graphics: 60K (!)
To change Acrobat to FT&G, here is what you do:
- Go to Document -> OCR Text Recognition -> Recognize Text Using OCR…
- Click the Edit button

- In PDF Output Style, change to Formatted Text & Graphics
- Hit OK
Acrobat will now use Formatted Text & Graphics, and should keep that setting for your future scans too.
What’s The Catch?
As with anything, there is a downside. Acrobat does its best to make the text look like what was there before, but it is not perfect. Also, anything that is mis-OCR’ed will actually show up in the document.
It depends on what your objectives are. If you want to have the exact replica of what you are scanning, you’ll probably want to use Searchable Image.
However, if size is your main concern and you just want to have a fairly-faithful representation, Formatted Text & Graphics may be the way to go.
Do you have any other tricks for making PDFs smaller?
Related posts:














@brooks – thanks for this tip. i just used it on a one page typed letter i received and the difference was going from 740 to 40 KB. that pretty much is in line with what you got but i have an interesting mysterious problem with adobe standard 7.0 after OCRing where it just crashes a few seconds after the process is complete. i have to save the doc fast or else the OCR wont be saved. has anyone seen this before?
is there a way to set this behavior in Adobe as default for OCR scans? its a bit anyone having to dig down into the options every time.
@pendolino – In the version of Acrobat that I have, 8.0 Mac, it remembers which OCR setting you used last time, so you don't have to go in and set it every time. So if 7.0 doesn't do that, then you may need to wait for 8.0?
@brooksd – i just discovered the same with 7.0 moments before getting your response. thanks. looks like it sticks.
I have a zillion scanned PDFS from my scansnap but I haven't run OCR on them yet. I have been having Devonthink do that, but now I am thinking that having all of my PDFs OCRd would be helpful. ABbyfinereader seems to make a mess of the PDFs, taking forever and making a super long file name out of them (in addition to keeping the original PDF which I no longer want). How do you all handle this? Help?
@Sarah Do you mean you have PDFs that are currently searchable in Devonthink, but you want to take them out of Devonthink but still have them searchable?
@brooksd
I have unsearchable PDFs from the scansnap AND the searchable ones in devonthink bc I had devonthink do the OCR. I now want to run OCR on the unsearchable ones not in dt. Does that make sense?
Use one of the Applescripts available here and do the OCR with Acrobat or PDF Pen. I think it was MacSparky that recently had scripts for PDF Pen.
Thanks for that Rob. I have been meaning to point to that PDFPen script for a while but haven't had a chance. This reminded me.
Thus is a great tip, but somewhat out of date. Acrobat 9 has a new technology for OCRd PDFs called ClearScan, that results in dramatically smaller file sizes and crisper PDFs. More about it here http://blogs.adobe.com/acrolaw/2009/05/better_pdf…
(In case that link isn't visible, just Google 'acrobat clearscan ocr' and see the post in the Acrobat for Legal Professionals blog.
Thanks for the tip, nodis!