If you have updated to macOS Sierra 10.12.2 and use Preview to manipulate scanned PDFs, watch out. There seems to be a bug and the OCR text layer can disappear. I’ve replicated this issue on documents scanned with the Fujitsu ScanSnap and the Doxie Q so far.
In the comments to my blog post about ScanSnap on Sierra, awesome DocumentSnap reader Alex writes this:
Since updating to macOS 10.12.2 I have found that Preview destroys the OCR layer of PDFs scanned and OCR’d with the latest ScanSnap Manager software if you make any sort of edit with Preview (e.g. deleting or reordering pages). After editing and saving with Preview, the PDF is no longer searchable and text is not selectable. Managed to replicate the problem on another Mac running 10.12.2. Doesn’t seem to affect PDFs scanned and OCR’d with other scanners or applications. Just wanted to warn everyone to perhaps wait before updating, and check that they haven’t unwittingly destroyed their OCR if they have already updated.
This was confirmed in the comments by reader Jakub.
Since I hadn’t yet upgraded to 10.12.2, I decided to test with scans before and after upgrading, and since I had a Doxie Q sitting on my desk, I tested with that as well to see if it was a ScanSnap thing. I also tested Preview on a machine with 10.12.1 and a machine with El Capitan.
For the test, I scanned documents on Sierra 10.12.1 and 10.12.2, checked that the PDF was OCRed properly, then deleted a page, saved, and re-opened and checked the text again. Here are the results:
- Scanned with ScanSnap on 10.12.1 & edited Sierra 10.12.1: OK
- Scannedwith ScanSnap on 10.12.1 & edited Sierra 10.12.2: GONE
- Scanned with ScanSnap on 10.12.1 & edited El Capitan: OK
- Scanned with ScanSnap on 10.12.2 & edited Sierra 10.12.1: OK
- Scanned with ScanSnap on 10.12.2 & edited Sierra 10.12.2: GONE
- Scanned with ScanSnap on 10.12.2 & edited El Capitan: OK
- Scanned with Doxie Q on 10.12.1 & edited Sierra 10.12.1: OK
- Scanned with Doxie Q on 10.12.2 & edited Sierra 10.12.2 : GONE
As you can see, it seems to be something to do with Preview on macOS Sierra 10.12.2. Alex said that he didn’t see the issue with other scanners, but I ran into it with both ScanSnap and Doxie. Both of those scanners use ABBYY for OCR, so that may be relevant.
If you’ve upgraded to 10.12.2 (or see this issue on another platform!), please let us know in the comments if you see the same thing. I’ll update if a fix/workaround appears.