macOS Sierra 10.12.2 - OCR Text Removed with Preview And Scanned PDFs?

macOS Sierra 10.12.2 – OCR Text Removed with Preview And Scanned PDFs?

Update 1/23/2017: The Sierra 10.12.3 Update seems to have fixed the issue. If you haven’t already, it’s highly suggested that you update.

If you have updated to macOS Sierra 10.12.2 and use Preview to manipulate scanned PDFs, watch out. There seems to be a bug and the OCR text layer can disappear. I’ve replicated this issue on documents scanned with the Fujitsu ScanSnap and the Doxie Q so far.

In the comments to my blog post about ScanSnap on Sierra, awesome DocumentSnap reader Alex writes this:

Since updating to macOS 10.12.2 I have found that Preview destroys the OCR layer of PDFs scanned and OCR’d with the latest ScanSnap Manager software if you make any sort of edit with Preview (e.g. deleting or reordering pages). After editing and saving with Preview, the PDF is no longer searchable and text is not selectable. Managed to replicate the problem on another Mac running 10.12.2. Doesn’t seem to affect PDFs scanned and OCR’d with other scanners or applications. Just wanted to warn everyone to perhaps wait before updating, and check that they haven’t unwittingly destroyed their OCR if they have already updated.

This was confirmed in the comments by reader Jakub.

Since I hadn’t yet upgraded to 10.12.2, I decided to test with scans before and after upgrading, and since I had a Doxie Q sitting on my desk, I tested with that as well to see if it was a ScanSnap thing. I also tested Preview on a machine with 10.12.1 and a machine with El Capitan.

All ScanSnap scans were done with a ScanSnap iX500 using ScanSnap Manager 6.3 L60. All Doxie scans were done with a Doxie Q exported using Doxie software 2.9.1 (1864).

For the test, I scanned documents on Sierra 10.12.1 and 10.12.2, checked that the PDF was OCRed properly, then deleted a page, saved, and re-opened and checked the text again. Here are the results:

  • Scanned with ScanSnap on 10.12.1 & edited Sierra 10.12.1: OK
  • Scannedwith ScanSnap on 10.12.1 & edited Sierra 10.12.2: GONE
  • Scanned with ScanSnap on 10.12.1 & edited El Capitan: OK
  • Scanned with ScanSnap on 10.12.2 & edited Sierra 10.12.1: OK
  • Scanned with ScanSnap on 10.12.2 & edited Sierra 10.12.2: GONE
  • Scanned with ScanSnap on 10.12.2 & edited El Capitan: OK
  • Scanned with Doxie Q on 10.12.1 & edited Sierra 10.12.1: OK
  • Scanned with Doxie Q on 10.12.2 & edited Sierra 10.12.2 : GONE

As you can see, it seems to be something to do with Preview on macOS Sierra 10.12.2. Alex said that he didn’t see the issue with other scanners, but I ran into it with both ScanSnap and Doxie. Both of those scanners use ABBYY for OCR, so that may be relevant.

If you’ve upgraded to 10.12.2 (or see this issue on another platform!), please let us know in the comments if you see the same thing. I’ll update if a fix/workaround appears.

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply 40 comments

GaryM - April 28, 2022 Reply

I am running macOS Monterey 12.3.1, with ScanSnap running ABBYY, and I just noticed this same problem. I can use Preview to Highlight, or Markup, PDFs fine and save them, and the PDF remains searchable. But if I delete pages in Preview, then Save, the resultant PDF can NO LONGER be searched. Does anybody know of fixes for this; is Apple even aware/working on this?

    Jakub - April 29, 2022 Reply

    Hi Gary.

    Since there is still no solution to the OCR-layer, I am not using ABBY for OCR anymore, but the ScanSnap integrated solution. Although it is not my preferred solution, it works fine and resolved my issues. But I still need to rerun OCR on older documents if I edit them in order not regain the OCR Layer.

      Gary Macek - April 29, 2022 Reply

      Jakub, Thanks for the super-fast, very helpful comment. But I am afraid I don’t know what you mean by “ScanSnap integrated solution” nor how to implement it? ABBYY profiles of course have the ‘Send to’ to direct to another application. But I only see ABBYY, Acrobat, PowerPDF, and PDF Converter. Does your solution still create searchable PDFs, while avoiding Apple Preview app removing the OCR-layer?

Jakub - July 10, 2020 Reply

Does anyone have a solution for macOs 10.14.6 for that issue instead of redoing the OCR on the edited document?

Jakub - October 14, 2019 Reply

Hey guys.

Coming back to that topic. I am facing the similar problem with Mojave. Currently I am running MacOs 10.14.6 and if I delete one page in a scanned document, which has the OCR content, it is deleted upon saving the new file.
Basically the same behaviour as seen before. Can anybody confirm that?

    Eleuteruiz - October 23, 2019 Reply

    I am having this problem for the first time… with Catalina! Did anyone else experience this with 10.15? Any solution?

    Charles - November 4, 2019 Reply

    I’m having this problem with Catalina, which is incredibly frustrating!
    Preview has had a number of long-standing bugs (ex. the vertically-squashed thumbnails in ‘Contact Sheet’ view, warping regular rectngular pages, such as A4, into squares).
    This article gives some insight into why Apple’s software is so buggy: https://tidbits.com/2019/10/21/six-reasons-why-ios-13-and-catalina-are-so-buggy/

David - October 7, 2017 Reply

This problem still exists in 10.12.6. I tried it both on my MacBook Air and on my Mac Mini. Same problem on both.

Does anybody have any experiences with High Sierra?

Mark B - April 20, 2017 Reply

I have posted comments about this OCR/PDF corruption bug at a zotero website:

https://forums.zotero.org/discussion/comment/274497

I showed that Preview will overwrite certain types of OCR information in pdf files. My experience is limited to pdf files with OCR information using Clearscan in Adobe Acrobat. What I have found is that the OCR information is corrupted when the pdf is opened, modified (e.g. add an annotation), and then saved in Preview. This problem first surfaced in ~2009 (long before El Capitan), and is still active relative to current versions of Mac OS X (Yosemite, 10.10.5), Preview 9.0 (909.17), and Adobe Acrobat (XI, 11.0.20). I reported the problem as bug to Apple but there has been no fix yet, to the best of my knowledge.

    Mark - April 20, 2017 Reply

    Note: I made an error reporting about my operating system, which is MacOS Sierra (10.12.4).

Martijn - February 16, 2017 Reply

Thanks for sharing this, I was having the same problem until I upgraded to the latest version of Sierra.

Ron K - February 14, 2017 Reply

Those with older ScanSnaps may want to update your software to the latest, even if it states unsupported. I love that they release the latest (6.3 L61) with drivers for the old scanners, I’m using my S510M from 2008 still (on macOS Sierra). I’ll be upgrading to the next model they release because they don’t leave users out in the cold!!!

Alex - January 27, 2017 Reply

Good to hear that the issue seems to be fixed.

Does the fix apply to all PDFs, or just new files scanned with 10.12.3? Has anyone who has updated to 10.12.3 tried editing OCR’d ScanSnap documents scanned with 10.12.1 or 10.12.2? I have read that these become “corrupted” with 10.12.3, so am still a bit wary of updating. If anyone who has already updated could test this out, I would be grateful to know what happens.

    Anton - January 27, 2017 Reply

    Don’t worry Alex, the problem is solved. (Didn’t have anything to do with the ScanSnap anyway.)

    Cheers.

Trev - January 23, 2017 Reply

Have just installed macOS 10.12.3 and it has fixed the issue for me

    Anton - January 24, 2017 Reply

    Great!

    Apple says that the update 10.12.3 that came out yesterday:

    1 Improves automatic graphics switching on MacBook Pro
    2 Resolves graphics issues while encoding Adobe Premiere Pro projects on MacBook Pro with Touch Bar
    3 Fixes an issue that prevented the searching of scanned PDF documents in Preview.
    4 Resolves a compatibility issue with PDF documents that are exported with encryption enabled.
    5 Fixes an issue that prevented some third-party applications from correctly importing images from digital cameras.

    I suppose this bug was related with the third issue.

    Michael - January 26, 2017 Reply

    Brilliant, thanks for the confirmation — it’s the green light I’ve been waiting for to install macOS Sierra.

Anton - January 21, 2017 Reply

Upgraded today….,

….and after saving my pdf with preview, the our-layer was gone.

Scanner: brother MFC-J6920DW
OCR: Devonthink Pro Office that uses Abby Finereader as its engine to do the OCR

I think the problem exists with Preview and Abby. Not so much to do with the ScanSnap.

Will be using another pdf-reader until this problem is solved. Thanks for your article!

Thierry - January 10, 2017 Reply

Do we have some good news with latest beta 10.12.3?

    Alex - January 12, 2017 Reply

    I have just done a Twitter search to see if there have been any developments, and this tweet sounds encouraging:

    Peter Vendlegård ‏@veke71 Jan 10
    @9to5mac The OCR-layer / ScanSnap bug is fixed in the new MacOS beta. I can add stuff and save them with P’veiw and retain the OCR again.

Michael - January 10, 2017 Reply

Brooks, can you recover the removed OCR text by going back to an earlier version, using Apple’s “Revert to” feature (effectively a file-level Undo)? I don’t currently have Sierra to try it on.

While this will undo your changes, at least you can (if it works) quickly retrieve lost text.

    Anton - January 21, 2017 Reply

    You can go back to an earlier version using Time Machine. That should recover your file.

Ronald - January 6, 2017 Reply

Here some more in dept info about the subject:
http://tidbits.com/article/16966

Mike - January 5, 2017 Reply

The web site 9 to 5 Mac had a story about this today. I’m not sure if I can post the link, but I’ll try:

https://9to5mac.com/2017/01/03/sierra-preview-pdf-errors-scansnap/

“PDF-handling problems in Sierra that broke ScanSnap are back in 10.12.2, many apps affected”

It seems to be an Apple issue that is affecting more apps than just the Fujitsu ones.

Bob Levy - January 5, 2017 Reply

I may have to revert back to El Capitan. MacOS Sierra messed up my Neat 4.3.0 (Can’t crop) Downloading and installing Neat 4.5.0.99 brought crashes and instablity.

I don’t care to use scan to cloud and Neat’s new subscription model is very pricey so I’m trying Paperless 2.3.8 and get a huge number of crashes and OCR is very intermittent.

Not sure what to do. In the short term, going back to 10.11 may be a solution, but ultimately, I’ve got to stay relatively current with MacOS.

Jim - January 4, 2017 Reply

Fujitsu’s “fix” for the OS Sierra “update” was disappointing to me. I had thought they’d deploy a new suite of programs that had been patched to fix this glitch. Instead, they presented a confusing matrix of instructions to download and install this and that, and, in trying to follow their instructions, I managed to completely destroy the ScanSnap software I’d had installed. It took me a long time to revert to what I had before, let alone trying to “upgrade” to Sierra. I did get it running, including Scan To Cloud which was a bit difficult for me to do.

I am holding out, using El Capitan, and using the things I’ve paid for.

    CertifiedMacintoshConsultant - January 20, 2017 Reply

    Jim, Agreed. Fujitsu’s installer is confusing however let it do it’s thing. When you are prompted to enter the admin password, it’s because it’s trying to install the base elements that are being patched, just enter your admin password and it will do the full and complete update of all the modules without any needed action from you. Past installers did not tell you what steps were being performed so you were left to wonder “Did this install anything? Or do I need to manually run it.”

Mike - January 4, 2017 Reply

I wonder if this is related to the problem that Fujitsu had earlier with scans made on Sierra. They finally released fixed software for that issue, and now this happens.

Anil Agrawal - January 4, 2017 Reply

Currently, I am using my Windows 10 Laptop to do all my ScanSnap iX500 scans and OCR. These get saved on my Synology DS415+ NAS.

I do editing on my MAC 10.11 El Capitan that I have not upgraded to 10.12 Sierra. However, I don’t use Preview anymore due to an unrelated issue. The issue was/is that Preview’s autosave doesn’t work when the opened file is on a network drive! I found out the hard way! So, I use PDFpen Pro now

Elwood - January 2, 2017 Reply

Since update to 10.12.2 all pdf files I create are missing some text and have black blocks where logos were. But only if viewed with Preview or PDF Expert, when viewed with Adobe or a browser they are fine and when emailed to others they are fine too. I assume the others are using windows products and Adobe. Bad update I guess. After several chats and calls to Apple, no solution.

Ronald - December 31, 2016 Reply

They gave up ABBYY support for FUJITSU Image Scanner ScanSnap iX500 under Sierra? They say under Compatible Operating Systems, macOS Sierra v10.12 “ABBYY FineReader for ScanSnap™ 5.0” :
“Check with each vendor for compatibility information.” So they don’t even say anymore if its supported or not, like they did before! See http://www.fujitsu.com/uk/products/computing/peripheral/scanners/scansnap/ix500/
and then specifications.

Lina - December 29, 2016 Reply

After saving each pdf my OCR layer is gone also

Has anyone find any solution? I have to read tons of pdfs for my thesis and this problem has basically screw me up.

    Lina - December 29, 2016 Reply

    I’m using now Acrobat Reader and there is no problem if I save the changes. So I suppose I’ll just use it as my default pdf reader for now… to bad since I liked Preview

Peter N Lewis - December 22, 2016 Reply

Yes, editing in Preview in 10.12.2 destroys my ScanSnap OCR layer as well. Sigh.

BTW, every reference to 12.12.2 in your article should be 10.12.2…

    Brooks Duncan - December 22, 2016 Reply

    Oh my goodness, that is really embarrassing. That’s what happens when I rush a blog post. 🙂
    Thanks for the heads up. Love Keyboard Maestro btw!

Jon - December 21, 2016 Reply

Since updating to 10.12.2 Preview has been hanging up every time I try to edit, add pages, etc. Any advice would be appreciated. Thanks

Alex - December 21, 2016 Reply

Thanks for spreading the word. Like Jakub, I only noticed this bug by chance, a few days after updating to 10.12.2, when checking if a very small typeface had been OCRed accurately.

I just hope this gets fixed so I won’t be forced to look for a cumbersome or expensive third-party alternative to what used to be reliable and free. I have used my ScanSnap without trouble for years, so after the recent problems, and now this, it’s a bit disconcerting to realise that our scans can be so “fragile”.

lamike - December 20, 2016 Reply

Running OS 10.12.2. Have made PDFpen Pro (not Preview) the default app for all pdf’s on my Mac. So far, the Sierra/Preview problem doesn’t appear to be present. My guess is it has something to do with Apple’s insistence on Sandboxing. My PDFpen Pro not purchased via the Apple Store. That may be important.

Jakub - December 20, 2016 Reply

Good (or not so good…) to know that you can confirm that behaviour and that it’s connected to the latest Sierra update. It quite surprising that no one has noticed this before. Actually I only found out by accident, since I was merging two PDF in my ScanSnap-Folder where I have used the search bar, which filtered all scans by content. That way the new document never showed up in the list. I think that most poeple are not aware of that since they will not double check if the OCR layer still exists.
By the way, you can open the file again with the ABBY FineReader and the merged PDF will get the OCRed again, but only if you add a PDF to an existing file. If you export to a new one ABBY will not recognize it as ScanSnap document.

Leave a Reply: