Reader Story: Zero Knowledge Backup (Part 2)

Reader Story: Zero Knowledge Backup (Part 2)

This is Part 2 of a three part series by awesome DocumentSnap reader Mike from California. In Part 1, he talked about how he jump started his paperless transition. Today is about how he came up with his backup system and what he ended up implementing.

Take it away Mike.

After my initial push to go paperless, I ended up with two moving boxes full of paper ready to go to the shredding center. But, I wasn’t wiling to take them yet, because I wasn’t yet happy with my backup situation. What if I shredded all those papers, and then something happened to my paperless documents?

Backup Starting Point

I already had a two-tier backup strategy. Tier 1 is Time Machine: The laptop that has all of my scanned documents does automatic Time Machine backups to a server computer on my home network. The backups are stored on a RAID, so they have some protection from disk drive failure.

The laptop’s internal drive is encrypted, so the scanned documents are secure in case the laptop is stolen. The backups on the server are encrypted, so they’re also secure in case of a burglary. But, if my house were burgled (or damaged by fire/flood/disaster), I could lose both the laptop and the backups! Not good. That’s why I also have Tier 2.

Tier 2 is an offsite backup that I manage manually. Every couple of months, I use a program called SuperDuper to do a full, encrypted clone of my laptop’s hard drive onto a portable drive, that I store in the safe deposit box at my bank.

I have two such portable drives, so when I take one to the bank, I bring the other one home to use for the next cycle. Thus, there’s always one drive in the safe deposit box. This is reasonably secure. Even if my house burned down, I would lose at most a couple of months’ data, depending upon how recently I had taken a drive to the bank.

Realistically, most of what I would lose could be re-downloaded without too much trouble. Still, I wasn’t comfortable with backups that depended so heavily on me to manually perform them. There have definitely been times when I’ve gotten busy or distracted, and realized that it had been six months since my last trip to the bank. I wanted something more reliable than my manual Tier 2.

Evaluating Cloud Services

Of course, now there are cloud backup services that specialize in this sort of thing. I looked at all of them, and found that there’s quite an array. Here are just a couple of considerations:

  • Some services don’t keep past versions of files. They essentially maintain a snapshot of your system. That’s fine if your system crashes or is stolen or destroyed. But what if you accidentally over-write an important document and the bogus version gets backed up?
  • Some services keep past versions, but once a file is deleted, they prune it from the backups after awhile. What if you discover that you accidentally deleted a whole folder full of important documents when you “cleaned house” on your computer last month?
  • Some services have iffy security

Zero Knowledge Backup

Let me expand on that last point at some length. I really wanted a so-called “Zero Knowledge” backup service — one where the service itself had absolutely no access to any of the backed-up data.

That way, there was no chance that a security breach of the backup service could result in all of my paperless documents (including financial information and so forth) being accessible to the hackers.

Some of the services allow you to set up their backup software so that your local computer encrypts all of the data that it sends to the cloud, before it ever leaves your machine. Sounds good, until you realize that those services upload your encryption key to the service, so that they can do things like reset your key for you when you lose it, or offer a web portal to restore documents using a browser.

If that stored key is compromised, your data is at risk. Haven’t we all heard enough horror stories lately to be a bit squeamish about this sort of thing?

Some of the services allow you to specify your own encryption key that you never send to them. There’s a bit of risk here — if you lose your key, all that backed up data is lost forever. But, as long as you don’t lose your key, your data should be secure, right? Amazingly, at least one of those services has no way for you to restore your data without supplying them with your encryption key! So, your data is secure only until you actually need to get it back!

Only a couple of the backup services never touch your key for any purposes. The best one appeared to be CrashPlan. They keep past file versions, they don’t prune deleted files, you can specify your own key that is never sent to them for any reason. You can even have them send you a hard drive with all of your backed up data on it (for an extra $160 fee) so that you can restore files quickly even if your Internet connection is slow.

Unfortunately, the software that runs on your computer is written in Java, and there are many reports of performance problems like excessive memory use or excessive CPU use.

Mike’s Backup Solution

I ended up choosing an up-and-comer with a novel approach: Arq.

Rather than being a service that supplies both software and cloud storage, Arq is just an app you buy. Once you’ve purchased the app, you get to tell it where you want to store your data, on any of a number of supported cloud storage providers. Arq encrypts all of the data locally before sending it to your chosen cloud provider, and when you restore, it decrypts locally as well. In short, Arq passes the Zero Knowledge test, and the app is native and doesn’t have the performance issues that are reported with CrashPlan.

Originally, Arq was mostly used with Amazon’s cloud storage, especially a service called “Glacier”, which is intended for exactly this application — long term, inexpensive storage of files that don’t need to be accessed often. The problem with Glacier is that the pricing structure is extremely complicated, and depends greatly on how much data you access in a period of time. If you had a calamity and had to restore all of your data from Glacier in a hurry, it could be quite expensive.

Just as I was making my backup decision, Google entered public beta with a service called Google Nearline. Long story short, storage pricing is competitive with Glacier, but the charges for fetching the data are flat fees with no dependency on how quickly you read from the storage. I figured out that it might cost me $100 or so to restore all of the data I can envision storing there if I had to, but since I would only ever do that if there were some huge catastrophe that took out my computer, my backup computer and my safe deposit box, that seemed pretty reasonable.

Nearline storage costs $0.01 per gigabyte. Put another way, 1TB of storage is $10 per month. CrashPlan and some of the other backup services offer flat-rate prices like $6/month for unlimited data, so Nearline could be more expensive if you have huge amounts of data. But, for the amount of data I will actually back up, Nearline is cheaper.

So, that’s where I am today. Most of my paperless documents are automatically fetched by FileThis and dropped into my Paperless Inbox folder on my computer, along with anything I scan manually. Hazel inspects the new arrivals, renames them and files them. Shortly thereafter, Arq backs them up to Google Nearline, and Time Machine backs them up to my local server.

I took those moving boxes to the shredding center, and watched with glee as they turned to dust.

Thanks Mike! Google Nearline is definitely a service to watch. The next part of Mike’s story will be about paperless bills and stamps. Watch out for that soon.

About the Author

Brooks Duncan helps individuals and small businesses go paperless. He's been an accountant, a software developer, a manager in a very large corporation, and has run DocumentSnap since 2008. You can find Brooks on Twitter at @documentsnap or @brooksduncan. Thanks for stopping by.

Leave a Reply 8 comments

Jim - September 3, 2015 Reply

I checked out Google Nearline but I’m a little confused. 1TB costs $10/month. Plus there are associated upload and download fees (e.g., another $10/TB to download your data). But “regular”* Google storage is the same price, and there are no associated retrieval fees. So what would be the advantage of using Nearline?

*By “regular” I mean the data packages you can buy for use with Gmail/Picasa/Google Photos, etc. They give you 15GB free but you can upgrade to 1TB for $10/month.

Am I missing something?

    Mike - September 7, 2015 Reply

    The difference between the two is that Nearline is priced by how much you store. If you store 1TB, they may work out to be the same. But if you store, say, 100GB, you’re paying for only what you use. Of course, you can also store more than 1TB.

    I don’t anticipate storing that much data, so Nearline should be cheaper for me. Arq supports backing up to both Google Drive and Google Nearline (and many others).

      Jim - September 8, 2015 Reply

      Ahh, thanks for the clarification, Mike.

Jon - September 2, 2015 Reply

I know that there is no such thing as a dumb question, but … in addition to having everything on my computer & on my external hard drive via Time Machine, I copied all my biz & personal files to DropBox . You don’t mention D-Box – is it considered secure cloud storage?

    Mike - September 7, 2015 Reply

    Hi Jon,

    The issue with DropBox is that it’s not a backup solution. It’s a file synchronization solution. Let’s say that you have a bunch of important documents in your DropBox folder on your Mac. You accidentally delete one, or (more insidiously) over-write it with something else.

    That change is immediately synchronized up to DropBox, and back down to any other devices that sync with DropBox. So, the file you just deleted by accident is immediately deleted from DropBox, giving you no opportunity to get it back again.

    I use DropBox for synchronization. I do not use it for backups.

Leah - September 2, 2015 Reply

Great articles! Thanks Mike and Brooks.

Thierry - August 26, 2015 Reply

Thanks Mike for the 2 stories.

Do you have a big RAID server at home? Is it enough fast to use with TimeMachine?
I have a small one-drive Synology, initialy for centralized Time Machine of 2 Macs, but it is too slow to use on Wifi n, or even on Gigabit ethernet.
I always wondered what was missing to my server. CPU too slow?

How long time did it take to upload your TB of data?
Is it faster on Google Neraline, compared to CrashPlan?

    Mike - September 2, 2015 Reply

    Hi Thierry,

    It’s not a huge RAID. It’s a Synology 1815+, which has space for five hard drives. I put three 4TB drives in it, so it has 8TB of storage with the third drive used for the RAID protection. I used the Synology Hybrid RAID format, which means it will be easy to expand the storage later if it fills up. It seems very fast.

    However, I have found that Synology’s Time Machine solution is unreliable, as are most non-Apple fileserver/NAS products. It will appear to work fine for some period of time, then one day the laptop says that the backup is damaged and has to be started over. I suspect but don’t know for sure that what happens is that the network connection gets interrupted (maybe by taking the laptop to work when it happened to be backing up) and this leaves the backups in a bad state.

    What I ended up doing was setting up an iSCSI volume on the Synology and mounting it on my Mac Mini, which runs Mac OS X Server. This makes the Mini see the Synology storage as a local hard disk, which I then set as the target volume for OS X Server’s Time Machine service.

    This way, all of the Time Machine-related communication between the laptop and the backup destination is based on fully supported Apple software. Since setting up this way, I’ve had no problems. (Knock wood!)

    This required purchasing an iSCSI Initiator software package for the Mini. I bought the globalSAN product from Studio Network Solutions, and it seems to work just fine.

Leave a Reply: