This is Part 2 of a three part series by awesome DocumentSnap reader Mike from California. In Part 1, he talked about how he jump started his paperless transition. Today is about how he came up with his backup system and what he ended up implementing.
Take it away Mike.
After my initial push to go paperless, I ended up with two moving boxes full of paper ready to go to the shredding center. But, I wasn’t wiling to take them yet, because I wasn’t yet happy with my backup situation. What if I shredded all those papers, and then something happened to my paperless documents?
Backup Starting Point
I already had a two-tier backup strategy. Tier 1 is Time Machine: The laptop that has all of my scanned documents does automatic Time Machine backups to a server computer on my home network. The backups are stored on a RAID, so they have some protection from disk drive failure.
The laptop’s internal drive is encrypted, so the scanned documents are secure in case the laptop is stolen. The backups on the server are encrypted, so they’re also secure in case of a burglary. But, if my house were burgled (or damaged by fire/flood/disaster), I could lose both the laptop and the backups! Not good. That’s why I also have Tier 2.
Tier 2 is an offsite backup that I manage manually. Every couple of months, I use a program called SuperDuper to do a full, encrypted clone of my laptop’s hard drive onto a portable drive, that I store in the safe deposit box at my bank.
I have two such portable drives, so when I take one to the bank, I bring the other one home to use for the next cycle. Thus, there’s always one drive in the safe deposit box. This is reasonably secure. Even if my house burned down, I would lose at most a couple of months’ data, depending upon how recently I had taken a drive to the bank.
Realistically, most of what I would lose could be re-downloaded without too much trouble. Still, I wasn’t comfortable with backups that depended so heavily on me to manually perform them. There have definitely been times when I’ve gotten busy or distracted, and realized that it had been six months since my last trip to the bank. I wanted something more reliable than my manual Tier 2.
Evaluating Cloud Services
Of course, now there are cloud backup services that specialize in this sort of thing. I looked at all of them, and found that there’s quite an array. Here are just a couple of considerations:
- Some services don’t keep past versions of files. They essentially maintain a snapshot of your system. That’s fine if your system crashes or is stolen or destroyed. But what if you accidentally over-write an important document and the bogus version gets backed up?
- Some services keep past versions, but once a file is deleted, they prune it from the backups after awhile. What if you discover that you accidentally deleted a whole folder full of important documents when you “cleaned house” on your computer last month?
- Some services have iffy security
Zero Knowledge Backup
Let me expand on that last point at some length. I really wanted a so-called “Zero Knowledge” backup service — one where the service itself had absolutely no access to any of the backed-up data.
That way, there was no chance that a security breach of the backup service could result in all of my paperless documents (including financial information and so forth) being accessible to the hackers.
Some of the services allow you to set up their backup software so that your local computer encrypts all of the data that it sends to the cloud, before it ever leaves your machine. Sounds good, until you realize that those services upload your encryption key to the service, so that they can do things like reset your key for you when you lose it, or offer a web portal to restore documents using a browser.
If that stored key is compromised, your data is at risk. Haven’t we all heard enough horror stories lately to be a bit squeamish about this sort of thing?
Some of the services allow you to specify your own encryption key that you never send to them. There’s a bit of risk here — if you lose your key, all that backed up data is lost forever. But, as long as you don’t lose your key, your data should be secure, right? Amazingly, at least one of those services has no way for you to restore your data without supplying them with your encryption key! So, your data is secure only until you actually need to get it back!
Only a couple of the backup services never touch your key for any purposes. The best one appeared to be CrashPlan. They keep past file versions, they don’t prune deleted files, you can specify your own key that is never sent to them for any reason. You can even have them send you a hard drive with all of your backed up data on it (for an extra $160 fee) so that you can restore files quickly even if your Internet connection is slow.
Unfortunately, the software that runs on your computer is written in Java, and there are many reports of performance problems like excessive memory use or excessive CPU use.
Mike’s Backup Solution
I ended up choosing an up-and-comer with a novel approach: Arq.
Rather than being a service that supplies both software and cloud storage, Arq is just an app you buy. Once you’ve purchased the app, you get to tell it where you want to store your data, on any of a number of supported cloud storage providers. Arq encrypts all of the data locally before sending it to your chosen cloud provider, and when you restore, it decrypts locally as well. In short, Arq passes the Zero Knowledge test, and the app is native and doesn’t have the performance issues that are reported with CrashPlan.
Originally, Arq was mostly used with Amazon’s cloud storage, especially a service called “Glacier”, which is intended for exactly this application — long term, inexpensive storage of files that don’t need to be accessed often. The problem with Glacier is that the pricing structure is extremely complicated, and depends greatly on how much data you access in a period of time. If you had a calamity and had to restore all of your data from Glacier in a hurry, it could be quite expensive.
Just as I was making my backup decision, Google entered public beta with a service called Google Nearline. Long story short, storage pricing is competitive with Glacier, but the charges for fetching the data are flat fees with no dependency on how quickly you read from the storage. I figured out that it might cost me $100 or so to restore all of the data I can envision storing there if I had to, but since I would only ever do that if there were some huge catastrophe that took out my computer, my backup computer and my safe deposit box, that seemed pretty reasonable.
Nearline storage costs $0.01 per gigabyte. Put another way, 1TB of storage is $10 per month. CrashPlan and some of the other backup services offer flat-rate prices like $6/month for unlimited data, so Nearline could be more expensive if you have huge amounts of data. But, for the amount of data I will actually back up, Nearline is cheaper.
So, that’s where I am today. Most of my paperless documents are automatically fetched by FileThis and dropped into my Paperless Inbox folder on my computer, along with anything I scan manually. Hazel inspects the new arrivals, renames them and files them. Shortly thereafter, Arq backs them up to Google Nearline, and Time Machine backs them up to my local server.
I took those moving boxes to the shredding center, and watched with glee as they turned to dust.
Thanks Mike! Google Nearline is definitely a service to watch. The next part of Mike’s story will be about paperless bills and stamps. Watch out for that soon.