Backblaze Zfs



Nerds love discussing their backupstrategies, so I thought I give it a try.

When you subtract out the amount of storage that my ancient backups, ZFS snapshots, and other cruft take up, there’s right around 2TB of data that I’d label as critical. Using Backblaze’s B2 pricing calculator, it’d cost me about $10 a month (2000GB x $0.005) to store that critical data. Enter Backblaze Backblaze offers a $6 per month unlimited personal backup service but it only works on Windows and Mac – no love for Linux. Even if they showed love for Linux with that backup service, it would likely have a desktop client which would require a full desktop environment consistent with their other operating system offerings.

Goals

At a high-level, I I have a few goals for my backup strategy.

Backblaze Zfs
  1. Avoid permanent data loss. Permanent data loss is the absolute worst case scenario and it must be avoided at all costs. I fully intended to never lose a single picture in my photo library, a note in my note archive, or any other file.
  2. Avoid bit-rot. As happy as I am with the Apple ecosystem in general, I'm disappointed that APFS chose not to implement data checksumming, and that I can't buy a Macbook with ECC RAM. Thus, to avoid bit-rot, long-term data storage should be done on systems which do offer protection against corruption of data over time.
  3. Maintain High-Availability. As an independent software contractor, I can't just run to the IT department and request a new workstation when the drive in my laptop dies—I am the IT department. So, in-order to uphold the expectations of my clients, there should never be a day where I unexpectedly can't work due to a hardware failure.
  4. Maintain Security. I may choose to trust a vendor with storing data, but I don't want to have to trust them to not read it. All data should be encrypted with keys I manage before it leaves my control. Additionally, I'd rather not trust the vendor's bespoke backup client to perform this encryption or key management for me. This rules out most cloud backup providers.

Overview

Which brings me to my actual backup strategy:

  1. Galactica[1], my main workstation, does a daily SuperDuper clone to Atlantia, an external USB SSD.
  2. Galactica and Prometheus (my wife's laptop) both automatically run network Time Machine backups to RAIDZ datasets on Gemini, a FreeNAS server.
  3. Galactica uses Vorta to perform periodic Borg backups to another dataset on Gemini.
  4. Gemini periodically performs Borg backups of Pegasus (general NAS volume) to a dataset on another ZFS pool.
  5. After a Borg backup finishes, Gemini runs a cloud sync task to sync the Borg repository to a bucket on Backblaze B2.

Threat Model

All of this aims to fill the needs I described at the outset. Specifically, here are the failure modes I've thought through.

ScenarioAction
Macbook SSD dies on a work day.Boot from SuperDuper clone and continue working.
Macbook hardware failure (other than SSD)Get old Macbook from closet (was primary before current workstation) and boot from SuperDuper clone.
Need to provision a new Macbook to replace the dead one.Either (1) restore from SuperDuper clone or (2) restore from Time Machine.
Discover corrupted files on Macbook's drive.Either (1) restore from Time Machine or (2) restore from Borg repository.
Gemini's RAIDZ2 (data) loses 1 or 2 drives.Replace drives and re-silver. No data loss.
Gemini's RAIDZ2 (data) loses 3 or more drives.Replace drives and restore from Borg backup.
Gemini's RAIDZ (data2) loses 1 drive.Replace drive and resilver. No data loss.
Gemini's RAIDZ (data2) loses 2 or more drives.Replace drives and restore from B2's version of the Borg backup.
Ransomware attack encrypts Galactica.Restore from SuperDuper clone.
Ransomware attack encrypts Galactica and Atlantia.Restore from Time Machine.
Ransomware attack encrypts Galactica, Atlantia, and all Gemini volumes.Restore Gemini's ZFS pools from the last good snapshot, then restore Galactica.
House burns down and takes Galactica, Atlantia, and Gemini with it.Buy new house and Macbook, restore from B2's version of the Borg backup.
Nuclear attack takes out my house as well as the B2 datacenter.Dataloss occurs, but I'm most likely dead also and, thus, don't care.
Backblaze becomes untrustworthy and starts reading my data.They don't have the keys because it was encrypted by Borg.
Borg becomes untrustworthy and encrypts data in a flawed way.Borg doesn't have access to my data since it's stored on B2.
Someone working for Backblaze purposefully alters Borg's (open-source) code to break their encryption, whilst also having access to my stored data.They gain access to my data. But, this seems exceedingly unlikely.

This strategy upholds my goals.

  1. Security is upheld because no one gets both the data and the keys.
  2. Bit-rot is mitigated by ZFS data checksumming.
  3. Ransomware attacks are mitigated by multiple backups and ZFS snapshots.
  4. Availability is upheld by bootable SuperDuper clones and by keeping around a last-gen laptop.
  5. Data loss is mitigated by drive redundancy in Gemini; by using multiple types of backup software, at least one of which is open-source; and by moving some backups offsite.

So, barring nuclear war and/or some sort of Nation State attack on the Borg project, I don't plan on losing any data anytime soon.

  1. Yes, all of my computers and hard drives are named after ships in the Battlestar Galactica flotilla. ↩︎

Context

  • I have a FreeNAS setup
  • I want a cheap offsite backup solution
  • Backblaze only offer S3 storage solution
  • I have virtual machines on zvol to backup (not just files)
  • ZFS send/receive is very convenient if you have another zfs system in other
    location

I choose zfsbackup-go, this tool use zfs send/receive to generate archives on S3 storage space. This software is not ready for very critical use, still in « beta » missing some features like deletation on remote location from the command line.

Advantages

  • Can backup zvol as well as dataset
  • Differential backups based on snapshots
  • Encrypted backups
  • Compressed backups
  • Compatible with all S3 solutions
BackblazeZfsBackblaze zfs

Drawbacks

  • You can’t use backups without re-import in a ZFS pool (so real backups
    not a archives solution)
  • Still in beta

Backblaze Zfs

How to

Backblaze
  1. (Optional) Build zfsbackup-go for freebsd with crosscompile from a linux
    1. You need GO
    2. git clone git@github.com:someone1/zfsbackup-go.git
    3. GOOS=freebsd GOARCH=amd64 go build -o zfsbackup main.go
  2. Generate a gpg key gpg --full-generate-key
  3. Export the private/public gpg keys

    1. gpg --list-keys
    2. gpg --export-secret-keys -a keyid
    3. gpg --export -a keyid
  4. Create the S3 credentials you need
    I use BackBlaze because it’s the cheapper one, but you can use every S3 storage service or implementation you want.

  5. Configure periodic snapshot in FreeNas

  6. Configure zfsbackup cron task

    • You need to ensure the schedule is after the snapshot
    • I have develop a very little script to have a more simpler crontask because zfsbackup only support one Dataset at a time
    • Command DATASETS='Tank/Dataset1 Tank/Group/Data1' /root/zfsbackup.sh

Backblaze B2 Zfs

Conclusion

  • This solution offer very good performance like 15MB/s during backup operation with only 10% cpu usage on Xeon D1521. My Orange ISP uplink is the limitation here.
  • With incremental snapshot, BackBlaze is a very effective solution at low cost like less than 60€ a year for a 100GB on disk backuped VM.
  • DO BACKUPS
  • DO BACKUPS
  • DO BACKUPS