Backup of S3 Objects Using rsnapshot

I’ve been using rsnapshot to take backups of around 10 servers and laptops for well over 15 years, and it is a remarkably reliable tool that has proven itself many times. Rsnapshot uses rsync over SSH and maintains a temporal hard-link file pool. Once rsnapshot is configured and running, on the backup server, you get a hardlink farm with directories like this for the remote server:

/backup/serverA.domain/.sync/foo
/backup/serverA.domain/daily.0/foo
/backup/serverA.domain/daily.1/foo
/backup/serverA.domain/daily.2/foo
...
/backup/serverA.domain/daily.6/foo
/backup/serverA.domain/weekly.0/foo
/backup/serverA.domain/weekly.1/foo
...
/backup/serverA.domain/monthly.0/foo
/backup/serverA.domain/monthly.1/foo
...
/backup/serverA.domain/yearly.0/foo

I can browse and rescue files easily, going back in time when needed.

The rsnapshot project README explains more, there is a long rsnapshot HOWTO although I usually find the rsnapshot man page the easiest to digest.

I have stored multi-TB Git-LFS data on GitLab.com for some time. The yearly renewal is coming up, and the price for Git-LFS storage on GitLab.com is now excessive (~$10.000/year). I have reworked my work-flow and finally migrated debdistget to only store Git-LFS stubs on GitLab.com and push the real files to S3 object storage. The cost for this is barely measurable, I have yet to run into the €25/month warning threshold.

But how do you backup stuff stored in S3?

For some time, my S3 backup solution has been to run the minio-client mirror command to download all S3 objects to my laptop, and rely on rsnapshot to keep backups of this. While 4TB NVME’s are relatively cheap, I’ve felt that this disk and network churn on my laptop is unsatisfactory for quite some time.

What is a better approach?

I find S3 hosting sites fairly unreliable by design. Only a couple of clicks in your web browser and you have dropped 100TB of data. Or by someone else who steal your plaintext-equivalent cookie. Thus, I haven’t really felt comfortable using any S3-based backup option. I prefer to self-host, although continously running a mirror job is not sufficient: if I accidentally drop the entire S3 object store, my mirror run will remove all files locally too.

The rsnapshot approach that allows going back in time and having data on self-managed servers feels superior to me.

What if we could use rsnapshot with a S3 client instead of rsync?

Someone else asked about this several years ago, and the suggestion was to use the fuse-based s3fs which sounded unreliable to me. After some experimentation, working around some hard-coded assumption in the rsnapshot implementation, I came up with a small configuration pattern and a wrapper tool to implement what I desired.

Here is my configuration snippet:

cmd_rsync    /backup/s3/s3rsync
rsync_short_args    -Q
rsync_long_args    --json --remove
lockfile    /backup/s3/rsnapshot.pid
snapshot_root    /backup/s3
backup    s3:://hetzner/debdistget-gnuinos    ./debdistget-gnuinos
backup    s3:://hetzner/debdistget-tacos  ./debdistget-tacos
backup    s3:://hetzner/debdistget-diffos ./debdistget-diffos
backup    s3:://hetzner/debdistget-pureos ./debdistget-pureos
backup    s3:://hetzner/debdistget-kali   ./debdistget-kali
backup    s3:://hetzner/debdistget-devuan ./debdistget-devuan
backup    s3:://hetzner/debdistget-trisquel   ./debdistget-trisquel
backup    s3:://hetzner/debdistget-debian ./debdistget-debian

The idea is to save a backup of a couple of S3 buckets under /backup/s3/.

I have some scripts that take a complete rsnapshot.conf file and append my per-directory configuration so that this becomes a complete configuration. If you are curious how I roll this, backup-all invokes backup-one appending my rsnapshot.conf template with the snippet above.

The s3rsync wrapper script is the essential hack to convert rsnapshot’s rsync parameters into something that talks S3 and the script is as follows:

#!/bin/sh

set -eu

S3ARG=
for ARG in "$@"; do
    case $ARG in
    s3:://*) S3ARG="$S3ARG "$(echo $ARG | sed -e 's,s3:://,,');;
    -Q*) ;;
    *) S3ARG="$S3ARG $ARG";;
    esac
done

echo /backup/s3/mc mirror $S3ARG

exec /backup/s3/mc mirror $S3ARG

It uses the minio-client tool. I first tried s3cmd but its sync command read all files to compute MD5 checksums every time you invoke it, which is very slow. The mc mirror command is blazingly fast since it only compare mtime’s, just like rsync or git.

If I invoke a sync job for a fully synced up directory the output looks like this:

root@hamster /backup# /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf -V sync
Setting locale to POSIX "C"
echo 1443 > /backup/s3/rsnapshot.pid 
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-gnuinos \
    /backup/s3/.sync//debdistget-gnuinos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-gnuinos /backup/s3/.sync//debdistget-gnuinos
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-tacos \
    /backup/s3/.sync//debdistget-tacos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-tacos /backup/s3/.sync//debdistget-tacos
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-diffos \
    /backup/s3/.sync//debdistget-diffos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-diffos /backup/s3/.sync//debdistget-diffos
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-pureos \
    /backup/s3/.sync//debdistget-pureos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-pureos /backup/s3/.sync//debdistget-pureos
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-kali \
    /backup/s3/.sync//debdistget-kali 
/backup/s3/mc mirror --json --remove hetzner/debdistget-kali /backup/s3/.sync//debdistget-kali
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-devuan \
    /backup/s3/.sync//debdistget-devuan 
/backup/s3/mc mirror --json --remove hetzner/debdistget-devuan /backup/s3/.sync//debdistget-devuan
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-trisquel \
    /backup/s3/.sync//debdistget-trisquel 
/backup/s3/mc mirror --json --remove hetzner/debdistget-trisquel /backup/s3/.sync//debdistget-trisquel
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-debian \
    /backup/s3/.sync//debdistget-debian 
/backup/s3/mc mirror --json --remove hetzner/debdistget-debian /backup/s3/.sync//debdistget-debian
{"status":"success","total":0,"transferred":0,"duration":0,"speed":0}
touch /backup/s3/.sync/ 
rm -f /backup/s3/rsnapshot.pid 
/run/current-system/profile/bin/logger -p user.info -t rsnapshot[1443] \
    /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf \
    -V sync: completed successfully 
root@hamster /backup# 

You can tell from the paths that this machine runs Guix. This was the first production use of the Guix System for me, and the machine has been running since 2015 (with the occasional new hard drive). Before, I used rsnapshot on Debian, but some stable release of Debian dropped the rsnapshot package, paving the way for me to test Guix in production on a non-Internet exposed machine. Unfortunately, mc is not packaged in Guix, so you will have to install it from the MinIO Client GitHub page manually.

Running the daily rotation looks like this:

root@hamster /backup# /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf -V daily
Setting locale to POSIX "C"
echo 1549 > /backup/s3/rsnapshot.pid 
mv /backup/s3/daily.5/ /backup/s3/daily.6/ 
mv /backup/s3/daily.4/ /backup/s3/daily.5/ 
mv /backup/s3/daily.3/ /backup/s3/daily.4/ 
mv /backup/s3/daily.2/ /backup/s3/daily.3/ 
mv /backup/s3/daily.1/ /backup/s3/daily.2/ 
mv /backup/s3/daily.0/ /backup/s3/daily.1/ 
/run/current-system/profile/bin/cp -al /backup/s3/.sync /backup/s3/daily.0 
rm -f /backup/s3/rsnapshot.pid 
/run/current-system/profile/bin/logger -p user.info -t rsnapshot[1549] \
    /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf \
    -V daily: completed successfully 
root@hamster /backup# 

Hopefully you will feel inspired to take backups of your S3 buckets now!

Automatic Replicant Backup over USB using rsync

I have been using Replicant on the Samsung SIII I9300 for over two years. I have written before on taking a backup of the phone using rsync but recently I automated my setup as described below. This work was prompted by a screen accident with my phone that caused it to die, and I noticed that I hadn’t taken regular backups. I did not lose any data this time, since typically all content I create on the device is immediately synchronized to my clouds. Photos are uploaded by the ownCloud app, SMS Backup+ saves SMS and call logs to my IMAP server, and I use DAVDroid for synchronizing contacts, calendar and task lists with my instance of ownCloud. Still, I strongly believe in regular backups of everything, so it was time to automate this.

For my use-case, taking backups of the phone whenever I connect it to one of my laptops is sufficient. I typically connect it to my laptops for charging at least every other day. My laptops are all running Debian, but this should be applicable to most modern GNU/Linux system. This is not Replicant-specific, although you need a rooted phone. I thought that automating this would be simple, but I got to learn the ins and outs of systemd and udev in the process and this ended up taking the better part of an evening.

I started out adding an udev rule and a small script, thinking I could invoke the backup process from the udev rule. However rsync would magically die after running a few seconds. After an embarrassing long debugging session, finally I found someone with a similar problem which led me to a nice writeup on the topic of running long-running services on udev events. I created a file /etc/udev/rules.d/99-android-backup.rules with the following content:

ACTION=="add", SUBSYSTEMS=="usb", ENV{ID_SERIAL_SHORT}=="323048a5ae82918b", TAG+="systemd", ENV{SYSTEMD_WANTS}+="android-backup@$env{ID_SERIAL_SHORT}.service"
ACTION=="add", SUBSYSTEMS=="usb", ENV{ID_SERIAL_SHORT}=="4df9e09c25e75f63", TAG+="systemd", ENV{SYSTEMD_WANTS}+="android-backup@$env{ID_SERIAL_SHORT}.service"

The serial numbers correspond to the device serial numbers of the two devices I wish to backup. The adb devices command will print them for you, and you need to replace my values with the values from your phones. Next I created a systemd service to describe a oneshot service. The file /etc/systemd/system/android-backup@.service have the following content:

[Service]
Type=oneshot
ExecStart=/usr/local/sbin/android-backup %I

The at-sign (“@”) in the service filename signal that this is a service that takes a parameter. I’m not enough of an udev/systemd person to explain these two files using the proper terminology, but at least you can pattern-match and follow the basic idea of them: the udev rule matches the devices that I’m interested in (I don’t want this to happen to all random Android devices I attach, hence matching against known serial numbers), and it causes a systemd service with a parameter to be started. The systemd service file describe the script to run, and passes on the parameter.

Now for the juicy part, the script. I have /usr/local/sbin/android-backup with the following content.

#!/bin/bash

DIRBASE=/var/backups/android
export ANDROID_SERIAL="$1"

exec 2>&1 | logger

if ! test -d "$DIRBASE-$ANDROID_SERIAL"; then
    echo "could not find directory: $DIRBASE-$ANDROID_SERIAL"
    exit 1
fi

set -x

adb wait-for-device
adb root
adb wait-for-device
adb shell printf "address 127.0.0.1\nuid = root\ngid = root\n[root]\n\tpath = /\n" \> /mnt/secure/rsyncd.conf
adb shell rsync --daemon --no-detach --config=/mnt/secure/rsyncd.conf &
adb forward tcp:6010 tcp:873
sleep 2
rsync -av --delete --exclude /dev --exclude /acct --exclude /sys --exclude /proc rsync://localhost:6010/root/ $DIRBASE-$ANDROID_SERIAL/
: rc $?
adb forward --remove tcp:6010
adb shell rm -f /mnt/secure/rsyncd.conf

This script warrant more detailed explanation. Backups are placed under, e.g., /var/backups/android-323048a5ae82918b/ for later off-site backup (you do backup your laptop, right?). You have to manually create this directory, as a safety catch to not wildly rsync data into non-existing directories. The script logs everything using syslog, so run a tail -F /var/log/syslog& when setting this up. You may want to reduce verbosity of rsync if you prefer (replace rsync -av with rsync -a). The script runs adb wait-for-device which you rightly guessed will wait for the device to settle. Next adb root is invoked to get root on the device (reading all files from the system naturally requires root). It takes some time to switch, so another wait-for-device call is needed. Next a small rsyncd configuration file is created in /mnt/secure/rsyncd.conf on the phone. The file tells rsync do listen on localhost, run as root, and use / as the path. By default, rsyncd is read-only so the host will not be able to upload any data over rsync, just read data out. Next rsync is started on the phone. The adb forward command forwards port 6010 on the laptop to port 873 on the phone (873 is the default rsyncd port). Unfortunately, setting up the TCP forward appears to take some time, and adb wait-for-device will not wait for that to complete, hence an ugly sleep 2 at this point. Next is the rsync invocation itself, which just pulls in everything from the phone to the laptop, excluding some usual suspects. The somewhat cryptic : rc $? merely logs the exit code of the rsync process into syslog. Finally we clean up the TCP forward and remove the rsyncd.conf file that was temporarily created.

This setup appears stable to me. I can plug in a phone and a backup will be taken. I can even plug in both my devices at the same time, and they will run at the same time. If I unplug a device, the script or rsync will error out and systemd cleans up.

If anyone has ideas on how to avoid the ugly temporary rsyncd.conf file or the ugly sleep 2, I’m interested. It would also be nice to not have to do the ‘adb root’ dance, and instead have the phone start the rsync daemon when connecting to my laptop somehow. TCP forwarding might be troublesome on a multi-user system, but my laptops aren’t. Killing rsync on the phone is probably a good idea too. If you have ideas on how to fix any of this, other feedback, or questions, please let me know!