Independently Reproducible Git Bundles

The gnulib project publish a git bundle as a stable archival copy of the gnulib git repository once in a while.

Why? We don’t know exactly what this may be useful for, but I’m promoting for this to see if we can establish some good use-case.

A git bundle may help to establish provinence in case of an attack on the Savannah hosting platform that compromise the gnulib git repository.

Another use is in the Debian gnulib package: that gnulib bundle is git cloned when building some Debian packages, to get to exactly the gnulib commit used by each upstream project – see my talk on gnulib at Debconf24 – and this approach reduces the amount of vendored code that is part of Debian’s source code, which is relevant to mitigate XZ-style attacks.

The first time we published the bundle, I wanted it to be possible to re-create it bit-by-bit identically by others.

At the time I discovered a well-written blog post by Paul Beacher on reproducible git bundles and thought he had solved the problem for me. Essentially it boils down to disable threading during compression when producing the bundle, and his final example show this results in a predictable bit-by-bit identical output:

$ for i in $(seq 1 100); do \
> git -c 'pack.threads=1' bundle create -q /tmp/bundle-$i --all; \
> done
$ md5sum /tmp/bundle-* | cut -f 1 -d ' ' | uniq -c
    100 4898971d4d3b8ddd59022d28c467ffca

So what remains to be said about this? It seems reproducability goes deeper than that. One desirable property is that someone else should be able to reproduce the same git bundle, and not only that a single individual is able to reproduce things on one machine.

It surprised me to see that when I ran the same set of commands on a different machine (started from a fresh git clone), I got a different checksum. The different checksums occured even when nothing had been committed on the server side between the two runs.

I thought the reason had to do with other sources of unpredictable data, and I explored several ways to work around this but eventually gave up. I settled for the following sequence of commands:

REV=ac9dd0041307b1d3a68d26bf73567aa61222df54 # master branch commit to package
git clone https://git.savannah.gnu.org/git/gnulib.git
cd gnulib
git fsck # attempt to validate input
# inspect that the new tree matches a trusted copy
git checkout -B master $REV # put $REV at master
for b in $(git branch -r | grep origin/stable- | sort --version-sort); do git checkout ${b#origin/}; done
git remote remove origin # drop some unrelated branches
git gc --prune=now # drop any commits after $REV
git -c 'pack.threads=1' bundle create gnulib.bundle --all
V=$(env TZ=UTC0 git show -s --date=format:%Y%m%d --pretty=%cd master)
mv gnulib.bundle gnulib-$V.bundle
build-aux/gnupload --to ftp.gnu.org:gnulib gnulib-$V.bundle

At the time it felt more important to publish something than to reach for perfection, so we did so using the above snippet. Afterwards I reached out to the git community on this and there were good discussion about my challenge.

At the end of that thread you see that I was finally able to reproduce a bit-by-bit identical bundles from two different clones, by using an intermediate git -c pack.threads=1 repack -adF step. I now assume that the unpredictable data I got earlier was introduced during the ‘git clone’ steps, compressing the pack differently each time due to threaded compression. The outcome could also depend on what content the server provided, so if someone ran git gc, git repack on the server side things would change for the user, even if the user forced threading to 1 during cloning — more experiments on what kind of server-side alterations results in client-side differences would be good research.

A couple of months passed and it is now time to publish another gnulib bundle – somewhat paired to the bi-yearly stable gnulib branches – so let’s walk through the commands and explain what they do. First clone the repository:

REV=225973a89f50c2b494ad947399425182dd42618c   # master branch commit to package
S1REV=475dd38289d33270d0080085084bf687ad77c74d # stable-202501 branch commit
S2REV=e8cc0791e6bb0814cf4e88395c06d5e06655d8b5 # stable-202507 branch commit
git clone https://git.savannah.gnu.org/git/gnulib.git
cd gnulib
git fsck # attempt to validate input

I believe the git fsck will validate that the chain of SHA1 commits are linked together, preventing someone from smuggling in unrelated commits earlier in the history without having to do SHA1 collision. SHA1 collisions are economically feasible today, so this isn’t much of a guarantee of anything though.

git checkout -B master $REV # put $REV at master
# Add all stable-* branches locally:
for b in $(git branch -r | grep origin/stable- | sort --version-sort); do git checkout ${b#origin/}; done
git checkout -B stable-202501 $S1REV
git checkout -B stable-202507 $S2REV
git remote remove origin # drop some unrelated branches
git gc --prune=now # drop any unrelated commits, not clear this helps

This establish a set of branches pinned to particular commits. The older stable-* branches are no longer updated, so they shouldn’t be moving targets. In case they are modified in the future, the particular commit we used will be found in the official git bundle.

time git -c pack.threads=1 repack -adF

That’s the new magic command to repack and recompress things in a hopefully more predictable way. This leads to a 72MB git pack under .git/objects/pack/ and a 62MB git bundle. The runtime on my laptop is around 5 minutes.

I experimented with -c pack.compression=1 and -c pack.compression=9 but the size was roughly the same; 76MB and 66MB for level 1 and 72MB and 62MB for level 9. Runtime still around 5 minutes.

Git uses zlib by default, which isn’t the most optimal compression around. I tried -c pack.compression=0 and got a 163MB git pack and a 153MB git bundle. The runtime is still around 5 minutes, indicating that compression is not the bottleneck for the git repack command.

That 153MB uncompressed git bundle compresses to 48MB with gzip default settings and 46MB with gzip -9; to 39MB with zst defaults and 34MB with zst -9; and to 28MB using xz defaults with a small 26MB using xz -9.

Still the inconvenience of having to uncompress a 30-40MB archive into
the much larger 153MB is probably not worth the savings compared to
shipping and using the (still relatively modest) 62MB git bundle.

Now finally prepare the bundle and ship it:

git -c 'pack.threads=1' bundle create gnulib.bundle --all
V=$(env TZ=UTC0 git show -s --date=format:%Y%m%d --pretty=%cd master)
mv gnulib.bundle gnulib-$V.bundle
build-aux/gnupload --to ftp.gnu.org:gnulib gnulib-$V.bundle

Yay! Another gnulib git bundle snapshot is available from
https://ftp.gnu.org/gnu/gnulib/.

The essential part of the git repack command is the -F parameter. In the thread -f was suggested, which translates into the git pack-objects --no-reuse-delta parameter:

--no-reuse-delta

When creating a packed archive in a repository that has existing packs, the command reuses existing deltas. This sometimes results in a slightly suboptimal pack. This flag tells the command not to reuse existing deltas but compute them from scratch.

When reading the man page, I though that using -F which translates into --no-reuse-object would be slightly stronger:

--no-reuse-object

This flag tells the command not to reuse existing object data at all, including non deltified object, forcing recompression of everything. This implies --no-reuse-delta. Useful only in the obscure case where wholesale enforcement of a different compression level on the packed data is desired.

On the surface, without --no-reuse-objects, some amount of earlier compression could taint the final result. Still, I was able to get bit-by-bit identical bundles by using -f so possibly reaching for -F is not necessary.

All the commands were done using git 2.51.0 as packaged by Guix. I fear the result may be different with other git versions and/or zlib libraries. I was able to reproduce the same bundle on a Trisquel 12 aramo (derived from Ubuntu 22.04) machine, which uses git 2.34.1. This suggests there is some chances of this being possible to reproduce in 20 years time. Time will tell.

I also fear these commands may be insufficient if something is moving on the server-side of the git repository of gnulib (even just something simple as a new commit), I tried to make some experiments with this but let’s aim for incremental progress here. At least I have now been able to reproduce the same bundle on different machines, which wasn’t the case last time.

Happy Reproducible Git Bundle Hacking!

Building Debian in a GitLab Pipeline

After thinking about multi-stage Debian rebuilds I wanted to implement the idea. Recall my illustration:

Earlier I rebuilt all packages that make up the difference between Ubuntu and Trisquel. It turned out to be a 42% bit-by-bit identical similarity. To check the generality of my approach, I rebuilt the difference between Debian and Devuan too. That was the debdistreproduce project. It “only” had to orchestrate building up to around 500 packages for each distribution and per architecture.

Differential reproducible rebuilds doesn’t give you the full picture: it ignore the shared package between the distribution, which make up over 90% of the packages. So I felt a desire to do full archive rebuilds. The motivation is that in order to trust Trisquel binary packages, I need to trust Ubuntu binary packages (because that make up 90% of the Trisquel packages), and many of those Ubuntu binaries are derived from Debian source packages. How to approach all of this? Last year I created the debdistrebuild project, and did top-50 popcon package rebuilds of Debian bullseye, bookworm, trixie, and Ubuntu noble and jammy, on a mix of amd64 and arm64. The amount of reproducibility was lower. Primarily the differences were caused by using different build inputs.

Last year I spent (too much) time creating a mirror of snapshot.debian.org, to be able to have older packages available for use as build inputs. I have two copies hosted at different datacentres for reliability and archival safety. At the time, snapshot.d.o had serious rate-limiting making it pretty unusable for massive rebuild usage or even basic downloads. Watching the multi-month download complete last year had a meditating effect. The completion of my snapshot download co-incided with me realizing something about the nature of rebuilding packages. Let me below give a recap of the idempotent rebuilds idea, because it motivate my work to build all of Debian from a GitLab pipeline.

One purpose for my effort is to be able to trust the binaries that I use on my laptop. I believe that without building binaries from source code, there is no practically feasible way to trust binaries. To trust any binary you receive, you can de-assemble the bits and audit the assembler instructions for the CPU you will execute it on. Doing that on a OS-wide level this is unpractical. A more practical approach is to audit the source code, and then confirm that the binary is 100% bit-by-bit identical to one that you can build yourself (from the same source) on your own trusted toolchain. This is similar to a reproducible build.

My initial goal with debdistrebuild was to get to 100% bit-by-bit identical rebuilds, and then I would have trustworthy binaries. Or so I thought. This also appears to be the goal of reproduce.debian.net. They want to reproduce the official Debian binaries. That is a worthy and important goal. They achieve this by building packages using the build inputs that were used to build the binaries. The build inputs are earlier versions of Debian packages (not necessarily from any public Debian release), archived at snapshot.debian.org.

I realized that these rebuilds would be not be sufficient for me: it doesn’t solve the problem of how to trust the toolchain. Let’s assume the reproduce.debian.net effort succeeds and is able to 100% bit-by-bit identically reproduce the official Debian binaries. Which appears to be within reach. To have trusted binaries we would “only” have to audit the source code for the latest version of the packages AND audit the tool chain used. There is no escaping from auditing all the source code — that’s what I think we all would prefer to focus on, to be able to improve upstream source code.

The trouble is about auditing the tool chain. With the Reproduce.debian.net approach, that is a recursive problem back to really ancient Debian packages, some of them which may no longer build or work, or even be legally distributable. Auditing all those old packages is a LARGER effort than auditing all current packages! Doing auditing of old packages is of less use to making contributions: those releases are old, and chances are any improvements have already been implemented and released. Or that improvements are no longer applicable because the projects evolved since the earlier version.

See where this is going now? I reached the conclusion that reproducing official binaries using the same build inputs is not what I’m interested in. I want to be able to build the binaries that I use from source using a toolchain that I can also build from source. And preferably that all of this is using latest version of all packages, so that I can contribute and send patches for them, to improve matters.

The toolchain that Reproduce.Debian.Net is using is not trustworthy unless all those ancient packages are audited or rebuilt bit-by-bit identically, and I don’t see any practical way forward to achieve that goal. Nor have I seen anyone working on that problem. It is possible to do, though, but I think there are simpler ways to achieve the same goal.

My approach to reach trusted binaries on my laptop appears to be a three-step effort:

  • Encourage an idempotently rebuildable Debian archive, i.e., a Debian archive that can be 100% bit-by-bit identically rebuilt using Debian itself.
  • Construct a smaller number of binary *.deb packages based on Guix binaries that when used as build inputs (potentially iteratively) leads to 100% bit-by-bit identical packages as in step 1.
  • Encourage a freedom respecting distribution, similar to Trisquel, from this idempotently rebuildable Debian.

How to go about achieving this? Today’s Debian build architecture is something that lack transparency and end-user control. The build environment and signing keys are managed by, or influenced by, unidentified people following undocumented (or at least not public) security procedures, under unknown legal jurisdictions. I always wondered why none of the Debian-derivates have adopted a modern GitDevOps-style approach as a method to improve binary build transparency, maybe I missed some project?

If you want to contribute to some GitHub or GitLab project, you click the ‘Fork’ button and get a CI/CD pipeline running which rebuild artifacts for the project. This makes it easy for people to contribute, and you get good QA control because the entire chain up until its artifact release are produced and tested. At least in theory. Many projects are behind on this, but it seems like this is a useful goal for all projects. This is also liberating: all users are able to reproduce artifacts. There is no longer any magic involved in preparing release artifacts. As we’ve seen with many software supply-chain security incidents for the past years, where the “magic” is involved is a good place to introduce malicious code.

To allow me to continue with my experiment, I thought the simplest way forward was to setup a GitDevOps-centric and user-controllable way to build the entire Debian archive. Let me introduce the debdistbuild project.

Debdistbuild is a re-usable GitLab CI/CD pipeline, similar to the Salsa CI pipeline. It provide one “build” job definition and one “deploy” job definition. The pipeline can run on GitLab.org Shared Runners or you can set up your own runners, like my GitLab riscv64 runner setup. I have concerns about relying on GitLab (both as software and as a service), but my ideas are easy to transfer to some other GitDevSecOps setup such as Codeberg.org. Self-hosting GitLab, including self-hosted runners, is common today, and Debian rely increasingly on Salsa for this. All of the build infrastructure could be hosted on Salsa eventually.

The build job is simple. From within an official Debian container image build packages using dpkg-buildpackage essentially by invoking the following commands.

sed -i 's/ deb$/ deb deb-src/' /etc/apt/sources.list.d/*.sources
apt-get -o Acquire::Check-Valid-Until=false update
apt-get dist-upgrade -q -y
apt-get install -q -y --no-install-recommends build-essential fakeroot
env DEBIAN_FRONTEND=noninteractive \
    apt-get build-dep -y --only-source $PACKAGE=$VERSION
useradd -m build
DDB_BUILDDIR=/build/reproducible-path
chgrp build $DDB_BUILDDIR
chmod g+w $DDB_BUILDDIR
su build -c "apt-get source --only-source $PACKAGE=$VERSION" > ../$PACKAGE_$VERSION.build
cd $DDB_BUILDDIR
su build -c "dpkg-buildpackage"
cd ..
mkdir out
mv -v $(find $DDB_BUILDDIR -maxdepth 1 -type f) out/

The deploy job is also simple. It commit artifacts to a Git project using Git-LFS to handle large objects, essentially something like this:

if ! grep -q '^pool/**' .gitattributes; then
    git lfs track 'pool/**'
    git add .gitattributes
    git commit -m"Track pool/* with Git-LFS." .gitattributes
fi
POOLDIR=$(if test "$(echo "$PACKAGE" | cut -c1-3)" = "lib"; then C=4; else C=1; fi; echo "$DDB_PACKAGE" | cut -c1-$C)
mkdir -pv pool/main/$POOLDIR/
rm -rfv pool/main/$POOLDIR/$PACKAGE
mv -v out pool/main/$POOLDIR/$PACKAGE
git add pool
git commit -m"Add $PACKAGE." -m "$CI_JOB_URL" -m "$VERSION" -a
if test "${DDB_GIT_TOKEN:-}" = ""; then
    echo "SKIP: Skipping git push due to missing DDB_GIT_TOKEN (see README)."
else
    git push -o ci.skip
fi

That’s it! The actual implementation is a bit longer, but the major difference is for log and error handling.

You may review the source code of the base Debdistbuild pipeline definition, the base Debdistbuild script and the rc.d/-style scripts implementing the build.d/ process and the deploy.d/ commands.

There was one complication related to artifact size. GitLab.org job artifacts are limited to 1GB. Several packages in Debian produce artifacts larger than this. What to do? GitLab supports up to 5GB for files stored in its package registry, but this limit is too close for my comfort, having seen some multi-GB artifacts already. I made the build job optionally upload artifacts to a S3 bucket using SHA256 hashed file hierarchy. I’m using Hetzner Object Storage but there are many S3 providers around, including self-hosting options. This hierarchy is compatible with the Git-LFS .git/lfs/object/ hierarchy, and it is easy to setup a separate Git-LFS object URL to allow Git-LFS object downloads from the S3 bucket. In this mode, only Git-LFS stubs are pushed to the git repository. It should have no trouble handling the large number of files, since I have earlier experience with Apt mirrors in Git-LFS.

To speed up job execution, and to guarantee a stable build environment, instead of installing build-essential packages on every build job execution, I prepare some build container images. The project responsible for this is tentatively called stage-N-containers. Right now it create containers suitable for rolling builds of trixie on amd64, arm64, and riscv64, and a container intended for as use the stage-0 based on the 20250407 docker images of bookworm on amd64 and arm64 using the snapshot.d.o 20250407 archive. Or actually, I’m using snapshot-cloudflare.d.o because of download speed and reliability. I would have prefered to use my own snapshot mirror with Hetzner bandwidth, alas the Debian snapshot team have concerns about me publishing the list of (SHA1 hash) filenames publicly and I haven’t been bothered to set up non-public access.

Debdistbuild has built around 2.500 packages for bookworm on amd64 and bookworm on arm64. To confirm the generality of my approach, it also build trixie on amd64, trixie on arm64 and trixie on riscv64. The riscv64 builds are all on my own hosted runners. For amd64 and arm64 my own runners are only used for large packages where the GitLab.com shared runners run into the 3 hour time limit.

What’s next in this venture? Some ideas include:

  • Optimize the stage-N build process by identifying the transitive closure of build dependencies from some initial set of packages.
  • Create a build orchestrator that launches pipelines based on the previous list of packages, as necessary to fill the archive with necessary packages. Currently I’m using a basic /bin/sh for loop around curl to trigger GitLab CI/CD pipelines with names derived from https://popcon.debian.org/.
  • Create and publish a dists/ sub-directory, so that it is possible to use the newly built packages in the stage-1 build phase.
  • Produce diffoscope-style differences of built packages, both stage0 against official binaries and between stage0 and stage1.
  • Create the stage-1 build containers and stage-1 archive.
  • Review build failures. On amd64 and arm64 the list is small (below 10 out of ~5000 builds), but on riscv64 there is some icache-related problem that affects Java JVM that triggers build failures.
  • Provide GitLab pipeline based builds of the Debian docker container images, cloud-images, debian-live CD and debian-installer ISO’s.
  • Provide integration with Sigstore and Sigsum for signing of Debian binaries with transparency-safe properties.
  • Implement a simple replacement for dpkg and apt using /bin/sh for use during bootstrapping when neither packaging tools are available.

What do you think?

GitLab Runner with Rootless Privilege-less Capability-less Podman on riscv64

I host my own GitLab CI/CD runners, and find that having coverage on the riscv64 CPU architecture is useful for testing things. The HiFive Premier P550 seems to be a common hardware choice. The P550 is possible to purchase online. You also need a (mini-)ATX chassi, power supply (~500W is more than sufficient), PCI-to-M2 converter and a NVMe storage device. Total cost per machine was around $8k/€8k for me. Assembly was simple: bolt everything, connect ATX power, connect cables for the front-panel, USB and and Audio. Be sure to toggle the physical power switch on the P550 before you close the box. Front-panel power button will start your machine. There is a P550 user manual available.

Below I will guide you to install the GitLab Runner on the pre-installed Ubuntu 24.04 that ships with the P550, and configure it to use Podman in root-less mode and without the --privileged flag, without any additional capabilities like SYS_ADMIN. Presumably you want to migrate to some other OS instead; hey Trisquel 13 riscv64 I’m waiting for you! I wouldn’t recommend using this machine for anything sensitive, there is an awful lot of non-free and/or vendor-specific software installed, and the hardware itself is young. I am not aware of any riscv64 hardware that can run a libre OS, all of them appear to require non-free blobs and usually a non-mainline kernel.

  • Login on console using username ‘ubuntu‘ and password ‘ubuntu‘. You will be asked to change the password, so do that.
  • Start a terminal, gain root with sudo -i and change the hostname:
    echo jas-p550-01 > /etc/hostname
  • Connect ethernet and run: apt-get update && apt-get dist-upgrade -u.
  • If your system doesn’t have valid MAC address (they show as MAC ‘8c:00:00:00:00:00 if you run ‘ip a’), you can fix this to avoid collisions if you install multiple P550’s on the same network. Connect the Debug USB-C connector on the back to one of the hosts USB-A slots. Use minicom (use Ctrl-A X to exit) to talk to it.
apt-get install minicom
minicom -o -D /dev/ttyUSB3
#cmd: ifconfig
inet 192.168.0.2 netmask: 255.255.240.0
gatway 192.168.0.1
SOM_Mac0: 8c:00:00:00:00:00
SOM_Mac1: 8c:00:00:00:00:00
MCU_Mac: 8c:00:00:00:00:00
#cmd: setmac 0 CA:FE:42:17:23:00
The MAC setting will be valid after rebooting the carrier board!!!
MAC[0] addr set to CA:FE:42:17:23:00(ca:fe:42:17:23:0)
#cmd: setmac 1 CA:FE:42:17:23:01
The MAC setting will be valid after rebooting the carrier board!!!
MAC[1] addr set to CA:FE:42:17:23:01(ca:fe:42:17:23:1)
#cmd: setmac 2 CA:FE:42:17:23:02
The MAC setting will be valid after rebooting the carrier board!!!
MAC[2] addr set to CA:FE:42:17:23:02(ca:fe:42:17:23:2)
#cmd:
  • For reference, if you wish to interact with the MCU you may do that via OpenOCD and telnet, like the following (as root on the P550). You need to have the Debug USB-C connected to a USB-A host port.
apt-get install openocd
wget https://raw.githubusercontent.com/sifiveinc/hifive-premier-p550-tools/refs/heads/master/mcu-firmware/stm32_openocd.cfg
echo 'acc115d283ff8533d6ae5226565478d0128923c8a479a768d806487378c5f6c3 stm32_openocd.cfg' | sha256sum -c
openocd -f stm32_openocd.cfg &
telnet localhost 4444
...
  • Reboot the machine and login remotely from your laptop. Gain root and set up SSH public-key authentication and disable SSH password logins.
echo 'ssh-ed25519 AAA...' > ~/.ssh/authorized_keys
sed -i 's;^#PasswordAuthentication.*;PasswordAuthentication no;' /etc/ssh/sshd_config
service ssh restart
  • With a NVME device in the PCIe slot, create a LVM partition where the GitLab runner will live:
parted /dev/nvme0n1 print
blkdiscard /dev/nvme0n1
parted /dev/nvme0n1 mklabel gpt
parted /dev/nvme0n1 mkpart jas-p550-nvm-02 ext2 1MiB 100% align-check optimal 1
parted /dev/nvme0n1 set 1 lvm on
partprobe /dev/nvme0n1
pvcreate /dev/nvme0n1p1
vgcreate vg0 /dev/nvme0n1p1
lvcreate -L 400G -n glr vg0
mkfs.ext4 -L glr /dev/mapper/vg0-glr

Now with a reasonable setup ready, let’s install the GitLab Runner. The following is adapted from gitlab-runner’s official installation instructions documentation. The normal installation flow doesn’t work because they don’t publish riscv64 apt repositories, so you will have to perform upgrades manually.

# wget https://s3.dualstack.us-east-1.amazonaws.com/gitlab-runner-downloads/latest/deb/gitlab-runner_riscv64.deb
# wget https://s3.dualstack.us-east-1.amazonaws.com/gitlab-runner-downloads/latest/deb/gitlab-runner-helper-images.deb
wget https://gitlab-runner-downloads.s3.amazonaws.com/v17.11.0/deb/gitlab-runner_riscv64.deb
wget https://gitlab-runner-downloads.s3.amazonaws.com/v17.11.0/deb/gitlab-runner-helper-images.deb
echo '68a4c2a4b5988a5a5bae019c8b82b6e340376c1b2190228df657164c534bc3c3 gitlab-runner-helper-images.deb' | sha256sum -c
echo 'ee37dc76d3c5b52e4ba35cf8703813f54f536f75cfc208387f5aa1686add7a8c gitlab-runner_riscv64.deb' | sha256sum -c
dpkg -i gitlab-runner-helper-images.deb gitlab-runner_riscv64.deb

Remember the NVMe device? Let’s not forget to use it, to avoid wear and tear of the internal MMC root disk. Do this now before any files in /home/gitlab-runner appears, or you have to move them manually.

gitlab-runner stop
echo 'LABEL=glr /home/gitlab-runner ext4 defaults,noatime 0 1' >> /etc/fstab
systemctl daemon-reload
mount /home/gitlab-runner

Next install gitlab-runner and configure it. Replace token glrt-REPLACEME below with the registration token you get from your GitLab project’s Settings -> CI/CD -> Runners -> New project runner. I used the tags ‘riscv64‘ and a runner description of the hostname.

gitlab-runner register --non-interactive --url https://gitlab.com --token glrt-REPLACEME --name $(hostname) --executor docker --docker-image debian:stable

We install and configure gitlab-runner to use podman, and to use non-root user.

apt-get install podman
gitlab-runner stop
usermod --add-subuids 100000-165535 --add-subgids 100000-165535 gitlab-runner

You need to run some commands as the gitlab-runner user, but unfortunately some interaction between sudo/su and pam_systemd makes this harder than it should be. So you have to setup SSH for the user and login via SSH to run the commands. Does anyone know of a better way to do this?

# on the p550:
cp -a /root/.ssh/ /home/gitlab-runner/
chown -R gitlab-runner:gitlab-runner /home/gitlab-runner/.ssh/
# on your laptop:
ssh gitlab-runner@jas-p550-01
systemctl --user --now enable podman.socket
systemctl --user --now start podman.socket
loginctl enable-linger gitlab-runner gitlab-runner
systemctl status --user podman.socket

We modify /etc/gitlab-runner/config.toml as follows, replace 997 with the user id shown by systemctl status above. See feature flags documentation for more documentation.

...
[[runners]]
environment = ["FF_NETWORK_PER_BUILD=1", "FF_USE_FASTZIP=1"]
...
[runners.docker]
host = "unix:///run/user/997/podman/podman.sock"

Note that unlike the documentation I do not add the ‘privileged = true‘ parameter here. I will come back to this later.

Restart the system to confirm that pushing a .gitlab-ci.yml with a job that uses the riscv64 tag like the following works properly.

dump-env-details-riscv64:
stage: build
image: riscv64/debian:testing
tags: [ riscv64 ]
script:
- set

Your gitlab-runner should now be receiving jobs and running them in rootless podman. You may view the log using journalctl as follows:

journalctl --follow _SYSTEMD_UNIT=gitlab-runner.service

To stop the graphical environment and disable some unnecessary services, you can use:

systemctl set-default multi-user.target
systemctl disable openvpn cups cups-browsed sssd colord

At this point, things were working fine and I was running many successful builds. Now starts the fun part with operational aspects!

I had a problem when running buildah to build a new container from within a job, and noticed that aardvark-dns was crashing. You can use the Debian ‘aardvark-dns‘ binary instead.

wget http://ftp.de.debian.org/debian/pool/main/a/aardvark-dns/aardvark-dns_1.14.0-3_riscv64.deb
echo 'df33117b6069ac84d3e97dba2c59ba53775207dbaa1b123c3f87b3f312d2f87a aardvark-dns_1.14.0-3_riscv64.deb' | sha256sum -c
mkdir t
cd t
dpkg -x ../aardvark-dns_1.14.0-3_riscv64.deb .
mv /usr/lib/podman/aardvark-dns /usr/lib/podman/aardvark-dns.ubuntu
mv usr/lib/podman/aardvark-dns /usr/lib/podman/aardvark-dns.debian

My setup uses podman in rootless mode without passing the –privileged parameter or any –add-cap parameters to add non-default capabilities. This is sufficient for most builds. However if you try to create container using buildah from within a job, you may see errors like this:

Writing manifest to image destination
Error: mounting new container: mounting build container "8bf1ec03d967eae87095906d8544f51309363ddf28c60462d16d73a0a7279ce1": creating overlay mount to /var/lib/containers/storage/overlay/23785e20a8bac468dbf028bf524274c91fbd70dae195a6cdb10241c345346e6f/merged, mount_data="lowerdir=/var/lib/containers/storage/overlay/l/I3TWYVYTRZ4KVYCT6FJKHR3WHW,upperdir=/var/lib/containers/storage/overlay/23785e20a8bac468dbf028bf524274c91fbd70dae195a6cdb10241c345346e6f/diff,workdir=/var/lib/containers/storage/overlay/23785e20a8bac468dbf028bf524274c91fbd70dae195a6cdb10241c345346e6f/work,volatile": using mount program /usr/bin/fuse-overlayfs: unknown argument ignored: lazytime
fuse: device not found, try 'modprobe fuse' first
fuse-overlayfs: cannot mount: No such file or directory
: exit status 1

According to GitLab runner security considerations, you should not enable the ‘privileged = true’ parameter, and the alternative appears to run Podman as root with privileged=false. Indeed setting privileged=true as in the following example solves the problem, and I suppose running podman as root would too.

[[runners]]
[runners.docker]
privileged = true

Can we do better? After some experimentation, and reading open issues with suggested capabilities and configuration snippets, I ended up with the following configuration. It runs podman in rootless mode (as the gitlab-runner user) without --privileged, but add the CAP_SYS_ADMIN capability and exposes the /dev/fuse device. Still, this is running as non-root user on the machine, so I think it is an improvement compared to using --privileged and also compared to running podman as root.

[[runners]]
[runners.docker]
privileged = false
cap_add = ["SYS_ADMIN"]
devices = ["/dev/fuse"]

Still I worry about the security properties of such a setup, so I only enable these settings for a separately configured runner instance that I use when I need this docker-in-docker (oh, I meant buildah-in-podman) functionality. I found one article discussing Rootless Podman without the privileged flag that suggest –isolation=chroot but I have yet to make this work. Suggestions for improvement are welcome.

Happy Riscv64 Building!

Update 2025-05-05: I was able to make it work without the SYS_ADMIN capability too, with a GitLab /etc/gitlab-runner/config.toml like the following:

[[runners]]
  [runners.docker]
    privileged = false
    devices = ["/dev/fuse"]

And passing --isolation chroot to Buildah like this:

buildah build --isolation chroot -t $CI_REGISTRY_IMAGE:name image/

I’ve updated the blog title to add the word “capability-less” as well. I’ve confirmed that the same recipe works on podman on a ppc64el platform too. Remaining loop-holes are escaping from the chroot into the non-root gitlab-runner user, and escalating that privilege to root. The /dev/fuse and sub-uid/gid may be privilege escalation vectors here, otherwise I believe you’ve found a serious software security issue rather than a configuration mistake.

Verified Reproducible Tarballs

Remember the XZ Utils backdoor? One factor that enabled the attack was poor auditing of the release tarballs for differences compared to the Git version controlled source code. This proved to be a useful place to distribute malicious data.

The differences between release tarballs and upstream Git sources is typically vendored and generated files. Lots of them. Auditing all source tarballs in a distribution for similar issues is hard and boring work for humans. Wouldn’t it be better if that human auditing time could be spent auditing the actual source code stored in upstream version control instead? That’s where auditing time would help the most.

Are there better ways to address the concern about differences between version control sources and tarball artifacts? Let’s consider some approaches:

  • Stop publishing (or at least stop building from) source tarballs that differ from version control sources.
  • Create recipes for how to derive the published source tarballs from version control sources. Verify that independently from upstream.

While I like the properties of the first solution, and have made effort to support that approach, I don’t think normal source tarballs are going away any time soon. I am concerned that it may not even be a desirable complete solution to this problem. We may need tarballs with pre-generated content in them for various reasons that aren’t entirely clear to us today.

So let’s consider the second approach. It could help while waiting for more experience with the first approach, to see if there are any fundamental problems with it.

How do you know that the XZ release tarballs was actually derived from its version control sources? The same for Gzip? Coreutils? Tar? Sed? Bash? GCC? We don’t know this! I am not aware of any automated or collaborative effort to perform this independent confirmation. Nor am I aware of anyone attempting to do this on a regular basis. We would want to be able to do this in the year 2042 too. I think the best way to reach that is to do the verification continuously in a pipeline, fixing bugs as time passes. The current state of the art seems to be that people audit the differences manually and hope to find something. I suspect many package maintainers ignore the problem and take the release source tarballs and trust upstream about this.

We can do better.

I have launched a project to setup a GitLab pipeline that invokes per-release scripts to rebuild that release artifact from git sources. Currently it only contain recipes for projects that I released myself. Releases which where done in a controlled way with considerable care to make reproducing the tarballs possible. The project homepage is here:

https://gitlab.com/debdistutils/verify-reproducible-releases

The project is able to reproduce the release tarballs for Libtasn1 v4.20.0, InetUtils v2.6, Libidn2 v2.3.8, Libidn v1.43, and GNU SASL v2.2.2. You can see this in a recent successful pipeline. All of those releases were prepared using Guix, and I’m hoping the Guix time-machine will make it possible to keep re-generating these tarballs for many years to come.

I spent some time trying to reproduce the current XZ release tarball for version 5.8.1. That would have been a nice example, wouldn’t it? First I had to somehow mimic upstream’s build environment. The XZ release tarball contains GNU Libtool files that are identified with version 2.5.4.1-baa1-dirty. I initially assumed this was due to the maintainer having installed libtool from git locally (after making some modifications) and made the XZ release using it. Later I learned that it may actually be coming from ArchLinux which ship with this particular libtool version. It seems weird for a distribution to use libtool built from a non-release tag, and furthermore applying patches to it, but things are what they are. I made some effort to setup an ArchLinux build environment, however the now-current Gettext version in ArchLinux seems to be more recent than the one that were used to prepare the XZ release. I don’t know enough ArchLinux to setup an environment corresponding to an earlier version of ArchLinux, which would be required to finish this. I gave up, maybe the XZ release wasn’t prepared on ArchLinux after all. Actually XZ became a good example for this writeup anyway: while you would think this should be trivial, the fact is that it isn’t! (There is another aspect here: fingerprinting the versions used to prepare release tarballs allows you to infer what kind of OS maintainers are using to make releases on, which is interesting on its own.)

I made some small attempts to reproduce the tarball for GNU Shepherd version 1.0.4 too, but I still haven’t managed to complete it.

Do you want a supply-chain challenge for the Easter weekend? Pick some well-known software and try to re-create the official release tarballs from the corresponding Git checkout. Is anyone able to reproduce anything these days? Bonus points for wrapping it up as a merge request to my project.

Happy Supply-Chain Security Hacking!

On Binary Distribution Rebuilds

I rebuilt (the top-50 popcon) Debian and Ubuntu packages, on amd64 and arm64, and compared the results a couple of months ago. Since then the Reproduce.Debian.net effort has been launched. Unlike my small experiment, that effort is a full-scale rebuild with more architectures. Their goal is to reproduce what is published in the Debian archive.

One differences between these two approaches are the build inputs: The Reproduce Debian effort use the same build inputs which were used to build the published packages. I’m using the latest version of published packages for the rebuild.

What does that difference imply? I believe reproduce.debian.net will be able to reproduce more of the packages in the archive. If you build a C program using one version of GCC you will get some binary output; and if you use a later GCC version you are likely to end up with a different binary output. This is a good thing: we want GCC to evolve and produce better output over time. However it means in order to reproduce the binaries we publish and use, we need to rebuild them using whatever build dependencies were used to prepare those binaries. The conclusion is that we need to use the old GCC to rebuild the program, and this appears to be the Reproduce.Debian.Net approach.

It would be a huge success if the Reproduce.Debian.net effort were to reach 100% reproducibility, and this seems to be within reach.

However I argue that we need go further than that. Being able to rebuild the packages reproducible using older binary packages only begs the question: can we rebuild those older packages? I fear attempting to do so ultimately leads to a need to rebuild 20+ year old packages, with a non-negligible amount of them being illegal to distribute or are unable to build anymore due to bit-rot. We won’t solve the Trusting Trust concern if our rebuild effort assumes some initial binary blob that we can no longer build from source code.

I’ve made an illustration of the effort I’m thinking of, to reach something that is stronger than reproducible rebuilds. I am calling this concept a Idempotent Rebuild, which is an old concept that I believe is the same as John Gilmore has described many years ago.

The illustration shows how the Debian main archive is used as input to rebuild another “stage #0” archive. This stage #0 archive can be compared with diffoscope to the main archive, and all differences are things that would be nice to resolve. The packages in the stage #0 archive is used to prepare a new container image with build tools, and the stage #0 archive is used as input to rebuild another version of itself, called the “stage #1” archive. The differences between stage #0 and stage #1 are also useful to analyse and resolve. This process can be repeated many times. I believe it would be a useful property if this process terminated at some point, where the stage #N archive was identical to the stage #N-1 archive. If this would happen, I label the output archive as an Idempotent Rebuild of the distribution.

How big is N today? The simplest assumption is that it is infinity. Any build timestamp embedded into binary packages will change on every iteration. This will cause the process to never terminate. Fixing embedded timestamps is something that the Reproduce.Debian.Net effort will also run into, and will have to resolve.

What other causes for differences could there be? It is easy to see that generally if some output is not deterministic, such as the sort order of assembler object code in binaries, then the output will be different. Trivial instances of this problem will be caught by the reproduce.debian.net effort as well.

Could there be higher order chains that lead to infinite N? It is easy to imagine the existence of these, but I don’t know how they would look like in practice.

An ideal would be if we could get down to N=1. Is that technically possible? Compare building GCC, it performs an initial stage 0 build using the system compiler to produce a stage 1 intermediate, which is used to build itself again to stage 2. Stage 1 and 2 is compared, and on success (identical binaries), the compilation succeeds. Here N=2. But this is performed using some unknown system compiler that is normally different from the GCC version being built. When rebuilding a binary distribution, you start with the same source versions. So it seems N=1 could be possible.

I’m unhappy to not be able to report any further technical progress now. The next step in this effort is to publish the stage #0 build artifacts in a repository, so they can be used to build stage #1. I already showed that stage #0 was around ~30% reproducible compared to the official binaries, but I didn’t save the artifacts in a reusable repository. Since the official binaries were not built using the latest versions, it is to be expected that the reproducibility number is low. But what happens at stage #1? The percentage should go up: we are now compare the rebuilds with an earlier rebuild, using the same build inputs. I’m eager to see this materialize, and hope to eventually make progress on this. However to build stage #1 I believe I need to rebuild a much larger number packages in stage #0, it could be roughly similar to the “build-essentials-depends” package set.

I believe the ultimate end goal of Idempotent Rebuilds is to be able to re-bootstrap a binary distribution like Debian from some other bootstrappable environment like Guix. In parallel to working on a achieving the 100% Idempotent Rebuild of Debian, we can setup a Guix environment that build Debian packages using Guix binaries. These builds ought to eventually converge to the same Debian binary packages, or there is something deeply problematic happening. This approach to re-bootstrap a binary distribution like Debian seems simpler than rebuilding all binaries going back to the beginning of time for that distribution.

What do you think?

PS. I fear that Debian main may have already went into a state where it is not able to rebuild itself at all anymore: the presence and assumption of non-free firmware and non-Debian signed binaries may have already corrupted the ability for Debian main to rebuild itself. To be able to complete the idempotent and bootstrapped rebuild of Debian, this needs to be worked out.