Building Debian in a GitLab Pipeline

After thinking about multi-stage Debian rebuilds I wanted to implement the idea. Recall my illustration:

Earlier I rebuilt all packages that make up the difference between Ubuntu and Trisquel. It turned out to be a 42% bit-by-bit identical similarity. To check the generality of my approach, I rebuilt the difference between Debian and Devuan too. That was the debdistreproduce project. It “only” had to orchestrate building up to around 500 packages for each distribution and per architecture.

Differential reproducible rebuilds doesn’t give you the full picture: it ignore the shared package between the distribution, which make up over 90% of the packages. So I felt a desire to do full archive rebuilds. The motivation is that in order to trust Trisquel binary packages, I need to trust Ubuntu binary packages (because that make up 90% of the Trisquel packages), and many of those Ubuntu binaries are derived from Debian source packages. How to approach all of this? Last year I created the debdistrebuild project, and did top-50 popcon package rebuilds of Debian bullseye, bookworm, trixie, and Ubuntu noble and jammy, on a mix of amd64 and arm64. The amount of reproducibility was lower. Primarily the differences were caused by using different build inputs.

Last year I spent (too much) time creating a mirror of snapshot.debian.org, to be able to have older packages available for use as build inputs. I have two copies hosted at different datacentres for reliability and archival safety. At the time, snapshot.d.o had serious rate-limiting making it pretty unusable for massive rebuild usage or even basic downloads. Watching the multi-month download complete last year had a meditating effect. The completion of my snapshot download co-incided with me realizing something about the nature of rebuilding packages. Let me below give a recap of the idempotent rebuilds idea, because it motivate my work to build all of Debian from a GitLab pipeline.

One purpose for my effort is to be able to trust the binaries that I use on my laptop. I believe that without building binaries from source code, there is no practically feasible way to trust binaries. To trust any binary you receive, you can de-assemble the bits and audit the assembler instructions for the CPU you will execute it on. Doing that on a OS-wide level this is unpractical. A more practical approach is to audit the source code, and then confirm that the binary is 100% bit-by-bit identical to one that you can build yourself (from the same source) on your own trusted toolchain. This is similar to a reproducible build.

My initial goal with debdistrebuild was to get to 100% bit-by-bit identical rebuilds, and then I would have trustworthy binaries. Or so I thought. This also appears to be the goal of reproduce.debian.net. They want to reproduce the official Debian binaries. That is a worthy and important goal. They achieve this by building packages using the build inputs that were used to build the binaries. The build inputs are earlier versions of Debian packages (not necessarily from any public Debian release), archived at snapshot.debian.org.

I realized that these rebuilds would be not be sufficient for me: it doesn’t solve the problem of how to trust the toolchain. Let’s assume the reproduce.debian.net effort succeeds and is able to 100% bit-by-bit identically reproduce the official Debian binaries. Which appears to be within reach. To have trusted binaries we would “only” have to audit the source code for the latest version of the packages AND audit the tool chain used. There is no escaping from auditing all the source code — that’s what I think we all would prefer to focus on, to be able to improve upstream source code.

The trouble is about auditing the tool chain. With the Reproduce.debian.net approach, that is a recursive problem back to really ancient Debian packages, some of them which may no longer build or work, or even be legally distributable. Auditing all those old packages is a LARGER effort than auditing all current packages! Doing auditing of old packages is of less use to making contributions: those releases are old, and chances are any improvements have already been implemented and released. Or that improvements are no longer applicable because the projects evolved since the earlier version.

See where this is going now? I reached the conclusion that reproducing official binaries using the same build inputs is not what I’m interested in. I want to be able to build the binaries that I use from source using a toolchain that I can also build from source. And preferably that all of this is using latest version of all packages, so that I can contribute and send patches for them, to improve matters.

The toolchain that Reproduce.Debian.Net is using is not trustworthy unless all those ancient packages are audited or rebuilt bit-by-bit identically, and I don’t see any practical way forward to achieve that goal. Nor have I seen anyone working on that problem. It is possible to do, though, but I think there are simpler ways to achieve the same goal.

My approach to reach trusted binaries on my laptop appears to be a three-step effort:

  • Encourage an idempotently rebuildable Debian archive, i.e., a Debian archive that can be 100% bit-by-bit identically rebuilt using Debian itself.
  • Construct a smaller number of binary *.deb packages based on Guix binaries that when used as build inputs (potentially iteratively) leads to 100% bit-by-bit identical packages as in step 1.
  • Encourage a freedom respecting distribution, similar to Trisquel, from this idempotently rebuildable Debian.

How to go about achieving this? Today’s Debian build architecture is something that lack transparency and end-user control. The build environment and signing keys are managed by, or influenced by, unidentified people following undocumented (or at least not public) security procedures, under unknown legal jurisdictions. I always wondered why none of the Debian-derivates have adopted a modern GitDevOps-style approach as a method to improve binary build transparency, maybe I missed some project?

If you want to contribute to some GitHub or GitLab project, you click the ‘Fork’ button and get a CI/CD pipeline running which rebuild artifacts for the project. This makes it easy for people to contribute, and you get good QA control because the entire chain up until its artifact release are produced and tested. At least in theory. Many projects are behind on this, but it seems like this is a useful goal for all projects. This is also liberating: all users are able to reproduce artifacts. There is no longer any magic involved in preparing release artifacts. As we’ve seen with many software supply-chain security incidents for the past years, where the “magic” is involved is a good place to introduce malicious code.

To allow me to continue with my experiment, I thought the simplest way forward was to setup a GitDevOps-centric and user-controllable way to build the entire Debian archive. Let me introduce the debdistbuild project.

Debdistbuild is a re-usable GitLab CI/CD pipeline, similar to the Salsa CI pipeline. It provide one “build” job definition and one “deploy” job definition. The pipeline can run on GitLab.org Shared Runners or you can set up your own runners, like my GitLab riscv64 runner setup. I have concerns about relying on GitLab (both as software and as a service), but my ideas are easy to transfer to some other GitDevSecOps setup such as Codeberg.org. Self-hosting GitLab, including self-hosted runners, is common today, and Debian rely increasingly on Salsa for this. All of the build infrastructure could be hosted on Salsa eventually.

The build job is simple. From within an official Debian container image build packages using dpkg-buildpackage essentially by invoking the following commands.

sed -i 's/ deb$/ deb deb-src/' /etc/apt/sources.list.d/*.sources
apt-get -o Acquire::Check-Valid-Until=false update
apt-get dist-upgrade -q -y
apt-get install -q -y --no-install-recommends build-essential fakeroot
env DEBIAN_FRONTEND=noninteractive \
    apt-get build-dep -y --only-source $PACKAGE=$VERSION
useradd -m build
DDB_BUILDDIR=/build/reproducible-path
chgrp build $DDB_BUILDDIR
chmod g+w $DDB_BUILDDIR
su build -c "apt-get source --only-source $PACKAGE=$VERSION" > ../$PACKAGE_$VERSION.build
cd $DDB_BUILDDIR
su build -c "dpkg-buildpackage"
cd ..
mkdir out
mv -v $(find $DDB_BUILDDIR -maxdepth 1 -type f) out/

The deploy job is also simple. It commit artifacts to a Git project using Git-LFS to handle large objects, essentially something like this:

if ! grep -q '^pool/**' .gitattributes; then
    git lfs track 'pool/**'
    git add .gitattributes
    git commit -m"Track pool/* with Git-LFS." .gitattributes
fi
POOLDIR=$(if test "$(echo "$PACKAGE" | cut -c1-3)" = "lib"; then C=4; else C=1; fi; echo "$DDB_PACKAGE" | cut -c1-$C)
mkdir -pv pool/main/$POOLDIR/
rm -rfv pool/main/$POOLDIR/$PACKAGE
mv -v out pool/main/$POOLDIR/$PACKAGE
git add pool
git commit -m"Add $PACKAGE." -m "$CI_JOB_URL" -m "$VERSION" -a
if test "${DDB_GIT_TOKEN:-}" = ""; then
    echo "SKIP: Skipping git push due to missing DDB_GIT_TOKEN (see README)."
else
    git push -o ci.skip
fi

That’s it! The actual implementation is a bit longer, but the major difference is for log and error handling.

You may review the source code of the base Debdistbuild pipeline definition, the base Debdistbuild script and the rc.d/-style scripts implementing the build.d/ process and the deploy.d/ commands.

There was one complication related to artifact size. GitLab.org job artifacts are limited to 1GB. Several packages in Debian produce artifacts larger than this. What to do? GitLab supports up to 5GB for files stored in its package registry, but this limit is too close for my comfort, having seen some multi-GB artifacts already. I made the build job optionally upload artifacts to a S3 bucket using SHA256 hashed file hierarchy. I’m using Hetzner Object Storage but there are many S3 providers around, including self-hosting options. This hierarchy is compatible with the Git-LFS .git/lfs/object/ hierarchy, and it is easy to setup a separate Git-LFS object URL to allow Git-LFS object downloads from the S3 bucket. In this mode, only Git-LFS stubs are pushed to the git repository. It should have no trouble handling the large number of files, since I have earlier experience with Apt mirrors in Git-LFS.

To speed up job execution, and to guarantee a stable build environment, instead of installing build-essential packages on every build job execution, I prepare some build container images. The project responsible for this is tentatively called stage-N-containers. Right now it create containers suitable for rolling builds of trixie on amd64, arm64, and riscv64, and a container intended for as use the stage-0 based on the 20250407 docker images of bookworm on amd64 and arm64 using the snapshot.d.o 20250407 archive. Or actually, I’m using snapshot-cloudflare.d.o because of download speed and reliability. I would have prefered to use my own snapshot mirror with Hetzner bandwidth, alas the Debian snapshot team have concerns about me publishing the list of (SHA1 hash) filenames publicly and I haven’t been bothered to set up non-public access.

Debdistbuild has built around 2.500 packages for bookworm on amd64 and bookworm on arm64. To confirm the generality of my approach, it also build trixie on amd64, trixie on arm64 and trixie on riscv64. The riscv64 builds are all on my own hosted runners. For amd64 and arm64 my own runners are only used for large packages where the GitLab.com shared runners run into the 3 hour time limit.

What’s next in this venture? Some ideas include:

  • Optimize the stage-N build process by identifying the transitive closure of build dependencies from some initial set of packages.
  • Create a build orchestrator that launches pipelines based on the previous list of packages, as necessary to fill the archive with necessary packages. Currently I’m using a basic /bin/sh for loop around curl to trigger GitLab CI/CD pipelines with names derived from https://popcon.debian.org/.
  • Create and publish a dists/ sub-directory, so that it is possible to use the newly built packages in the stage-1 build phase.
  • Produce diffoscope-style differences of built packages, both stage0 against official binaries and between stage0 and stage1.
  • Create the stage-1 build containers and stage-1 archive.
  • Review build failures. On amd64 and arm64 the list is small (below 10 out of ~5000 builds), but on riscv64 there is some icache-related problem that affects Java JVM that triggers build failures.
  • Provide GitLab pipeline based builds of the Debian docker container images, cloud-images, debian-live CD and debian-installer ISO’s.
  • Provide integration with Sigstore and Sigsum for signing of Debian binaries with transparency-safe properties.
  • Implement a simple replacement for dpkg and apt using /bin/sh for use during bootstrapping when neither packaging tools are available.

What do you think?

Verified Reproducible Tarballs

Remember the XZ Utils backdoor? One factor that enabled the attack was poor auditing of the release tarballs for differences compared to the Git version controlled source code. This proved to be a useful place to distribute malicious data.

The differences between release tarballs and upstream Git sources is typically vendored and generated files. Lots of them. Auditing all source tarballs in a distribution for similar issues is hard and boring work for humans. Wouldn’t it be better if that human auditing time could be spent auditing the actual source code stored in upstream version control instead? That’s where auditing time would help the most.

Are there better ways to address the concern about differences between version control sources and tarball artifacts? Let’s consider some approaches:

  • Stop publishing (or at least stop building from) source tarballs that differ from version control sources.
  • Create recipes for how to derive the published source tarballs from version control sources. Verify that independently from upstream.

While I like the properties of the first solution, and have made effort to support that approach, I don’t think normal source tarballs are going away any time soon. I am concerned that it may not even be a desirable complete solution to this problem. We may need tarballs with pre-generated content in them for various reasons that aren’t entirely clear to us today.

So let’s consider the second approach. It could help while waiting for more experience with the first approach, to see if there are any fundamental problems with it.

How do you know that the XZ release tarballs was actually derived from its version control sources? The same for Gzip? Coreutils? Tar? Sed? Bash? GCC? We don’t know this! I am not aware of any automated or collaborative effort to perform this independent confirmation. Nor am I aware of anyone attempting to do this on a regular basis. We would want to be able to do this in the year 2042 too. I think the best way to reach that is to do the verification continuously in a pipeline, fixing bugs as time passes. The current state of the art seems to be that people audit the differences manually and hope to find something. I suspect many package maintainers ignore the problem and take the release source tarballs and trust upstream about this.

We can do better.

I have launched a project to setup a GitLab pipeline that invokes per-release scripts to rebuild that release artifact from git sources. Currently it only contain recipes for projects that I released myself. Releases which where done in a controlled way with considerable care to make reproducing the tarballs possible. The project homepage is here:

https://gitlab.com/debdistutils/verify-reproducible-releases

The project is able to reproduce the release tarballs for Libtasn1 v4.20.0, InetUtils v2.6, Libidn2 v2.3.8, Libidn v1.43, and GNU SASL v2.2.2. You can see this in a recent successful pipeline. All of those releases were prepared using Guix, and I’m hoping the Guix time-machine will make it possible to keep re-generating these tarballs for many years to come.

I spent some time trying to reproduce the current XZ release tarball for version 5.8.1. That would have been a nice example, wouldn’t it? First I had to somehow mimic upstream’s build environment. The XZ release tarball contains GNU Libtool files that are identified with version 2.5.4.1-baa1-dirty. I initially assumed this was due to the maintainer having installed libtool from git locally (after making some modifications) and made the XZ release using it. Later I learned that it may actually be coming from ArchLinux which ship with this particular libtool version. It seems weird for a distribution to use libtool built from a non-release tag, and furthermore applying patches to it, but things are what they are. I made some effort to setup an ArchLinux build environment, however the now-current Gettext version in ArchLinux seems to be more recent than the one that were used to prepare the XZ release. I don’t know enough ArchLinux to setup an environment corresponding to an earlier version of ArchLinux, which would be required to finish this. I gave up, maybe the XZ release wasn’t prepared on ArchLinux after all. Actually XZ became a good example for this writeup anyway: while you would think this should be trivial, the fact is that it isn’t! (There is another aspect here: fingerprinting the versions used to prepare release tarballs allows you to infer what kind of OS maintainers are using to make releases on, which is interesting on its own.)

I made some small attempts to reproduce the tarball for GNU Shepherd version 1.0.4 too, but I still haven’t managed to complete it.

Do you want a supply-chain challenge for the Easter weekend? Pick some well-known software and try to re-create the official release tarballs from the corresponding Git checkout. Is anyone able to reproduce anything these days? Bonus points for wrapping it up as a merge request to my project.

Happy Supply-Chain Security Hacking!

Reproducible and minimal source-only tarballs

With the release of Libntlm version 1.8 the release tarball can be reproduced on several distributions. We also publish a signed minimal source-only tarball, produced by git-archive which is the same format used by Savannah, Codeberg, GitLab, GitHub and others. Reproducibility of both tarballs are tested continuously for regressions on GitLab through a CI/CD pipeline. If that wasn’t enough to excite you, the Debian packages of Libntlm are now built from the reproducible minimal source-only tarball. The resulting binaries are reproducible on several architectures.

What does that even mean? Why should you care? How you can do the same for your project? What are the open issues? Read on, dear reader…

This article describes my practical experiments with reproducible release artifacts, following up on my earlier thoughts that lead to discussion on Fosstodon and a patch by Janneke Nieuwenhuizen to make Guix tarballs reproducible that inspired me to some practical work.

Let’s look at how a maintainer release some software, and how a user can reproduce the released artifacts from the source code. Libntlm provides a shared library written in C and uses GNU Make, GNU Autoconf, GNU Automake, GNU Libtool and gnulib for build management, but these ideas should apply to most project and build system. The following illustrate the steps a maintainer would take to prepare a release:

git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make distcheck
gpg -b libntlm-1.8.tar.gz

The generated files libntlm-1.8.tar.gz and libntlm-1.8.tar.gz.sig are published, and users download and use them. This is how the GNU project have been doing releases since the late 1980’s. That is a testament to how successful this pattern has been! These tarballs contain source code and some generated files, typically shell scripts generated by autoconf, makefile templates generated by automake, documentation in formats like Info, HTML, or PDF. Rarely do they contain binary object code, but historically that happened.

The XZUtils incident illustrate that tarballs with files that are not included in the git archive offer an opportunity to disguise malicious backdoors. I blogged earlier how to mitigate this risk by using signed minimal source-only tarballs.

The risk of hiding malware is not the only motivation to publish signed minimal source-only tarballs. With pre-generated content in tarballs, there is a risk that GNU/Linux distributions such as Trisquel, Guix, Debian/Ubuntu or Fedora ship generated files coming from the tarball into the binary *.deb or *.rpm package file. Typically the person packaging the upstream project never realized that some installed artifacts was not re-built through a typical autoconf -fi && ./configure && make install sequence, and never wrote the code to rebuild everything. This can also happen if the build rules are written but are buggy, shipping the old artifact. When a security problem is found, this can lead to time-consuming situations, as it may be that patching the relevant source code and rebuilding the package is not sufficient: the vulnerable generated object from the tarball would be shipped into the binary package instead of a rebuilt artifact. For architecture-specific binaries this rarely happens, since object code is usually not included in tarballs — although for 10+ years I shipped the binary Java JAR file in the GNU Libidn release tarball, until I stopped shipping it. For interpreted languages and especially for generated content such as HTML, PDF, shell scripts this happens more than you would like.

Publishing minimal source-only tarballs enable easier auditing of a project’s code, to avoid the need to read through all generated files looking for malicious content. I have taken care to generate the source-only minimal tarball using git-archive. This is the same format that GitLab, GitHub etc offer for the automated download links on git tags. The minimal source-only tarballs can thus serve as a way to audit GitLab and GitHub download material! Consider if/when hosting sites like GitLab or GitHub has a security incident that cause generated tarballs to include a backdoor that is not present in the git repository. If people rely on the tag download artifact without verifying the maintainer PGP signature using GnuPG, this can lead to similar backdoor scenarios that we had for XZUtils but originated with the hosting provider instead of the release manager. This is even more concerning, since this attack can be mounted for some selected IP address that you want to target and not on everyone, thereby making it harder to discover.

With all that discussion and rationale out of the way, let’s return to the release process. I have added another step here:

make srcdist
gpg -b libntlm-1.8-src.tar.gz

Now the release is ready. I publish these four files in the Libntlm’s Savannah Download area, but they can be uploaded to a GitLab/GitHub release area as well. These are the SHA256 checksums I got after building the tarballs on my Trisquel 11 aramo laptop:

91de864224913b9493c7a6cec2890e6eded3610d34c3d983132823de348ec2ca  libntlm-1.8-src.tar.gz
ce6569a47a21173ba69c990965f73eb82d9a093eb871f935ab64ee13df47fda1  libntlm-1.8.tar.gz

So how can you reproduce my artifacts? Here is how to reproduce them in a Ubuntu 22.04 container:

podman run -it --rm ubuntu:22.04
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make dist srcdist
sha256sum libntlm-*.tar.gz

You should see the exact same SHA256 checksum values. Hooray!

This works because Trisquel 11 and Ubuntu 22.04 uses the same version of git, autoconf, automake, and libtool. These tools do not guarantee the same output content for all versions, similar to how GNU GCC does not generate the same binary output for all versions. So there is still some delicate version pairing needed.

Ideally, the artifacts should be possible to reproduce from the release artifacts themselves, and not only directly from git. It is possible to reproduce the full tarball in a AlmaLinux 8 container – replace almalinux:8 with rockylinux:8 if you prefer RockyLinux:

podman run -it --rm almalinux:8
dnf update -y
dnf install -y make wget gcc
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8.tar.gz
tar xfa libntlm-1.8.tar.gz
cd libntlm-1.8
./configure
make dist
sha256sum libntlm-1.8.tar.gz

The source-only minimal tarball can be regenerated on Debian 11:

podman run -it --rm debian:11
apt-get update
apt-get install -y --no-install-recommends make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
make -f cfg.mk srcdist
sha256sum libntlm-1.8-src.tar.gz 

As the Magnus Opus or chef-d’œuvre, let’s recreate the full tarball directly from the minimal source-only tarball on Trisquel 11 – replace docker.io/kpengboy/trisquel:11.0 with ubuntu:22.04 if you prefer.

podman run -it --rm docker.io/kpengboy/trisquel:11.0
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make wget git ca-certificates
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8-src.tar.gz
tar xfa libntlm-1.8-src.tar.gz
cd libntlm-v1.8
./bootstrap
./configure
make dist
sha256sum libntlm-1.8.tar.gz

Yay! You should now have great confidence in that the release artifacts correspond to what’s in version control and also to what the maintainer intended to release. Your remaining job is to audit the source code for vulnerabilities, including the source code of the dependencies used in the build. You no longer have to worry about auditing the release artifacts.

I find it somewhat amusing that the build infrastructure for Libntlm is now in a significantly better place than the code itself. Libntlm is written in old C style with plenty of string manipulation and uses broken cryptographic algorithms such as MD4 and single-DES. Remember folks: solving supply chain security issues has no bearing on what kind of code you eventually run. A clean gun can still shoot you in the foot.

Side note on naming: GitLab exports tarballs with pathnames libntlm-v1.8/ (i.e.., PROJECT-TAG/) and I’ve adopted the same pathnames, which means my libntlm-1.8-src.tar.gz tarballs are bit-by-bit identical to GitLab’s exports and you can verify this with tools like diffoscope. GitLab name the tarball libntlm-v1.8.tar.gz (i.e., PROJECT-TAG.ARCHIVE) which I find too similar to the libntlm-1.8.tar.gz that we also publish. GitHub uses the same git archive style, but unfortunately they have logic that removes the ‘v’ in the pathname so you will get a tarball with pathname libntlm-1.8/ instead of libntlm-v1.8/ that GitLab and I use. The content of the tarball is bit-by-bit identical, but the pathname and archive differs. Codeberg (running Forgejo) uses another approach: the tarball is called libntlm-v1.8.tar.gz (after the tag) just like GitLab, but the pathname inside the archive is libntlm/, otherwise the produced archive is bit-by-bit identical including timestamps. Savannah’s CGIT interface uses archive name libntlm-1.8.tar.gz with pathname libntlm-1.8/, but otherwise file content is identical. Savannah’s GitWeb interface provides snapshot links that are named after the git commit (e.g., libntlm-a812c2ca.tar.gz with libntlm-a812c2ca/) and I cannot find any tag-based download links at all. Overall, we are so close to get SHA256 checksum to match, but fail on pathname within the archive. I’ve chosen to be compatible with GitLab regarding the content of tarballs but not on archive naming. From a simplicity point of view, it would be nice if everyone used PROJECT-TAG.ARCHIVE for the archive filename and PROJECT-TAG/ for the pathname within the archive. This aspect will probably need more discussion.

Side note on git archive output: It seems different versions of git archive produce different results for the same repository. The version of git in Debian 11, Trisquel 11 and Ubuntu 22.04 behave the same. The version of git in Debian 12, AlmaLinux/RockyLinux 8/9, Alpine, ArchLinux, macOS homebrew, and upcoming Ubuntu 24.04 behave in another way. Hopefully this will not change that often, but this would invalidate reproducibility of these tarballs in the future, forcing you to use an old git release to reproduce the source-only tarball. Alas, GitLab and most other sites appears to be using modern git so the download tarballs from them would not match my tarballs – even though the content would.

Side note on ChangeLog: ChangeLog files were traditionally manually curated files with version history for a package. In recent years, several projects moved to dynamically generate them from git history (using tools like git2cl or gitlog-to-changelog). This has consequences for reproducibility of tarballs: you need to have the entire git history available! The gitlog-to-changelog tool also output different outputs depending on the time zone of the person using it, which arguable is a simple bug that can be fixed. However this entire approach is incompatible with rebuilding the full tarball from the minimal source-only tarball. It seems Libntlm’s ChangeLog file died on the surgery table here.

So how would a distribution build these minimal source-only tarballs? I happen to help on the libntlm package in Debian. It has historically used the generated tarballs as the source code to build from. This means that code coming from gnulib is vendored in the tarball. When a security problem is discovered in gnulib code, the security team needs to patch all packages that include that vendored code and rebuild them, instead of merely patching the gnulib package and rebuild all packages that rely on that particular code. To change this, the Debian libntlm package needs to Build-Depends on Debian’s gnulib package. But there was one problem: similar to most projects that use gnulib, Libntlm depend on a particular git commit of gnulib, and Debian only ship one commit. There is no coordination about which commit to use. I have adopted gnulib in Debian, and add a git bundle to the *_all.deb binary package so that projects that rely on gnulib can pick whatever commit they need. This allow an no-network GNULIB_URL and GNULIB_REVISION approach when running Libntlm’s ./bootstrap with the Debian gnulib package installed. Otherwise libntlm would pick up whatever latest version of gnulib that Debian happened to have in the gnulib package, which is not what the Libntlm maintainer intended to be used, and can lead to all sorts of version mismatches (and consequently security problems) over time. Libntlm in Debian is developed and tested on Salsa and there is continuous integration testing of it as well, thanks to the Salsa CI team.

Side note on git bundles: unfortunately there appears to be no reproducible way to export a git repository into one or more files. So one unfortunate consequence of all this work is that the gnulib *.orig.tar.gz tarball in Debian is not reproducible any more. I have tried to get Git bundles to be reproducible but I never got it to work — see my notes in gnulib’s debian/README.source on this aspect. Of course, source tarball reproducibility has nothing to do with binary reproducibility of gnulib in Debian itself, fortunately.

One open question is how to deal with the increased build dependencies that is triggered by this approach. Some people are surprised by this but I don’t see how to get around it: if you depend on source code for tools in another package to build your package, it is a bad idea to hide that dependency. We’ve done it for a long time through vendored code in non-minimal tarballs. Libntlm isn’t the most critical project from a bootstrapping perspective, so adding git and gnulib as Build-Depends to it will probably be fine. However, consider if this pattern was used for other packages that uses gnulib such as coreutils, gzip, tar, bison etc (all are using gnulib) then they would all Build-Depends on git and gnulib. Cross-building those packages for a new architecture will therefor require git on that architecture first, which gets circular quick. The dependency on gnulib is real so I don’t see that going away, and gnulib is a Architecture:all package. However, the dependency on git is merely a consequence of how the Debian gnulib package chose to make all gnulib git commits available to projects: through a git bundle. There are other ways to do this that doesn’t require the git tool to extract the necessary files, but none that I found practical — ideas welcome!

Finally some brief notes on how this was implemented. Enabling bootstrappable source-only minimal tarballs via gnulib’s ./bootstrap is achieved by using the GNULIB_REVISION mechanism, locking down the gnulib commit used. I have always disliked git submodules because they add extra steps and has complicated interaction with CI/CD. The reason why I gave up git submodules now is because the particular commit to use is not recorded in the git archive output when git submodules is used. So the particular gnulib commit has to be mentioned explicitly in some source code that goes into the git archive tarball. Colin Watson added the GNULIB_REVISION approach to ./bootstrap back in 2018, and now it no longer made sense to continue to use a gnulib git submodule. One alternative is to use ./bootstrap with --gnulib-srcdir or --gnulib-refdir if there is some practical problem with the GNULIB_URL towards a git bundle the GNULIB_REVISION in bootstrap.conf.

The srcdist make rule is simple:

git archive --prefix=libntlm-v1.8/ -o libntlm-1.8-src.tar.gz HEAD

Making the make dist generated tarball reproducible can be more complicated, however for Libntlm it was sufficient to make sure the modification times of all files were set deterministically to the timestamp of the last commit in the git repository. Interestingly there seems to be a couple of different ways to accomplish this, Guix doesn’t support minimal source-only tarballs but rely on a .tarball-timestamp file inside the tarball. Paul Eggert explained what TZDB is using some time ago. The approach I’m using now is fairly similar to the one I suggested over a year ago. If there are problems because all files in the tarball now use the same modification time, there is a solution by Bruno Haible that could be implemented.

Side note on git tags: Some people may wonder why not verify a signed git tag instead of verifying a signed tarball of the git archive. Currently most git repositories uses SHA-1 for git commit identities, but SHA-1 is not a secure hash function. While current SHA-1 attacks can be detected and mitigated, there are fundamental doubts that a git SHA-1 commit identity uniquely refers to the same content that was intended. Verifying a git tag will never offer the same assurance, since a git tag can be moved or re-signed at any time. Verifying a git commit is better but then we need to trust SHA-1. Migrating git to SHA-256 would resolve this aspect, but most hosting sites such as GitLab and GitHub does not support this yet. There are other advantages to using signed tarballs instead of signed git commits or git tags as well, e.g., tar.gz can be a deterministically reproducible persistent stable offline storage format but .git sub-directory trees or git bundles do not offer this property.

Doing continous testing of all this is critical to make sure things don’t regress. Libntlm’s pipeline definition now produce the generated libntlm-*.tar.gz tarballs and a checksum as a build artifact. Then I added the 000-reproducability job which compares the checksums and fails on mismatches. You can read its delicate output in the job for the v1.8 release. Right now we insists that builds on Trisquel 11 match Ubuntu 22.04, that PureOS 10 builds match Debian 11 builds, that AlmaLinux 8 builds match RockyLinux 8 builds, and AlmaLinux 9 builds match RockyLinux 9 builds. As you can see in pipeline job output, not all platforms lead to the same tarballs, but hopefully this state can be improved over time. There is also partial reproducibility, where the full tarball is reproducible across two distributions but not the minimal tarball, or vice versa.

If this way of working plays out well, I hope to implement it in other projects too.

What do you think? Happy Hacking!

Towards reproducible minimal source code tarballs? On *-src.tar.gz

While the work to analyze the xz backdoor is in progress, several ideas have been suggested to improve the software supply chain ecosystem. Some of those ideas are good, some of the ideas are at best irrelevant and harmless, and some suggestions are plain bad. I’d like to attempt to formalize two ideas, which have been discussed before, but the context in which they can be appreciated have not been as clear as it is today.

  1. Reproducible tarballs. The idea is that published source tarballs should be possible to reproduce independently somehow, and that this should be continuously tested and verified — preferrably as part of the upstream project continuous integration system (e.g., GitHub action or GitLab pipeline). While nominally this looks easy to achieve, there are some complex matters in this, for example: what timestamps to use for files in the tarball? I’ve brought up this aspect before.
  2. Minimal source tarballs without generated vendor files. Most GNU Autoconf/Automake-based tarballs pre-generated files which are important for bootstrapping on exotic systems that does not have the required dependencies. For the bootstrapping story to succeed, this approach is important to support. However it has become clear that this practice raise significant costs and risks. Most modern GNU/Linux distributions have all the required dependencies and actually prefers to re-build everything from source code. These pre-generated extra files introduce uncertainty to that process.

My strawman proposal to improve things is to define new tarball format *-src.tar.gz with at least the following properties:

  1. The tarball should allow users to build the project, which is the entire purpose of all this. This means that at least all source code for the project has to be included.
  2. The tarballs should be signed, for example with PGP or minisign.
  3. The tarball should be possible to reproduce bit-by-bit by a third party using upstream’s version controlled sources and a pointer to which revision was used (e.g., git tag or git commit).
  4. The tarball should not require an Internet connection to download things.
    • Corollary: every external dependency either has to be explicitly documented as such (e.g., gcc and GnuTLS), or included in the tarball.
    • Observation: This means including all *.po gettext translations which are normally downloaded when building from version controlled sources.
  5. The tarball should contain everything required to build the project from source using as much externally released versioned tooling as possible. This is the “minimal” property lacking today.
    • Corollary: This means including a vendored copy of OpenSSL or libz is not acceptable: link to them as external projects.
    • Open question: How about non-released external tooling such as gnulib or autoconf archive macros? This is a bit more delicate: most distributions either just package one current version of gnulib or autoconf archive, not previous versions. While this could change, and distributions could package the gnulib git repository (up to some current version) and the autoconf archive git repository — and packages were set up to extract the version they need (gnulib’s ./bootstrap already supports this via the –gnulib-refdir parameter), this is not normally in place.
    • Suggested Corollary: The tarball should contain content from git submodule’s such as gnulib and the necessary Autoconf archive M4 macros required by the project.
  6. Similar to how the GNU project specify the ./configure interface we need a documented interface for how to bootstrap the project. I suggest to use the already well established idiom of running ./bootstrap to set up the package to later be able to be built via ./configure. Of course, some projects are not using the autotool ./configure interface and will not follow this aspect either, but like most build systems that compete with autotools have instructions on how to build the project, they should document similar interfaces for bootstrapping the source tarball to allow building.

If tarballs that achieve the above goals were available from popular upstream projects, distributions could more easily use them instead of current tarballs that include pre-generated content. The advantage would be that the build process is not tainted by “unnecessary” files. We need to develop tools for maintainers to create these tarballs, similar to make dist that generate today’s foo-1.2.3.tar.gz files.

I think one common argument against this approach will be: Why bother with all that, and just use git-archive outputs? Or avoid the entire tarball approach and move directly towards version controlled check outs and referring to upstream releases as git URL and commit tag or id. One problem with this is that SHA-1 is broken, so placing trust in a SHA-1 identifier is simply not secure. Another counter-argument is that this optimize for packagers’ benefits at the cost of upstream maintainers: most upstream maintainers do not want to store gettext *.po translations in their source code repository. A compromise between the needs of maintainers and packagers is useful, so this *-src.tar.gz tarball approach is the indirection we need to solve that. Update: In my experiment with source-only tarballs for Libntlm I actually did use git-archive output.

What do you think?

A Security Device Threat Model: The Substitution Attack

I’d like to describe and discuss a threat model for computational devices. This is generic but we will narrow it down to security-related devices. For example, portable hardware dongles used for OpenPGP/OpenSSH keys, FIDO/U2F, OATH HOTP/TOTP, PIV, payment cards, wallets etc and more permanently attached devices like a Hardware Security Module (HSM), a TPM-chip, or the hybrid variant of a mostly permanently-inserted but removable hardware security dongles.

Our context is cryptographic hardware engineering, and the purpose of the threat model is to serve as as a thought experiment for how to build and design security devices that offer better protection. The threat model is related to the Evil maid attack.

Our focus is to improve security for the end-user rather than the traditional focus to improve security for the organization that provides the token to the end-user, or to improve security for the site that the end-user is authenticating to. This is a critical but often under-appreciated distinction, and leads to surprising recommendations related to onboard key generation, randomness etc below.

The Substitution Attack

An attacker is able to substitute any component of the device (hardware or software) at any time for any period of time.

Your takeaway should be that devices should be designed to mitigate harmful consequences if any component of the device (hardware or software) is substituted for a malicious component for some period of time, at any time, during the lifespan of that component. Some designs protect better against this attack than other designs, and the threat model can be used to understand which designs are really bad, and which are less so.

Terminology

The threat model involves at least one device that is well-behaving and one that is not, and we call these Good Device and Bad Device respectively. The bad device may be the same physical device as the good key, but with some minor software modification or a minor component replaced, but could also be a completely separate physical device. We don’t care about that distinction, we just care if a particular device has a malicious component in it or not. I’ll use terms like “security device”, “device”, “hardware key”, “security co-processor” etc interchangeably.

From an engineering point of view, “malicious” here includes “unintentional behavior” such as software or hardware bugs. It is not possible to differentiate an intentionally malicious device from a well-designed device with a critical bug.

Don’t attribute to malice what can be adequately explained by stupidity, but don’t naïvely attribute to stupidity what may be deniable malicious.

What is “some period of time”?

“Some period of time” can be any length of time: seconds, minutes, days, weeks, etc.

It may also occur at any time: During manufacturing, during transportation to the user, after first usage by the user, or after a couple of months usage by the user. Note that we intentionally consider time-of-manufacturing as a vulnerable phase.

Even further, the substitution may occur multiple times. So the Good Key may be replaced with a Bad Key by the attacker for one day, then returned, and later this repeats a month later.

What is “harmful consequences”?

Since a security key has a fairly well-confined scope and purpose, we can get a fairly good exhaustive list of things that could go wrong. Harmful consequences include:

  • Attacker learns any secret keys stored on a Good Key.
  • Attacker causes user to trust a public generated by a Bad Key.
  • Attacker is able to sign something using a Good Key.
  • Attacker learns the PIN code used to unlock a Good Key.
  • Attacker learns data that is decrypted by a Good Key.

Thin vs Deep solutions

One approach to mitigate many issues arising from device substitution is to have the host (or remote site) require that the device prove that it is the intended unique device before it continues to talk to it. This require an authentication/authorization protocol, which usually involves unique device identity and out-of-band trust anchors. Such trust anchors is often problematic, since a common use-case for security device is to connect it to a host that has never seen the device before.

A weaker approach is to have the device prove that it merely belongs to a class of genuine devices from a trusted manufacturer, usually by providing a signature generated by a device-specific private key signed by the device manufacturer. This is weaker since then the user cannot differentiate two different good devices.

In both cases, the host (or remote site) would stop talking to the device if it cannot prove that it is the intended key, or at least belongs to a class of known trusted genuine devices.

Upon scrutiny, this “solution” is still vulnerable to a substitution attack, just earlier in the manufacturing chain: how can the process that injects the per-device or per-class identities/secrets know that it is putting them into a good key rather than a malicious device? Consider also the consequences if the cryptographic keys that guarantee that a device is genuine leaks.

The model of the “thin solution” is similar to the old approach to network firewalls: have a filtering firewall that only lets through “intended” traffic, and then run completely insecure protocols internally such as telnet.

The networking world has evolved, and now we have defense in depth: even within strongly firewall’ed networks, it is prudent to run for example SSH with publickey-based user authentication even on locally physical trusted networks. This approach requires more thought and adds complexity, since each level has to provide some security checking.

I’m arguing we need similar defense-in-depth for security devices. Security key designs cannot simply dodge this problem by assuming it is working in a friendly environment where component substitution never occur.

Example: Device authentication using PIN codes

To see how this threat model can be applied to reason about security key designs, let’s consider a common design.

Many security keys uses PIN codes to unlock private key operations, for example on OpenPGP cards that lacks built-in PIN-entry functionality. The software on the computer just sends a PIN code to the device, and the device allows private-key operations if the PIN code was correct.

Let’s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious device that saves a copy of the PIN code presented to it, and then gives out error messages. Once the user has entered the PIN code and gotten an error message, presumably temporarily giving up and doing other things, the attacker replaces the device back again. The attacker has learnt the PIN code, and can later use this to perform private-key operations on the good device.

This means a good design involves not sending PIN codes in clear, but use a stronger authentication protocol that allows the card to know that the PIN was correct without learning the PIN. This is implemented optionally for many OpenPGP cards today as the key-derivation-function extension. That should be mandatory, and users should not use setups that sends device authentication in the clear, and ultimately security devices should not even include support for that. Compare how I build Gnuk on my PGP card with the kdf_do=required option.

Example: Onboard non-predictable key-generation

Many devices offer both onboard key-generation, for example OpenPGP cards that generate a Ed25519 key internally on the devices, or externally where the device imports an externally generated cryptographic key.

Let’s apply the subsitution threat model to this design: the user wishes to generate a key and trust the public key that came out of that process. The attacker substitutes the device for a malicious device during key-generation, imports the private key into a good device and gives that back to the user. Most of the time except during key generation the user uses a good device but still the attacker succeeded in having the user trust a public key which the attacker knows the private key for. The substitution may be a software modification, and the method to leak the private key to the attacker may be out-of-band signalling.

This means a good design never generates key on-board, but imports them from a user-controllable environment. That approach should be mandatory, and users should not use setups that generates private keys on-board, and ultimately security devices should not even include support for that.

Example: Non-predictable randomness-generation

Many devices claims to generate random data, often with elaborate design documents explaining how good the randomness is.

Let’s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious design that generates (for the attacker) predictable randomness. The user will never be able to detect the difference since the random output is, well, random, and typically not distinguishable from weak randomness. The user cannot know if any cryptographic keys generated by a generator was faulty or not.

This means a good design never generates non-predictable randomness on the device. That approach should be mandatory, and users should not use setups that generates non-predictable randomness on the device, and ideally devices should not have this functionality.

Case-Study: Tillitis

I have warmed up a bit for this. Tillitis is a new security device with interesting properties, and core to its operation is the Compound Device Identifier (CDI), essentially your Ed25519 private key (used for SSH etc) is derived from the CDI that is computed like this:

cdi = blake2s(UDS, blake2s(device_app), USS)

Let’s apply the substitution threat model to this design: Consider someone replacing the Tillitis key with a malicious key during postal delivery of the key to the user, and the replacement device is identical with the real Tillitis key but implements the following key derivation function:

cdi = weakprng(UDS’, weakprng(device_app), USS)

Where weakprng is a compromised algorithm that is predictable for the attacker, but still appears random. Everything will work correctly, but the attacker will be able to learn the secrets used by the user, and the user will typically not be able to tell the difference since the CDI is secret and the Ed25519 public key is not self-verifiable.

Conclusion

Remember that it is impossible to fully protect against this attack, that’s why it is merely a thought experiment, intended to be used during design of these devices. Consider an attacker that never gives you access to a good key and as a user you only ever use a malicious device. There is no way to have good security in this situation. This is not hypothetical, many well-funded organizations do what they can to derive people from having access to trustworthy security devices. Philosophically it does not seem possible to tell if these organizations have succeeded 100% already and there are only bad security devices around where further resistance is futile, but to end on an optimistic note let’s assume that there is a non-negligible chance that they haven’t succeeded. In these situations, this threat model becomes useful to improve the situation by identifying less good designs, and that’s why the design mantra of “mitigate harmful consequences” is crucial as a takeaway from this. Let’s improve the design of security devices that further the security of its users!