Trisquel is 42% Reproducible!

The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.

tl;dr: go to reproduce-trisquel.

When I set about to understand how Trisquel worked, I identified a number of things that would improve my confidence in it. The lowest hanging fruit for me was to manually audit the package archive, and I wrote a tool called debdistdiff to automate this for me. That led me to think about apt archive transparency more in general. I have made some further work in that area (hint: apt-verify) that deserve its own blog post eventually. Most of apt archive transparency is futile if we don’t trust the intended packages that are in the archive. One way to measurable increase trust in the package are to provide reproducible builds of the packages, which should by now be an established best practice. Code review is still important, but since it will never provide positive guarantees we need other processes that can identify sub-optimal situations automatically. The way reproducible builds easily identify negative results is what I believe has driven much of its success: its results are tangible and measurable. The field of software engineering is in need of more such practices.

The design of my setup to build Trisquel reproducible are as follows.

  • The project debdistget is responsible for downloading Release/Packages files (which are the most relevant files from dists/) from apt archives, and works by commiting them into GitLab-hosted git-repositories. I maintain several such repositories for popular apt-archives, including for Trisquel and its upstream Ubuntu. GitLab invokes a schedule pipeline to do the downloading, and there is some race conditions here.
  • The project debdistdiff is used to produce the list of added and modified packages, which are the input to actually being able to know what packages to reproduce. It publishes human readable summary of difference for several distributions, including Trisquel vs Ubuntu. Early on I decided that rebuilding all of the upstream Ubuntu packages is out of scope for me: my personal trust in the official Debian/Ubuntu apt archives are greater than my trust of the added/modified packages in Trisquel.
  • The final project reproduce-trisquel puts the pieces together briefly as follows, everything being driven from its .gitlab-ci.yml file.
    • There is a (manually triggered) job generate-build-image to create a build image to speed up CI/CD runs, using a simple Dockerfile.
    • There is a (manually triggered) job generate-package-lists that uses debdistdiff to generate and store package lists and puts its output in lists/. The reason this is manually triggered right now is due to a race condition.
    • There is a (scheduled) job that does two things: from the package lists, the script generate-ci-packages.sh builds a GitLab CI/CD instruction file ci-packages.yml that describes jobs for each package to build. The second part is generate-readme.sh that re-generate the project’s README.md based on the build logs and diffoscope outputs that stored in the git repository.
    • Through the ci-packages.yml file, there is a large number of jobs that are dynamically defined, which currently are manually triggered to not overload the build servers. The script build-package.sh is invoked and attempts to rebuild a package, and stores build log and diffoscope output in the git project itself.

I did not expect to be able to use the GitLab shared runners to do the building, however they turned out to work quite well and I postponed setting up my own runner. There is a manually curated lists/disabled-aramo.txt with some packages that all required too much disk space or took over two hours to build. Today I finally took the time to setup a GitLab runner using podman running Trisquel aramo, and I expect to complete builds of the remaining packages soon — one of my Dell R630 server with 256GB RAM and dual 2680v4 CPUs should deliver sufficient performance.

Current limitations and ideas on further work (most are filed as project issues) include:

  • We don’t support *.buildinfo files. As far as I am aware, Trisquel does not publish them for their builds. Improving this would be a first step forward, anyone able to help? Compare buildinfo.debian.net. For example, many packages differ only in their NT_GNU_BUILD_ID symbol inside the ELF binary, see example diffoscope output for libgpg-error. By poking around in jenkins.trisquel.org I managed to discover that Trisquel built initramfs-utils in the randomized path /build/initramfs-tools-bzRLUp and hard-coding that path allowed me to reproduce that package. I expect the same to hold for many other packages. Unfortunately, this failure turned into success with that package moved the needle from 42% reproducibility to 43% however I didn’t let that stand in the way of a good headline.
  • The mechanism to download the Release/Package-files from dists/ is not fool-proof: we may not capture all ever published such files. While this is less of a concern for reproducibility, it is more of a concern for apt transparency. Still, having Trisquel provide a service similar to snapshot.debian.org would help.
  • Having at least one other CPU architecture would be nice.
  • Due to lack of time and mental focus, handling incremental updates of new versions of packages is not yet working. This means we only ever build one version of a package, and never discover any newly published versions of the same package. Now that Trisquel aramo is released, the expected rate of new versions should be low, but still happens due to security or backports.
  • Porting this to test supposedly FSDG-compliant distributions such as PureOS and Gnuinos should be relatively easy. I’m also looking at Devuan because of Gnuinos.
  • The elephant in the room is how reproducible Ubuntu is in the first place.

Happy Easter Hacking!

Update 2023-04-17: The original project “reproduce-trisquel” that was announced here has been archived and replaced with two projects, one generic “debdistreproduce” and one with results for Trisquel: “reproduce/trisquel“.

7 Replies to “Trisquel is 42% Reproducible!”

  1. If *Debian* is working hard on reproducible builds – and is the upstream for Ubuntu – just rebase Trisquel on to Debian and you get the buildinfo and reproducibility largely for free since you don’t modify much.

    [Disclosure: I am, of course, a Debian developer]

    • Thanks for feedback — I’m not a Trisquel developer so I can’t speak for them or know if that is going to happen.

      PureOS is derived from Debian, so anyone wanting a 100% free software GNU/Linux based on Debian already has one to work on.

      I think it is useful to have a 100% free variant of Ubuntu around in the eco-system, so if Trisquel rebased on Debian I wouldn’t really know what would set it apart from PureOS except branding, which seems lack an overall set-back.

      Trisquel uses abrowser instead of firefox, and linux-libre instead of linux, and I especially enjoy netplan over /etc/network/interfaces but other than that I find all the *.deb-based distributions quite similar. I find the similarity even stronger after having studied the differences between them lately.

      It would be interesting to diff Ubuntu compared to Debian though… I expect this would be a much larger diff than the other derivatives. It seems hard to match them though, since their release schedules aren’t aligned in the same way Trisquel/Ubuntu, PureOS/Debian and Devuan/Gnuinos are.

      /Simon

  2. You never will align them: Ubuntu ships every six months with an LTS every two years regularly. At that point, they may pull from Debian stable/testing or unstable to produce something they will support for two years.

    Debian’s schedule is roughly every two years – “when it’s ready”

    Purism will be behind Debian – and Devuan, which is 99% Debian less systemd will be behind that and Gnuinos??

    There really aren’t enough developers to support so many forks of Debian – heck, there aren’t enough developers to support Debian.
    A free fork of Ubuntu really isn’t going to help matters and basing Ubuntu on Trisquel is taking a “less free” Debian derivative with a commercial pressure and trying to make it freer than Debian … doesn’t really work.

    Still, whatever folks want to try: but ONE free Debian derivative would make much more sense. Maybe merge Trisquel/Gnuinos and all the other FSF approved derivatives if they’re close enough?

    • Yeah aligning and diffing Debian and Ubuntu seems a bit challenging. However maybe just a package list diff without version numbers would be useful? I suppose one has to combine main+contrib+nonfree and compare that to what’s in main+universe+multiverse+restricted? They share the same */*-updates/*-security/*-backports layout, but I wonder if it is useful to do a comparison of bullseye-updates with jammy-updates.

      We’ll see what happens in the eco-system, forking is a classic theme in free software when there is tension, and I reckon this is going to increase rather than decrease, as most of the involved projects seems to (in one way or another) take a “it is my way or the highway” approach to some technical (systemd) or philosophical decision (100% free software). I think each particular decision is understandable, and illustrate that people have different goals with their computing.

      My idea with the “debdistutils” project is to provide one space to do collaborate work between dpkg/apt-based distributions, for apt transparency, dist diffing, external reproducibility and what else may be relevant.

      /Simon

  3. For the curious, here is how Ubuntu jammy differs from Debian bullseye:

    https://debdistutils.gitlab.io/debdistdiff/ubuntu-vs-debian/

    It is apparent that there are too many differences between how Debian and Ubuntu uses the main part vs the *-updates part for this to be useful. I’ll see if I can make debdistdiff combine everything and just compare it overall, and not break it down per main/updates/security/backports.

  4. Pingback: More on Differential Reproducible Builds: Devuan is 46% reproducible! – Simon Josefsson's blog

  5. Pingback: How To Trust A Machine – Simon Josefsson's blog

Leave a Reply to simon Cancel reply

Your email address will not be published. Required fields are marked *

*