Coping with non-free software in Debian

A personal reflection on how I moved from my Debian home to find two new homes with Trisquel and Guix for my own ethical computing, and while doing so settled my dilemma about further Debian contributions.

Debian‘s contributions to the free software community has been tremendous. Debian was one of the early distributions in the 1990’s that combined the GNU tools (compiler, linker, shell, editor, and a set of Unix tools) with the Linux kernel and published a free software operating system. Back then there were little guidance on how to publish free software binaries, let alone entire operating systems. There was a lack of established community processes and conflict resolution mechanisms, and lack of guiding principles to motivate the work. The community building efforts that came about in parallel with the technical work has resulted in a steady flow of releases over the years.

From the work of Richard Stallman and the Free Software Foundation (FSF) during the 1980’s and early 1990’s, there was at the time already an established definition of free software. Inspired by free software definition, and a belief that a social contract helps to build a community and resolve conflicts, Debian’s social contract (DSC) with the free software community was published in 1997. The DSC included the Debian Free Software Guidelines (DFSG), which directly led to the Open Source Definition.

Slackware 3.5" disks
One of my earlier Slackware install disk sets, kept for nostalgic reasons.

I was introduced to GNU/Linux through Slackware in the early 1990’s (oh boy those nights calculating XFree86 modeline’s and debugging sendmail.cf) and primarily used RedHat Linux during ca 1995-2003. I switched to Debian during the Woody release cycles, when the original RedHat Linux was abandoned and Fedora launched. It was Debian’s explicit community processes and infrastructure that attracted me. The slow nature of community processes also kept me using RedHat for so long: centralized and dogmatic decision processes often produce quick and effective outcomes, and in my opinion RedHat Linux was technically better than Debian ca 1995-2003. However the RedHat model was not sustainable, and resulted in the RedHat vs Fedora split. Debian catched up, and reached technical stability once its community processes had been grounded. I started participating in the Debian community around late 2006.

My interpretation of Debian’s social contract is that Debian should be a distribution of works licensed 100% under a free license. The Debian community has always been inclusive towards non-free software, creating the contrib/non-free section and permitting use of the bug tracker to help resolve issues with non-free works. This is all explained in the social contract. There has always been a clear boundary between free and non-free work, and there has been a commitment that the Debian system itself would be 100% free.

The concern that RedHat Linux was not 100% free software was not critical to me at the time: I primarily (and happily) ran GNU tools on Solaris, IRIX, AIX, OS/2, Windows etc. Running GNU tools on RedHat Linux was an improvement, and I hadn’t realized it was possible to get rid of all non-free software on my own primary machine. Debian realized that goal for me. I’ve been a believer in that model ever since. I can use Solaris, macOS, Android etc knowing that I have the option of using a 100% free Debian.

While the inclusive approach towards non-free software invite and deserve criticism (some argue that being inclusive to non-inclusive behavior is a bad idea), I believe that Debian’s approach was a successful survival technique: by being inclusive to – and a compromise between – free and non-free communities, Debian has been able to stay relevant and contribute to both environments. If Debian had not served and contributed to the free community, I believe free software people would have stopped contributing. If Debian had rejected non-free works completely, I don’t think the successful Ubuntu distribution would have been based on Debian.

I wrote the majority of the text above back in September 2022, intending to post it as a way to argue for my proposal to maintain the status quo within Debian. I didn’t post it because I felt I was saying the obvious, and that the obvious do not need to be repeated, and the rest of the post was just me going down memory lane.

The Debian project has been a sustainable producer of a 100% free OS up until Debian 11 bullseye. In the resolution on non-free firmware the community decided to leave the model that had resulted in a 100% free Debian for so long. The goal of Debian is no longer to publish a 100% free operating system, instead this was added: “The Debian official media may include firmware”. Indeed the Debian 12 bookworm release has confirmed that this would not only be an optional possibility. The Debian community could have published a 100% free Debian, in parallel with the non-free Debian, and still be consistent with their newly adopted policy, but chose not to. The result is that Debian’s policies are not consistent with their actions. It doesn’t make sense to claim that Debian is 100% free when the Debian installer contains non-free software. Actions speaks louder than words, so I’m left reading the policies as well-intended prose that is no longer used for guidance, but for the peace of mind for people living in ivory towers. And to attract funding, I suppose.

So how to deal with this, on a personal level? I did not have an answer to that back in October 2022 after the vote. It wasn’t clear to me that I would ever want to contribute to Debian under the new social contract that promoted non-free software. I went on vacation from any Debian work. Meanwhile Debian 12 bookworm was released, confirming my fears. I kept coming back to this text, and my only take-away was that it would be unethical for me to use Debian on my machines. Letting actions speak for themselves, I switched to PureOS on my main laptop during October, barely noticing any difference since it is based on Debian 11 bullseye. Back in December, I bought a new laptop and tried Trisquel and Guix on it, as they promise a migration path towards ppc64el that PureOS do not.

While I pondered how to approach my modest Debian contributions, I set out to learn Trisquel and gained trust in it. I migrated one Debian machine after another to Trisquel, and started to use Guix on others. Migration was easy because Trisquel is based on Ubuntu which is based on Debian. Using Guix has its challenges, but I enjoy its coherant documented environment. All of my essential self-hosted servers (VM hosts, DNS, e-mail, WWW, Nextcloud, CI/CD builders, backup etc) uses Trisquel or Guix now. I’ve migrated many GitLab CI/CD rules to use Trisquel instead of Debian, to have a more ethical computing base for software development and deployment. I wish there were official Guix docker images around.

Time has passed, and when I now think about any Debian contributions, I’m a little less muddled by my disappointment of the exclusion of a 100% free Debian. I realize that today I can use Debian in the same way that I use macOS, Android, RHEL or Ubuntu. And what prevents me from contributing to free software on those platforms? So I will make the occasional Debian contribution again, knowing that it will also indirectly improve Trisquel. To avoid having to install Debian, I need a development environment in Trisquel that allows me to build Debian packages. I have found a recipe for doing this:

# System commands:
sudo apt-get install debhelper git-buildpackage debian-archive-keyring
sudo wget -O /usr/share/debootstrap/scripts/debian-common https://sources.debian.org/data/main/d/debootstrap/1.0.128%2Bnmu2/scripts/debian-common
sudo wget -O /usr/share/debootstrap/scripts/sid https://sources.debian.org/data/main/d/debootstrap/1.0.128%2Bnmu2/scripts/sid
# Run once to create build image:
DIST=sid git-pbuilder create --mirror http://deb.debian.org/debian/ --debootstrapopts "--exclude=usr-is-merged" --basepath /var/cache/pbuilder/base-sid.cow
# Run in a directory with debian/ to build a package:
gbp buildpackage --git-pbuilder --git-dist=sid

How to sustainably deliver a 100% free software binary distributions seems like an open question, and the challenges are not all that different compared to the 1990’s or early 2000’s. I’m hoping Debian will come back to provide a 100% free platform, but my fear is that Debian will compromise even further on the free software ideals rather than the opposite. With similar arguments that were used to add the non-free firmware, Debian could compromise the free software spirit of the Linux boot process (e.g., non-free boot images signed by Debian) and media handling (e.g., web browsers and DRM), as Debian have already done with appstore-like functionality for non-free software (Python pip). To learn about other freedom issues in Debian packaging, browsing Trisquel’s helper scripts may enlight you.

Debian’s setback and the recent setback for RHEL-derived distributions are sad, and it will be a challenge for these communities to find internally consistent coherency going forward. I wish them the best of luck, as Debian and RHEL are important for the wider free software eco-system. Let’s see how the community around Trisquel, Guix and the other FSDG-distributions evolve in the future.

The situation for free software today appears better than it was years ago regardless of Debian and RHEL’s setbacks though, which is important to remember! I don’t recall being able install a 100% free OS on a modern laptop and modern server as easily as I am able to do today.

Happy Hacking!

Addendum 22 July 2023: The original title of this post was Coping with non-free Debian, and there was a thread about it that included feedback on the title. I do agree that my initial title was confrontational, and I’ve changed it to the more specific Coping with non-free software in Debian. I do appreciate all the fine free software that goes into Debian, and hope that this will continue and improve, although I have doubts given the opinions expressed by the majority of developers. For the philosophically inclined, it is interesting to think about what it means to say that a compilation of software is freely licensed. At what point does a compilation of software deserve the labels free vs non-free? Windows probably contains some software that is published as free software, let’s say Windows is 1% free. Apple authors a lot of free software (as a tangent, Apple probably produce more free software than what Debian as an organization produces), and let’s say macOS contains 20% free software. Solaris (or some still maintained derivative like OpenIndiana) is mostly freely licensed these days, isn’t it? Let’s say it is 80% free. Ubuntu and RHEL pushes that closer to let’s say 95% free software. Debian used to be 100% but is now slightly less at maybe 99%. Trisquel and Guix are at 100%. At what point is it reasonable to call a compilation free? Does Debian deserve to be called freely licensed? Does macOS? Is it even possible to use these labels for compilations in any meaningful way? All numbers just taken from thin air. It isn’t even clear how this can be measured (binary bytes? lines of code? CPU cycles? etc). The caveat about license review mistakes applies. I ignore Debian’s own claims that Debian is 100% free software, which I believe is inconsistent and no longer true under any reasonable objective analysis. It was not true before the firmware vote since Debian ships with non-free blobs in the Linux kernel for example.

Sigstore for Apt Archives: apt-cosign

As suggested in my initial announcement of apt-sigstore my plan was to look into stronger uses of Sigstore than rekor, and I’m now happy to announce that the apt-cosign plugin has been added to apt-sigstore and the operational project debdistcanary is publishing cosign-statements about the InRelease file published by the following distributions: Trisquel GNU/Linux, PureOS, Gnuinos, Ubuntu, Debian and Devuan.

Summarizing the commands that you need to run as root to experience the great new world:

# run everything as root: su / sudo -i / doas -s
apt-get install -y apt gpg bsdutils wget
wget -nv -O/usr/local/bin/apt-verify-gpgv https://gitlab.com/debdistutils/apt-verify/-/raw/main/apt-verify-gpgv
chmod +x /usr/local/bin/apt-verify-gpgv
mkdir -p /etc/apt/verify.d
ln -s /usr/bin/gpgv /etc/apt/verify.d
echo 'APT::Key::gpgvcommand "apt-verify-gpgv";' > /etc/apt/apt.conf.d/75verify
wget -O/usr/local/bin/cosign https://github.com/sigstore/cosign/releases/download/v2.0.1/cosign-linux-amd64
echo 924754b2e62f25683e3e74f90aa5e166944a0f0cf75b4196ee76cb2f487dd980  /usr/local/bin/cosign | sha256sum -c
chmod +x /usr/local/bin/cosign
wget -nv -O/etc/apt/verify.d/apt-cosign https://gitlab.com/debdistutils/apt-sigstore/-/raw/main/apt-cosign
chmod +x /etc/apt/verify.d/apt-cosign
mkdir -p /etc/apt/trusted.cosign.d
dist=$(lsb_release --short --id | tr A-Z a-z)
wget -O/etc/apt/trusted.cosign.d/cosign-public-key-$dist.txt "https://gitlab.com/debdistutils/debdistcanary/-/raw/main/cosign/cosign-public-key-$dist.txt"
echo "Cosign::Base-URL \"https://gitlab.com/debdistutils/canary/$dist/-/raw/main/cosign\";" > /etc/apt/apt.conf.d/77cosign

Then run your usual apt-get update and look in the syslog to debug things.

This is the kind of work that gets done while waiting for the build machines to attempt to reproducibly build PureOS. Unfortunately, the results is that a meager 16% of the 765 added/modifed packages are reproducible by me. There is some infrastructure work to be done to improve things: we should use sbuild for example. The build infrastructure should produce signed statements for each package it builds: One statement saying that it attempted to reproducible build a particular binary package (thus generated some build logs and diffoscope-output for auditing), and one statements saying that it actually was able to reproduce a package. Verifying such claims during apt-get install or possibly dpkg -i is a logical next step.

There is some code cleanups and release work to be done now. Which distribution will be the first apt-based distribution that includes native support for Sigstore? Let’s see.

Sigstore is not the only relevant transparency log around, and I’ve been trying to learn a bit about Sigsum to be able to support it as well. The more improved confidence about system security, the merrier!

More on Differential Reproducible Builds: Devuan is 46% reproducible!

Building on my work to rebuild Trisquel GNU/Linux 11.0 aramo, it felt simple to generalize the tooling to any two apt-repository pairs and I’ve created debdistreproduce as a template-project for doing this through the infrastructure of GitLab CI/CD and meanwhile even set up my own gitlab-runner on spare hardware. I’ve brought over reproduce/trisquel to using debdistreproduce as well, and archived the old reproduce-trisquel project.

After fixing some quirks, building Devuan GNU+Linux 4.0 Chimaera was fairly quick since they do not modify that many packages, and I’m now able to reproduce 46% of the packages that Devuan Chimaera add/modify on amd64. I have more work in progress here (hint: reproduce/pureos), but PureOS is considerably larger than both Trisquel and Devuan together. I’m not sure how interested Devuan or PureOS are in reproducible builds though.

Reflecting on this work made me realize that while the natural thing to do here was to differentiate two different apt-based distributions, I have realized the same way I did for debdistdiff that it would also be interesting to compare, say, Debian bookworm from Debian unstable, especially now that they should be fairly close together. My tooling should support that too. However, to really provide any benefit from the more complete existing reproducible testing of Debian, some further benefit from doing that would be useful and I can’t articulate one right now.

One ultimate goal with my effort is to improve trust in apt-repositories, and combining transparency-style protection a’la apt-sigstore with third-party validated reproducible builds may indeed be one such use-case that would benefit the wider community of apt-repositories. Imagine having your system not install any package unless it can verify it against a third-party reproducible build organization that commits their results in a tamper-proof transparency ledger. But I’m now on repeat here, so will stop.

Sigstore protects Apt archives: apt-verify & apt-sigstore

Do you want your apt-get update to only ever use files whose hash checksum have been recorded in the globally immutable tamper-resistance ledger rekor provided by the Sigstore project? Well I thought you’d never ask, but now you can, thanks to my new projects apt-verify and apt-sigstore. I have not done proper stable releases yet, so this is work in progress. To try it out, adapt to the modern era of running random stuff from the Internet as root, and run the following commands. Use a container or virtual machine if you have trust issues.

apt-get install -y apt gpg bsdutils wget
wget -nv -O/usr/local/bin/rekor-cli 'https://github.com/sigstore/rekor/releases/download/v1.1.0/rekor-cli-linux-amd64'
echo afde22f01d9b6f091a7829a6f5d759d185dc0a8f3fd21de22c6ae9463352cf7d  /usr/local/bin/rekor-cli | sha256sum -c
chmod +x /usr/local/bin/rekor-cli
wget -nv -O/usr/local/bin/apt-verify-gpgv https://gitlab.com/debdistutils/apt-verify/-/raw/main/apt-verify-gpgv
chmod +x /usr/local/bin/apt-verify-gpgv
mkdir -p /etc/apt/verify.d
ln -s /usr/bin/gpgv /etc/apt/verify.d
echo 'APT::Key::gpgvcommand "apt-verify-gpgv";' > /etc/apt/apt.conf.d/75verify
wget -nv -O/etc/apt/verify.d/apt-rekor https://gitlab.com/debdistutils/apt-sigstore/-/raw/main/apt-rekor
chmod +x /etc/apt/verify.d/apt-rekor
apt-get update
less /var/log/syslog

If the stars are aligned (and the puppet projects’ of debdistget and debdistcanary have ran their GitLab CI/CD pipeline recently enough) you will see a successful output from apt-get update and your syslog will contain debug logs showing the entries from the rekor log for the release index files that you downloaded. See sample outputs in the README.

If you get tired of it, disabling is easy:

chmod -x /etc/apt/verify.d/apt-rekor

Our project currently supports Trisquel GNU/Linux 10 (nabia) & 11 (aramo), PureOS 10 (byzantium), Gnuinos chimaera, Ubuntu 20.04 (focal) & 22.04 (jammy), Debian 10 (buster) & 11 (bullseye), and Devuan GNU+Linux 4.0 (chimaera). Others can be supported to, please open an issue about it, although my focus is on FSDG-compliant distributions and their upstreams.

This is a continuation of my previous work on apt-canary. I have realized that it was better to separate out the generic part of apt-canary into my new project apt-verify that offers a plugin-based method, and then rewrote apt-canary to be one such plugin. Then apt-sigstore‘s apt-rekor was my second plugin for apt-verify.

Due to the design of things, and some current limitations, Ubuntu is the least stable since they push out new signed InRelease files frequently (mostly due to their use of Phased-Update-Percentage) and debdistget and debdistcanary CI/CD runs have a hard time keeping up. If you have insight on how to improve this, please comment me in the issue tracking the race condition.

There are limitations of what additional safety a rekor-based solution actually provides, but I expect that to improve as I get a cosign-based approach up and running. Currently apt-rekor mostly make targeted attacks less deniable. With a cosign-based approach, we could design things such that your machine only downloads updates when they have been publicly archived in an immutable fashion, or submitted for validation by a third-party such as my reproducible build setup for Trisquel GNU/Linux aramo.

What do you think? Happy Hacking!

Trisquel is 42% Reproducible!

The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.

tl;dr: go to reproduce-trisquel.

When I set about to understand how Trisquel worked, I identified a number of things that would improve my confidence in it. The lowest hanging fruit for me was to manually audit the package archive, and I wrote a tool called debdistdiff to automate this for me. That led me to think about apt archive transparency more in general. I have made some further work in that area (hint: apt-verify) that deserve its own blog post eventually. Most of apt archive transparency is futile if we don’t trust the intended packages that are in the archive. One way to measurable increase trust in the package are to provide reproducible builds of the packages, which should by now be an established best practice. Code review is still important, but since it will never provide positive guarantees we need other processes that can identify sub-optimal situations automatically. The way reproducible builds easily identify negative results is what I believe has driven much of its success: its results are tangible and measurable. The field of software engineering is in need of more such practices.

The design of my setup to build Trisquel reproducible are as follows.

  • The project debdistget is responsible for downloading Release/Packages files (which are the most relevant files from dists/) from apt archives, and works by commiting them into GitLab-hosted git-repositories. I maintain several such repositories for popular apt-archives, including for Trisquel and its upstream Ubuntu. GitLab invokes a schedule pipeline to do the downloading, and there is some race conditions here.
  • The project debdistdiff is used to produce the list of added and modified packages, which are the input to actually being able to know what packages to reproduce. It publishes human readable summary of difference for several distributions, including Trisquel vs Ubuntu. Early on I decided that rebuilding all of the upstream Ubuntu packages is out of scope for me: my personal trust in the official Debian/Ubuntu apt archives are greater than my trust of the added/modified packages in Trisquel.
  • The final project reproduce-trisquel puts the pieces together briefly as follows, everything being driven from its .gitlab-ci.yml file.
    • There is a (manually triggered) job generate-build-image to create a build image to speed up CI/CD runs, using a simple Dockerfile.
    • There is a (manually triggered) job generate-package-lists that uses debdistdiff to generate and store package lists and puts its output in lists/. The reason this is manually triggered right now is due to a race condition.
    • There is a (scheduled) job that does two things: from the package lists, the script generate-ci-packages.sh builds a GitLab CI/CD instruction file ci-packages.yml that describes jobs for each package to build. The second part is generate-readme.sh that re-generate the project’s README.md based on the build logs and diffoscope outputs that stored in the git repository.
    • Through the ci-packages.yml file, there is a large number of jobs that are dynamically defined, which currently are manually triggered to not overload the build servers. The script build-package.sh is invoked and attempts to rebuild a package, and stores build log and diffoscope output in the git project itself.

I did not expect to be able to use the GitLab shared runners to do the building, however they turned out to work quite well and I postponed setting up my own runner. There is a manually curated lists/disabled-aramo.txt with some packages that all required too much disk space or took over two hours to build. Today I finally took the time to setup a GitLab runner using podman running Trisquel aramo, and I expect to complete builds of the remaining packages soon — one of my Dell R630 server with 256GB RAM and dual 2680v4 CPUs should deliver sufficient performance.

Current limitations and ideas on further work (most are filed as project issues) include:

  • We don’t support *.buildinfo files. As far as I am aware, Trisquel does not publish them for their builds. Improving this would be a first step forward, anyone able to help? Compare buildinfo.debian.net. For example, many packages differ only in their NT_GNU_BUILD_ID symbol inside the ELF binary, see example diffoscope output for libgpg-error. By poking around in jenkins.trisquel.org I managed to discover that Trisquel built initramfs-utils in the randomized path /build/initramfs-tools-bzRLUp and hard-coding that path allowed me to reproduce that package. I expect the same to hold for many other packages. Unfortunately, this failure turned into success with that package moved the needle from 42% reproducibility to 43% however I didn’t let that stand in the way of a good headline.
  • The mechanism to download the Release/Package-files from dists/ is not fool-proof: we may not capture all ever published such files. While this is less of a concern for reproducibility, it is more of a concern for apt transparency. Still, having Trisquel provide a service similar to snapshot.debian.org would help.
  • Having at least one other CPU architecture would be nice.
  • Due to lack of time and mental focus, handling incremental updates of new versions of packages is not yet working. This means we only ever build one version of a package, and never discover any newly published versions of the same package. Now that Trisquel aramo is released, the expected rate of new versions should be low, but still happens due to security or backports.
  • Porting this to test supposedly FSDG-compliant distributions such as PureOS and Gnuinos should be relatively easy. I’m also looking at Devuan because of Gnuinos.
  • The elephant in the room is how reproducible Ubuntu is in the first place.

Happy Easter Hacking!

Update 2023-04-17: The original project “reproduce-trisquel” that was announced here has been archived and replaced with two projects, one generic “debdistreproduce” and one with results for Trisquel: “reproduce/trisquel“.