linux – Simon Josefsson's blog

GitLab Runner with Rootless Privilege-less Capability-less Podman on riscv64

Posted on 2025-04-25 by simon — 3 Comments ↓

I host my own GitLab CI/CD runners, and find that having coverage on the riscv64 CPU architecture is useful for testing things. The HiFive Premier P550 seems to be a common hardware choice. The P550 is possible to purchase online. You also need a (mini-)ATX chassi, power supply (~500W is more than sufficient), PCI-to-M2 converter and a NVMe storage device. Total cost per machine was around $8k/€8k for me. Assembly was simple: bolt everything, connect ATX power, connect cables for the front-panel, USB and and Audio. Be sure to toggle the physical power switch on the P550 before you close the box. Front-panel power button will start your machine. There is a P550 user manual available.

Below I will guide you to install the GitLab Runner on the pre-installed Ubuntu 24.04 that ships with the P550, and configure it to use Podman in root-less mode and without the --privileged flag, without any additional capabilities like SYS_ADMIN. Presumably you want to migrate to some other OS instead; hey Trisquel 13 riscv64 I’m waiting for you! I wouldn’t recommend using this machine for anything sensitive, there is an awful lot of non-free and/or vendor-specific software installed, and the hardware itself is young. I am not aware of any riscv64 hardware that can run a libre OS, all of them appear to require non-free blobs and usually a non-mainline kernel.

Login on console using username ‘ubuntu‘ and password ‘ubuntu‘. You will be asked to change the password, so do that.
Start a terminal, gain root with sudo -i and change the hostname:
echo jas-p550-01 > /etc/hostname
Connect ethernet and run: apt-get update && apt-get dist-upgrade -u.
If your system doesn’t have valid MAC address (they show as MAC ‘8c:00:00:00:00:00 if you run ‘ip a’), you can fix this to avoid collisions if you install multiple P550’s on the same network. Connect the Debug USB-C connector on the back to one of the hosts USB-A slots. Use minicom (use Ctrl-A X to exit) to talk to it.

apt-get install minicom
minicom -o -D /dev/ttyUSB3
#cmd: ifconfig
inet 192.168.0.2  netmask: 255.255.240.0
gatway 192.168.0.1
SOM_Mac0: 8c:00:00:00:00:00
SOM_Mac1: 8c:00:00:00:00:00
MCU_Mac:  8c:00:00:00:00:00
#cmd: setmac 0 CA:FE:42:17:23:00
The MAC setting will be valid after rebooting the carrier board!!!
MAC[0] addr set to CA:FE:42:17:23:00(ca:fe:42:17:23:0)
#cmd: setmac 1 CA:FE:42:17:23:01
The MAC setting will be valid after rebooting the carrier board!!!
MAC[1] addr set to CA:FE:42:17:23:01(ca:fe:42:17:23:1)
#cmd: setmac 2 CA:FE:42:17:23:02
The MAC setting will be valid after rebooting the carrier board!!!
MAC[2] addr set to CA:FE:42:17:23:02(ca:fe:42:17:23:2)
#cmd:

For reference, if you wish to interact with the MCU you may do that via OpenOCD and telnet, like the following (as root on the P550). You need to have the Debug USB-C connected to a USB-A host port.

apt-get install openocd
wget https://raw.githubusercontent.com/sifiveinc/hifive-premier-p550-tools/refs/heads/master/mcu-firmware/stm32_openocd.cfg
echo 'acc115d283ff8533d6ae5226565478d0128923c8a479a768d806487378c5f6c3  stm32_openocd.cfg' | sha256sum -c
openocd -f stm32_openocd.cfg &
telnet localhost 4444
...

Reboot the machine and login remotely from your laptop. Gain root and set up SSH public-key authentication and disable SSH password logins.

echo 'ssh-ed25519 AAA...' > ~/.ssh/authorized_keys
sed -i 's;^#PasswordAuthentication.*;PasswordAuthentication no;' /etc/ssh/sshd_config
service ssh restart

With a NVME device in the PCIe slot, create a LVM partition where the GitLab runner will live:

parted /dev/nvme0n1 print
blkdiscard /dev/nvme0n1
parted /dev/nvme0n1 mklabel gpt
parted /dev/nvme0n1 mkpart jas-p550-nvm-02 ext2 1MiB 100% align-check optimal 1
parted /dev/nvme0n1 set 1 lvm on
partprobe /dev/nvme0n1
pvcreate /dev/nvme0n1p1
vgcreate vg0 /dev/nvme0n1p1
lvcreate -L 400G -n glr vg0
mkfs.ext4 -L glr /dev/mapper/vg0-glr

Now with a reasonable setup ready, let’s install the GitLab Runner. The following is adapted from gitlab-runner’s official installation instructions documentation. The normal installation flow doesn’t work because they don’t publish riscv64 apt repositories, so you will have to perform upgrades manually.

# wget https://s3.dualstack.us-east-1.amazonaws.com/gitlab-runner-downloads/latest/deb/gitlab-runner_riscv64.deb
# wget https://s3.dualstack.us-east-1.amazonaws.com/gitlab-runner-downloads/latest/deb/gitlab-runner-helper-images.deb
wget https://gitlab-runner-downloads.s3.amazonaws.com/v17.11.0/deb/gitlab-runner_riscv64.deb
wget https://gitlab-runner-downloads.s3.amazonaws.com/v17.11.0/deb/gitlab-runner-helper-images.deb
echo '68a4c2a4b5988a5a5bae019c8b82b6e340376c1b2190228df657164c534bc3c3  gitlab-runner-helper-images.deb' | sha256sum -c
echo 'ee37dc76d3c5b52e4ba35cf8703813f54f536f75cfc208387f5aa1686add7a8c  gitlab-runner_riscv64.deb' | sha256sum -c
dpkg -i gitlab-runner-helper-images.deb gitlab-runner_riscv64.deb

Remember the NVMe device? Let’s not forget to use it, to avoid wear and tear of the internal MMC root disk. Do this now before any files in /home/gitlab-runner appears, or you have to move them manually.

gitlab-runner stop
echo 'LABEL=glr /home/gitlab-runner ext4 defaults,noatime 0 1' >> /etc/fstab
systemctl daemon-reload
mount /home/gitlab-runner

Next install gitlab-runner and configure it. Replace token glrt-REPLACEME below with the registration token you get from your GitLab project’s Settings -> CI/CD -> Runners -> New project runner. I used the tags ‘riscv64‘ and a runner description of the hostname.

gitlab-runner register --non-interactive --url https://gitlab.com --token glrt-REPLACEME --name $(hostname) --executor docker --docker-image debian:stable

We install and configure gitlab-runner to use podman, and to use non-root user.

apt-get install podman
gitlab-runner stop
usermod --add-subuids 100000-165535 --add-subgids 100000-165535 gitlab-runner

You need to run some commands as the gitlab-runner user, but unfortunately some interaction between sudo/su and pam_systemd makes this harder than it should be. So you have to setup SSH for the user and login via SSH to run the commands. Does anyone know of a better way to do this?

# on the p550:
cp -a /root/.ssh/ /home/gitlab-runner/
chown -R gitlab-runner:gitlab-runner /home/gitlab-runner/.ssh/
# on your laptop:
ssh gitlab-runner@jas-p550-01
systemctl --user --now enable podman.socket
systemctl --user --now start podman.socket
loginctl enable-linger gitlab-runner gitlab-runner
systemctl status --user podman.socket

We modify /etc/gitlab-runner/config.toml as follows, replace 997 with the user id shown by systemctl status above. See feature flags documentation for more documentation.

...
[[runners]]
  environment = ["FF_NETWORK_PER_BUILD=1", "FF_USE_FASTZIP=1"]
...
  [runners.docker]
        host = "unix:///run/user/997/podman/podman.sock"

Note that unlike the documentation I do not add the ‘privileged = true‘ parameter here. I will come back to this later.

Restart the system to confirm that pushing a .gitlab-ci.yml with a job that uses the riscv64 tag like the following works properly.

dump-env-details-riscv64:
  stage: build
  image: riscv64/debian:testing
  tags: [ riscv64 ]
  script:
    - set

Your gitlab-runner should now be receiving jobs and running them in rootless podman. You may view the log using journalctl as follows:

journalctl --follow _SYSTEMD_UNIT=gitlab-runner.service

To stop the graphical environment and disable some unnecessary services, you can use:

systemctl set-default multi-user.target
systemctl disable openvpn cups cups-browsed sssd colord

At this point, things were working fine and I was running many successful builds. Now starts the fun part with operational aspects!

I had a problem when running buildah to build a new container from within a job, and noticed that aardvark-dns was crashing. You can use the Debian ‘aardvark-dns‘ binary instead.

wget http://ftp.de.debian.org/debian/pool/main/a/aardvark-dns/aardvark-dns_1.14.0-3_riscv64.deb
echo 'df33117b6069ac84d3e97dba2c59ba53775207dbaa1b123c3f87b3f312d2f87a  aardvark-dns_1.14.0-3_riscv64.deb' | sha256sum -c
mkdir t
cd t
dpkg -x ../aardvark-dns_1.14.0-3_riscv64.deb .
mv /usr/lib/podman/aardvark-dns /usr/lib/podman/aardvark-dns.ubuntu
mv usr/lib/podman/aardvark-dns /usr/lib/podman/aardvark-dns.debian

My setup uses podman in rootless mode without passing the –privileged parameter or any –add-cap parameters to add non-default capabilities. This is sufficient for most builds. However if you try to create container using buildah from within a job, you may see errors like this:

Writing manifest to image destination
Error: mounting new container: mounting build container "8bf1ec03d967eae87095906d8544f51309363ddf28c60462d16d73a0a7279ce1": creating overlay mount to /var/lib/containers/storage/overlay/23785e20a8bac468dbf028bf524274c91fbd70dae195a6cdb10241c345346e6f/merged, mount_data="lowerdir=/var/lib/containers/storage/overlay/l/I3TWYVYTRZ4KVYCT6FJKHR3WHW,upperdir=/var/lib/containers/storage/overlay/23785e20a8bac468dbf028bf524274c91fbd70dae195a6cdb10241c345346e6f/diff,workdir=/var/lib/containers/storage/overlay/23785e20a8bac468dbf028bf524274c91fbd70dae195a6cdb10241c345346e6f/work,volatile": using mount program /usr/bin/fuse-overlayfs: unknown argument ignored: lazytime
fuse: device not found, try 'modprobe fuse' first
fuse-overlayfs: cannot mount: No such file or directory
: exit status 1

According to GitLab runner security considerations, you should not enable the ‘privileged = true’ parameter, and the alternative appears to run Podman as root with privileged=false. Indeed setting privileged=true as in the following example solves the problem, and I suppose running podman as root would too.

[[runners]]
    [runners.docker]
        privileged = true

Can we do better? After some experimentation, and reading open issues with suggested capabilities and configuration snippets, I ended up with the following configuration. It runs podman in rootless mode (as the gitlab-runner user) without --privileged, but add the CAP_SYS_ADMIN capability and exposes the /dev/fuse device. Still, this is running as non-root user on the machine, so I think it is an improvement compared to using --privileged and also compared to running podman as root.

[[runners]]
  [runners.docker]
    privileged = false
    cap_add = ["SYS_ADMIN"]
    devices = ["/dev/fuse"]

Still I worry about the security properties of such a setup, so I only enable these settings for a separately configured runner instance that I use when I need this docker-in-docker (oh, I meant buildah-in-podman) functionality. I found one article discussing Rootless Podman without the privileged flag that suggest –isolation=chroot but I have yet to make this work. Suggestions for improvement are welcome.

Happy Riscv64 Building!

Update 2025-05-05: I was able to make it work without the SYS_ADMIN capability too, with a GitLab /etc/gitlab-runner/config.toml like the following:

[[runners]]
  [runners.docker]
    privileged = false
    devices = ["/dev/fuse"]

And passing --isolation chroot to Buildah like this:

buildah build --isolation chroot -t $CI_REGISTRY_IMAGE:name image/

I’ve updated the blog title to add the word “capability-less” as well. I’ve confirmed that the same recipe works on podman on a ppc64el platform too. Remaining loop-holes are escaping from the chroot into the non-root gitlab-runner user, and escalating that privilege to root. The /dev/fuse and sub-uid/gid may be privilege escalation vectors here, otherwise I believe you’ve found a serious software security issue rather than a configuration mistake.

Trisquel is 42% Reproducible!

Posted on 2023-04-10 by simon

The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.

tl;dr: go to reproduce-trisquel.

When I set about to understand how Trisquel worked, I identified a number of things that would improve my confidence in it. The lowest hanging fruit for me was to manually audit the package archive, and I wrote a tool called debdistdiff to automate this for me. That led me to think about apt archive transparency more in general. I have made some further work in that area (hint: apt-verify) that deserve its own blog post eventually. Most of apt archive transparency is futile if we don’t trust the intended packages that are in the archive. One way to measurable increase trust in the package are to provide reproducible builds of the packages, which should by now be an established best practice. Code review is still important, but since it will never provide positive guarantees we need other processes that can identify sub-optimal situations automatically. The way reproducible builds easily identify negative results is what I believe has driven much of its success: its results are tangible and measurable. The field of software engineering is in need of more such practices.

The design of my setup to build Trisquel reproducible are as follows.

The project debdistget is responsible for downloading Release/Packages files (which are the most relevant files from dists/) from apt archives, and works by commiting them into GitLab-hosted git-repositories. I maintain several such repositories for popular apt-archives, including for Trisquel and its upstream Ubuntu. GitLab invokes a schedule pipeline to do the downloading, and there is some race conditions here.
The project debdistdiff is used to produce the list of added and modified packages, which are the input to actually being able to know what packages to reproduce. It publishes human readable summary of difference for several distributions, including Trisquel vs Ubuntu. Early on I decided that rebuilding all of the upstream Ubuntu packages is out of scope for me: my personal trust in the official Debian/Ubuntu apt archives are greater than my trust of the added/modified packages in Trisquel.
The final project reproduce-trisquel puts the pieces together briefly as follows, everything being driven from its .gitlab-ci.yml file.
- There is a (manually triggered) job generate-build-image to create a build image to speed up CI/CD runs, using a simple Dockerfile.
- There is a (manually triggered) job generate-package-lists that uses debdistdiff to generate and store package lists and puts its output in lists/. The reason this is manually triggered right now is due to a race condition.
- There is a (scheduled) job that does two things: from the package lists, the script generate-ci-packages.sh builds a GitLab CI/CD instruction file ci-packages.yml that describes jobs for each package to build. The second part is generate-readme.sh that re-generate the project’s README.md based on the build logs and diffoscope outputs that stored in the git repository.
- Through the ci-packages.yml file, there is a large number of jobs that are dynamically defined, which currently are manually triggered to not overload the build servers. The script build-package.sh is invoked and attempts to rebuild a package, and stores build log and diffoscope output in the git project itself.

I did not expect to be able to use the GitLab shared runners to do the building, however they turned out to work quite well and I postponed setting up my own runner. There is a manually curated lists/disabled-aramo.txt with some packages that all required too much disk space or took over two hours to build. Today I finally took the time to setup a GitLab runner using podman running Trisquel aramo, and I expect to complete builds of the remaining packages soon — one of my Dell R630 server with 256GB RAM and dual 2680v4 CPUs should deliver sufficient performance.

Current limitations and ideas on further work (most are filed as project issues) include:

We don’t support *.buildinfo files. As far as I am aware, Trisquel does not publish them for their builds. Improving this would be a first step forward, anyone able to help? Compare buildinfo.debian.net. For example, many packages differ only in their NT_GNU_BUILD_ID symbol inside the ELF binary, see example diffoscope output for libgpg-error. By poking around in jenkins.trisquel.org I managed to discover that Trisquel built initramfs-utils in the randomized path /build/initramfs-tools-bzRLUp and hard-coding that path allowed me to reproduce that package. I expect the same to hold for many other packages. Unfortunately, this failure turned into success with that package moved the needle from 42% reproducibility to 43% however I didn’t let that stand in the way of a good headline.
The mechanism to download the Release/Package-files from dists/ is not fool-proof: we may not capture all ever published such files. While this is less of a concern for reproducibility, it is more of a concern for apt transparency. Still, having Trisquel provide a service similar to snapshot.debian.org would help.
Having at least one other CPU architecture would be nice.
Due to lack of time and mental focus, handling incremental updates of new versions of packages is not yet working. This means we only ever build one version of a package, and never discover any newly published versions of the same package. Now that Trisquel aramo is released, the expected rate of new versions should be low, but still happens due to security or backports.
Porting this to test supposedly FSDG-compliant distributions such as PureOS and Gnuinos should be relatively easy. I’m also looking at Devuan because of Gnuinos.
The elephant in the room is how reproducible Ubuntu is in the first place.

Happy Easter Hacking!

Update 2023-04-17: The original project “reproduce-trisquel” that was announced here has been archived and replaced with two projects, one generic “debdistreproduce” and one with results for Trisquel: “reproduce/trisquel“.

Understanding Trisquel

Posted on 2023-01-22 by simon

Ever wondered how Trisquel and Ubuntu differs and what’s behind the curtain from a developer perspective? I have. Sharing what I’ve learnt will allow you to increase knowledge and trust in Trisquel too.

Trisquel GNU/Linux logo

The scripts to convert an Ubuntu archive into a Trisquel archive are available in the ubuntu-purge repository. The easy to read purge-focal script lists the packages to remove from Ubuntu 20.04 Focal when it is imported into Trisquel 10.0 Nabia. The purge-jammy script provides the same for Ubuntu 22.04 Jammy and (the not yet released) Trisquel 11.0 Aramo. The list of packages is interesting, and by researching the reasons for each exclusion you can learn a lot about different attitudes towards free software and understand the desire to improve matters. I wish there were a wiki-page that for each removed package summarized relevant links to earlier discussions. At the end of the script there is a bunch of packages that are removed for branding purposes that are less interesting to review.

Trisquel adds a couple of Trisquel-specific packages. The source code for these packages are in the trisquel-packages repository, with sub-directories for each release: see 10.0/ for Nabia and 11.0/ for Aramo. These packages appears to be mostly for branding purposes.

Trisquel modify a set of packages, and here is starts to get interesting. Probably the most important package to modify is to use GNU Linux-libre instead of Linux as the kernel. The scripts to modify packages are in the package-helpers repository. The relevant scripts are in the helpers/ sub-directory. There is a branch for each Trisquel release, see helpers/ for Nabia and helpers/ for Aramo. To see how Linux is replaced with Linux-libre you can read the make-linux script.

This covers the basic of approaching Trisquel from a developers perspective. As a user, I have identified some areas that need more work to improve trust in Trisquel:

Auditing the Trisquel archive to confirm that the intended changes covered above are the only changes that are published.
Rebuild all packages that were added or modified by Trisquel and publish diffoscope output comparing them to what’s in the Trisquel archive. The goal would be to have reproducible builds of all Trisquel-related packages.
Publish an audit log of the Trisquel archive to allow auditing of what packages are published. This boils down to trust of the OpenPGP key used to sign the Trisquel archive.
Trisquel archive mirror auditing to confirm that they are publishing only what comes from the official archive, and that they do so timely.

I hope to publish more about my work into these areas. Hopefully this will inspire similar efforts in related distributions like PureOS and the upstream distributions Ubuntu and Debian.

Happy hacking!

Trisquel 11 on NV41PZ: First impressions

Posted on 2022-12-10 by simon

My NovaCustom NV41PZ laptop arrived a couple of days ago, and today I had some time to install it. You may want to read about my purchasing decision process first. I expected a rough ride to get it to work, given the number of people claiming that modern laptops can’t run fully free operating systems. I first tried the Trisquel 10 live DVD and it booted fine including network, but the mouse trackpad did not work. Before investigating it, I noticed a forum thread about Trisquel 11 beta3 images, and being based on Ubuntu 22.04 LTS and has Linux-libre 5.15 it seemed better to start with more modern software. After installing through the live DVD successfully, I realized I didn’t like MATE but wanted to keep using GNOME. I reverted back to installing a minimal environment through the netinst image, and manually installed GNOME (apt-get install gnome) since I prefer that over MATE, together with a bunch of other packages. I’ve been running it for a couple of hours now, and here is a brief summary of the hardware components that works.

CPU	Alder Lake Intel i7-1260P
Memory	2x32GB Kingston DDR4 SODIMM 3200MHz
Storage	Samsung 980 Pro 2TB NVME
BIOS	Dasharo Coreboot
Graphics	Intel Xe
Screen (internal)	14″ 1920×1080
Screen (HDMI)	Dell 27″ 2560×1440 and Ben-Q PD3220U 3840×1260 works fine
Screen (USB-C)	Via Wavlink USB-C/HDMI port extender: Dell 27″ 2560×1440 and Ben-Q PD3220U 3840×1260
Webcam	Builtin 1MP Camera
Microphone	Intel Alder Lake
Keyboard	ISO layout, all function keys working
Mouse	Trackpad, tap clicking and gestures
Ethernet RJ45	Realtek RTL8111/8168/8411 with r8169 driver
Memory card	O2 Micro comes up as /dev/mmcblk0
Docking station	Wavlink 4xUSB, 2xHDMI, DP, RJ45, …
Connectivity	USB-A, USB-C
Audio	Intel Alder Lake

Hardware components and status

So what’s not working? Unfortunately, NovaCustom does not offer any WiFi or Bluetooth module that is compatible with Trisquel, so the AX211 (1675x) Wifi/Bluetooth card in it is just dead weight. I imagine it would be possible to get the card to work if non-free firmware is loaded. I don’t need Bluetooth right now, and use the Technoetic N-150 USB WiFi dongle when I’m not connected to wired network.

Compared against my X201, the following factors have improved.

Faster – CPU benchmark suggests it is 8 times faster than my old i7-620M. While it feels snappier it is not a huge difference. While NVMe should improve SSD performance, benchmark wise the NVMe 980Pro only seems around 2-3 faster than the SATA-based 860 Evo. Going from 6GB to 64GB is 10 times more memory, which is useful for disk caching.
BIOS is free software.
EC firmware is free.
Operating system follows the FSDG.

I’m still unhappy about the following properties with both the NV41PZ and the X201.

CPU microcode is not available under free license.
Intel Mangement Engine is still present in the CPU.
No builtin WiFi/Bluetooth that works with free software.
Some other secondary processors (e.g., disk or screen) may be running non-free software but at least none requires non-free firmware.

Hopefully my next laptop will have improved on this further. I hope to be able to resolve the WiFi part by replacing the WiFi module, there appears to be options available but I have not tested them on this laptop yet. Does anyone know of a combined WiFi and Bluetooth M.2 module that would work on Trisquel?

While I haven’t put the laptop to heavy testing yet, everything that I would expect a laptop to be able to do seems to work fine. Including writing this blog post!

Towards pluggable GSS-API modules

Posted on 2022-07-14 by simon

GSS-API is a standardized framework that is used by applications to, primarily, support Kerberos V5 authentication. GSS-API is standardized by IETF and supported by protocols like SSH, SMTP, IMAP and HTTP, and implemented by software projects such as OpenSSH, Exim, Dovecot and Apache httpd (via mod_auth_gssapi). The implementations of Kerberos V5 and GSS-API that are packaged for common GNU/Linux distributions, such as Debian, include MIT Kerberos, Heimdal and (less popular) GNU Shishi/GSS.

When an application or library is packaged for a GNU/Linux distribution, a choice is made which GSS-API library to link with. I believe this leads to two problematic consequences: 1) it is difficult for end-users to chose between Kerberos implementation, and 2) dependency bloat for non-Kerberos users. Let’s discuss these separately.

No system admin or end-user choice over the GSS-API/Kerberos implementation used

There are differences in the bug/feature set of MIT Kerberos and that of Heimdal’s, and definitely that of GNU Shishi. This can lead to a situation where an application (say, Curl) is linked to MIT Kerberos, and someone discovers a Kerberos related problem that would have been working if Heimdal was used, or vice versa. Sometimes it is possible to locally rebuild a package using another set of dependencies. However doing so has a high maintenance cost to track security fixes in future releases. It is an unsatisfying solution for the distribution to flip flop between which library to link to, depending on which users complain the most. To resolve this, a package could be built in two variants: one for MIT Kerberos and one for Heimdal. Both can be shipped. This can help solve the problem, but the question of which variant to install by default leads to similar concerns, and will also eventually leads to dependency conflicts. Consider an application linked to libraries (possible in several steps) where one library only supports MIT Kerberos and one library only supports Heimdal.

The fact remains that there will continue to be multiple Kerberos implementations. Distributions will continue to support them, and will be faced with the dilemma of which one to link to by default. Distributions and the people who package software will have little guidance on which implementation to chose from their upstream, since most upstream support both implementations. The result is that system administrators and end-users are not given a simple way to have flexibility about which implementation to use.
Dependency bloat for non-Kerberos use-cases.

Compared to the number of users of GNU/Linux systems out there, the number of Kerberos users on GNU/Linux systems is smaller. Here distributions face another dilemma. Should they enable GSS-API for all applications, to satisfy the Kerberos community, or should they be conservative with adding dependencies to reduce attacker surface for the non-Kerberos users? This is a dilemma with no clear answer, and one approach has been to ship two versions of a package: one with Kerberos support and one without. Another option here is for upstream to support loadable modules, for example Dovecot implement this and Debian ship with a separate ‘dovecot-gssapi’ package that extend the core Dovecot seamlessly. Few except some larger projects appear to be willing to carry that maintenance cost upstream, so most only support build-time linking of the GSS-API library.

There are a number of real-world situations to consider, but perhaps the easiest one to understand for most GNU/Linux users is OpenSSH. The SSH protocol supports Kerberos via GSS-API, and OpenSSH implement this feature, and most GNU/Linux distributions ship a SSH client and SSH server linked to a GSS-API library. Someone made the choice of linking it to a GSS-API library, for the arguable smaller set of people interested in it, and also the choice which library to link to. Rebuilding OpenSSH locally without Kerberos support comes with a high maintenance cost. Many people will not need or use the Kerberos features of the SSH client or SSH server, and having it enabled by default comes with a security cost. Having a vulnerability in OpenSSH is critical for many systems, and therefor its dependencies are a reasonable concern. Wouldn’t it be nice if OpenSSH was built in a way that didn’t force you to install MIT Kerberos or Heimdal? While still making it easy for Kerberos users to use it, of course.

Hopefully I have made the problem statement clear above, and that I managed to convince you that the state of affairs is in need of improving. I learned of the problems from my personal experience with maintaining GNU SASL in Debian, and for many years I ignored this problem.

Let me introduce Libgssglue!

Matryoshka Dolls – photo CC-4.0-BY-NC by PngAll

Libgssglue is a library written by Kevin W. Coffman based on historical GSS-API code, the initial release was in 2004 (using the name libgssapi) and the last release was in 2012. Libgssglue provides a minimal GSS-API library and header file, so that any application can link to it instead of directly to MIT Kerberos or Heimdal (or GNU GSS). The administrator or end-user can select during run-time which GSS-API library to use, through a global /etc/gssapi_mech.conf file or even a local GSSAPI_MECH_CONF environment variable. Libgssglue is written in C, has no external dependencies, and is BSD-style licensed. It was developed for the CITI NFSv4 project but libgssglue ended up not being used.

I have added support to build GNU SASL with libgssglue — the changes required were only ./configure.ac-related since GSS-API is a standardized framework. I have written a fairly involved CI/CD check that builds GNU SASL with MIT Kerberos, Heimdal, libgssglue and GNU GSS, sets ups a local Kerberos KDC and verify successful GSS-API and GS2-KRB5 authentications. The ‘gsasl’ command line tool connects to a local example SMTP server, also based on GNU SASL (linked to all variants of GSS-API libraries), and to a system-installed Dovecot IMAP server that use the MIT Kerberos GSS-API library. This is on Debian but I expect it to be easily adaptable to other GNU/Linux distributions. The check triggered some (expected) Shishi/GSS-related missing features, and triggered one problem related to authorization identities that may be a bug in GNU SASL. However, testing shows that it is possible to link GNU SASL with libgssglue and have it be operational with any choice of GSS-API library that is shipped with Debian. See GitLab CI/CD code and its CI/CD output.

This experiment worked so well that I contacted Kevin to learn that he didn’t have any future plans for the project. I have adopted libgssglue and put up a Libgssglue GitLab project page, and pushed out a libgssglue 0.5 release fixing only some minor build-related issues. There are still some missing newly introduced GSS-API interfaces that could be added, but I haven’t been able to find any critical issues with it. Amazing that an untouched 10 year old project works so well!

My current next steps are:

Release GNU SASL with support for Libgssglue and encourage its use in documentation.
Make GNU SASL link to Libgssglue in Debian, to avoid a hard dependency on MIT Kerberos, but still allowing a default out-of-the-box Kerberos experience with GNU SASL.
Maintain libgssglue upstream and implement self-checks, CI/CD testing, new GSS-API interfaces that have been defined, and generally fix bugs and improve the project. Help appreciated!
Maintain the libgssglue package in Debian.
Look into if there are applications in Debian that link to a GSS-API library that could instead be linked to libgssglue to allow flexibility for the end-user and reduce dependency bloat.

What do you think? Happy Hacking!