Debian Libre Live 13.3.0 is released!

Following up on my initial announcement about Debian Libre Live I am happy to report on continued progress and the release of Debian Libre Live version 13.3.0.

Since both this and the previous 13.2.0 release are based on the stable Debian trixie release, there really isn’t a lot of major changes but instead incremental minor progress for the installation process. Repeated installations has a tendency to reveal bugs, and we have resolved the apt sources list confusion for Calamares-based installations and a couple of other nits. This release is more polished and we are not aware of any known remaining issues with them (unlike for earlier versions which were released with known problems), although we conservatively regard the project as still in beta. A Debian Libre Live logo is needed before marking this as stable, any graphically talented takers? (Please base it on the Debian SVG upstream logo image.)

We provide GNOME, KDE, and XFCE desktop images, as well as text-only “standard” image, which match the regular Debian Live images with non-free software on them, but also provide a “slim” variant which is merely 750MB compared to the 1.9GB “standard” image. The slim image can still start a debian installer, and can still boot into a minimal live text-based system.

The GNOME, KDE and XFCE desktop images feature the Calamares installer, and we have performed testing on a variety of machines. The standard and slim images does not have a installer from the running live system, but all images support a boot menu entry to start the installer.

With this release we also extend our arm64 support to two tested platforms. The current list of successfully installed and supported systems now include the following hardware:

This is a very limited set of machines, but the diversity in CPUs and architecture should hopefully reflect well on a wide variety of commonly available machines. Several of these machines are crippled (usually GPU or WiFI) without adding non-free software, complain at your hardware vendor and adapt your use-cases and future purchases.

The images are as follows, with SHA256SUM checksums and GnuPG signature on the 13.3.0 release page.

Curious how the images were made? Fear not, for the Debian Libre Live project README has documentation, the run.sh script is short and the .gitlab-ci.yml CI/CD Pipeline definition file brief.

Happy Libre OS hacking!

Debian Taco – Towards a GitSecDevOps Debian

One of my holiday projects was to understand and gain more trust in how Debian binaries are built, and as the holidays are coming to an end, I’d like to introduce a new research project called Debian Taco. I apparently need more holidays, because there are still more work to be done here, so at the end I’ll summarize some pending work.

Debian Taco, or TacOS, is a GitSecDevOps rebuild of Debian GNU/Linux.

The Debian Taco project publish rebuilt binary packages, package repository metadata (InRelease, Packages, etc), container images, cloud images and live images.

All packages are built from pristine source packages in the Debian archive. Debian Taco does not modify any Debian source code nor add or remove any packages found in Debian.

No servers are involved! Everything is built in GitLab pipelines and results are published through modern GitDevOps mechanism like GitLab Pages and S3 object storage. You can fork the individual projects below on GitLab.com and you will have your own Debian-derived OS available for tweaking. (Of course, at some level, servers are always involved, so this claim is a bit of hyperbole.)

Goals

The goal of TacOS is to be bit-by-bit identical with official Debian GNU/Linux, and until that has been completed, publish diffoscope output with differences.

The idea is to further categorize all artifact differences into one of the following categories:

1) An obvious bug in Debian. For example, if a package does not build reproducible.

2) An obvious bug in TacOS. For example, if our build environment does not manage to build a package.

3) Something else. This would be input for further research and consideration. This category also include things where it isn’t obvious if it is a bug in Debian or in TacOS. Known examples:

3A) Packages in TacOS are rebuilt the latest available source code, not the (potentially) older package that were used to build the Debian packages. This could lead to differences in the packages. These differences may be useful to analyze to identify supply-chain attacks. See some discussion about idempotent rebuilds.

Our packages are all built from source code, unless we have not yet managed to build something. In the latter situation, Debian Taco falls back and uses the official Debian artifact. This allows an incremental publication of Debian Taco that still is 100% complete without requiring that everything is rebuilt instantly. The goal is that everything should be rebuilt, and until that has been completed, publish a list of artifacts that we use verbatim from Debian.

Debian Taco Archive

The Debian Taco Archive project generate and publish the package archive (dists/tacos-trixie/InRelease, dists/tacos-trixie/main/binary-amd64/Packages.gz, pool/* etc), similar to what is published at https://deb.debian.org/debian/.

The output of the Debian Taco Archive is available from https://debdistutils.gitlab.io/tacos/archive/.

Debian Taco Container Images

The Debian Taco Container Images project provide container images of Debian Taco for trixie, forky and sid on the amd64, arm64, ppc64el and riscv64 architectures.

These images allow quick and simple use of Debian Taco interactively, but makes it easy to deploy for container orchestration frameworks.

Debian Taco Cloud Images

The Debian Taco Cloud Images project provide cloud images of Debian Taco for trixie, forky and sid on the amd64, arm64, ppc64el and riscv64 architectures.

Launch and install Debian Taco for your cloud environment!

Debian Taco Live Images

The Debian Taco Live Images project provide live images of Debian Taco for trixie, forky and sid on the amd64 and arm64 architectures.

These images allows running Debian Taco on physical hardware (or virtual machines), and even installation for permanent use.

Debian Taco Build Images and Packages

Packages are built using debdistbuild, which was introduced in a blog about Build Debian in a GitLab Pipeline.

The first step is to prepare build images, which is done by the Debian Taco Build Images project. They are similar to the Debian Taco containers but have build-essential and debdistbuild installed on them.

Debdistbuild is launched in a per-architecture per-suite CI/CD project. Currently only trixie-amd64 is available. That project has built some essential early packages like base-files, debian-archive-keyring and hostname. They are stored in Git LFS backed by a S3 object storage. These packages were all built reproducibly. So this means Debian Taco is still 100% bit-by-bit identical to Debian, except for the renaming.

I’ve yet to launch a more massive wide-scale package rebuild until some outstanding issues have been resolved. I earlier rebuilt around 7000 packages from Trixie on amd64, so I know that the method easily scales.

Remaining work

Where is the diffoscope package outputs and list of package differences? For another holiday! Clearly this is an important remaining work item.

Another important outstanding issue is how to orchestrate launching the build of all packages. Clearly a list of packages is needed, and some trigger mechanism to understand when new packages are added to Debian.

One goal was to build packages from the tag2upload browse.dgit.debian.org archive, before checking the Debian Archive. This ought to be really simple to implement, but other matters came first.

GitLab or Codeberg?

Everything is written using basic POSIX /bin/sh shell scripts. Debian Taco uses the GitLab CI/CD Pipeline mechanism together with a Hetzner S3 object storage to serve packages. The scripts have only weak reliance on GitLab-specific principles, and were designed with the intention to support other platforms. I believe reliance on a particular CI/CD platform is a limitation, so I’d like to explore shipping Debian Taco through a Forgejo-based architecture, possibly via Codeberg as soon as I manage to deploy reliable Forgejo runners.

The important aspects that are required are:

1) Pipelines that can build and publish web sites similar to GitLab Pages. Codeberg has a pipeline mechanism. I’ve successfully used Codeberg Pages to publish the OATH Toolkit homepage homepage. Glueing this together seems feasible.

2) Container Registry. It seems Forgejo supports a Container Registry but I’ve not worked with it at Codeberg to understand if there are any limitations.

3) Package Registry. The Deban Taco live images are uploaded into a package registry, because they are too big for being served through GitLab Pages. It may be converted to using a Pages mechanism, or possibly through Release Artifacts if multi-GB artifacts are supported on other platforms.

I hope to continue this work and explaining more details in a series of posts, stay tuned!

Reproducible Guix Container Images

Around a year ago I wrote about Guix Container Images for GitLab CI/CD and these images have served the community well. Besides continous use in CI/CD, these Guix container images are used to confirm reproducibility of the source tarball artifacts in the releases of Libtasn1 v4.20, InetUtils v2.6, Libidn2 v2.3.8, Libidn v1.43, SASL v2.2.2, Guile-GnuTLS v5.0.1, and OATH Toolkit v2.6.13. See how all those release announcements mention a Guix commit? That’s the essential supply-chain information about the Guix build environment that allows the artifacts to be re-created. To make sure this is repeatable, the release tarball artifacts are re-created from source code every week in the verify-reproducible-artifacts project, that I wrote about earlier. Guix’s time travelling feature make this sustainable to maintain, and hopefully will continue to be able to reproduce the exact same tarball artifacts for years to come.

During the last year, unfortunately Guix was removed from Debian stable. My Guix container images were created from Debian with that Guix package. My setup continued to work since the old stage0 Debian+Guix containers were still available. Such a setup is not sustainable, as there will be bit-rot and we don’t want to rely on old containers forever, which (after the removal of Guix in Debian) could not be re-produced any more. Let this be a reminder how user-empowering features such as Guix time-travelling is! I have reworked my Guix container image setup, and this post is an update on the current status of this effort.

The first step was to re-engineer Debian container images with Guix, and I realized these were useful on their own, and warrant a separate project. A more narrowly scoped project makes will hopefully make it easier to keep them working. Now instead of apt-get install guix they use the official Guix guix-install.sh approach. Read more about that effort in the announcement of Debian with Guix.

The second step was to reconsider my approach to generate the Guix images. The earlier design had several stages. First, Debian+Guix containers were created. Then from those containers, a pure Guix container was created. Finally, using the pure Guix container another pure Guix container was created. The idea behind that GCC-like approach was to get to reproducible images that were created from an image that had no Debian left on it. However, I never managed to finish this. Partially because I hadn’t realized that every time you build a Guix container image from Guix, you effectively go back in time. When using Guix version X to build a container with Guix on it, it will not put Guix version X into the container but will put whatever version of Guix is available in its package archive, which will be an earlier version, such as version X-N. I had hope to overcome this somehow (running a guix pull in newly generated images may work), but never finished this before Guix was removed from Debian.

So what could a better design look like?

For efficiency, I had already started experimenting with generating the final images directly from the Debian+Guix images, and after reproducibility bugs were fixed I was able to get to reproducible images. However, I was still concerned that the Debian container could taint the process somehow, and was also concerned about the implied dependency on non-free software in Debian.

I’ve been using comparative rebuilds using “similar” distributions to confirm artifact reproducibility for my software projects, comparing builds on Trisquel 11 with Ubuntu 22.04, and AlmaLinux 9 with RockyLinux 9 for example. This works surprisingly well. Including one freedom-respecting distribution like Trisquel will detect if any non-free software has bearing on artifacts. Using different architectures, such as amd64 vs arm64 also help with deeper supply-chain concerns.

My conclusion was that I wanted containers with the same Guix commit for both Trisquel and Ubuntu. Given the similarity with Debian, adapting and launching the Guix on Trisquel/Debian project was straight forward. So we now have Trisquel 11/12 and Ubuntu 22.04/24.04 images with the same Guix on them.

Do you see where the debian-with-guix and guix-on-dpkg projects are leading to?

We are now ready to look at the modernized Guix Container Images project. The tags are the same as before:

registry.gitlab.com/debdistutils/guix/container:latest
registry.gitlab.com/debdistutils/guix/container:slim
registry.gitlab.com/debdistutils/guix/container:extra
registry.gitlab.com/debdistutils/guix/container:gash

The method to create them is different. Now there is a “build” job that uses the earlier Guix+Trisquel container (for amd64) or Guix+Debian (for arm64, pending Trisquel arm64 containers). The build job create the final containers directly. Next a Ubuntu “reproduce” job is launched that runs the same commands, failing if it cannot generate the bit-by-bit identical container. Then single-arch images are tested (installing/building GNU hello and building libksba), and then pushed to the GitLab registry, adding multi-arch images in the process. Then the final multi-arch containers are tested by building Guile-GnuTLS and, on success, uploaded to the Docker Hub.

How would you use them? A small way to start the container is like this:

jas@kaka:~$ podman run -it --privileged --entrypoint=/bin/sh registry.gitlab.com/debdistutils/guix/container:latest
sh-5.2# env HOME=/ guix describe # https://issues.guix.gnu.org/74949
  guix 21ce6b3
    repository URL: https://git.guix.gnu.org/guix.git
    branch: master
    commit: 21ce6b392ace4c4d22543abc41bd7c22596cd6d2
sh-5.2# 

The need for --entrypoint=/bin/sh is because Guix’s pack command sets up the entry point differently than most other containers. This could probably be fixed if people want that, and there may be open bug reports about this.

The need for --privileged is more problematic, but is discussed upstream. The above example works fine without it, but running anything more elaborate with guix-daemon installing packages will trigger a fatal error. Speaking of that, here is a snippet of commands that allow you to install Guix packages in the container.

cp -rL /gnu/store/*profile/etc/* /etc/
echo 'root:x:0:0:root:/:/bin/sh' > /etc/passwd
echo 'root:x:0:' > /etc/group
groupadd --system guixbuild
for i in $(seq -w 1 10); do useradd -g guixbuild -G guixbuild -d /var/empty -s $(command -v nologin) -c "Guix build user $i" --system guixbuilder$i; done
env LANG=C.UTF-8 guix-daemon --build-users-group=guixbuild &
guix archive --authorize < /share/guix/ci.guix.gnu.org.pub
guix archive --authorize < /share/guix/bordeaux.guix.gnu.org.pub
guix install hello
GUIX_PROFILE="/var/guix/profiles/per-user/root/guix-profile"
. "$GUIX_PROFILE/etc/profile"
hello

This could be simplified, but we chose to not hard-code in our containers because some of these are things that probably shouldn’t be papered over but fixed properly somehow. In some execution environments, you may need to pass --disable-chroot to guix-daemon.

To use the containers to build something in a GitLab pipeline, here is an example snippet:

test-amd64-latest-wget-configure-make-libksba:
  image: registry.gitlab.com/debdistutils/guix/container:latest
  before_script:
  - cp -rL /gnu/store/*profile/etc/* /etc/
  - echo 'root:x:0:0:root:/:/bin/sh' > /etc/passwd
  - echo 'root:x:0:' > /etc/group
  - groupadd --system guixbuild
  - for i in $(seq -w 1 10); do useradd -g guixbuild -G guixbuild -d /var/empty -s $(command -v nologin) -c "Guix build user $i" --system guixbuilder$i; done
  - export HOME=/
  - env LANG=C.UTF-8 guix-daemon --build-users-group=guixbuild &
  - guix archive --authorize < /share/guix/ci.guix.gnu.org.pub
  - guix archive --authorize < /share/guix/bordeaux.guix.gnu.org.pub
  - guix describe
  - guix install libgpg-error
  - GUIX_PROFILE="//.guix-profile"
  - . "$GUIX_PROFILE/etc/profile"
  script:
  - wget https://www.gnupg.org/ftp/gcrypt/libksba/libksba-1.6.7.tar.bz2
  - tar xfa libksba-1.6.7.tar.bz2
  - cd libksba-1.6.7
  - ./configure
  - make V=1
  - make check VERBOSE=t V=1

More help on the project page for the Guix Container Images.

That’s it for tonight folks, and remember, Happy Hacking!

Container Images for Debian with Guix

The debian-with-guix-container project build and publish container images of Debian GNU/Linux stable with GNU Guix installed.

The images are like normal Debian stable containers but have the guix tool and a reasonable fresh guix pull.

Supported architectures include amd64 and arm64. The multi-arch container is called:

registry.gitlab.com/debdistutils/guix/debian-with-guix-container:stable

It may also be accessed via debian-with-guix at Docker Hub as:

docker.io/jas4711/debian-with-guix:stable

The container images may be used like this:

$ podman run --privileged -it --hostname guix --rm registry.gitlab.com/debdistutils/guix/debian-with-guix-container:stable
root@guix:/# hello
bash: hello: command not found
root@guix:/# guix describe
  guix c9eb69d
    repository URL: https://gitlab.com/debdistutils/guix/mirror.git
    branch: master
    commit: c9eb69ddbf05e77300b59f49f4bb5aa50cae0892
root@guix:/# LC_ALL=C.UTF-8 /root/.config/guix/current/bin/guix-daemon --build-users-group=guixbuild &
[1] 21
root@guix:/# GUIX_PROFILE=/root/.config/guix/current; . "$GUIX_PROFILE/etc/profile"
root@guix:/# guix describe
Generation 2    Nov 28 2025 10:14:11    (current)
  guix c9eb69d
    repository URL: https://gitlab.com/debdistutils/guix/mirror.git
    branch: master
    commit: c9eb69ddbf05e77300b59f49f4bb5aa50cae0892
root@guix:/# guix install --verbosity=0 hello
accepted connection from pid 55, user root
The following package will be installed:
   hello 2.12.2

hint: Consider setting the necessary environment variables by running:

     GUIX_PROFILE="/root/.guix-profile"
     . "$GUIX_PROFILE/etc/profile"

Alternately, see `guix package --search-paths -p "/root/.guix-profile"'.

root@guix:/# GUIX_PROFILE="/root/.guix-profile"
root@guix:/# . "$GUIX_PROFILE/etc/profile"
root@guix:/# hello
Hello, world!
root@guix:/# 

Below is an example GitLab pipeline job that demonstrate how to run guix install to install additional dependencies, and then download and build a package that pick up the installed package from the system.

test-wget-configure-make-libksba-amd64:
  image: registry.gitlab.com/debdistutils/guix/debian-with-guix-container:stable
  before_script:
  - env LC_ALL=C.UTF-8 /root/.config/guix/current/bin/guix-daemon --build-users-group=guixbuild $GUIX_DAEMON_ARG &
  - GUIX_PROFILE=/root/.config/guix/current; . "$GUIX_PROFILE/etc/profile"
  - guix describe
  - guix install libgpg-error
  - GUIX_PROFILE="/root/.guix-profile"; . "$GUIX_PROFILE/etc/profile"
  - apt-get install --update -y --no-install-recommends build-essential wget ca-certificates bzip2
  script:
  - wget https://www.gnupg.org/ftp/gcrypt/libksba/libksba-1.6.7.tar.bz2
  - tar xfa libksba-1.6.7.tar.bz2
  - cd libksba-1.6.7
  - ./configure
  - make V=1
  - make check VERBOSE=t V=1

The images were initially created for use in GitLab CI/CD Pipelines but should work for any use.

The images are built in a GitLab CI/CD pipeline, see .gitlab-ci.yml.

The containers are derived from official Debian stable images with Guix installed and a successful run of guix pull, built using buildah invoked from build.sh using image/Containerfile that runs image/setup.sh.

The pipeline also push images to the GitLab container registry, and then also to Docker Hub.

Guix binaries are downloaded from the Guix binary tarballs project because of upstream download site availability and bandwidth concerns.

Enjoy these images! Hopefully they can help you overcome the loss of Guix in Debian which made it a mere apt-get install guix away before.

There are several things that may be improved further. An alternative to using podman --privileged is to use --security-opt seccomp=unconfined --cap-add=CAP_SYS_ADMIN,CAP_NET_ADMIN which may be slightly more fine-grained.

For ppc64el support I ran into an error message that I wasn’t able to resolve:

guix pull: error: while setting up the build environment: cannot set host name: Operation not permitted

For riscv64, I can’t even find a Guix riscv64 binary tarball for download, is there one anywhere?

For arm64 containers, it seems that you need to start guix-daemon with --disable-chroot to get something to work, at least on GitLab.com’s shared runners, otherwise you will get this error message:

guix install: error: clone: Invalid argument

Building the images themselves also require disabling some security functionality, and I was not able to build images with buildah without providing --cap-add=CAP_SYS_ADMIN,CAP_NET_ADMIN otherwise there were errors like this:

guix pull: error: cloning builder process: Operation not permitted
guix pull: error: clone: Operation not permitted
guix pull: error: while setting up the build environment: cannot set loopback interface flags: Operation not permitted

Finally on amd64 it seems --security-opt seccomp=unconfined is necessary, otherwise there is an error message like this, even if you use --disable-chroot:

guix pull: error: while setting up the child process: in phase setPersonality: cannot set personality: Function not implemented

This particular error is discussed upstream, but I think generally that these error suggest that guix-daemon could use more optional use of features: if some particular feature is not available, gracefully fall back to another mode of operation, instead of exiting with an error. Of course, it should never fall back to an insecure mode of operation, unless the user requests that.

Happy Hacking!

Independently Reproducible Git Bundles

The gnulib project publish a git bundle as a stable archival copy of the gnulib git repository once in a while.

Why? We don’t know exactly what this may be useful for, but I’m promoting for this to see if we can establish some good use-case.

A git bundle may help to establish provinence in case of an attack on the Savannah hosting platform that compromise the gnulib git repository.

Another use is in the Debian gnulib package: that gnulib bundle is git cloned when building some Debian packages, to get to exactly the gnulib commit used by each upstream project – see my talk on gnulib at Debconf24 – and this approach reduces the amount of vendored code that is part of Debian’s source code, which is relevant to mitigate XZ-style attacks.

The first time we published the bundle, I wanted it to be possible to re-create it bit-by-bit identically by others.

At the time I discovered a well-written blog post by Paul Beacher on reproducible git bundles and thought he had solved the problem for me. Essentially it boils down to disable threading during compression when producing the bundle, and his final example show this results in a predictable bit-by-bit identical output:

$ for i in $(seq 1 100); do \
> git -c 'pack.threads=1' bundle create -q /tmp/bundle-$i --all; \
> done
$ md5sum /tmp/bundle-* | cut -f 1 -d ' ' | uniq -c
    100 4898971d4d3b8ddd59022d28c467ffca

So what remains to be said about this? It seems reproducability goes deeper than that. One desirable property is that someone else should be able to reproduce the same git bundle, and not only that a single individual is able to reproduce things on one machine.

It surprised me to see that when I ran the same set of commands on a different machine (started from a fresh git clone), I got a different checksum. The different checksums occured even when nothing had been committed on the server side between the two runs.

I thought the reason had to do with other sources of unpredictable data, and I explored several ways to work around this but eventually gave up. I settled for the following sequence of commands:

REV=ac9dd0041307b1d3a68d26bf73567aa61222df54 # master branch commit to package
git clone https://git.savannah.gnu.org/git/gnulib.git
cd gnulib
git fsck # attempt to validate input
# inspect that the new tree matches a trusted copy
git checkout -B master $REV # put $REV at master
for b in $(git branch -r | grep origin/stable- | sort --version-sort); do git checkout ${b#origin/}; done
git remote remove origin # drop some unrelated branches
git gc --prune=now # drop any commits after $REV
git -c 'pack.threads=1' bundle create gnulib.bundle --all
V=$(env TZ=UTC0 git show -s --date=format:%Y%m%d --pretty=%cd master)
mv gnulib.bundle gnulib-$V.bundle
build-aux/gnupload --to ftp.gnu.org:gnulib gnulib-$V.bundle

At the time it felt more important to publish something than to reach for perfection, so we did so using the above snippet. Afterwards I reached out to the git community on this and there were good discussion about my challenge.

At the end of that thread you see that I was finally able to reproduce a bit-by-bit identical bundles from two different clones, by using an intermediate git -c pack.threads=1 repack -adF step. I now assume that the unpredictable data I got earlier was introduced during the ‘git clone’ steps, compressing the pack differently each time due to threaded compression. The outcome could also depend on what content the server provided, so if someone ran git gc, git repack on the server side things would change for the user, even if the user forced threading to 1 during cloning — more experiments on what kind of server-side alterations results in client-side differences would be good research.

A couple of months passed and it is now time to publish another gnulib bundle – somewhat paired to the bi-yearly stable gnulib branches – so let’s walk through the commands and explain what they do. First clone the repository:

REV=225973a89f50c2b494ad947399425182dd42618c   # master branch commit to package
S1REV=475dd38289d33270d0080085084bf687ad77c74d # stable-202501 branch commit
S2REV=e8cc0791e6bb0814cf4e88395c06d5e06655d8b5 # stable-202507 branch commit
git clone https://git.savannah.gnu.org/git/gnulib.git
cd gnulib
git fsck # attempt to validate input

I believe the git fsck will validate that the chain of SHA1 commits are linked together, preventing someone from smuggling in unrelated commits earlier in the history without having to do SHA1 collision. SHA1 collisions are economically feasible today, so this isn’t much of a guarantee of anything though.

git checkout -B master $REV # put $REV at master
# Add all stable-* branches locally:
for b in $(git branch -r | grep origin/stable- | sort --version-sort); do git checkout ${b#origin/}; done
git checkout -B stable-202501 $S1REV
git checkout -B stable-202507 $S2REV
git remote remove origin # drop some unrelated branches
git gc --prune=now # drop any unrelated commits, not clear this helps

This establish a set of branches pinned to particular commits. The older stable-* branches are no longer updated, so they shouldn’t be moving targets. In case they are modified in the future, the particular commit we used will be found in the official git bundle.

time git -c pack.threads=1 repack -adF

That’s the new magic command to repack and recompress things in a hopefully more predictable way. This leads to a 72MB git pack under .git/objects/pack/ and a 62MB git bundle. The runtime on my laptop is around 5 minutes.

I experimented with -c pack.compression=1 and -c pack.compression=9 but the size was roughly the same; 76MB and 66MB for level 1 and 72MB and 62MB for level 9. Runtime still around 5 minutes.

Git uses zlib by default, which isn’t the most optimal compression around. I tried -c pack.compression=0 and got a 163MB git pack and a 153MB git bundle. The runtime is still around 5 minutes, indicating that compression is not the bottleneck for the git repack command.

That 153MB uncompressed git bundle compresses to 48MB with gzip default settings and 46MB with gzip -9; to 39MB with zst defaults and 34MB with zst -9; and to 28MB using xz defaults with a small 26MB using xz -9.

Still the inconvenience of having to uncompress a 30-40MB archive into
the much larger 153MB is probably not worth the savings compared to
shipping and using the (still relatively modest) 62MB git bundle.

Now finally prepare the bundle and ship it:

git -c 'pack.threads=1' bundle create gnulib.bundle --all
V=$(env TZ=UTC0 git show -s --date=format:%Y%m%d --pretty=%cd master)
mv gnulib.bundle gnulib-$V.bundle
build-aux/gnupload --to ftp.gnu.org:gnulib gnulib-$V.bundle

Yay! Another gnulib git bundle snapshot is available from
https://ftp.gnu.org/gnu/gnulib/.

The essential part of the git repack command is the -F parameter. In the thread -f was suggested, which translates into the git pack-objects --no-reuse-delta parameter:

--no-reuse-delta

When creating a packed archive in a repository that has existing packs, the command reuses existing deltas. This sometimes results in a slightly suboptimal pack. This flag tells the command not to reuse existing deltas but compute them from scratch.

When reading the man page, I though that using -F which translates into --no-reuse-object would be slightly stronger:

--no-reuse-object

This flag tells the command not to reuse existing object data at all, including non deltified object, forcing recompression of everything. This implies --no-reuse-delta. Useful only in the obscure case where wholesale enforcement of a different compression level on the packed data is desired.

On the surface, without --no-reuse-objects, some amount of earlier compression could taint the final result. Still, I was able to get bit-by-bit identical bundles by using -f so possibly reaching for -F is not necessary.

All the commands were done using git 2.51.0 as packaged by Guix. I fear the result may be different with other git versions and/or zlib libraries. I was able to reproduce the same bundle on a Trisquel 12 aramo (derived from Ubuntu 22.04) machine, which uses git 2.34.1. This suggests there is some chances of this being possible to reproduce in 20 years time. Time will tell.

I also fear these commands may be insufficient if something is moving on the server-side of the git repository of gnulib (even just something simple as a new commit), I tried to make some experiments with this but let’s aim for incremental progress here. At least I have now been able to reproduce the same bundle on different machines, which wasn’t the case last time.

Happy Reproducible Git Bundle Hacking!