Reproducible and minimal source-only tarballs

With the release of Libntlm version 1.8 the release tarball can be reproduced on several distributions. We also publish a signed minimal source-only tarball, produced by git-archive which is the same format used by Savannah, Codeberg, GitLab, GitHub and others. Reproducibility of both tarballs are tested continuously for regressions on GitLab through a CI/CD pipeline. If that wasn’t enough to excite you, the Debian packages of Libntlm are now built from the reproducible minimal source-only tarball. The resulting binaries are reproducible on several architectures.

What does that even mean? Why should you care? How you can do the same for your project? What are the open issues? Read on, dear reader…

This article describes my practical experiments with reproducible release artifacts, following up on my earlier thoughts that lead to discussion on Fosstodon and a patch by Janneke Nieuwenhuizen to make Guix tarballs reproducible that inspired me to some practical work.

Let’s look at how a maintainer release some software, and how a user can reproduce the released artifacts from the source code. Libntlm provides a shared library written in C and uses GNU Make, GNU Autoconf, GNU Automake, GNU Libtool and gnulib for build management, but these ideas should apply to most project and build system. The following illustrate the steps a maintainer would take to prepare a release:

git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make distcheck
gpg -b libntlm-1.8.tar.gz

The generated files libntlm-1.8.tar.gz and libntlm-1.8.tar.gz.sig are published, and users download and use them. This is how the GNU project have been doing releases since the late 1980’s. That is a testament to how successful this pattern has been! These tarballs contain source code and some generated files, typically shell scripts generated by autoconf, makefile templates generated by automake, documentation in formats like Info, HTML, or PDF. Rarely do they contain binary object code, but historically that happened.

The XZUtils incident illustrate that tarballs with files that are not included in the git archive offer an opportunity to disguise malicious backdoors. I blogged earlier how to mitigate this risk by using signed minimal source-only tarballs.

The risk of hiding malware is not the only motivation to publish signed minimal source-only tarballs. With pre-generated content in tarballs, there is a risk that GNU/Linux distributions such as Trisquel, Guix, Debian/Ubuntu or Fedora ship generated files coming from the tarball into the binary *.deb or *.rpm package file. Typically the person packaging the upstream project never realized that some installed artifacts was not re-built through a typical autoconf -fi && ./configure && make install sequence, and never wrote the code to rebuild everything. This can also happen if the build rules are written but are buggy, shipping the old artifact. When a security problem is found, this can lead to time-consuming situations, as it may be that patching the relevant source code and rebuilding the package is not sufficient: the vulnerable generated object from the tarball would be shipped into the binary package instead of a rebuilt artifact. For architecture-specific binaries this rarely happens, since object code is usually not included in tarballs — although for 10+ years I shipped the binary Java JAR file in the GNU Libidn release tarball, until I stopped shipping it. For interpreted languages and especially for generated content such as HTML, PDF, shell scripts this happens more than you would like.

Publishing minimal source-only tarballs enable easier auditing of a project’s code, to avoid the need to read through all generated files looking for malicious content. I have taken care to generate the source-only minimal tarball using git-archive. This is the same format that GitLab, GitHub etc offer for the automated download links on git tags. The minimal source-only tarballs can thus serve as a way to audit GitLab and GitHub download material! Consider if/when hosting sites like GitLab or GitHub has a security incident that cause generated tarballs to include a backdoor that is not present in the git repository. If people rely on the tag download artifact without verifying the maintainer PGP signature using GnuPG, this can lead to similar backdoor scenarios that we had for XZUtils but originated with the hosting provider instead of the release manager. This is even more concerning, since this attack can be mounted for some selected IP address that you want to target and not on everyone, thereby making it harder to discover.

With all that discussion and rationale out of the way, let’s return to the release process. I have added another step here:

make srcdist
gpg -b libntlm-1.8-src.tar.gz

Now the release is ready. I publish these four files in the Libntlm’s Savannah Download area, but they can be uploaded to a GitLab/GitHub release area as well. These are the SHA256 checksums I got after building the tarballs on my Trisquel 11 aramo laptop:

91de864224913b9493c7a6cec2890e6eded3610d34c3d983132823de348ec2ca  libntlm-1.8-src.tar.gz
ce6569a47a21173ba69c990965f73eb82d9a093eb871f935ab64ee13df47fda1  libntlm-1.8.tar.gz

So how can you reproduce my artifacts? Here is how to reproduce them in a Ubuntu 22.04 container:

podman run -it --rm ubuntu:22.04
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make dist srcdist
sha256sum libntlm-*.tar.gz

You should see the exact same SHA256 checksum values. Hooray!

This works because Trisquel 11 and Ubuntu 22.04 uses the same version of git, autoconf, automake, and libtool. These tools do not guarantee the same output content for all versions, similar to how GNU GCC does not generate the same binary output for all versions. So there is still some delicate version pairing needed.

Ideally, the artifacts should be possible to reproduce from the release artifacts themselves, and not only directly from git. It is possible to reproduce the full tarball in a AlmaLinux 8 container – replace almalinux:8 with rockylinux:8 if you prefer RockyLinux:

podman run -it --rm almalinux:8
dnf update -y
dnf install -y make wget gcc
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8.tar.gz
tar xfa libntlm-1.8.tar.gz
cd libntlm-1.8
./configure
make dist
sha256sum libntlm-1.8.tar.gz

The source-only minimal tarball can be regenerated on Debian 11:

podman run -it --rm debian:11
apt-get update
apt-get install -y --no-install-recommends make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
make -f cfg.mk srcdist
sha256sum libntlm-1.8-src.tar.gz 

As the Magnus Opus or chef-d’œuvre, let’s recreate the full tarball directly from the minimal source-only tarball on Trisquel 11 – replace docker.io/kpengboy/trisquel:11.0 with ubuntu:22.04 if you prefer.

podman run -it --rm docker.io/kpengboy/trisquel:11.0
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make wget git ca-certificates
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8-src.tar.gz
tar xfa libntlm-1.8-src.tar.gz
cd libntlm-v1.8
./bootstrap
./configure
make dist
sha256sum libntlm-1.8.tar.gz

Yay! You should now have great confidence in that the release artifacts correspond to what’s in version control and also to what the maintainer intended to release. Your remaining job is to audit the source code for vulnerabilities, including the source code of the dependencies used in the build. You no longer have to worry about auditing the release artifacts.

I find it somewhat amusing that the build infrastructure for Libntlm is now in a significantly better place than the code itself. Libntlm is written in old C style with plenty of string manipulation and uses broken cryptographic algorithms such as MD4 and single-DES. Remember folks: solving supply chain security issues has no bearing on what kind of code you eventually run. A clean gun can still shoot you in the foot.

Side note on naming: GitLab exports tarballs with pathnames libntlm-v1.8/ (i.e.., PROJECT-TAG/) and I’ve adopted the same pathnames, which means my libntlm-1.8-src.tar.gz tarballs are bit-by-bit identical to GitLab’s exports and you can verify this with tools like diffoscope. GitLab name the tarball libntlm-v1.8.tar.gz (i.e., PROJECT-TAG.ARCHIVE) which I find too similar to the libntlm-1.8.tar.gz that we also publish. GitHub uses the same git archive style, but unfortunately they have logic that removes the ‘v’ in the pathname so you will get a tarball with pathname libntlm-1.8/ instead of libntlm-v1.8/ that GitLab and I use. The content of the tarball is bit-by-bit identical, but the pathname and archive differs. Codeberg (running Forgejo) uses another approach: the tarball is called libntlm-v1.8.tar.gz (after the tag) just like GitLab, but the pathname inside the archive is libntlm/, otherwise the produced archive is bit-by-bit identical including timestamps. Savannah’s CGIT interface uses archive name libntlm-1.8.tar.gz with pathname libntlm-1.8/, but otherwise file content is identical. Savannah’s GitWeb interface provides snapshot links that are named after the git commit (e.g., libntlm-a812c2ca.tar.gz with libntlm-a812c2ca/) and I cannot find any tag-based download links at all. Overall, we are so close to get SHA256 checksum to match, but fail on pathname within the archive. I’ve chosen to be compatible with GitLab regarding the content of tarballs but not on archive naming. From a simplicity point of view, it would be nice if everyone used PROJECT-TAG.ARCHIVE for the archive filename and PROJECT-TAG/ for the pathname within the archive. This aspect will probably need more discussion.

Side note on git archive output: It seems different versions of git archive produce different results for the same repository. The version of git in Debian 11, Trisquel 11 and Ubuntu 22.04 behave the same. The version of git in Debian 12, AlmaLinux/RockyLinux 8/9, Alpine, ArchLinux, macOS homebrew, and upcoming Ubuntu 24.04 behave in another way. Hopefully this will not change that often, but this would invalidate reproducibility of these tarballs in the future, forcing you to use an old git release to reproduce the source-only tarball. Alas, GitLab and most other sites appears to be using modern git so the download tarballs from them would not match my tarballs – even though the content would.

Side note on ChangeLog: ChangeLog files were traditionally manually curated files with version history for a package. In recent years, several projects moved to dynamically generate them from git history (using tools like git2cl or gitlog-to-changelog). This has consequences for reproducibility of tarballs: you need to have the entire git history available! The gitlog-to-changelog tool also output different outputs depending on the time zone of the person using it, which arguable is a simple bug that can be fixed. However this entire approach is incompatible with rebuilding the full tarball from the minimal source-only tarball. It seems Libntlm’s ChangeLog file died on the surgery table here.

So how would a distribution build these minimal source-only tarballs? I happen to help on the libntlm package in Debian. It has historically used the generated tarballs as the source code to build from. This means that code coming from gnulib is vendored in the tarball. When a security problem is discovered in gnulib code, the security team needs to patch all packages that include that vendored code and rebuild them, instead of merely patching the gnulib package and rebuild all packages that rely on that particular code. To change this, the Debian libntlm package needs to Build-Depends on Debian’s gnulib package. But there was one problem: similar to most projects that use gnulib, Libntlm depend on a particular git commit of gnulib, and Debian only ship one commit. There is no coordination about which commit to use. I have adopted gnulib in Debian, and add a git bundle to the *_all.deb binary package so that projects that rely on gnulib can pick whatever commit they need. This allow an no-network GNULIB_URL and GNULIB_REVISION approach when running Libntlm’s ./bootstrap with the Debian gnulib package installed. Otherwise libntlm would pick up whatever latest version of gnulib that Debian happened to have in the gnulib package, which is not what the Libntlm maintainer intended to be used, and can lead to all sorts of version mismatches (and consequently security problems) over time. Libntlm in Debian is developed and tested on Salsa and there is continuous integration testing of it as well, thanks to the Salsa CI team.

Side note on git bundles: unfortunately there appears to be no reproducible way to export a git repository into one or more files. So one unfortunate consequence of all this work is that the gnulib *.orig.tar.gz tarball in Debian is not reproducible any more. I have tried to get Git bundles to be reproducible but I never got it to work — see my notes in gnulib’s debian/README.source on this aspect. Of course, source tarball reproducibility has nothing to do with binary reproducibility of gnulib in Debian itself, fortunately.

One open question is how to deal with the increased build dependencies that is triggered by this approach. Some people are surprised by this but I don’t see how to get around it: if you depend on source code for tools in another package to build your package, it is a bad idea to hide that dependency. We’ve done it for a long time through vendored code in non-minimal tarballs. Libntlm isn’t the most critical project from a bootstrapping perspective, so adding git and gnulib as Build-Depends to it will probably be fine. However, consider if this pattern was used for other packages that uses gnulib such as coreutils, gzip, tar, bison etc (all are using gnulib) then they would all Build-Depends on git and gnulib. Cross-building those packages for a new architecture will therefor require git on that architecture first, which gets circular quick. The dependency on gnulib is real so I don’t see that going away, and gnulib is a Architecture:all package. However, the dependency on git is merely a consequence of how the Debian gnulib package chose to make all gnulib git commits available to projects: through a git bundle. There are other ways to do this that doesn’t require the git tool to extract the necessary files, but none that I found practical — ideas welcome!

Finally some brief notes on how this was implemented. Enabling bootstrappable source-only minimal tarballs via gnulib’s ./bootstrap is achieved by using the GNULIB_REVISION mechanism, locking down the gnulib commit used. I have always disliked git submodules because they add extra steps and has complicated interaction with CI/CD. The reason why I gave up git submodules now is because the particular commit to use is not recorded in the git archive output when git submodules is used. So the particular gnulib commit has to be mentioned explicitly in some source code that goes into the git archive tarball. Colin Watson added the GNULIB_REVISION approach to ./bootstrap back in 2018, and now it no longer made sense to continue to use a gnulib git submodule. One alternative is to use ./bootstrap with --gnulib-srcdir or --gnulib-refdir if there is some practical problem with the GNULIB_URL towards a git bundle the GNULIB_REVISION in bootstrap.conf.

The srcdist make rule is simple:

git archive --prefix=libntlm-v1.8/ -o libntlm-v1.8.tar.gz HEAD

Making the make dist generated tarball reproducible can be more complicated, however for Libntlm it was sufficient to make sure the modification times of all files were set deterministically to the timestamp of the last commit in the git repository. Interestingly there seems to be a couple of different ways to accomplish this, Guix doesn’t support minimal source-only tarballs but rely on a .tarball-timestamp file inside the tarball. Paul Eggert explained what TZDB is using some time ago. The approach I’m using now is fairly similar to the one I suggested over a year ago. If there are problems because all files in the tarball now use the same modification time, there is a solution by Bruno Haible that could be implemented.

Side note on git tags: Some people may wonder why not verify a signed git tag instead of verifying a signed tarball of the git archive. Currently most git repositories uses SHA-1 for git commit identities, but SHA-1 is not a secure hash function. While current SHA-1 attacks can be detected and mitigated, there are fundamental doubts that a git SHA-1 commit identity uniquely refers to the same content that was intended. Verifying a git tag will never offer the same assurance, since a git tag can be moved or re-signed at any time. Verifying a git commit is better but then we need to trust SHA-1. Migrating git to SHA-256 would resolve this aspect, but most hosting sites such as GitLab and GitHub does not support this yet. There are other advantages to using signed tarballs instead of signed git commits or git tags as well, e.g., tar.gz can be a deterministically reproducible persistent stable offline storage format but .git sub-directory trees or git bundles do not offer this property.

Doing continous testing of all this is critical to make sure things don’t regress. Libntlm’s pipeline definition now produce the generated libntlm-*.tar.gz tarballs and a checksum as a build artifact. Then I added the 000-reproducability job which compares the checksums and fails on mismatches. You can read its delicate output in the job for the v1.8 release. Right now we insists that builds on Trisquel 11 match Ubuntu 22.04, that PureOS 10 builds match Debian 11 builds, that AlmaLinux 8 builds match RockyLinux 8 builds, and AlmaLinux 9 builds match RockyLinux 9 builds. As you can see in pipeline job output, not all platforms lead to the same tarballs, but hopefully this state can be improved over time. There is also partial reproducibility, where the full tarball is reproducible across two distributions but not the minimal tarball, or vice versa.

If this way of working plays out well, I hope to implement it in other projects too.

What do you think? Happy Hacking!

Towards reproducible minimal source code tarballs? On *-src.tar.gz

While the work to analyze the xz backdoor is in progress, several ideas have been suggested to improve the software supply chain ecosystem. Some of those ideas are good, some of the ideas are at best irrelevant and harmless, and some suggestions are plain bad. I’d like to attempt to formalize two ideas, which have been discussed before, but the context in which they can be appreciated have not been as clear as it is today.

  1. Reproducible tarballs. The idea is that published source tarballs should be possible to reproduce independently somehow, and that this should be continuously tested and verified — preferrably as part of the upstream project continuous integration system (e.g., GitHub action or GitLab pipeline). While nominally this looks easy to achieve, there are some complex matters in this, for example: what timestamps to use for files in the tarball? I’ve brought up this aspect before.
  2. Minimal source tarballs without generated vendor files. Most GNU Autoconf/Automake-based tarballs pre-generated files which are important for bootstrapping on exotic systems that does not have the required dependencies. For the bootstrapping story to succeed, this approach is important to support. However it has become clear that this practice raise significant costs and risks. Most modern GNU/Linux distributions have all the required dependencies and actually prefers to re-build everything from source code. These pre-generated extra files introduce uncertainty to that process.

My strawman proposal to improve things is to define new tarball format *-src.tar.gz with at least the following properties:

  1. The tarball should allow users to build the project, which is the entire purpose of all this. This means that at least all source code for the project has to be included.
  2. The tarballs should be signed, for example with PGP or minisign.
  3. The tarball should be possible to reproduce bit-by-bit by a third party using upstream’s version controlled sources and a pointer to which revision was used (e.g., git tag or git commit).
  4. The tarball should not require an Internet connection to download things.
    • Corollary: every external dependency either has to be explicitly documented as such (e.g., gcc and GnuTLS), or included in the tarball.
    • Observation: This means including all *.po gettext translations which are normally downloaded when building from version controlled sources.
  5. The tarball should contain everything required to build the project from source using as much externally released versioned tooling as possible. This is the “minimal” property lacking today.
    • Corollary: This means including a vendored copy of OpenSSL or libz is not acceptable: link to them as external projects.
    • Open question: How about non-released external tooling such as gnulib or autoconf archive macros? This is a bit more delicate: most distributions either just package one current version of gnulib or autoconf archive, not previous versions. While this could change, and distributions could package the gnulib git repository (up to some current version) and the autoconf archive git repository — and packages were set up to extract the version they need (gnulib’s ./bootstrap already supports this via the –gnulib-refdir parameter), this is not normally in place.
    • Suggested Corollary: The tarball should contain content from git submodule’s such as gnulib and the necessary Autoconf archive M4 macros required by the project.
  6. Similar to how the GNU project specify the ./configure interface we need a documented interface for how to bootstrap the project. I suggest to use the already well established idiom of running ./bootstrap to set up the package to later be able to be built via ./configure. Of course, some projects are not using the autotool ./configure interface and will not follow this aspect either, but like most build systems that compete with autotools have instructions on how to build the project, they should document similar interfaces for bootstrapping the source tarball to allow building.

If tarballs that achieve the above goals were available from popular upstream projects, distributions could more easily use them instead of current tarballs that include pre-generated content. The advantage would be that the build process is not tainted by “unnecessary” files. We need to develop tools for maintainers to create these tarballs, similar to make dist that generate today’s foo-1.2.3.tar.gz files.

I think one common argument against this approach will be: Why bother with all that, and just use git-archive outputs? Or avoid the entire tarball approach and move directly towards version controlled check outs and referring to upstream releases as git URL and commit tag or id. One problem with this is that SHA-1 is broken, so placing trust in a SHA-1 identifier is simply not secure. Another counter-argument is that this optimize for packagers’ benefits at the cost of upstream maintainers: most upstream maintainers do not want to store gettext *.po translations in their source code repository. A compromise between the needs of maintainers and packagers is useful, so this *-src.tar.gz tarball approach is the indirection we need to solve that. Update: In my experiment with source-only tarballs for Libntlm I actually did use git-archive output.

What do you think?

A Security Device Threat Model: The Substitution Attack

I’d like to describe and discuss a threat model for computational devices. This is generic but we will narrow it down to security-related devices. For example, portable hardware dongles used for OpenPGP/OpenSSH keys, FIDO/U2F, OATH HOTP/TOTP, PIV, payment cards, wallets etc and more permanently attached devices like a Hardware Security Module (HSM), a TPM-chip, or the hybrid variant of a mostly permanently-inserted but removable hardware security dongles.

Our context is cryptographic hardware engineering, and the purpose of the threat model is to serve as as a thought experiment for how to build and design security devices that offer better protection. The threat model is related to the Evil maid attack.

Our focus is to improve security for the end-user rather than the traditional focus to improve security for the organization that provides the token to the end-user, or to improve security for the site that the end-user is authenticating to. This is a critical but often under-appreciated distinction, and leads to surprising recommendations related to onboard key generation, randomness etc below.

The Substitution Attack

An attacker is able to substitute any component of the device (hardware or software) at any time for any period of time.

Your takeaway should be that devices should be designed to mitigate harmful consequences if any component of the device (hardware or software) is substituted for a malicious component for some period of time, at any time, during the lifespan of that component. Some designs protect better against this attack than other designs, and the threat model can be used to understand which designs are really bad, and which are less so.

Terminology

The threat model involves at least one device that is well-behaving and one that is not, and we call these Good Device and Bad Device respectively. The bad device may be the same physical device as the good key, but with some minor software modification or a minor component replaced, but could also be a completely separate physical device. We don’t care about that distinction, we just care if a particular device has a malicious component in it or not. I’ll use terms like “security device”, “device”, “hardware key”, “security co-processor” etc interchangeably.

From an engineering point of view, “malicious” here includes “unintentional behavior” such as software or hardware bugs. It is not possible to differentiate an intentionally malicious device from a well-designed device with a critical bug.

Don’t attribute to malice what can be adequately explained by stupidity, but don’t naïvely attribute to stupidity what may be deniable malicious.

What is “some period of time”?

“Some period of time” can be any length of time: seconds, minutes, days, weeks, etc.

It may also occur at any time: During manufacturing, during transportation to the user, after first usage by the user, or after a couple of months usage by the user. Note that we intentionally consider time-of-manufacturing as a vulnerable phase.

Even further, the substitution may occur multiple times. So the Good Key may be replaced with a Bad Key by the attacker for one day, then returned, and later this repeats a month later.

What is “harmful consequences”?

Since a security key has a fairly well-confined scope and purpose, we can get a fairly good exhaustive list of things that could go wrong. Harmful consequences include:

  • Attacker learns any secret keys stored on a Good Key.
  • Attacker causes user to trust a public generated by a Bad Key.
  • Attacker is able to sign something using a Good Key.
  • Attacker learns the PIN code used to unlock a Good Key.
  • Attacker learns data that is decrypted by a Good Key.

Thin vs Deep solutions

One approach to mitigate many issues arising from device substitution is to have the host (or remote site) require that the device prove that it is the intended unique device before it continues to talk to it. This require an authentication/authorization protocol, which usually involves unique device identity and out-of-band trust anchors. Such trust anchors is often problematic, since a common use-case for security device is to connect it to a host that has never seen the device before.

A weaker approach is to have the device prove that it merely belongs to a class of genuine devices from a trusted manufacturer, usually by providing a signature generated by a device-specific private key signed by the device manufacturer. This is weaker since then the user cannot differentiate two different good devices.

In both cases, the host (or remote site) would stop talking to the device if it cannot prove that it is the intended key, or at least belongs to a class of known trusted genuine devices.

Upon scrutiny, this “solution” is still vulnerable to a substitution attack, just earlier in the manufacturing chain: how can the process that injects the per-device or per-class identities/secrets know that it is putting them into a good key rather than a malicious device? Consider also the consequences if the cryptographic keys that guarantee that a device is genuine leaks.

The model of the “thin solution” is similar to the old approach to network firewalls: have a filtering firewall that only lets through “intended” traffic, and then run completely insecure protocols internally such as telnet.

The networking world has evolved, and now we have defense in depth: even within strongly firewall’ed networks, it is prudent to run for example SSH with publickey-based user authentication even on locally physical trusted networks. This approach requires more thought and adds complexity, since each level has to provide some security checking.

I’m arguing we need similar defense-in-depth for security devices. Security key designs cannot simply dodge this problem by assuming it is working in a friendly environment where component substitution never occur.

Example: Device authentication using PIN codes

To see how this threat model can be applied to reason about security key designs, let’s consider a common design.

Many security keys uses PIN codes to unlock private key operations, for example on OpenPGP cards that lacks built-in PIN-entry functionality. The software on the computer just sends a PIN code to the device, and the device allows private-key operations if the PIN code was correct.

Let’s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious device that saves a copy of the PIN code presented to it, and then gives out error messages. Once the user has entered the PIN code and gotten an error message, presumably temporarily giving up and doing other things, the attacker replaces the device back again. The attacker has learnt the PIN code, and can later use this to perform private-key operations on the good device.

This means a good design involves not sending PIN codes in clear, but use a stronger authentication protocol that allows the card to know that the PIN was correct without learning the PIN. This is implemented optionally for many OpenPGP cards today as the key-derivation-function extension. That should be mandatory, and users should not use setups that sends device authentication in the clear, and ultimately security devices should not even include support for that. Compare how I build Gnuk on my PGP card with the kdf_do=required option.

Example: Onboard non-predictable key-generation

Many devices offer both onboard key-generation, for example OpenPGP cards that generate a Ed25519 key internally on the devices, or externally where the device imports an externally generated cryptographic key.

Let’s apply the subsitution threat model to this design: the user wishes to generate a key and trust the public key that came out of that process. The attacker substitutes the device for a malicious device during key-generation, imports the private key into a good device and gives that back to the user. Most of the time except during key generation the user uses a good device but still the attacker succeeded in having the user trust a public key which the attacker knows the private key for. The substitution may be a software modification, and the method to leak the private key to the attacker may be out-of-band signalling.

This means a good design never generates key on-board, but imports them from a user-controllable environment. That approach should be mandatory, and users should not use setups that generates private keys on-board, and ultimately security devices should not even include support for that.

Example: Non-predictable randomness-generation

Many devices claims to generate random data, often with elaborate design documents explaining how good the randomness is.

Let’s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious design that generates (for the attacker) predictable randomness. The user will never be able to detect the difference since the random output is, well, random, and typically not distinguishable from weak randomness. The user cannot know if any cryptographic keys generated by a generator was faulty or not.

This means a good design never generates non-predictable randomness on the device. That approach should be mandatory, and users should not use setups that generates non-predictable randomness on the device, and ideally devices should not have this functionality.

Case-Study: Tillitis

I have warmed up a bit for this. Tillitis is a new security device with interesting properties, and core to its operation is the Compound Device Identifier (CDI), essentially your Ed25519 private key (used for SSH etc) is derived from the CDI that is computed like this:

cdi = blake2s(UDS, blake2s(device_app), USS)

Let’s apply the substitution threat model to this design: Consider someone replacing the Tillitis key with a malicious key during postal delivery of the key to the user, and the replacement device is identical with the real Tillitis key but implements the following key derivation function:

cdi = weakprng(UDS’, weakprng(device_app), USS)

Where weakprng is a compromised algorithm that is predictable for the attacker, but still appears random. Everything will work correctly, but the attacker will be able to learn the secrets used by the user, and the user will typically not be able to tell the difference since the CDI is secret and the Ed25519 public key is not self-verifiable.

Conclusion

Remember that it is impossible to fully protect against this attack, that’s why it is merely a thought experiment, intended to be used during design of these devices. Consider an attacker that never gives you access to a good key and as a user you only ever use a malicious device. There is no way to have good security in this situation. This is not hypothetical, many well-funded organizations do what they can to derive people from having access to trustworthy security devices. Philosophically it does not seem possible to tell if these organizations have succeeded 100% already and there are only bad security devices around where further resistance is futile, but to end on an optimistic note let’s assume that there is a non-negligible chance that they haven’t succeeded. In these situations, this threat model becomes useful to improve the situation by identifying less good designs, and that’s why the design mantra of “mitigate harmful consequences” is crucial as a takeaway from this. Let’s improve the design of security devices that further the security of its users!

OpenPGP master key on Nitrokey Start

I’ve used hardware-backed OpenPGP keys since 2006 when I imported newly generated rsa1024 subkeys to a FSFE Fellowship card. This worked well for several years, and I recall buying more ZeitControl cards for multi-machine usage and backup purposes. As a side note, I recall being unsatisfied with the weak 1024-bit RSA subkeys at the time – my primary key was a somewhat stronger 1280-bit RSA key created back in 2002 — but OpenPGP cards at the time didn’t support more than 1024 bit RSA, and were (and still often are) also limited to power-of-two RSA key sizes which I dislike.

I had my master key on disk with a strong password for a while, mostly to refresh expiration time of the subkeys and to sign other’s OpenPGP keys. At some point I stopped carrying around encrypted copies of my master key. That was my main setup when I migrated to a new stronger RSA 3744 bit key with rsa2048 subkeys on a YubiKey NEO back in 2014. At that point, signing other’s OpenPGP keys was a rare enough occurrence that I settled with bringing out my offline machine to perform this operation, transferring the public key to sign on USB sticks. In 2019 I re-evaluated my OpenPGP setup and ended up creating a offline Ed25519 key with subkeys on a FST-01G running Gnuk. My approach for signing other’s OpenPGP keys were still to bring out my offline machine and sign things using the master secret using USB sticks for storage and transport. Which meant I almost never did that, because it took too much effort. So my 2019-era Ed25519 key still only has a handful of signatures on it, since I had essentially stopped signing other’s keys which is the traditional way of getting signatures in return.

None of this caused any critical problem for me because I continued to use my old 2014-era RSA3744 key in parallel with my new 2019-era Ed25519 key, since too many systems didn’t handle Ed25519. However, during 2022 this changed, and the only remaining environment that I still used my RSA3744 key for was in Debian — and they require OpenPGP signatures on the new key to allow it to replace an older key. I was in denial about this sub-optimal solution during 2022 and endured its practical consequences, having to use the YubiKey NEO (which I had replaced with a permanently inserted YubiKey Nano at some point) for Debian-related purposes alone.

In December 2022 I bought a new laptop and setup a FST-01SZ with my Ed25519 key, and while I have taken a vacation from Debian, I continue to extend the expiration period on the old RSA3744-key in case I will ever have to use it again, so the overall OpenPGP setup was still sub-optimal. Having two valid OpenPGP keys at the same time causes people to use both for email encryption (leading me to have to use both devices), and the WKD Key Discovery protocol doesn’t like two valid keys either. At FOSDEM’23 I ran into Andre Heinecke at GnuPG and I couldn’t help complain about how complex and unsatisfying all OpenPGP-related matters were, and he mildly ignored my rant and asked why I didn’t put the master key on another smartcard. The comment sunk in when I came home, and recently I connected all the dots and this post is a summary of what I did to move my offline OpenPGP master key to a Nitrokey Start.

First a word about device choice, I still prefer to use hardware devices that are as compatible with free software as possible, but the FST-01G or FST-01SZ are no longer easily available for purchase. I got a comment about Nitrokey start in my last post, and had two of them available to experiment with. There are things to dislike with the Nitrokey Start compared to the YubiKey (e.g., relative insecure chip architecture, the bulkier form factor and lack of FIDO/U2F/OATH support) but – as far as I know – there is no more widely available owner-controlled device that is manufactured for an intended purpose of implementing an OpenPGP card. Thus it hits the sweet spot for me.

Nitrokey Start

The first step is to run latest firmware on the Nitrokey Start – for bug-fixes and important OpenSSH 9.0 compatibility – and there are reproducible-built firmware published that you can install using pynitrokey. I run Trisquel 11 aramo on my laptop, which does not include the Python Pip package (likely because it promotes installing non-free software) so that was a slight complication. Building the firmware locally may have worked, and I would like to do that eventually to confirm the published firmware, however to save time I settled with installing the Ubuntu 22.04 packages on my machine:

$ sha256sum python3-pip*
ded6b3867a4a4cbaff0940cab366975d6aeecc76b9f2d2efa3deceb062668b1c  python3-pip_22.0.2+dfsg-1ubuntu0.2_all.deb
e1561575130c41dc3309023a345de337e84b4b04c21c74db57f599e267114325  python3-pip-whl_22.0.2+dfsg-1ubuntu0.2_all.deb
$ doas dpkg -i python3-pip*
...
$ doas apt install -f
...
$

Installing pynitrokey downloaded a bunch of dependencies, and it would be nice to audit the license and security vulnerabilities for each of them. (Verbose output below slightly redacted.)

jas@kaka:~$ pip3 install --user pynitrokey
Collecting pynitrokey
  Downloading pynitrokey-0.4.34-py3-none-any.whl (572 kB)
Collecting frozendict~=2.3.4
  Downloading frozendict-2.3.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (113 kB)
Requirement already satisfied: click<9,>=8.0.0 in /usr/lib/python3/dist-packages (from pynitrokey) (8.0.3)
Collecting ecdsa
  Downloading ecdsa-0.18.0-py2.py3-none-any.whl (142 kB)
Collecting python-dateutil~=2.7.0
  Downloading python_dateutil-2.7.5-py2.py3-none-any.whl (225 kB)
Collecting fido2<2,>=1.1.0
  Downloading fido2-1.1.0-py3-none-any.whl (201 kB)
Collecting tlv8
  Downloading tlv8-0.10.0.tar.gz (16 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: certifi>=14.5.14 in /usr/lib/python3/dist-packages (from pynitrokey) (2020.6.20)
Requirement already satisfied: pyusb in /usr/lib/python3/dist-packages (from pynitrokey) (1.2.1.post1)
Collecting urllib3~=1.26.7
  Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
Collecting spsdk<1.8.0,>=1.7.0
  Downloading spsdk-1.7.1-py3-none-any.whl (684 kB)
Collecting typing_extensions~=4.3.0
  Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)
Requirement already satisfied: cryptography<37,>=3.4.4 in /usr/lib/python3/dist-packages (from pynitrokey) (3.4.8)
Collecting intelhex
  Downloading intelhex-2.3.0-py2.py3-none-any.whl (50 kB)
Collecting nkdfu
  Downloading nkdfu-0.2-py3-none-any.whl (16 kB)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from pynitrokey) (2.25.1)
Collecting tqdm
  Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting nrfutil<7,>=6.1.4
  Downloading nrfutil-6.1.7.tar.gz (845 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: cffi in /usr/lib/python3/dist-packages (from pynitrokey) (1.15.0)
Collecting crcmod
  Downloading crcmod-1.7.tar.gz (89 kB)
  Preparing metadata (setup.py) ... done
Collecting libusb1==1.9.3
  Downloading libusb1-1.9.3-py3-none-any.whl (60 kB)
Collecting pc_ble_driver_py>=0.16.4
  Downloading pc_ble_driver_py-0.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB)
Collecting piccata
  Downloading piccata-2.0.3-py3-none-any.whl (21 kB)
Collecting protobuf<4.0.0,>=3.17.3
  Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
Collecting pyserial
  Downloading pyserial-3.5-py2.py3-none-any.whl (90 kB)
Collecting pyspinel>=1.0.0a3
  Downloading pyspinel-1.0.3.tar.gz (58 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from nrfutil<7,>=6.1.4->pynitrokey) (5.4.1)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil~=2.7.0->pynitrokey) (1.16.0)
Collecting pylink-square<0.11.9,>=0.8.2
  Downloading pylink_square-0.11.1-py2.py3-none-any.whl (78 kB)
Collecting jinja2<3.1,>=2.11
  Downloading Jinja2-3.0.3-py3-none-any.whl (133 kB)
Collecting bincopy<17.11,>=17.10.2
  Downloading bincopy-17.10.3-py3-none-any.whl (17 kB)
Collecting fastjsonschema>=2.15.1
  Downloading fastjsonschema-2.16.3-py3-none-any.whl (23 kB)
Collecting astunparse<2,>=1.6
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting oscrypto~=1.2
  Downloading oscrypto-1.3.0-py2.py3-none-any.whl (194 kB)
Collecting deepmerge==0.3.0
  Downloading deepmerge-0.3.0-py2.py3-none-any.whl (7.6 kB)
Collecting pyocd<=0.31.0,>=0.28.3
  Downloading pyocd-0.31.0-py3-none-any.whl (12.5 MB)
Collecting click-option-group<0.6,>=0.3.0
  Downloading click_option_group-0.5.5-py3-none-any.whl (12 kB)
Collecting pycryptodome<4,>=3.9.3
  Downloading pycryptodome-3.17-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
Collecting pyocd-pemicro<1.2.0,>=1.1.1
  Downloading pyocd_pemicro-1.1.5-py3-none-any.whl (9.0 kB)
Requirement already satisfied: colorama<1,>=0.4.4 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (0.4.4)
Collecting commentjson<1,>=0.9
  Downloading commentjson-0.9.0.tar.gz (8.7 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: asn1crypto<2,>=1.2 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (1.4.0)
Collecting pypemicro<0.2.0,>=0.1.9
  Downloading pypemicro-0.1.11-py3-none-any.whl (5.7 MB)
Collecting libusbsio>=2.1.11
  Downloading libusbsio-2.1.11-py3-none-any.whl (247 kB)
Collecting sly==0.4
  Downloading sly-0.4.tar.gz (60 kB)
  Preparing metadata (setup.py) ... done
Collecting ruamel.yaml<0.18.0,>=0.17
  Downloading ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
Collecting cmsis-pack-manager<0.3.0
  Downloading cmsis_pack_manager-0.2.10-py2.py3-none-manylinux1_x86_64.whl (25.1 MB)
Collecting click-command-tree==1.1.0
  Downloading click_command_tree-1.1.0-py3-none-any.whl (3.6 kB)
Requirement already satisfied: bitstring<3.2,>=3.1 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (3.1.7)
Collecting hexdump~=3.3
  Downloading hexdump-3.3.zip (12 kB)
  Preparing metadata (setup.py) ... done
Collecting fire
  Downloading fire-0.5.0.tar.gz (88 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/lib/python3/dist-packages (from astunparse<2,>=1.6->spsdk<1.8.0,>=1.7.0->pynitrokey) (0.37.1)
Collecting humanfriendly
  Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Collecting argparse-addons>=0.4.0
  Downloading argparse_addons-0.12.0-py3-none-any.whl (3.3 kB)
Collecting pyelftools
  Downloading pyelftools-0.29-py2.py3-none-any.whl (174 kB)
Collecting milksnake>=0.1.2
  Downloading milksnake-0.1.5-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: appdirs>=1.4 in /usr/lib/python3/dist-packages (from cmsis-pack-manager<0.3.0->spsdk<1.8.0,>=1.7.0->pynitrokey) (1.4.4)
Collecting lark-parser<0.8.0,>=0.7.1
  Downloading lark-parser-0.7.8.tar.gz (276 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: MarkupSafe>=2.0 in /usr/lib/python3/dist-packages (from jinja2<3.1,>=2.11->spsdk<1.8.0,>=1.7.0->pynitrokey) (2.0.1)
Collecting asn1crypto<2,>=1.2
  Downloading asn1crypto-1.5.1-py2.py3-none-any.whl (105 kB)
Collecting wrapt
  Downloading wrapt-1.15.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (78 kB)
Collecting future
  Downloading future-0.18.3.tar.gz (840 kB)
  Preparing metadata (setup.py) ... done
Collecting psutil>=5.2.2
  Downloading psutil-5.9.4-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (280 kB)
Collecting capstone<5.0,>=4.0
  Downloading capstone-4.0.2-py2.py3-none-manylinux1_x86_64.whl (2.1 MB)
Collecting naturalsort<2.0,>=1.5
  Downloading naturalsort-1.5.1.tar.gz (7.4 kB)
  Preparing metadata (setup.py) ... done
Collecting prettytable<3.0,>=2.0
  Downloading prettytable-2.5.0-py3-none-any.whl (24 kB)
Collecting intervaltree<4.0,>=3.0.2
  Downloading intervaltree-3.1.0.tar.gz (32 kB)
  Preparing metadata (setup.py) ... done
Collecting ruamel.yaml.clib>=0.2.6
  Downloading ruamel.yaml.clib-0.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (485 kB)
Collecting termcolor
  Downloading termcolor-2.2.0-py3-none-any.whl (6.6 kB)
Collecting sortedcontainers<3.0,>=2.0
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Requirement already satisfied: wcwidth in /usr/lib/python3/dist-packages (from prettytable<3.0,>=2.0->pyocd<=0.31.0,>=0.28.3->spsdk<1.8.0,>=1.7.0->pynitrokey) (0.2.5)
Building wheels for collected packages: nrfutil, crcmod, sly, tlv8, commentjson, hexdump, pyspinel, fire, intervaltree, lark-parser, naturalsort, future
  Building wheel for nrfutil (setup.py) ... done
  Created wheel for nrfutil: filename=nrfutil-6.1.7-py3-none-any.whl size=898520 sha256=de6f8803f51d6c26d24dc7df6292064a468ff3f389d73370433fde5582b84a10
  Stored in directory: /home/jas/.cache/pip/wheels/39/2b/9b/98ab2dd716da746290e6728bdb557b14c1c9a54cb9ed86e13b
  Building wheel for crcmod (setup.py) ... done
  Created wheel for crcmod: filename=crcmod-1.7-cp310-cp310-linux_x86_64.whl size=31422 sha256=5149ac56fcbfa0606760eef5220fcedc66be560adf68cf38c604af3ad0e4a8b0
  Stored in directory: /home/jas/.cache/pip/wheels/85/4c/07/72215c529bd59d67e3dac29711d7aba1b692f543c808ba9e86
  Building wheel for sly (setup.py) ... done
  Created wheel for sly: filename=sly-0.4-py3-none-any.whl size=27352 sha256=f614e413918de45c73d1e9a8dca61ca07dc760d9740553400efc234c891f7fde
  Stored in directory: /home/jas/.cache/pip/wheels/a2/23/4a/6a84282a0d2c29f003012dc565b3126e427972e8b8157ea51f
  Building wheel for tlv8 (setup.py) ... done
  Created wheel for tlv8: filename=tlv8-0.10.0-py3-none-any.whl size=11266 sha256=3ec8b3c45977a3addbc66b7b99e1d81b146607c3a269502b9b5651900a0e2d08
  Stored in directory: /home/jas/.cache/pip/wheels/e9/35/86/66a473cc2abb0c7f21ed39c30a3b2219b16bd2cdb4b33cfc2c
  Building wheel for commentjson (setup.py) ... done
  Created wheel for commentjson: filename=commentjson-0.9.0-py3-none-any.whl size=12092 sha256=28b6413132d6d7798a18cf8c76885dc69f676ea763ffcb08775a3c2c43444f4a
  Stored in directory: /home/jas/.cache/pip/wheels/7d/90/23/6358a234ca5b4ec0866d447079b97fedf9883387d1d7d074e5
  Building wheel for hexdump (setup.py) ... done
  Created wheel for hexdump: filename=hexdump-3.3-py3-none-any.whl size=8913 sha256=79dfadd42edbc9acaeac1987464f2df4053784fff18b96408c1309b74fd09f50
  Stored in directory: /home/jas/.cache/pip/wheels/26/28/f7/f47d7ecd9ae44c4457e72c8bb617ef18ab332ee2b2a1047e87
  Building wheel for pyspinel (setup.py) ... done
  Created wheel for pyspinel: filename=pyspinel-1.0.3-py3-none-any.whl size=65033 sha256=01dc27f81f28b4830a0cf2336dc737ef309a1287fcf33f57a8a4c5bed3b5f0a6
  Stored in directory: /home/jas/.cache/pip/wheels/95/ec/4b/6e3e2ee18e7292d26a65659f75d07411a6e69158bb05507590
  Building wheel for fire (setup.py) ... done
  Created wheel for fire: filename=fire-0.5.0-py2.py3-none-any.whl size=116951 sha256=3d288585478c91a6914629eb739ea789828eb2d0267febc7c5390cb24ba153e8
  Stored in directory: /home/jas/.cache/pip/wheels/90/d4/f7/9404e5db0116bd4d43e5666eaa3e70ab53723e1e3ea40c9a95
  Building wheel for intervaltree (setup.py) ... done
  Created wheel for intervaltree: filename=intervaltree-3.1.0-py2.py3-none-any.whl size=26119 sha256=5ff1def22ba883af25c90d90ef7c6518496fcd47dd2cbc53a57ec04cd60dc21d
  Stored in directory: /home/jas/.cache/pip/wheels/fa/80/8c/43488a924a046b733b64de3fac99252674c892a4c3801c0a61
  Building wheel for lark-parser (setup.py) ... done
  Created wheel for lark-parser: filename=lark_parser-0.7.8-py2.py3-none-any.whl size=62527 sha256=3d2ec1d0f926fc2688d40777f7ef93c9986f874169132b1af590b6afc038f4be
  Stored in directory: /home/jas/.cache/pip/wheels/29/30/94/33e8b58318aa05cb1842b365843036e0280af5983abb966b83
  Building wheel for naturalsort (setup.py) ... done
  Created wheel for naturalsort: filename=naturalsort-1.5.1-py3-none-any.whl size=7526 sha256=bdecac4a49f2416924548cae6c124c85d5333e9e61c563232678ed182969d453
  Stored in directory: /home/jas/.cache/pip/wheels/a6/8e/c9/98cfa614fff2979b457fa2d9ad45ec85fa417e7e3e2e43be51
  Building wheel for future (setup.py) ... done
  Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492037 sha256=57a01e68feca2b5563f5f624141267f399082d2f05f55886f71b5d6e6cf2b02c
  Stored in directory: /home/jas/.cache/pip/wheels/5e/a9/47/f118e66afd12240e4662752cc22cefae5d97275623aa8ef57d
Successfully built nrfutil crcmod sly tlv8 commentjson hexdump pyspinel fire intervaltree lark-parser naturalsort future
Installing collected packages: tlv8, sortedcontainers, sly, pyserial, pyelftools, piccata, naturalsort, libusb1, lark-parser, intelhex, hexdump, fastjsonschema, crcmod, asn1crypto, wrapt, urllib3, typing_extensions, tqdm, termcolor, ruamel.yaml.clib, python-dateutil, pyspinel, pypemicro, pycryptodome, psutil, protobuf, prettytable, oscrypto, milksnake, libusbsio, jinja2, intervaltree, humanfriendly, future, frozendict, fido2, ecdsa, deepmerge, commentjson, click-option-group, click-command-tree, capstone, astunparse, argparse-addons, ruamel.yaml, pyocd-pemicro, pylink-square, pc_ble_driver_py, fire, cmsis-pack-manager, bincopy, pyocd, nrfutil, nkdfu, spsdk, pynitrokey
  WARNING: The script nitropy is installed in '/home/jas/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed argparse-addons-0.12.0 asn1crypto-1.5.1 astunparse-1.6.3 bincopy-17.10.3 capstone-4.0.2 click-command-tree-1.1.0 click-option-group-0.5.5 cmsis-pack-manager-0.2.10 commentjson-0.9.0 crcmod-1.7 deepmerge-0.3.0 ecdsa-0.18.0 fastjsonschema-2.16.3 fido2-1.1.0 fire-0.5.0 frozendict-2.3.5 future-0.18.3 hexdump-3.3 humanfriendly-10.0 intelhex-2.3.0 intervaltree-3.1.0 jinja2-3.0.3 lark-parser-0.7.8 libusb1-1.9.3 libusbsio-2.1.11 milksnake-0.1.5 naturalsort-1.5.1 nkdfu-0.2 nrfutil-6.1.7 oscrypto-1.3.0 pc_ble_driver_py-0.17.0 piccata-2.0.3 prettytable-2.5.0 protobuf-3.20.3 psutil-5.9.4 pycryptodome-3.17 pyelftools-0.29 pylink-square-0.11.1 pynitrokey-0.4.34 pyocd-0.31.0 pyocd-pemicro-1.1.5 pypemicro-0.1.11 pyserial-3.5 pyspinel-1.0.3 python-dateutil-2.7.5 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.7 sly-0.4 sortedcontainers-2.4.0 spsdk-1.7.1 termcolor-2.2.0 tlv8-0.10.0 tqdm-4.65.0 typing_extensions-4.3.0 urllib3-1.26.15 wrapt-1.15.0
jas@kaka:~$

Then upgrading the device worked remarkable well, although I wish that the tool would have printed URLs and checksums for the firmware files to allow easy confirmation.

jas@kaka:~$ PATH=$PATH:/home/jas/.local/bin
jas@kaka:~$ nitropy start list
Command line tool to interact with Nitrokey devices 0.4.34
:: 'Nitrokey Start' keys:
FSIJ-1.2.15-5D271572: Nitrokey Nitrokey Start (RTM.12.1-RC2-modified)
jas@kaka:~$ nitropy start update
Command line tool to interact with Nitrokey devices 0.4.34
Nitrokey Start firmware update tool
Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.35
System: Linux, is_linux: True
Python: 3.10.6
Saving run log to: /tmp/nitropy.log.gc5753a8
Admin PIN: 
Firmware data to be used:
- FirmwareType.REGNUAL: 4408, hash: ...b'72a30389' valid (from ...built/RTM.13/regnual.bin)
- FirmwareType.GNUK: 129024, hash: ...b'25a4289b' valid (from ...prebuilt/RTM.13/gnuk.bin)
Currently connected device strings:
Device: 
    Vendor: Nitrokey
   Product: Nitrokey Start
    Serial: FSIJ-1.2.15-5D271572
  Revision: RTM.12.1-RC2-modified
    Config: *:*:8e82
       Sys: 3.0
     Board: NITROKEY-START-G
initial device strings: [{'name': '', 'Vendor': 'Nitrokey', 'Product': 'Nitrokey Start', 'Serial': 'FSIJ-1.2.15-5D271572', 'Revision': 'RTM.12.1-RC2-modified', 'Config': '*:*:8e82', 'Sys': '3.0', 'Board': 'NITROKEY-START-G'}]
Please note:
- Latest firmware available is: 
  RTM.13 (published: 2022-12-08T10:59:11Z)
- provided firmware: None
- all data will be removed from the device!
- do not interrupt update process - the device may not run properly!
- the process should not take more than 1 minute
Do you want to continue? [yes/no]: yes
...
Starting bootloader upload procedure
Device: Nitrokey Start FSIJ-1.2.15-5D271572
Connected to the device
Running update!
Do NOT remove the device from the USB slot, until further notice
Downloading flash upgrade program...
Executing flash upgrade...
Waiting for device to appear:
  Wait 20 seconds.....

Downloading the program
Protecting device
Finish flashing
Resetting device
Update procedure finished. Device could be removed from USB slot.

Currently connected device strings (after upgrade):
Device: 
    Vendor: Nitrokey
   Product: Nitrokey Start
    Serial: FSIJ-1.2.19-5D271572
  Revision: RTM.13
    Config: *:*:8e82
       Sys: 3.0
     Board: NITROKEY-START-G
device can now be safely removed from the USB slot
final device strings: [{'name': '', 'Vendor': 'Nitrokey', 'Product': 'Nitrokey Start', 'Serial': 'FSIJ-1.2.19-5D271572', 'Revision': 'RTM.13', 'Config': '*:*:8e82', 'Sys': '3.0', 'Board': 'NITROKEY-START-G'}]
finishing session 2023-03-16 21:49:07.371291
Log saved to: /tmp/nitropy.log.gc5753a8
jas@kaka:~$ 

jas@kaka:~$ nitropy start list
Command line tool to interact with Nitrokey devices 0.4.34
:: 'Nitrokey Start' keys:
FSIJ-1.2.19-5D271572: Nitrokey Nitrokey Start (RTM.13)
jas@kaka:~$ 

Before importing the master key to this device, it should be configured. Note the commands in the beginning to make sure scdaemon/pcscd is not running because they may have cached state from earlier cards. Change PIN code as you like after this, my experience with Gnuk was that the Admin PIN had to be changed first, then you import the key, and then you change the PIN.

jas@kaka:~$ gpg-connect-agent "SCD KILLSCD" "SCD BYE" /bye
OK
ERR 67125247 Slut på fil <GPG Agent>
jas@kaka:~$ ps auxww|grep -e pcsc -e scd
jas        11651  0.0  0.0   3468  1672 pts/0    R+   21:54   0:00 grep --color=auto -e pcsc -e scd
jas@kaka:~$ gpg --card-edit

Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: [not set]
Language prefs ...: [not set]
Salutation .......: 
URL of public key : [not set]
Login data .......: [not set]
Signature PIN ....: forced
Key attributes ...: rsa2048 rsa2048 rsa2048
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: off
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]

gpg/card> admin
Admin commands are allowed

gpg/card> kdf-setup

gpg/card> passwd
gpg: OpenPGP card no. D276000124010200FFFE5D2715720000 detected

1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit

Your selection? 3
PIN changed.

1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit

Your selection? q

gpg/card> name
Cardholder's surname: Josefsson
Cardholder's given name: Simon

gpg/card> lang
Language preferences: sv

gpg/card> sex
Salutation (M = Mr., F = Ms., or space): m

gpg/card> login
Login data (account name): jas

gpg/card> url
URL to retrieve public key: https://josefsson.org/key-20190320.txt

gpg/card> forcesig

gpg/card> key-attr
Changing card key attribute for: Signature key
Please select what kind of key you want:
   (1) RSA
   (2) ECC
Your selection? 2
Please select which elliptic curve you want:
   (1) Curve 25519
   (4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
Note: There is no guarantee that the card supports the requested size.
      If the key generation does not succeed, please check the
      documentation of your card to see what sizes are allowed.
Changing card key attribute for: Encryption key
Please select what kind of key you want:
   (1) RSA
   (2) ECC
Your selection? 2
Please select which elliptic curve you want:
   (1) Curve 25519
   (4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: cv25519
Changing card key attribute for: Authentication key
Please select what kind of key you want:
   (1) RSA
   (2) ECC
Your selection? 2
Please select which elliptic curve you want:
   (1) Curve 25519
   (4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519

gpg/card> 
jas@kaka:~$ gpg --card-edit

Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: on
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]

jas@kaka:~$ 

Once setup, bring out your offline machine and boot it and mount your USB stick with the offline key. The paths below will be different, and this is using a somewhat unorthodox approach of working with fresh GnuPG configuration paths that I chose for the USB stick.

jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ cp -a gnupghome-backup-masterkey gnupghome-import-nitrokey-5D271572
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ gpg --homedir $PWD/gnupghome-import-nitrokey-5D271572 --edit-key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
gpg (GnuPG) 2.2.27; Copyright (C) 2021 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Secret key is available.

sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
[ expired] (1). Simon Josefsson <simon@josefsson.org>

gpg> keytocard
Really move the primary key? (y/N) y
Please select where to store the key:
   (1) Signature key
   (3) Authentication key
Your selection? 1

sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
[ expired] (1). Simon Josefsson <simon@josefsson.org>

gpg> 
Save changes? (y/N) y
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ 

Don’t forget to change the PIN at this point. At this point it is useful to confirm that the Nitrokey has the master key available and that is possible to sign statements with it, back on your regular machine:

jas@kaka:~$ gpg --card-status
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 1
KDF setting ......: on
Signature key ....: B1D2 BD13 75BE CB78 4CF4  F8C4 D73C F638 C53C 06BE
      created ....: 2019-03-20 23:37:24
Encryption key....: [none]
Authentication key: [none]
General key info..: pub  ed25519/D73CF638C53C06BE 2019-03-20 Simon Josefsson <simon@josefsson.org>
sec>  ed25519/D73CF638C53C06BE  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 5D271572
ssb>  ed25519/80260EE8A9B92B2B  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 42315277
ssb>  ed25519/51722B08FE4745A2  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 42315277
ssb>  cv25519/02923D7EE76EBD60  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 42315277
jas@kaka:~$ echo foo|gpg -a --sign|gpg --verify
gpg: Signature made Thu Mar 16 22:11:02 2023 CET
gpg:                using EDDSA key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
gpg: Good signature from "Simon Josefsson <simon@josefsson.org>" [ultimate]
jas@kaka:~$ 

Finally to retrieve and sign a key, for example Andre Heinecke’s that I could confirm the OpenPGP key identifier from his business card.

jas@kaka:~$ gpg --locate-external-keys aheinecke@gnupg.com
gpg: key 1FDF723CF462B6B1: public key "Andre Heinecke <aheinecke@gnupg.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   2  signed:   7  trust: 0-, 0q, 0n, 0m, 0f, 2u
gpg: depth: 1  valid:   7  signed:  64  trust: 7-, 0q, 0n, 0m, 0f, 0u
gpg: next trustdb check due at 2023-05-26
pub   rsa3072 2015-12-08 [SC] [expires: 2025-12-05]
      94A5C9A03C2FE5CA3B095D8E1FDF723CF462B6B1
uid           [ unknown] Andre Heinecke <aheinecke@gnupg.com>
sub   ed25519 2017-02-13 [S]
sub   ed25519 2017-02-13 [A]
sub   rsa3072 2015-12-08 [E] [expires: 2025-12-05]
sub   rsa3072 2015-12-08 [A] [expires: 2025-12-05]

jas@kaka:~$ gpg --edit-key "94A5C9A03C2FE5CA3B095D8E1FDF723CF462B6B1"
gpg (GnuPG) 2.2.27; Copyright (C) 2021 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.


pub  rsa3072/1FDF723CF462B6B1
     created: 2015-12-08  expires: 2025-12-05  usage: SC  
     trust: unknown       validity: unknown
sub  ed25519/2978E9D40CBABA5C
     created: 2017-02-13  expires: never       usage: S   
sub  ed25519/DC74D901C8E2DD47
     created: 2017-02-13  expires: never       usage: A   
The following key was revoked on 2017-02-23 by RSA key 1FDF723CF462B6B1 Andre Heinecke <aheinecke@gnupg.com>
sub  cv25519/1FFE3151683260AB
     created: 2017-02-13  revoked: 2017-02-23  usage: E   
sub  rsa3072/8CC999BDAA45C71F
     created: 2015-12-08  expires: 2025-12-05  usage: E   
sub  rsa3072/6304A4B539CE444A
     created: 2015-12-08  expires: 2025-12-05  usage: A   
[ unknown] (1). Andre Heinecke <aheinecke@gnupg.com>

gpg> sign

pub  rsa3072/1FDF723CF462B6B1
     created: 2015-12-08  expires: 2025-12-05  usage: SC  
     trust: unknown       validity: unknown
 Primary key fingerprint: 94A5 C9A0 3C2F E5CA 3B09  5D8E 1FDF 723C F462 B6B1

     Andre Heinecke <aheinecke@gnupg.com>

This key is due to expire on 2025-12-05.
Are you sure that you want to sign this key with your
key "Simon Josefsson <simon@josefsson.org>" (D73CF638C53C06BE)

Really sign? (y/N) y

gpg> quit
Save changes? (y/N) y
jas@kaka:~$ 

This is on my day-to-day machine, using the NitroKey Start with the offline key. No need to boot the old offline machine just to sign keys or extend expiry anymore! At FOSDEM’23 I managed to get at least one DD signature on my new key, and the Debian keyring maintainers accepted my Ed25519 key. Hopefully I can now finally let my 2014-era RSA3744 key expire in 2023-09-19 and not extend it any further. This should finish my transition to a simpler OpenPGP key setup, yay!

Apt Archive Transparency: debdistdiff & apt-canary

I’ve always found the operation of apt software package repositories to be a mystery. There appears to be a lack of transparency into which people have access to important apt package repositories out there, how the automatic non-human update mechanism is implemented, and what changes are published. I’m thinking of big distributions like Ubuntu and Debian, but also the free GNU/Linux distributions like Trisquel and PureOS that are derived from the more well-known distributions.

As far as I can tell, anyone who has the OpenPGP private key trusted by a apt-based GNU/Linux distribution can sign a modified Release/InRelease file and if my machine somehow downloads that version of the release file, my machine could be made to download and install packages that the distribution didn’t intend me to install. Further, it seems that anyone who has access to the main HTTP server, or any of its mirrors, or is anywhere on the network between them and my machine (when plaintext HTTP is used), can either stall security updates on my machine (on a per-IP basis), or use it to send my machine (again, on a per-IP basis to avoid detection) a modified Release/InRelease file if they had been able to obtain the private signing key for the archive. These are mighty powers that warrant overview.

I’ve always put off learning about the processes to protect the apt infrastructure, mentally filing it under “so many people rely on this infrastructure that enough people are likely to have invested time reviewing and improving these processes”. Simultaneous, I’ve always followed the more free-software friendly Debian-derived distributions such as gNewSense and have run it on some machines. I’ve never put them into serious production use, because the trust issues with their apt package repositories has been a big question mark for me. The “enough people” part of my rationale for deferring this is not convincing. Even the simple question of “is someone updating the apt repository” is not easy to understand on a running gNewSense system. At some point in time the gNewSense cron job to pull in security updates from Debian must have stopped working, and I wouldn’t have had any good mechanism to notice that. Most likely it happened without any public announcement. I’ve recently switched to Trisquel on production machines, and these questions has come back to haunt me.

The situation is unsatisfying and I looked into what could be done to improve it. I could try to understand who are the key people involved in each project, and may even learn what hardware component is used, or what software is involved to update and sign apt repositories. Is the server running non-free software? Proprietary BIOS or NIC firmware? Are the GnuPG private keys on disk? Smartcard? TPM? YubiKey? HSM? Where is the server co-located, and who has access to it? I tried to do a bit of this, and discovered things like Trisquel having a DSA1024 key in its default apt trust store (although for fairness, it seems that apt by default does not trust such signatures). However, I’m not certain understanding this more would scale to securing my machines against attacks on this infrastructure. Even people with the best intentions, and the state of the art hardware and software, will have problems.

To increase my trust in Trisquel I set out to understand how it worked. To make it easier to sort out what the interesting parts of the Trisquel archive to audit further were, I created debdistdiff to produce human readable text output comparing one apt archive with another apt archive. There is a GitLab CI/CD cron job that runs this every day, producing output comparing Trisquel vs Ubuntu and PureOS vs Debian. Working with these output files has made me learn more about how the process works, and I even stumbled upon something that is likely a bug where Trisquel aramo was imported from Ubuntu jammy while it contained a couple of package (e.g., gcc-8, python3.9) that were removed for the final Ubuntu jammy release.

After working on auditing the Trisquel archive manually that way, I realized that whatever I could tell from comparing Trisquel with Ubuntu, it would only be something based on a current snapshot of the archives. Tomorrow it may look completely different. What felt necessary was to audit the differences of the Trisquel archive continously. I was quite happy to have developed debdistdiff for one purpose (comparing two different archives like Trisquel and Ubuntu) and discovered that the tool could be used for another purpose (comparing the Trisquel archive at two different points in time). At this time I realized that I needed a log of all different apt archive metadata to be able to produce an audit log of the differences in time for the archive. I create manually curated git-repositories with the Release/InRelease and the Packages files for each architecture/component of the well-known distributions Trisquel, Ubuntu, Debian and PureOS. Eventually I wrote scripts to automate this, which are now published in the debdistget project.

At this point, one of the early question about per-IP substitution of Release files were lingering in my mind. However with the tooling I now had available, coming up with a way to resolve this was simple! Merely have apt compute a SHA256 checksum of the just downloaded InRelease file, and see if my git repository had the same file. At this point I started reading the Apt source code, and now I had more doubts about the security of my systems than I ever had before. Oh boy how the name Apt has never before felt more… Apt?! Oh well, we must leave some exercises for the students. Eventually I realized I wanted to touch as little of apt code basis as possible, and noticed the SigVerify::CopyAndVerify function called ExecGPGV which called apt-key verify which called GnuPG’s gpgv. By setting Apt::Key::gpgvcommand I could get apt-key verify to call another tool than gpgv. See where I’m going? I thought wrapping this up would now be trivial but for some reason the hash checksum I computed locally never matched what was on my server. I gave up and started working on other things instead.

Today I came back to this idea, and started to debug exactly how the local files looked that I got from apt and how they differed from what I had in my git repositories, that came straight from the apt archives. Eventually I traced this back to SplitClearSignedFile which takes an InRelease file and splits it into two files, probably mimicking the (old?) way of distributing both Release and Release.gpg. So the clearsigned InRelease file is split into one cleartext file (similar to the Release file) and one OpenPGP signature file (similar to the Release.gpg file). But why didn’t the cleartext variant of the InRelease file hash to the same value as the hash of the Release file? Sadly they differ by the final newline.

Having solved this technicality, wrapping the pieces up was easy, and I came up with a project apt-canary that provides a script apt-canary-gpgv that verify the local apt release files against something I call a “apt canary witness” file stored at a URL somewhere.

I’m now running apt-canary on my Trisquel aramo laptop, a Trisquel nabia server, and Talos II ppc64el Debian machine. This means I have solved the per-IP substitution worries (or at least made them less likely to occur, having to send the same malicious release files to both GitLab and my system), and allow me to have an audit log of all release files that I actually use for installing and downloading packages.

What do you think? There are clearly a lot of work and improvements to be made. This is a proof-of-concept implementation of an idea, but instead of refining it until perfection and delaying feedback, I wanted to publish this to get others to think about the problems and various ways to resolve them.

Btw, I’m going to be at FOSDEM’23 this weekend, helping to manage the Security Devroom. Catch me if you want to chat about this or other things. Happy Hacking!