x25519 – Simon Josefsson's blog

Classic McEliece goes to IETF and OpenSSH

Posted on 2023-12-10 by simon

My earlier work on Streamlined NTRU Prime has been progressing along. The IETF document on sntrup761 in SSH has passed several process points. GnuPG’s libgcrypt has added support for sntrup761. The libssh support for sntrup761 is working, but the merge request is stuck mostly due to lack of time to debug why the regression test suite sporadically errors out in non-sntrup761 related parts with the patch.

The foundation for lattice-based post-quantum algorithms has some uncertainty around it, and I have felt that there is more to the post-quantum story than adding sntrup761 to implementations. Classic McEliece has been mentioned to me a couple of times, and I took some time to learn it and did a cut’n’paste job of the proposed ISO standard and published draft-josefsson-mceliece in the IETF to make the algorithm easily available to the IETF community. A high-quality implementation of Classic McEliece has been published as libmceliece and I’ve been supporting the work of Jan Mojžíš to package libmceliece for Debian, alas it has been stuck in the ftp-master NEW queue for manual review for over two months. The pre-dependencies librandombytes and libcpucycles are available in Debian already.

All that text writing and packaging work set the scene to write some code. When I added support for sntrup761 in libssh, I became familiar with the OpenSSH code base, so it was natural to return to OpenSSH to experiment with a new SSH KEX for Classic McEliece. DJB suggested to pick mceliece6688128 and combine it with the existing X25519+sntrup761 or with plain X25519. While a three-algorithm hybrid between X25519, sntrup761 and mceliece6688128 would be a simple drop-in for those that don’t want to lose the benefits offered by sntrup761, I decided to start the journey on a pure combination of X25519 with mceliece6688128. The key combiner in sntrup761x25519 is a simple SHA512 call and the only good I can say about that is that it is simple to describe and implement, and doesn’t raise too many questions since it is already deployed.

After procrastinating coding for months, once I sat down to work it only took a couple of hours until I had a successful Classic McEliece SSH connection. I suppose my brain had sorted everything in background before I started. To reproduce it, please try the following in a Debian testing environment (I use podman to get a clean environment).

# podman run -it --rm debian:testing-slim
apt update
apt dist-upgrade -y
apt install -y wget python3 librandombytes-dev libcpucycles-dev gcc make git autoconf libz-dev libssl-dev

cd ~
wget -q -O- https://lib.mceliece.org/libmceliece-20230612.tar.gz | tar xfz -
cd libmceliece-20230612/
./configure
make install
ldconfig
cd ..

git clone https://gitlab.com/jas/openssh-portable
cd openssh-portable
git checkout jas/mceliece
autoreconf
./configure # verify 'libmceliece support: yes'
make # CC="cc -DDEBUG_KEX=1 -DDEBUG_KEXDH=1 -DDEBUG_KEXECDH=1"

You should now have a working SSH client and server that supports Classic McEliece! Verify support by running ./ssh -Q kex and it should mention mceliece6688128x25519-sha512@openssh.com.

To have it print plenty of debug outputs, you may remove the # character on the final line, but don’t use such a build in production.

You can test it as follows:

./ssh-keygen -A # writes to /usr/local/etc/ssh_host_...

# setup public-key based login by running the following:
./ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys

adduser --system sshd
mkdir /var/empty

while true; do $PWD/sshd -p 2222 -f /dev/null; done &

./ssh -v -p 2222 localhost -oKexAlgorithms=mceliece6688128x25519-sha512@openssh.com date

On the client you should see output like this:

OpenSSH_9.5p1, OpenSSL 3.0.11 19 Sep 2023
...
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: mceliece6688128x25519-sha512@openssh.com
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:YognhWY7+399J+/V8eAQWmM3UFDLT0dkmoj3pIJ0zXs
...

debug1: Host '[localhost]:2222' is known and matches the ED25519 host key.
debug1: Found key in /root/.ssh/known_hosts:1
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
...
debug1: Sending command: date
debug1: pledge: fork
debug1: permanently_set_uid: 0/0
Environment:
  USER=root
  LOGNAME=root
  HOME=/root
  PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
  MAIL=/var/mail/root
  SHELL=/bin/bash
  SSH_CLIENT=::1 46894 2222
  SSH_CONNECTION=::1 46894 ::1 2222
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
Sat Dec  9 22:22:40 UTC 2023
debug1: channel 0: free: client-session, nchannels 1
Transferred: sent 1048044, received 3500 bytes, in 0.0 seconds
Bytes per second: sent 23388935.4, received 78108.6
debug1: Exit status 0

Notice the kex: algorithm: mceliece6688128x25519-sha512@openssh.com output.

How about network bandwidth usage? Below is a comparison of a complete SSH client connection such as the one above that log in and print date and logs out. Plain X25519 is around 7kb, X25519 with sntrup761 is around 9kb, and mceliece6688128 with X25519 is around 1MB. Yes, Classic McEliece has large keys, but for many environments, 1MB of data for the session establishment will barely be noticeable.

./ssh -v -p 2222 localhost -oKexAlgorithms=curve25519-sha256 date 2>&1 | grep ^Transferred
Transferred: sent 3028, received 3612 bytes, in 0.0 seconds

./ssh -v -p 2222 localhost -oKexAlgorithms=sntrup761x25519-sha512@openssh.com date 2>&1 | grep ^Transferred
Transferred: sent 4212, received 4596 bytes, in 0.0 seconds

./ssh -v -p 2222 localhost -oKexAlgorithms=mceliece6688128x25519-sha512@openssh.com date 2>&1 | grep ^Transferred
Transferred: sent 1048044, received 3764 bytes, in 0.0 seconds

So how about session establishment time?

date; i=0; while test $i -le 100; do ./ssh -v -p 2222 localhost -oKexAlgorithms=curve25519-sha256 date > /dev/null 2>&1; i=`expr $i + 1`; done; date
Sat Dec  9 22:39:19 UTC 2023
Sat Dec  9 22:39:25 UTC 2023
# 6 seconds

date; i=0; while test $i -le 100; do ./ssh -v -p 2222 localhost -oKexAlgorithms=sntrup761x25519-sha512@openssh.com date > /dev/null 2>&1; i=`expr $i + 1`; done; date
Sat Dec  9 22:39:29 UTC 2023
Sat Dec  9 22:39:38 UTC 2023
# 9 seconds

date; i=0; while test $i -le 100; do ./ssh -v -p 2222 localhost -oKexAlgorithms=mceliece6688128x25519-sha512@openssh.com date > /dev/null 2>&1; i=`expr $i + 1`; done; date
Sat Dec  9 22:39:55 UTC 2023
Sat Dec  9 22:40:07 UTC 2023
# 12 seconds

I never noticed adding sntrup761, so I’m pretty sure I wouldn’t notice this increase either. This is all running on my laptop that runs Trisquel so take it with a grain of salt but at least the magnitude is clear.

Future work items include:

Use a better hybrid mode combiner than SHA512? See for example KEM Combiners.
Write IETF document describing the Classic McEliece SSH protocol
Submit my patch to the OpenSSH community for discussion and feedback, please review meanwhile!
Implement a mceliece6688128sntrup761x25519 multi-hybrid mode?
Write a shell script a’la sntrup761.sh to import a stripped-down mceliece6688128 implementation to avoid having OpenSSH depend on libmceliece?
My kexmceliece6688128x25519.c is really just kexsntrup761x25519.c with the three calls to sntrup761 replaced with mceliece6688128. This should be parametrized to avoid cut’n’paste of code, if the OpenSSH community is interested.
Consider if the behaviour of librandombytes related to closing of file descriptors is relevant to OpenSSH.

Happy post-quantum SSH’ing!

Update: Changing the mceliece6688128_keypair call to mceliece6688128f_keypair (i.e., using the fully compatible f-variant) results in McEliece being just as fast as sntrup761 on my machine.

Update 2023-12-26: An initial IETF document draft-josefsson-ssh-mceliece-00 published.

Streamlined NTRU Prime sntrup761 goes to IETF

Posted on 2023-05-12 by simon

The OpenSSH project added support for a hybrid Streamlined NTRU Prime post-quantum key encapsulation method sntrup761 to strengthen their X25519-based default in their version 8.5 released on 2021-03-03. While there has been a lot of talk about post-quantum crypto generally, my impression has been that there has been a slowdown in implementing and deploying them in the past two years. Why is that? Regardless of the answer, we can try to collaboratively change things, and one effort that appears strangely missing are IETF documents for these algorithms.

Building on some earlier work that added X25519/X448 to SSH, writing a similar document was relatively straight-forward once I had spent a day reading OpenSSH and TinySSH source code to understand how it worked. While I am not perfectly happy with how the final key is derived from the sntrup761/X25519 secrets – it is a SHA512 call on the concatenated secrets – I think the construct deserves to be better documented, to pave the road for increased confidence or better designs. Also, reusing the RFC5656§4 structs makes for a worse specification (one unnecessary normative reference), but probably a simpler implementation. I have published draft-josefsson-ntruprime-ssh-00 here. Credit here goes to Jan Mojžíš of TinySSH that designed the earlier sntrup4591761x25519-sha512@tinyssh.org in 2018, Markus Friedl who added it to OpenSSH in 2019, and Damien Miller that changed it to sntrup761 in 2020. Does anyone have more to add to the history of this work?

Once I had sharpened my xml2rfc skills, preparing a document describing the hybrid construct between the sntrup761 key-encapsulation mechanism and the X25519 key agreement method in a non-SSH fashion was easy. I do not know if this work is useful, but it may serve as a reference for further study. I published draft-josefsson-ntruprime-hybrid-00 here.

Finally, how about a IETF document on the base Streamlined NTRU Prime? Explaining all the details, and especially the math behind it would be a significant effort. I started doing that, but realized it is a subjective call when to stop explaining things. If we can’t assume that the reader knows about lattice math, is a document like this the best place to teach it? I settled for the most minimal approach instead, merely giving an introduction to the algorithm, included SageMath and C reference implementations together with test vectors. The IETF audience rarely understands math, so I think it is better to focus on the bits on the wire and the algorithm interfaces. Everything here was created by the Streamlined NTRU Prime team, I merely modified it a bit hoping I didn’t break too much. I have now published draft-josefsson-ntruprime-streamlined-00 here.

I maintain the IETF documents on my ietf-ntruprime GitLab page, feel free to open merge requests or raise issues to help improve them.

To have confidence in the code was working properly, I ended up preparing a branch with sntrup761 for the GNU-project Nettle and have submitted it upstream for review. I had the misfortune of having to understand and implement NIST’s DRBG-CTR to compute the sntrup761 known-answer tests, and what a mess it is. Why does a deterministic random generator support re-seeding? Why does it support non-full entropy derivation? What’s with the key size vs block size confusion? What’s with the optional parameters? What’s with having multiple algorithm descriptions? Luckily I was able to extract a minimal but working implementation that is easy to read. I can’t locate DRBG-CTR test vectors, anyone? Does anyone have sntrup761 test vectors that doesn’t use DRBG-CTR? One final reflection on publishing known-answer tests for an algorithm that uses random data: are the test vectors stable over different ways to implement the algorithm? Just consider of some optimization moved one randomness-extraction call before another, then wouldn’t the output be different? Are there other ways to verify correctness of implementations?

As always, happy hacking!