Streamlined NTRU Prime sntrup761 goes to IETF

The OpenSSH project added support for a hybrid Streamlined NTRU Prime post-quantum key encapsulation method sntrup761 to strengthen their X25519-based default in their version 8.5 released on 2021-03-03. While there has been a lot of talk about post-quantum crypto generally, my impression has been that there has been a slowdown in implementing and deploying them in the past two years. Why is that? Regardless of the answer, we can try to collaboratively change things, and one effort that appears strangely missing are IETF documents for these algorithms.

Building on some earlier work that added X25519/X448 to SSH, writing a similar document was relatively straight-forward once I had spent a day reading OpenSSH and TinySSH source code to understand how it worked. While I am not perfectly happy with how the final key is derived from the sntrup761/X25519 secrets – it is a SHA512 call on the concatenated secrets – I think the construct deserves to be better documented, to pave the road for increased confidence or better designs. Also, reusing the RFC5656§4 structs makes for a worse specification (one unnecessary normative reference), but probably a simpler implementation. I have published draft-josefsson-ntruprime-ssh-00 here. Credit here goes to Jan Mojžíš of TinySSH that designed the earlier sntrup4591761x25519-sha512@tinyssh.org in 2018, Markus Friedl who added it to OpenSSH in 2019, and Damien Miller that changed it to sntrup761 in 2020. Does anyone have more to add to the history of this work?

Once I had sharpened my xml2rfc skills, preparing a document describing the hybrid construct between the sntrup761 key-encapsulation mechanism and the X25519 key agreement method in a non-SSH fashion was easy. I do not know if this work is useful, but it may serve as a reference for further study. I published draft-josefsson-ntruprime-hybrid-00 here.

Finally, how about a IETF document on the base Streamlined NTRU Prime? Explaining all the details, and especially the math behind it would be a significant effort. I started doing that, but realized it is a subjective call when to stop explaining things. If we can’t assume that the reader knows about lattice math, is a document like this the best place to teach it? I settled for the most minimal approach instead, merely giving an introduction to the algorithm, included SageMath and C reference implementations together with test vectors. The IETF audience rarely understands math, so I think it is better to focus on the bits on the wire and the algorithm interfaces. Everything here was created by the Streamlined NTRU Prime team, I merely modified it a bit hoping I didn’t break too much. I have now published draft-josefsson-ntruprime-streamlined-00 here.

I maintain the IETF documents on my ietf-ntruprime GitLab page, feel free to open merge requests or raise issues to help improve them.

To have confidence in the code was working properly, I ended up preparing a branch with sntrup761 for the GNU-project Nettle and have submitted it upstream for review. I had the misfortune of having to understand and implement NIST’s DRBG-CTR to compute the sntrup761 known-answer tests, and what a mess it is. Why does a deterministic random generator support re-seeding? Why does it support non-full entropy derivation? What’s with the key size vs block size confusion? What’s with the optional parameters? What’s with having multiple algorithm descriptions? Luckily I was able to extract a minimal but working implementation that is easy to read. I can’t locate DRBG-CTR test vectors, anyone? Does anyone have sntrup761 test vectors that doesn’t use DRBG-CTR? One final reflection on publishing known-answer tests for an algorithm that uses random data: are the test vectors stable over different ways to implement the algorithm? Just consider of some optimization moved one randomness-extraction call before another, then wouldn’t the output be different? Are there other ways to verify correctness of implementations?

As always, happy hacking!

EdDSA and Ed25519 goes to IETF

After meeting Niels Möller at FOSDEM and learning about his Ed25519 implementation in GNU Nettle, I started working on a simple-to-implement description of Ed25519. The goal is to help implementers of various IETF (and non-IETF) protocols add support for Ed25519. As many are aware, OpenSSH and GnuPG has support for Ed25519 in recent versions, and OpenBSD since the v5.5 May 2014 release are signed with Ed25519. The paper describing EdDSA and Ed25519 is not aimed towards implementers, and does not include test vectors. I felt there were room for improvement to get wider and more accepted adoption.

Our work is published in the IETF as draft-josefsson-eddsa-ed25519 and we are soliciting feedback from implementers and others. Please help us iron out the mistakes in the document, and point out what is missing. For example, what could be done to help implementers avoid side-channel leakage? I don’t think the draft is the place for optimized and side-channel free implementations, and it is also not the place for a comprehensive tutorial on side-channel free programming. But maybe there is a middle ground where we can say something more than what we can do today. Ideas welcome!

Dice Random Numbers

Generating data with entropy, or random number generation (RNG), is a well-known difficult problem. Many crypto algorithms and protocols assumes random data is available. There are many implementations out there, including /dev/random in the BSD and Linux kernels and API calls in crypto libraries such as GnuTLS or OpenSSL. How they work can be understood by reading the source code. The quality of the data depends on actual hardware and what entropy sources were available — the RNG implementation itself is deterministic, it merely convert data with supposed entropy from a set of data sources and then generate an output stream.

In some situations, like on virtualized environments or on small embedded systems, it is hard to find sources of sufficient quantity. Rarely are there any lower-bound estimates on how much entropy there is in the data you get. You can improve the RNG issue by using a separate hardware RNG, but there is deployment complexity in that, and from a theoretical point of view, the problem of trusting that you get good random data merely moved from one system to another. (There is more to say about hardware RNGs, I’ll save that for another day.)

For some purposes, the available solutions does not inspire enough confidence in me because of the high complexity. Complexity is often the enemy of security. In crypto discussions I have said, only half-jokingly, that about the only RNG process that I would trust is one that I can explain in simple words and implement myself with the help of pen and paper. Normally I use the example of rolling a normal six-sided dice (a D6) several times. I have been thinking about this process in more detail lately, and felt it was time to write it down, regardless of how silly it may seem.

A die with six sides produces a random number between 1 and 6. It is relatively straight forward to intuitively convinced yourself that it is not clearly biased: inspect that it looks symmetric and do some trial rolls. By repeatedly rolling the die, you can generate how much data you need, time permitting.

I do not understand enough thermodynamics to know how to estimate the amount of entropy of a physical process, so I need to resort to intuitive arguments. It would be easy to just assume that a die produces 2.5 bits of entropy, because log_2(6)~=2.584. At least I find it easy to convince myself intuitively that 2.5 bits is an upper bound, there appears to me to be no way to get out more entropy than that out looking at a die roll outcome. I suspect that most dice have some form of defect, though, which leads to a very small bias that could be found with a large number of rolls. Thus I would propose that the amount of entropy of most D6’s are slightly below 2.5 bits on average. Further, to establish a lower bound, and intuitively, it seems easy to believe that if the entropy of particular D6 would be closer to 2 bits than to 2.5 bits, this would be noticeable fairly quickly by trial rolls. That assumes the die does not have complex logic and machinery in it that would hide the patterns. With the tinfoil hat on, consider a die with a power source and mechanics in it that allowed it to decide which number it would land on: it could generate seamingly-looking random pattern that still contained 0 bits of entropy. For example, suppose a D6 is built to produce the pattern 4, 1, 4, 2, 1, 3, 5, 6, 2, 3, 1, 3, 6, 3, 5, 6, 4, … this would mean it produces 0 bits of entropy (compare the numbers with the decimals of sqrt(2)). Other factors may also influence the amount of entropy in the output, consider if you roll the die by just dropping straight down from 1cm/1inch above the table. There could also be other reasons why the amount of entropy in a die roll is more limited, intuitive arguments are sometimes completely wrong! With this discussion as background, and for simplicity, going forward, I will assume that my D6 produces 2.5 bits of entropy on every roll.

We need to figure out how many times we need to roll it. I usually find myself needing a 128-bit random number (16 bytes). Crypto algorithms and protocols typically use power-of-2 data sizes. 64 bits of entropy results in brute-force attacks requiring about 2^64 tests, and for many operations, this is feasible with today’s computing power. Performing 2^128 operations does not seem possible with today’s technology. To produce 128 bits of entropy using a D6 that produces 2.5 bits of entropy per roll, you need to perform ceil(128/2.5)=52 rolls.

We also need to design an algorithm to convert the D6 output into the resulting 128-bit random number. While it would be nice from a theoretical point of view to let each and every bit of the D6 output influence each and every bit of the 128-bit random number, this becomes difficult to do with pen and paper. Update:This blog post used to include an algorithm here, however it was clearly wrong (written too late in the evening…) so I’ve removed it — I need to come back and think more about this.

So what’s the next step? Depends on what you want to use the random data for. For some purposes, such as generating a high-quality 128-bit AES key, I would be done. The key is right there. To generate a high-quality ECC private key, you need to generate somewhat more randomness (matching the ECC curve size) and do a couple of EC operations. To generate a high-quality RSA private key, unfortunately you will need much more randomness, at the point where it makes more sense to implement a PRNG seeded with a strong 128-bit seed generated using this process. The latter approach is the general solution: generate 128 bits of data using the dice approach, and then seed a CSPRNG of your choice to get large number of data quickly. These steps are somewhat technical, and you lose the pen-and-paper properties, but code to implement these parts are easier to verify compared to verifying that you get good quality entropy out of your RNG implementation.

Portable Symmetric Key Container (PSKC) Library

For the past weeks I have been working on implementing RFC 6030, also known as Portable Symmetric Key Container (PSKC). So what is PSKC? The Portable Symmetric Key Container (PSKC) format is used to transport and provision symmetric keys to cryptographic devices or software.

My PSKC Library allows you to parse, validate and generate PSKC data. The PSKC Library is written in C, uses LibXML, and is licensed under LGPLv2+. In practice, PSKC is most commonly used to transport secret keys for OATH HOTP/TOTP devices (and other OTP devices) between the personalization machine and the OTP validation server. Yesterday I released version 2.0.0 of OATH Toolkit with the new PSKC Library. See my earlier introduction to OATH Toolkit for background. OATH Toolkit is packaged for Debian/Ubuntu and I hope to refresh the package to include libpskc/pskctool soon.

To get a feeling for the PSKC data format, consider the most minimal valid PSKC data:

<?xml version="1.0"?>
<KeyContainer xmlns="urn:ietf:params:xml:ns:keyprov:pskc" Version="1.0">
  <KeyPackage/>
</KeyContainer>

The library can easily be used to export PSKC data into a comma-separated value (CSV) format, in fact the PSKC library tutorial concludes with that as an example. There is complete API documentation for the library. The command line tool is more useful for end-users and allows you to parse and inspect PSKC data. Below is an illustration of how you would use it to parse some PSKC data, first we show the content of a file “pskc-figure2.xml”:

<?xml version="1.0" encoding="UTF-8"?>
<KeyContainer Version="1.0"
	      Id="exampleID1"
	      xmlns="urn:ietf:params:xml:ns:keyprov:pskc">
  <KeyPackage>
    <Key Id="12345678"
         Algorithm="urn:ietf:params:xml:ns:keyprov:pskc:hotp">
      <Issuer>Issuer-A</Issuer>
      <Data>
        <Secret>
          <PlainValue>MTIzNA==
          </PlainValue>
        </Secret>
      </Data>
    </Key>
  </KeyPackage>
</KeyContainer>

Here is how you would parse and pretty print that PSKC data:

jas@latte:~$ pskctool -c pskc-figure2.xml 
Portable Symmetric Key Container (PSKC):
	Version: 1.0
	Id: exampleID1
	KeyPackage 0:
		DeviceInfo:
		Key:
			Id: 12345678
			Issuer: Issuer-A
			Algorithm: urn:ietf:params:xml:ns:keyprov:pskc:hotp
			Key Secret (base64): MTIzNA==

jas@latte:~$

For more information, see the OATH Toolkit website and the PSKC Library Manual.