The 2019 TLS certificate serial number mess

Remember the Great TLS Certificate Serial Number Brouhaha of March, 2019? Millions of website certificates have been mis-issued! Everything is insecure! The sky is falling! Revoke and replace, ASAP!

I barely do, but I remember thinking it was a really stupid overreaction. Now I’ve gone back and reviewed what happened, and I’ll try to explain what I think I know about it. I’m now a little less convinced that the response was an overreaction, but I’m even more convinced that the issue itself was no threat to security. Regardless of which side you’re on, you’re invited to take this opportunity to shore up any deficit in your outrage level.

The CA/Browser Forum rules

There’s a consortium called the CA/Browser Forum, where web browser makers, and other less powerful parties, get together to decide which security certificates web browsers will trust. They establish the rules that the certificate authorities (CAs) have to abide by, if they want their certificates to be of any use. One of their rules is:

Effective September 30, 2016, CAs SHALL generate non-sequential Certificate serial numbers greater than zero (0) containing at least 64 bits of output from a CSPRNG.

“CSPRNG” means a random number generator. In cryptography, random data is often called “entropy”.

Why would they insist that the serial number be random? This has not always been standard procedure. In the past, sequential serial numbers were common.

Now for a long detour.

The MD5 hash collision attack

In late 2008, a group of researchers (Sotirov, Stevens, et. al.) successfully used an MD5 hash collision attack, targeting a particular CA, to acquire a counterfeit certificate-signing certificate that was trusted by the web browsers of the day. That’s a big deal.

The attack required them to predict exactly, byte for byte, what the metadata would be in a certificate they were going to purchase a few days later. They would then spend a few days crunching numbers to forge a (unsigned) counterfeit certificate, and a correlated private and public key for use with the legitimate certificate. Then they would purchase a legitimate certificate from the targeted CA. If the legitimate certificate exactly matched their prediction, the attack would work. They would be able to copy/paste the legitimate certificate’s signature onto their counterfeit certificate, and it would pass all validity tests.

The metadata they had to predict included things like the CA name, validity period, serial number, and various flags and other standard gunk. In this case, only two fields posed a real problem: the start of the validity period, and the serial number.

The purchasing process was done via the CA’s website, and they figured out that the validity period always started about 6 seconds after they pressed the final OK button. They chose a convenient timestamp a few days in the future, intending to purchase the certificate exactly 6 seconds before that.

The CA assigned serial numbers sequentially, so to get the serial number right, they had to predict how many certificates the CA would sell during those few days. Their strategy was to choose a number that was a slight overestimate, then later purchase dozens of certificates themselves to try to manipulate the serial number to be just right, at the right time.

They performed the attack, and… they failed. They just missed getting the right timestamp and/or serial number. But it seems their unusual behavior hadn’t set off any alarms at the CA, so they tried again the next week. And failed again. And again. They succeeded on their fourth attempt.

What can be done to defend against this attack? The best defense is to use a better hash function than MD5. This was done: Use of MD5 was discontinued.

Another thing that would have stopped this attack is if the CA had put something in the certificate whose value is unpredictable. The attack was possible because the certificates had virtually no entropy. Every byte, more-or-less, could be predicted. If the certificates signed by the CA had had even 8 bits of entropy, we can imagine the attack would have required about 256 times as much effort. It doesn’t take very much entropy to make this particular attack effectively impossible.

This was also done. There are lots of ways to put a random-valued field in a certificate, but the method chosen was to use random serial numbers. That’s why the aforementioned CA/Browser Forum rule exists.

It occurred to me that there might be another reason to randomize the serial number, having to do with certificate revocation, since certificates are revoked by serial number. I don’t quite know what attack that could protect against, but it doesn’t hurt.

I don’t imagine there’s anything preventing a CA from putting additional entropy in their certificates, beyond the serial number. It doesn’t matter where the entropy is, from a security standpoint. But it doesn’t help satisfy the rule that arbitrarily only counts the entropy of the serial number.

How many bits?

As mentioned, the CA/Browser Forum chose to require 64 bits of entropy. 64 is a nice round number to computer scientists, but it turns out to be somewhat pathological in this case. The serial number field is an integer, not an array of bits. It leaves it as an exercise to figure out how to pack 64 or more random bits into an integer, such that you will always get a positive integer. That isn’t difficult, but you first have to recognize that there is a problem to be solved, and it might mean you can’t utilize your programming language’s convenient native 64-bit integer functionality. 63 or 62 bits would have been a more convenient minimum.

I haven’t seen this mentioned, but if you want to be very strict, since zero is not allowed, that leaves “only” 63.99999999999999999653 bits worth entropy for a positive 64-bit unsigned integer, not the required 64. If you want to just barely comply with the rules, one method would be to use a 65-bit unsigned integer, with the low bit always 1, and the remaining bits random. (In the final DER encoded form, it would use up to 72 bits, but that’s not what I’m talking about.) All your serial numbers would be odd numbers, but that’s okay.

Predictably, somebody fell into the trap. In late February 2019, it came to light that a certain certificate management application used by a number of CAs generated serial numbers having only 63 bits of entropy. Many certificates (millions?) generated by it were already in use.

Security implications

I’ll try to give some perspective about the security implications of this, or lack thereof.

63 bits only gives you half as many possible serial numbers as 64, as some were eager to point out. You could say it’s only half as secure.

In a way, that’s true. But that’s only if it’s the weakest link, which it’s probably not. And computers don’t mind doing twice as much work, so a factor of 2 is usually not very significant.

The MD5 attack would easily have been stopped by, say, 20 bits of entropy, which would have made the chance of success about 1 in 1.05 million, per attempt. Increasing that to 63 bits gives you an additional safety factor of $2^{43}$, or about 8.8 trillion. To have a good chance of success, you’d have to attempt the attack (which took a few days) around 10 million trillion times. Oh, and you would have had to discover the 63-bit issue yourself, work in secret, and finish the attack by March 2019, when all the CAs fixed the problem.

Alternatively, maybe it’s conceivable that some hypothetical hash function attack would be able to deal with a certain amount of entropy. Imagine the MD5 attack, but with the ability to generate multiple counterfeit certificate candidates, each with a different serial number. But it’s implausible that such an attack would just happen to max out at exactly 63 bits.

If the known weaknesses of MD5 had been somewhat more severe, then no amount of entropy would help. The attackers could have simply taken any certificate, with any serial number, and forged a corresponding counterfeit certificate in private, without any of the prediction rigamarole.

A broader issue is that the serial number rule is mainly about defending against an attack that is unlikely to ever be practical again. The lesson from the MD5 attack has hopefully been learned: The certificate world needs to stay ahead of the game, and mandate the use of strong hash functions with no sign of weakness. Randomizing the serial number is still a good idea, but even zero bits of entropy should be enough for basic security.

The response

The powers that be could have granted amnesty. They could have decided it wasn’t worth demanding that a lot of mainly innocent people spend who-knows-how-many total person-years updating certificates, in reaction to an obscure technical rule violation that’s not a security threat. But instead, they decided that all the improperly-issued certificates must be replaced, because rules are rules.

CAs do have a history of bad behavior, so there’s an understandable desire to be strict. The fact that the CAs’ customers are being punished as well could even be seen as a good thing. If you’re idealistic, you could predict that some customers will be annoyed enough that they will switch to a CA that they deem to be better at following the rules, ultimately encouraging CAs to improve their security practices.

The news coverage of this mess was interesting. Most articles, at least if you read down far enough, did admit that this was not a significant security issue, at least not at this time. I’d say that’s an understatement. Only a few really discussed the question of whether the response was appropriate, which I think maybe should have been the real story. Some took it as a given that the nuclear option (revoke and replace all the certificates) was the only way. I don’t necessarily think that was the wrong decision, but it wasn’t the only possible one.