I’ve been researching PAKE algorithms recently and there doesn’t seem to be a good explanation of Encrypted Key Exchange with Diffie Hellman (DH-EKE) out there. The best way to learn something is to teach it. So in that spirit, here follows my explanation of SPEKE and DH-EKE.
Alice and Bob want to exchange data. This data is sensitive, so it needs to be encrypted and authenticated. Traditionally, there are two methods that can be used to solve this problem.
The first method is to use public-key cryptography; most commonly x509 certificates. This can provide very strong cryptography. However, it also has some weaknesses. The biggest weakness is how to trust the remote certificate. PKI is the traditional solution. But this involves trusting a third party (or, more usually, many third parties). In the current climate of espionage, this is likely to be unacceptable to Alice and Bob. Additionally, in many x509 setups, the user authentication actually happens within the encrypted channel; often by sending a password over the wire. Ewww.
The second method is to use symmetric cryptography; most commonly a shared password. The problem here is that the strength of the cryptography is inversely related to the usability of the cryptography: no easy to remember password can have the same entropy as a randomly generated 4096-bit number.
Enter PAKE. Password Authenticated Key Exchange allows Alice and Bob to establish an ad hoc session key used to encrypt all their data with strong cryptography while establishing mutual trust through a shared password. Further more, when using PAKE, an attacker who is able to listen in on the conversation or modify the packets sent will learn nothing about the password used to authenticate the exchange nor about the strong session key itself. Nor is any trust on a third party required.
The basic principle of PAKE is to use a weak shared password to authenticate an exchange of strong, randomly generated public keys. Once completed, these keys can be used to derive session keys. Even though weak passwords are used, they are (hopefully) used in such a way so as not to jeopardize the creation of a strong session key.
While there are a variety of PAKE algorithms, in this article I will be focusing on two very similar algorithms: SPEKE and DH-EKE. These two algorithms are well defined and tested. That is not to say that they are perfectly secure. Both algorithms have various weaknesses. Some newer algorithms promise to solve some of these problems. It is, however, outside the scope of this article to evaluate all of these. So I’ll stick with two of the more traditional algorithms.
Diffie-Hellman Key Exchange
In order to understand DH-EKE or SPEKE, we must first understand the underlying Diffie-Hellman Key exchange (DHKE). DHKE allows two parties to create a key used for exchanging encrypted data from the exchange of two public keys from a set of public/private key pairs.
In the table above, uppercase values are private (they need to be secret) and lowercase values are public.
Here is how it works (these numbers correspond to the steps in the above table):
- Both parties agree on parameters of the exchange. These parameters are public and this agreement can happen over the network. The variable p is a large safe prime. Similarly, g is a generator for a finite cyclic group.
- Alice generates a random private key (A) and a corresponding public key (a). Bob does the same (B and a, respectively).
- Both parties exchange their public keys over the network.
- Alice calculates the encryption key (K) by raising Bob’s public key (b) to the power of her own secret key (A) modulo p. Similarly, Bob calculates the same encryption key (K) by raising Alice’s public key (a) to the power of his own secret key (B) modulo p.
- All network data is encrypted using the calculated key. Data can be decrypted by either party’s secret key. Since these are never transmitted over the network and it is not possible to determine the private key from the public key or the parameters, an eavesdropper cannot decrypt the data.
A few caveats are necessary at this point.
First, K is almost never used as the encryption key directly, since it may have some weak bits. Usually, a derived key (K’) is calculated using hashing to eliminate this weakness.
Second, while it is true that K (or K’) is secret and anything it encrypts can be decrypted only by the two parties who performed this exchange, this is completely vulnerable to a man-in-the-middle (MitM) attack. This is because neither side of the exchange is authenticated. Hence the need for an authenticated key exchange.
Simple Password Exponential Key Exchange (SPEKE)
SPEKE makes only one small change to the Diffie-Hellman Key Exchange. Rather than agreeing on a shared generator, each party will compute a generator (G; Step 2) by squaring the hash of some shared password. Because the generator (G) is now derived from a password, it is private.
Note that in Step 5, Alice’s computation of K will be the same as Bob’s computation of K if and only if the shared password used in Step 2 are the same. Hence, if only Alice and Bob know the shared password, then this algorithm is safe from a MitM attack. And since the generator (G) is private, no offline dictionary attack is possible.
Note well, however, that no validation of K has occurred. So while Alice knows that data encrypted with K cannot be read by anyone but Bob, she still doesn’t know whether the person on the other end of the exchange is, in fact, Bob. If it isn’t Bob on the other end, K will contain a nonsensical value and any data encrypted by it will be irretrievable (assuming the security of the underlying cryptography system).
SPEKE has two major weakness, however.
First, because the hash of the password is squared to create the generator (G), there is a guaranteed collision of an inverse hash. That is, both the hash of the password and the inverse of the hash of the password will output the same generator (G). This means that a clever attacker could, with some advanced planning, attempt at least two password guesses at once; halving the time required to successfully execute a dictionary attack.
Second, because the square of the hash of the password cannot efficiently generate a point on an elliptic curve, SPEKE is entirely incompatible with elliptic curve cryptography. It is possible to work around this by using an admissible encoding formula to ensure that all potential password hash values can be permuted to fall as a point on an elliptic curve.
Diffie-Hellman Encrypted Key Exchange
Like SPEKE, DH-EKE presumes that each side knows a shared secret. This is typically a hash of a password (Step 1). However, instead of changing the generator (g), DH-EKE encrypts one or both of the public keys during transmission (Step 4a, 4b, and 4c list the traditional variants; all of equal strength). This leads to two outcomes.
First, because this encryption is unauthenticated, so long as the public keys (a or b) are indistinguishable from random, it will be impossible to tell whether or not decryption of these values was successful or not; making offline dictionary attacks impossible.
Second, if Alice’s password is different from Bob’s, then Alice’s K will be different than Bob’s K. This implies that a MitM attack is also impossible. Note well, again, that no validation of K has occurred. Hence, if Alice encrypts something for Bob using K, it will be decryptable if and only if Bob is on the other side of the connection and he knows the password. Otherwise, the resulting encrypted data will be undecryptable.
Like SPEKE, DH-EKE has two weaknesses. Both of these weaknesses are caused by the presumption that the public keys are indistinguishable from random.
First, when using standard discrete log problems, at least some of the bits of the public keys are predictable. This is by definition since the value of the public key is between 1 and p-1 and the buffer it is sent in is a multiple of 8-bits. Thus, there will always be some zero bits. While this is not likely to enable an offline dictionary attack, it may be sufficient to allow offline elimination of some passwords. There may be some way to mitigate these weak bits, but there is no such mitigation standardized so any attempted mitigation is of unknown value.
Second, the same problem exists for elliptic curves, only much worse. This weakness at least has a workaround where admissible encodings can be used to hash the elliptic curve points before being encrypted. This has been well defined in the EC-DH-EKE with an admissible encoding variant; along with several other derivatives of this approach. It is beyond the scope of this article to enumerate all of these.
I have elected to omit key validation from the descriptions of SPEKE and DH-EKE above. However, key validation is necessary in most cases to establish trust. There are a variety of methods for implementing key validation. I have listed two commonly used validation techniques in the above table. Both of them use some undefined key derivation technique to derive K’ from K.
In validation Option 1, Alice and Bob independently create a random value. After exchanging the random values using the specified schema, each side can test that the returned random value matches the one that was sent. This proves that both sides have the same K.
In validation Option 2, Alice and Bob simply exchange hashed values of K’. Since either side can compute these values independently, each side can verify the other value. Again, this proves that both sides have the same K.