How Diffie-Hellman key-exchange works


(C) Gustaf Alhäll, released under CC BY 4.0
Original document can be found at hanicef.me
License can be found at creativecommons.org

Algorithm


The Diffie-Hellman key exchange function is a way of exchanging cryptographic keys while ensuring no-one can steal the keys in the process. It is commonly used in cryptographic algorithms for forward secrecy, a method that allows safe communication even after the main cryptographic key has been leaked.

Diffie-Hellman works like this: first, both parts must agree on a prime number, p, and find the primitive root modulo of p which we will call g. p and g are not secret, and can be safely transmitted over an unsafe communication without compromising the communication. Then, one side picks a random number, a, and the other picks a random number, b. a and b are kept secret, and are not transmitted to the other part.

Then, each side computes the following equation:

A = gᵃ mod p
B = gᵇ mod p

A and B is then sent to the other part; they do not have to be kept hidden. Both parts then compute the following equation:

s = Bᵃ mod p
s = Aᵇ mod p

Now, both parts know the same number, s, while making it practically impossible for anyone eavesdropping on the connection to be able to know s.

Finding a good value for p


One thing that isn't immediately obvious about Diffie-Hellman is what value to use for p. Picking a good value for p is vital for Diffie-Hellman to be secure, and bad values can make it very weak. First and foremost, p must be large, since p limits the amount of possible values. For example, if p is 23, then s can only be between 0-22, making brute force attacks far too easy.

The officially standardized way of finding p is by finding a value, q, that is also a prime that satisfies p = jq + 1 and j = (p - 1) / q. This will also be used to find g later.

The algorithm officially recommended for finding p and q is:

m = length of q
L = length of p
m' = ⌈m/160⌉
L' = ⌈m/160⌉
N' = ⌈m/1024⌉
SEED = random sequence of bytes where the length >= m
LN = length of SEED
   m'-1
U = Σ (SHA1(SEED + n) ⊕ SHA1((SEED + m' + n) mod 2ᴸᴺ)) * 2¹⁶⁰ⁿ
   n=0
q = U ∨ 2ᵐ⁻¹ ∨ 1

Before proceeding, make sure q is a prime using a primality algorithm. If not, generate a new SEED and recalculate U and q until q is a prime.

A typical primality algorithm is the Fermat primality test, which looks like this:

aᵖ⁻¹ ≡ 1 (mod p)

where p is the prime, and a is a random number 1 < a < p-1. Note that this must be repeated multiple times to be reliable, as it's known to give false positives in a few cases where p is not a prime. Also, keep in mind that there are other primality algorithms out there that might be better based on what you need to achieve; do your research before using this blindly!

The following steps must be repeated with a variable c, where c is a counter starting from 0 and incrementing by 1 for every iteration.

R = SEED + 2*m' + (L' * c)
   L'-1
V = Σ SHA1(R + n) * 2¹⁶⁰ⁿ
   n=0
W = V mod 2ᴸ
X = W ∨ 2ᴸ⁻¹
p = X - (X mod 2q) + 1

Now, if p > 2ᴸ⁻¹ and p is a prime, appropriate p and q values are now found. If this is not true, repeat the process until either the condition is true or c >= 4096 * N'. If the latter is true, no possible combination of p and q exists for the current q, and the algorithm must be restarted from the beginning.

Finding g


Now when we have good values for p and q, finding g is simple:

j = (p - 1) / q
h = any integer where 1 < h < p - 1 satisfies hᶨ mod p > 1
g = hᶨ mod p

Cryptographic security


Knowing how Diffie-Hellman works doesn't really help explain why it's so secure. The core to it's cryptographic strength is based on what is called the discrete logarithm problem. It's a mathematical problem that shows that for a cyclic group G = ⟨b⟩, the equation bᵏ = a where k = logᵦ a and a ∈ G has no known algorithm for finding a solution (Note that the german double-s, ß, is used as a replacement for b due to the b subscript is missing in Unicode).

A cyclic group is a group that can be generated using a single element g, so all elements in the group is represented as gᵃ, where a is an integer. While some cyclic groups G can quickly be solved, most of them cannot. A simple example of a group G that can be quickly solved is

G = ⟨10⟩ = { ..., 0.001, 0.01, 0.1, 1, 10, 100, 1000, ...}

That is, the group where the generator g is 10. Let's say that a is 1000, to make it simple for us, which makes k = log₁₀ 1000 = 3. Thus, this makes 10³ = 1000, which adds up. Try experimenting with different values for a, and you'll see that this is a simple case to solve.

The reason why this is a simple problem is because the generator is an integer. In that case, k will always be an integer, making the logarithm easy to calculate.

The groups that Diffie-Hellman relies on, though, are integers modulo of prime p. This group is a specific group where the generator involves a modular exponentiation, where the modulo is a prime, and the group is defined like this:

gᵃ = k mod p

This might look familiar: it looks very similar to the algorithm used in Diffie-Hellman, which we've seen earlier. The reason why this group is so important is because it carries a very special attribute: it's very easy to find gᵏ = a mod p if k = gᵃ mod p is already known. The former equation is the discrete logarithm of the latter equation, and that is what Diffie-Hellman is effectively solving during the key exchange.

However, note that integers modulo of prime p are group*s*, not a single group. This is why both parts must agree on a shared value g and p before the key exchange can begin. Which group to pick, though, is not an easy choice, and some groups are harder to solve that others.

Out of all the groups that exists, the Sophie German primes have been proven notoriously difficult to solve. This is because the group has a large prime order, meaning that there are many possible primes that fall into the group. This is important: the larger a group is, the harder it is to find the correct value as the chance of picking the right value, if you were to pick a random one, diminishes.

A prime number q is a Sophie German prime if p = 2q + 1 is also a prime. The corresponding prime, p, is called a "safe prime" and has the unique property that the order of the group q belongs to is only divisible by q and 2. Again, this might sound familiar to the algorithm used to calculate a value for p and q. In fact, that is exactly what the algorithm is doing: finding a p and q which ensures that p is a safe prime and q is a Sophie German prime.