Paul Crowley 1 - Stefan Lucks 2
LEVIATHAN  is a stream cipher proposed by David McGrew and Scott Fluhrer for the NESSIE project . Like most stream ciphers, it maps a key onto a pseudorandom keystream that can be XORed with the plaintext to generate the ciphertext. But it is unusual in that the keystream need not be generated in strict order from byte 0 onwards; arbitrary ranges of the keystream may be generated efficiently without the cost of generating and discarding all prior values. In other words, the keystream is ``seekable''. This property allows data from any part of a large encrypted file to be retrieved efficiently, without decrypting the whole file prior to the desired point; it is also useful for applications such as IPsec . Other stream ciphers with this property include block ciphers in CTR mode . LEVIATHAN draws ideas from the stream ciphers WAKE  and SEAL , and the GGM pseudo-random function (PRF) construction .
The keystream is bounded at bytes. Though the security
goals are stated in terms of key recovery, it is desirable that distinguishing
this keystream from a random binary string should be as computationally
expensive as an exhaustive search of the 128 or 256-bit keyspace.
Keystream generation is best modelled as a key-dependent function
, mapping a location
in the stream to a 32-bit output word; catenating consecutive values
of this function from 0 gives the entire keystream:
Finding for arbitrary is not especially fast. However, once this is done, intermediate values can usually be reused to find much more efficiently. This is because the internal structure of the cipher is based on a forest of binary trees, each of which generates words of output, as shown in Figure 1.
The notation we use to specify this function precisely is somewhat different from that given in , but is convenient for our purposes; we treat as a parameter, rather than as a word of state. The cipher is parameterised on and , where is divisible by 4 and ; LEVIATHAN sets and . denotes catenation of bit strings, bitwise complementation of , the XOR operation (addition in or as appropriate), and addition in the group , treating the first bit of the bitstring as the most significant and padding bitstrings shorter than bits with zeroes on the left. We specify the forest structure illustrated in Figure 1 recursively:
The internal state that functions , , and operate on (and the functions , , used to define them) is a 2-tuple of bitstrings ; we treat this as distinct from the catenated bitstring . The functions , , and operate on bytes within a word: and are rotates, while provides nonlinearity with the key-dependent permutations which map onto itself. These permutations are generated by the key schedule, which we omit. Note that and operate on each word of the tuple independently; mixing is provided by .
 gives a functionally different definition of ( ); the one given here is that intended by the authors  and used to generate the test vectors, though the difference is not relevant for our analysis.
We present two biases in the LEVIATHAN keystream that distinguish it from a random bit string. We know of no other attacks against LEVIATHAN more efficient than brute force.
Both attacks focus on consecutive pairs of outputs generated by . Clearly, LevPair generates the same -byte keystream as Lev, so a distinguisher for one is a distinguisher for the other. Such pairs are interesting because they are the most closely related outputs in the tree structure;  refers to attacks using such pairs as ``up-and-down attacks''. We can expand the formula for LevPair as follows:
From this we define functions LevAbove which generates the last common ancestor of such an output pair as illustrated in Figure 2, and PairCom which generates the output pair from the ancestor:
This structure gives us our first distinguisher. Though PairCom has the same domain as range, it is not in general bijective; it can be modelled more accurately as a random function. Thus a collision can occur in LevPair, given two distinct inputs, if there is a collision either in LevAbove or in PairCom, and if we model both as random functions the probability of an output collision for two random distinct inputs to LevPair is thus approximately , twice what it should be if the keystream were a random binary string.
For , this increased probability of collisions between output word pairs can be observed with a birthday attack after around output pairs ( bytes) have been generated; the techniques of  may be used to reduce the memory demands of this attack, though this slows the attack by a factor of approximately where is the height of the tree, since probes can no longer take advantage of the higher efficiency of sampling consecutive values of LevPair.
The definitions of the and functions are very similar; is the same as except that it treats its inputs in the opposite order, and inverts one of them. If did not apply bitwise inversion to its first input (call this function ), then the two functions would be related by (with Swap having the obvious definition ); this would mean in turn that for any , and thus that , with the result, as we shall see, that repeating pairs were visible in the output roughly twice as often as they should be. The inversion on the first input of breaks this symmetry; however, it turns out that it does not prevent a related attack.
Computation of PairCom requires 32 S-box lookups, but for each computation of the function the same 8-bit index, drawn from the least significant byte, is fed to each of the four S-boxes. Changes to the other bytes carry directly into the output of , without nonlinearity or mixing; in other words, where , we find . We call this least significant byte the index to the S-box. If is the input to PairCom, only bytes of are indices to S-boxes in , and only bytes are indices in ; by inverting only these two bytes in our pair , we can avoid the symmetry-breaking effect of the inversion as far as the nonlinear components are concerned, which results in the same four S-box indices being used in both the and branches of PairCom.
Figure 3 illustrates this attack. For an arbitrary -bit string , we define symbols for intermediate values in :
With these definitions, we find that :
From this it is clear that for any input of the appropriate form, one output word is the inverse of the other; or in other words, if we now XOR the two word outputs from PairCom together (which, conveniently, is the same as applying the LEVIATHAN compression function a second time), we find
Since we model LevAbove as a random function we conclude that inputs
to PairCom have probability of matching this form in
the normal calculation of LevPair. Where inputs do not match this
form, we assume that PairCom behaves as a random function and therefore
that for random not matching this form,
this assumption is borne out by experiment. From this we conclude
that LevPair is twice as likely as a random function to produce an
output such that
For , a test for the presence of this bias should therefore take on the order of samples of LevPair, ie bytes, as for the previous attack.
We looked for these biases on a reduced version of LEVIATHAN with .
For the PRF-PRF attack, we ran over 256 distinct keys generating 32-bit LevPair outputs for each, and sorting them to find collisions. We count as a collision each instance where a distinct pair of inputs result in the same output; thus, where outputs have the same value, we count this as distinct collisions. For a random function we would expect to find approximately3 collisions in total across all keys, while the PRF-PRF attack would predict an expected . The experiment found 2350336 collisions; this is standard deviations (SDs) from the expected value in the random function model, and SDs from the expected value in the model provided by the PRF-PRF attack. This shows that this model identifies a substantial bias in the cipher, but there is a further bias in the collision probability of roughly 0.38% yet to be accounted for.
For the S-box matching attack, we generated LevPair outputs for each of 256 keys, counting outputs with the property. A random function would generate an expected such outputs, while the S-box matching attack predicts that LevPair will generate an expected such outputs. The experiment found such outputs; this is SDs from the expected value in the random function model, and SDs from the expected value in the model provided by the S-box matching attack. Again, this shows that while a substantial source of bias has been identified, there is still a bias of 3.66% yet to be accounted for. Scott Fluhrer has reported finding this attack effective in experiments against the full LEVIATHAN with .
We have shown two forms of bias in the output of the LEVIATHAN keystream generator, either of which distinguish it from a random function with known bytes of output; we have not as yet found a way to recover key material using these distinguishers. These distinguishers can both be applied to the same portion of keystream for greater statistical significance. Both make use of compression in the PairCom function.
Despite these attacks, LEVIATHAN demonstrates that a tree-based cipher could offer many advantages. It is to be hoped that similar designs, offering the same speed and flexibility but resistant to this and other attacks, will be forthcoming.
Thanks to Rüdiger Weis for helpful commentary and suggestions, and to the LEVIATHAN authors for providing an implementation of the first experiment and for useful discussion.
URL for this paper: http://www.ciphergoth.org/crypto/leviathan