next

up previous
Next: Design of Mercy Up: Mercy: a fast large Previous: Mercy design goals

Subsections



Description of Mercy

Since most of Mercy's operations are based around 32-bit words, we define $ Z_{w}=Z_{2^{32}} $. Vectors are indexed from zero, so a vector $ P\in Z^{128}_{w} $ of 128 32-bit numbers will be indexed as $ (P_{0},P_{1},\ldots ,P_{127}) $. The symbol $ \oplus $ represents bitwise exclusive OR; where $ + $ appears in the figures with a square box around it, it represents addition in the ring $ Z_w $. Least significant and lowest indexed bytes and words appear leftmost and uppermost in the figures.

Note that some details that would be needed to build a specification of Mercy-based file encryption sufficient for interoperability, such as byte ordering within words, are omitted since they are irrelevant for cryptanalytic purposes.

T box

Figure 1: T box
\includegraphics{diagrams/t-table.eps}

The $ T $ box ( $ T:Z_{256}\rightarrow Z_{w} $, Figure 1) is a key-dependent mapping of bytes onto words. $ N $ represents multiplicative inverses in $ GF(2^{8}) $ with polynomial base $ x^{8}+x^{4}+x^{3}+x+1 $ except that $ N(0)=0 $. $ d_{0}\ldots d_{7} $ are key dependent bijective affine mappings on $ GF(2)^{8} $.

$\displaystyle T(x)=\sum ^{3}_{i=0}2^{8i}d_{2i+1}[N(d_{2i}[x])]$

Operation M

Figure 2: Operation M
\includegraphics{diagrams/m-op.eps}

$ M:Z_{w}\rightarrow Z_{w} $ (Figure 2) is drawn from David Wheeler's stream cipher WAKE [21]; it's a simple, key-dependent mapping on 32-bit words. The most significant byte of the input word is looked up in the $ T $ box, and the output XORred with the other three bytes shifted up eight bits; the construction of the $ T $ box ensures that this mapping is bijective.

$\displaystyle M(x)=2^{8}x\oplus T\left( \left\lfloor x/2^{24}\right\rfloor \right) $

Q state machine

Figure 3: Q state machine
\includegraphics{diagrams/q-op.eps}

The $ Q $ state machine (Figure 3) maps a four-word initial state and one word input onto a four-word final state and one word output ( $ Q:Z_{w}^{4}\times Z_{w}\rightarrow Z_{w}^{4}\times Z_{w} $ ) using taps from a linear feedback shift register and a nonlinear mixing function.

$\displaystyle Q(S,x)=((S_{3}\oplus y,S_{0},S_{1},S_{2}),y)\quad \textrm{where}\quad y=S_{2}+M(S_{0}+x)$ (1)

F function

Figure 4: F function
\includegraphics{diagrams/f-func.eps}

The $ F_{n} $ function ($ n\geq 8 $; Figure 4) accepts a 128-bit spice $ G\in Z_{w}^{4} $ and a $ 32n $-bit input $ A\in Z_{w}^{n} $ and generates a $ 32n $-bit output $ B\in Z_{w}^{n} $ ( $ F_{n}:Z^{n}_{w}\times Z^{4}_{w}\rightarrow Z^{n}_{w} $ ). $ F_{64} $ (usually just $ F $) is the F function for the Feistel rounds. Here $ S_{0\ldots n+4} $ represents successive 128-bit states of a state machine; $ U_{0\ldots n+3} \in Z_w $ are the successive 32-bit inputs to the state machine, and $ V_{0\ldots n+3} \in Z_w $ are the outputs.


$\displaystyle F_{n}(A,G)$ $\displaystyle =$ $\displaystyle B\quad \textrm{where}$  
$\displaystyle S_{0}$ $\displaystyle =$ $\displaystyle (A_{n-4},A_{n-3},A_{n-2},A_{n-1})$  
$\displaystyle (S_{i+1},V_{i})$ $\displaystyle =$ $\displaystyle Q(S_{i},U_{i})\qquad (0 \leq i < n+4)$  
$\displaystyle U_{i}$ $\displaystyle =$ $\displaystyle \left\{ \begin{array}{ll} G_{i} & (0 \leq i<4)\ A_{i+n-12} & (4\leq i<8)\ A_{i-8} & (8 \leq i < n) \end{array}\right.$  
$\displaystyle B_{i}$ $\displaystyle =$ $\displaystyle \left\{ \begin{array}{ll} V_{i+8} & (i<n-4)\ S_{n+4,i+4-n} & (n-4 \leq i < n) \end{array}\right.$  

Round structure

Figure 5: Round structure (decryption illustrated)
\includegraphics{diagrams/rounds.eps}

Mercy uses a six round Feistel structure (Figure 5) with partial pre- and post-whitening; unusually, the final swap is not omitted. The spice $ G\in Z^{4}_{w} $ (usually the sector number) goes through a ``spice scheduling'' procedure, analogous with key scheduling, through which the ``spice material'' $ G'\in Z_{w}^{24} $ is generated based on the input spice, using the $ F_{24} $ variant of the $ F $ function; this forms six 128-bit ``round spices''. Spice scheduling uses a dummy input to the F function; for this a vector of incrementing bytes $ H \in Z_w^{24} $is used. $ P\in Z_{w}^{128} $ represents the plaintext, $ C\in Z^{128}_{w} $ the ciphertext, and $ W_0, W_1 \in Z_{w}^{64} $ the whitening values. We describe decryption below; since Mercy is a straightforward Feistel cipher encryption follows in the straightforward way.

$\displaystyle H_{i}$ $\displaystyle =$ $\displaystyle \sum ^{3}_{j=0}2^{8j}(4i+j)$  
$\displaystyle G'$ $\displaystyle =$ $\displaystyle F_{24}(H, G)$  
$\displaystyle (L_{0},R_{0})$ $\displaystyle =$ $\displaystyle (C_{0\ldots 63},C_{64\ldots 127} \oplus W_1)$  
$\displaystyle (L_{i+1},R_{i+1})$ $\displaystyle =$ $\displaystyle (R_{i},L_{i}\oplus F_{64}(R_{i},G'_{4i\ldots 4i+3}))\qquad (0\leq i<6)$  
$\displaystyle (P_{0\ldots 63},P_{64\ldots 127})$ $\displaystyle =$ $\displaystyle (L_{6} \oplus W_0, R_{6})$  

Key schedule

The key is used to seed a CPRNG from which key material is drawn; [10] is used in the sample implementation (after discarding 256 bytes of output), and is convenient since it's small and byte oriented, but any strong CPRNG will serve. Then the procedure in Figure 6 generates the substitutions $ d_{0\ldots 7} $ and the 2048-bit whitening values $ W_{0},W_{1} $.

Figure 6: Key schedule pseudocode
\begin{figure}{\hspace{1in} \vbox{\textsf{ \begin{tabbing} \textbf{for} \=\+ \( ... ...i,j} + 2^{8k} \textbf{random\_byte} () \)\-\-\-\ \end{tabbing}}}} \end{figure}

An expected 10.6 random bytes will be drawn for each $ d $. Once $ d_{0\ldots 7} $ have been determined, a 1k table representing the $ T $ box can be generated. During normal use 1536 bytes of key-dependent tables are used.


next

up previous
Next: Design of Mercy Up: Mercy: a fast large Previous: Mercy design goals

mercy@paul.cluefactory.org.uk