Modelling the shrinking generator in terms of linear CA

This work analyses the output sequence from a cryptographic non-linear generator, the so-called shrinking generator. 
This sequence, known as the shrunken sequence, can be built by interleaving a unique PN-sequence whose characteristic polynomial serves as basis for the shrunken sequence's characteristic polynomial. 
In addition, the shrunken sequence can be also generated from a linear model based on cellular automata. 
The cellular automata here proposed generate a family of sequences with the same properties, period and characteristic polynomial, as those of the shrunken sequence. 
Moreover, such sequences appear several times along the cellular automata shifted a fixed number. 
The use of discrete logarithms allows the computation of such a number. 
The linearity of these cellular automata can be advantageously employed to launch a cryptanalysis against the shrinking generator and recover its output sequence.


Introduction
Private-key ciphers use the same key for encryption and decryption [18]. Thus, the key exchange between the two parts involved in the communication plays an important role. There are two types of private-key ciphers: stream ciphers and block ciphers. This work focuses on the first type. Stream ciphers are the fastest among the encryption procedures so they are implemented in many technological applications. Assume the messages are binary sequences, then stream ciphers encrypt bits individually [20]. The ciphertext is obtained XOR-ing the original message and a generated keystream sequence. The decryption is performed in the same way; XORing the ciphertext and the same keystream sequence. The main problem in stream ciphers is to generate from a short key a long keystream sequence that looks as random as possible. Maximal-length Linear Feedback Shift Registers (LFSRs) [11] generate PN-sequences with good cryptographic properties, but their linearity make them vulnerable. Due to the Berlekamp-Massey algorithm [15], these PN-sequences must never be used as keystream sequences, but as basis for other more complex structures. On the other hand, some one-dimensional linear cellular automata (CA) generate the same PN-sequences as those generated by maximal-length LFSRs [4]. Therefore, CA can be considered as alternative generators to LFSRs. Moreover, some keystream generators can be modelled as linear CA [3,9,10].
In [9], the authors showed that the output sequence of a cryptographic sequence generator, the shrinking generator, can be obtained from a linear model based on CA that uses rules 150 and 90. In this work, we propose a different family of CA that uses rules 102 or 60 and generates the same sequence, that is the shrunken sequence. Furthermore, we provide some properties about the shrunken sequence and the interleaved PN-sequences contained in it. We also analyse the other different sequences generated by the CA.
This paper is organized as follows: Section 2 gives some basic notions in order to understand the rest of the work. In Section 3, the main results of this work are introduced. This section is divided into two parts; Subsection 3.1 where some properties about the characteristic polynomial and the interleaved PN-sequences in the shrunken sequence are presented, and Subsection 3.2 where a characterization of the sequences obtained through the linear CA that generate the shrunken sequence is provided. In Sections 4, some conclusions complete the work.

Preliminaries
In this section, the main concepts of this work are introduced. In Section 2.1, we remind the notion of shrinking generator and shrunken sequence. In Section 2.2, we also define cellular automata and rules 102 and 60.
2.1. The shrinking generator. Before reminding the definition of shrinking generator, we need to introduce the concept of decimation, which will be used several times throughout this paper. Let {v i } be a linear recursive sequence over a finite field. The decimation of the sequence {v i } by d is a new sequence obtained by taking every d-th term of {v i } [8].
The shrinking generator was first introduced by Coppersmith et al. in [5]. It is composed of two maximal-length LFSRs, R 1 and R 2 , with lengths L 1 and L 2 , respectively, and gcd(L 1 , L 2 ) = 1. The PN-sequence {a i } generated by R 1 decimates the PN-sequence {b i } produced by the other register R 2 . The decimation rule is very simple; given two bits a i and b i , i = 0, 1, 2, . . . from both PN-sequences, the output sequence {s j } is obtained as follows: The sequence {s j } is called the shrunken sequence and its period is T = (2 L2 − 1)2 L1−1 . The linear complexity of this sequence, denoted by LC, satisfies L 2 2 L1−2 < LC ≤ L 2 2 L1−1 and its characteristic polynomial has the form p(x) m , where 2 L1−2 < m ≤ 2 L1−1 and p(x) is a primitive polynomial of degree L 2 [9]. Besides, the sequence is almost balanced as its number of 1s is 2 L1+L2−2 . This generator is easy to implement and has good cryptographic properties, so it is suitable for applications in stream ciphers. Notice that the shrunken sequence is obtained by irregular decimation of a PN-sequence. Example 1. Consider R 1 the LFSR with characteristic polynomial p 1 (x) = 1 + x + x 2 and initial state 10. Consider also R 2 the LFSR with characteristic polynomial p 2 (x) = 1 + x + x 3 and initial state 100. The shrunken-sequence can be computed in the following way: The sequence has period 14 and it is not difficult to check that its characteristic polynomial is p(x) 2 = (1 + x 2 + x 3 ) 2 , consequently its linear complexity equals 6.
2.2. Cellular automata. Cellular automata (CA) are devices composed of a finite number of cells whose content is updated according to a rule or function with k variables [6]. The state of the cell in position i at time t + 1, notated x t+1 i , depends on the state of the k neighbour cells at time t. If these rules are composed exclusively of XOR operations, then the CA are linear. Here, the CA we consider are regular (every cell follows the same rule), cyclic (extreme cells are adjacent) and one-dimensional. For k = 3, rules 102 and 60 are given by: The numbers 01100110 and 00111100 are the binary representations of 102 and 60, respectively. This is the reason why they are called rule 102 and rule 60. In Figure 1, it is possible to see these rules using the terminology introduced by Stephen Wolfram [21], where a white square represents the digit 0 and a black square represents the digit 1. Figure 2 shows the AC-images generated by these rules after applying 15 iterations to the one-dimensional CA. It is possible to see the symmetry between both rules. Notice that, according to Wolfram's terminology, both rules are considered for k = 3, but with one null coefficient. For rule 102 the coefficient corresponding to the first component is null, that is, Table 1a we find an example of a linear, regular, cyclic cellular automata that uses rule 102. Note that the linear, regular, cyclic automata with the same length that uses rule 60 provides the same sequences, but these appear in reverse order (see Table 1b).
Due to speed and randomness in their sequences, CA are a very good basis for stream ciphers. Furthermore, their hardware implementation is simple and their regular structure makes possible to find an efficient software implementation. The

Rule 102
Rule 60 Figure 2. AC-images generated with rules 102 and 60 60 60 60 60 60 60 60 first cryptographic application of CA was published in [22]. In this work, Wolfram used rule 30 for building a stream cipher, that was afterwards broken by Meier and Stafflebach [16]. However, other authors have proposed stream ciphers based on CA along the years [7,12,19].

Main results
Let F 2 be the Galois field of 2 elements. From now on, we consider two registers R 1 and R 2 , with characteristic polynomials p 1 (x), p 2 (x) ∈ F 2 [x], lengths L 1 and L 2 , gcd(L 1 , L 2 ) = 1, and T 1 = 2 L1 −1 and T 2 = 2 L2 −1 the periods of the corresponding PN-sequences, respectively. Besides, the PN-sequences generated by both registers are {a i } and {b i }, respectively. We assume without loss of generality that a 0 = 1.

3.1.
Characteristic polynomial and interleaved PN-sequences. In this section, we outline some of the most important properties of the shrunken sequence. In fact, we prove that the shrunken sequence is constructed interleaving a unique PN-sequence. We also determine the characteristic polynomial of the interleaved sequences and that of the shrunken sequence.
Theorem 3.1. The sequences obtained decimating by 2 L1−1 the shrunken-sequence are PN-sequences with period T 2 . We call these sequences the interleaved PNsequences of the shrunken sequence.
Proof. The sequence {a i } contains 2 L1−1 ones in the first T 1 bits. Suppose the location of these ones in {a i } is {0, j 1 , j 2 , . . . , j 2 L 1 −1 −1 }. Then, according to the decimation rule, the shrunken sequence is given by . . , b (T2−1)T1 } is obtained decimating by 2 L1−1 the shrunken-sequence. At the same time, this sequence can be obtained decimating by T 1 the PN-sequence {b i }. Since gcd(L 1 , L 2 ) = 1, it is not difficult to check that gcd(T 1 , T 2 ) = 1, and according to [11,Ch. 4,Th. 2], this sequence is a PN-sequence with the same period as {b i }, that is, T 2 .
It is worth noticing that the 2 L1−1 interleaved PN-sequences of the shrunken sequence correspond to shifted versions of the same PN-sequence. Now, we introduce a minor result that will be useful to get later results.
is the primitive polynomial that generates the interleaved PNsequences of the shrunken sequence, then the polynomial p(x) 2 L 1 −1 generates the shrunken sequence.
i=0 p i x i be the primitive polynomial that generates the interleaved PN-sequences of the shrunken sequence. These sequences have the following form These PN-sequences have to fulfill the linear recurrence Thus, we deduce that the polynomial Notice that p(x) 2 L 1 −1 always generates the shrunken sequence but this polynomial might not be the characteristic polynomial. In some cases the characteristic polynomial have the form p(x) m , with 2 L1−2 < m < 2 L2−1 . Now, we focus on the form of the primitive polynomial p(x).
Theorem 3.3. The primitive polynomial p(x) that generates the interleaved PNsequences of the shrunken sequence can be computed as We know that the characteristic polynomial p 2 (x) can be seen as where α ∈ F 2 L 2 is a primitive element in this field as well as a root of p 2 (x). We also know that any bit of the PN-sequence {b i } can be written as [14]. When A 0 = 1, we say the PN-sequence is in its characteristic phase.
The first interleaved PN-sequence of the shrunken sequence is As we saw before, any bit of this new PN-sequence can be calculated as follows where α 2 j (j = 0, 1, 2, . . . , L 2 − 1) are the roots of p 2 (x). If u i = b iT1 and β = α T1 , then any bit in the PN-sequence {u i } can be obtained as follows Therefore, the generator polynomial of the PN-sequence Note that p(x) only depends on p 2 (x) while the polynomial p 1 (x) affects m. In this way, given a fixed polynomial p 2 (x), every primitive polynomial with degree L 1 would provide the same p(x).
is the reciprocal polynomial of p 2 (x).
Consider now a cellular automaton that only uses rule 102. If the sequence in the 0-th column is {w i }, then the structure of the remaining sequences can be seen in Table 2. It is easy to check that the sequences in columns whose indices are 2 j , for j = 0, 1, 2, . . ., have the form {w i + w i+2 j }. If we consider rule 60, the same sequences appear but in reverse order.
In this section we study the relation between the different generated sequences in the CA. Let us start with a minor result about PN-sequences.
Proof. Any bit of the PN-sequence {w i } can be computed as where A 0 ∈ F 2 L 2 , and α ∈ F 2 L 2 is a root of p(x). Then, we have that ).
Since F 2 L 2 is a field and α is a primitive element, the sum of two elements in the field must be another element in the same field, that is, 1 + α = α D , for some D ∈ {2, 3, 4, . . . , 2 L2 − 2}. Therefore, and the lemma is proven.
Below, we remind the notion of Zech logarithm in a finite field.
Definition 3.5. Let α ∈ F 2 L 2 be a primitive element. The Zech logarithm with basis α is the application Z α : Z 2 L 2 −2 → Z * 2 L 2 −2 ∪ {∞}, such that each element t ∈ Z 2 L 2 −2 corresponds to Z α (t), attaining 1 + α t = α Zα(t) . Now, we can introduce the following result. Proof. According to Lemma 3.4 and Definition 3.5, we know that D = Z α (1) and this number is unique for the properties of the Galois fields.
From now on, we say that D is the number associated to the polynomial p(x) for the translation.
In Table 2 we saw that if we put a PN-sequence {w i } in the 0-th column, the next sequence corresponds to {w i +w i+1 }. We deduce that the remaining sequences are obtained shifting this PN-sequence D, 2D, 3D, . . . positions (mod 2 L2 − 1), respectively.
Example 2. For the polynomial p(x) = 1 + x 2 + x 3 , we have that α 5 = 1 + α. Therefore, D = Z α (1) = 5. If we consider the initial state 100, the PN-sequence has the form 1001110. If this PN-sequence is located in the 0-th column of the CA (see Table 3), then the sequence in the first column is the same PN-sequence, but it starts in position 5. The next PN-sequence (in the second column) starts in  Table 3).
The following result provides the associated number D * for reciprocal polynomials.
be two reciprocal primitive polynomials with degree L 2 and let D, D * be the associated numbers for the translation, respectively. Then, we have that The reciprocal polynomial has the form p * (x) = p L2 + p L2−1 x + p L2−2 x 2 + · · · + p 1 x L2−1 + p 0 x L2 . We consider the initial state The next bit of the PN-sequence {u i } can be generated using p * (x) as follows On the other hand, we know that From expressions (1) and (2), we deduce that u L2 = w 2 L 2 −L2−2 . Following the same procedure, we obtain that the first 2 L2 − 1 bits of the PN-sequence {u i } generated by p * (x) are According to Theorem 3.6 there exists a number D such that w i + w i+1 = w i+D .
Next lemma gives the value for the number associated to p(x) 2 .
Lemma 3.7. According to Theorem 3.6, there exists a number D such that w i + w i+1 = w i+D , for the PN-sequence {w i } generated by a primitive polynomial p(x) ∈ F 2 [x]. Then, given the polynomial p(x) 2 , there exists an associated number On the other hand, p(x) 2 = p 0 + p 1 x 2 + p 2 x 4 + · · · + p L2−1 x 2L2−2 + x 2L2 and the first 2 L2 − 1 bits of the sequence {t i } generated by this polynomial can be obtained as If we take all elements with even index, {t 0 , t 2 , . . .}, and substitute t 2i = u i , we have the sequence    u 0 , u 1 , u 2 , . . . , which is a PN-sequence generated by p(x).
On the other hand, if we take elements with odd index, {t 1 , t 3 , . . .}, and substitute t 2i+1 = v i and j = i − 1, we obtain the PN-sequence The polynomial p(x) generates both PN-sequences and, as a consequence, we have that u i + u i+1 = u i+D and v i + v i+1 = v i+D . Then, the sequence {t i } generated by p(x) 2 can be generated in the following way, In this case the number D 2 associated to p(x) 2 is D 2 = 2D and it is easy to check that this sequence is repeated every two columns along the whole CA.
It is possible to extend this proof for higher powers of 2.
Theorem 3.8. According to Theorem 3.6, there exists a number D such that w i + w i+1 = w i+D , for the PN-sequence {w i } generated by a primitive polynomial p(x) ∈ . For the polynomial p(x) 2 t , there exists a number D 2 t = 2 t D, such that t i + t i+2 t = t i+D 2 t , for every sequence {t i } generated by this polynomial.
Proof. In this case, the same procedure considered in Lemma 3.7 is followed to prove this result.
The characteristic polynomial of the shrunken sequence has the form p(x) m , with 2 L1−2 < m ≤ 2 L1−1 and p(x) as given in Theorem 3.3. This means that the polynomial p(x) 2 L 1 −1 can also generate the shrunken sequence (see Lemma 3.2). If we locate the shrunken sequence {s j } in the 0-th column of the CA, the 2 L1−1 −th column will have the form {s j + s j+2 L 1 −1 }. According to previous results, this sequence is the shrunken sequence starting in position D 2 L 1 −1 . Therefore, we have that the shrunken sequence appears in the columns whose indices are multiples of 2 L1−1 .
Let us prove now that the characteristic polynomial of the remaining sequences in the CA is p(x) m as well.
Theorem 3.9. The characteristic polynomial of the sequences generated by the CA is that of the shrunken sequence, that is, p(x) m , where p(x) has the form given in Theorem 3.3.
Proof. Let p(x) m = q 0 + q 1 x + q 2 x 2 + · · · + q L2−1 x L2−1 + x L2 be the characteristic polynomial of the shrunken sequence Assume that u j = s j + s j+1 , then the previous sequence can be seen as    u 0 , u 1 , u 2 , . . . , u L−2 , u L−1 , Consequently, the polynomial p(x) m can also generate this sequence. However, this sequence could also be generated by another polynomial with lower degree (divisor of p(x) m ). Consider the first 2 L1−1 +1 sequences in the CA, s 0 s 0 s 0 , s 1 s 1 s 1 , s 2 s 2 s 2 , . . . , s 2 L 1 −1 s 2 L 1 −1 s 2 L 1 −1 . We assume s 0 s 0 s 0 is the shrunken sequence. As we saw above, p(x) m generates s 1 s 1 s 1 as well, but suppose that p(x) k , with k < m generates s 1 s 1 s 1 . For the same procedure used above, p(x) k generates s 2 s 2 s 2 as well, and following the same idea p(x) k generates s 2 L 1 −1 s 2 L 1 −1 s 2 L 1 −1 . However, we know that s 2 L 1 −1 s 2 L 1 −1 s 2 L 1 −1 is the same sequence as s 0 s 0 s 0 , the shrunken sequence, but starting in position D 2 L 1 −1 = 2 L1−1 D, where D is the number associated to p(x). This is a contradiction, so every sequence in the CA is generated by p(x) m .
Theorem 3.10. The length of the CA that generates the shrunken sequence in the 0-th column is where T is the period of the shrunken sequence and D has the form given in Theorem 3.6.
Consider the shrunken sequence obtained in Example 1. The characteristic polynomial of this shrunken sequence is p(x) 2 = (1 + x 2 + x 3 ) 2 . The number associated to p(x) 2 is D 2 = 2 · D = 2 · 5 = 10. In Table 4 there is an example of a cyclic regular linear CA that generates this sequence. This CA has length 14 and it generates two different sequences, the shrunken sequence and another sequence, both with the same period 14. These two sequences appear 7 times but translated. For example, the shrunken sequence appears in columns 0, 2, 4, 6, 8, 10. In column 2 the shrunken sequence appears but starting in position 10, in column 4 in position 20 mod 14 = 6 and so on. The other sequence appears seven times as well, in columns, 1, 3, 5, 7, 9, 11, 13 with the same shifts.

3.3.
Applications. The shrinking generator and other generators based on irregular decimation [13,17] were created to increase the linear complexity of the PN-sequences. However, in Section 3.1 we have seen that the shrunken sequence generated by two maximal-length LFSRs of length L 1 and L 2 , respectively, is composed of 2 L1−1 interleaved PN-sequences (which are the same PN-sequence but shifted). Furthermore, in Section 3.2, we can check that the shrunken sequence can be always computed as the output sequence of a linear cellular automata. We can take advantage of such linear structures and their properties to design a cryptanalysis against the shrinking generator. According to the previous ideas, in [1,2] the authors proposed an attack based on an exhaustive search over the initial state of the first register. Thus, the complexity of the brute-force attack is reduced by a factor 2 L2 . We see that this generator can be easily broken, which attempts against the security of the information encrypted by this structure.

3.4.
Comparison with 90/150 CA. In [9], the authors proposed a family of linear, null, hybrid CA that also generated the shrunken sequence. These CA were a combination of both rules 90 and 150. Given two maximal-length LFSRs, it is necessary to perform the algorithm given in [4] and carry out a concatenation procedure to find the CA that generates the shrunken sequence. This fact makes it impossible to predict the form of the CA. In this work, since the CA are regular and we only use rule 102 (or 60) we do not need to look for the CA form. On the other hand, the length of our CA is, at the most, T = 2 L1−1 (2 L2 − 1), which is greater than (2 L1 − 1)L 2 (the length of the CA given in [9]). However, our CA generate 2 L1−1 different sequences, the remaining sequences in the CA are the same sequences, which is an advantage compared to the 90/150 CA. As a conclusion, we can say that our CA are longer but this disadvantage becomes less relevant when we notice the complex process developed in [9] to obtain the CA. Besides, this length is reduced to 2 L1−1 , since we have this number of sequences repeated along the CA.

Conclusions
The introduction of decimation to break the linearity of the PN-sequences has been useless, since the shrunken sequence can be modelled as the output sequence of a linear model based on CA. In this work, we analyse a family of one-dimensional, linear, regular and cyclic CA based on rule 102 that describes the behaviour of the shrinking generator, designed as a non-linear generator. These CA generate a family of sequences with the same characteristic polynomial and period as those of the shrunken sequence. Besides, the shrunken sequence is composed of PN-sequences whose characteristic polynomial serves as basis for the characteristic polynomial of the shrunken sequence. Therefore, this sequence is sensitive to suffer a cryptanalysis that takes advantage of the linearity and similarity between the sequences generated by the CA.