ANOTHER LOOK AT SUCCESS PROBABILITY OF LINEAR CRYPTANALYSIS

. This work studies the success probability of key recovery attacks based on using a single linear approximation. Previous works had analysed success probability under diﬀerent hypotheses on the distributions of correla-tions for the right and wrong key choices. This work puts forward a unifying framework of general key randomisation hypotheses. All previously used key randomisation hypotheses as also zero correlation attacks can be seen as special cases of the general framework. Derivations of expressions for the success probability are carried out under both the settings of the plaintexts being sampled with and without replacements. Compared to previous analysis, we uncover several new cases which have not been considered in the literature. For most of the cases which have been considered earlier, we provide complete expressions for the respective success probabilities. Finally, the full picture of the depen- dence of the success probability on the data complexity is revealed. Compared to the extant literature, our work provides a deeper and more thorough under- standing of the success probability of single linear cryptanalysis.


Introduction
Linear cryptanalysis [26] is a fundamental method of attacking a block cipher. To apply linear cryptanalysis, it is required to first obtain an approximate linear relation between the input and the output of a block cipher. Obtaining such a relation for a well designed cipher is a non-trivial task and requires a great deal of ingenuity along with a very careful examination of the internal structure of the mapping which defines the target block cipher. The present work does not address this aspect of linear cryptanalysis and it will be assumed that a linear relation is available.
The goal of (linear) cryptanalysis of a block cipher is to recover a portion of the secret key in time less than that required by a brute force algorithm to try out all possible keys. The portion of the key which is proposed to be recovered is called the target sub-key. An attack with such a goal is called a key recovery attack. A weaker goal is to be able to distinguish the output of the block cipher from that of a uniform random permutation and such attacks are called distinguishing attacks. In this work, we will concentrate only on key recovery attacks.
To apply linear cryptanalysis, it is required to obtain some data corresponding to the secret key. Such data consists of plaintext-ciphertext pairs (P i , C i ), i = 1, . . . , N , where C i is obtained by encrypting P i using the secret key. The plaintexts are chosen randomly. Typically, they are considered to be chosen under uniform random sampling with or without replacements.
There are well known attacks which essentially use statistical methods to determine the secret key from the available plaintext-ciphertext pairs. The output of the attack is a set of candidate values for the target sub-key. The attack is successful with some probability P S if the correct value of the target sub-key is in the set of candidate values. The size of the set of candidate values is also an important parameter. An attack is said to have a-bit advantage if the size of the set of candidate values is a fraction 2 −a of the number of possible values of the target sub-key [33].
The goal of a statistical analysis of an attack is to be able to obtain a relation between the three fundamental parameters N , P S and a. In this work, we concentrate on obtaining P S as a function of N and a and closely examine the behaviour of P S as a function of N .
Broadly speaking, a key recovery attack proceeds by testing each value of the target sub-key against the linear approximation with respect to the available data. For the correct choice κ * of the target sub-key, the linear approximation holds with some probability p κ * while for an incorrect choice κ = κ * of the target sub-key, the linear approximation holds with some other probability p κ,κ * . The basis of the attack is a difference in p κ * and p κ,κ * . The detailed examination of the internal structure of the block cipher leads to an estimate of p κ * , while p κ,κ * is obtained from an analysis of the behaviour of a uniform random permutation.
To perform a statistical analysis, it is required to hypothesise the values of p κ * and p κ,κ * . The hypothesis on p κ * is called the right key randomisation hypothesis, while the hypothesis on p κ,κ * is called the wrong key randomisation hypothesis. Until a few years ago, it was typical to hypothesise that p κ * is a constant p = 1/2 while p κ,κ * = 1/2.
The adjusted wrong key randomisation hypothesis was introduced by Bogdanov and Tischhauser in [12]. Based on a previous work by Daemen and Rijmen [15], it was hypothesised that p κ,κ * itself is a random variable following the normal distribution N (1/2, 2 −n−2 ). A later work by Ashur, Beyne and Rijmen [1] also used the adjusted wrong key randomisation hypothesis. The difference in [12] and [1] is in the manner in which the plaintexts P 1 , . . . , P N were assumed to be chosen -sampling with replacement was considered in [12] while sampling without replacement was considered in [1]. Both the works [12,1] observed a non-monotonic dependence of the success probability on N and provided possible explanations for this phenomenon. The statistical methodology used in [12,1] is based on an earlier work by Selçuk [33] using order statistics.
Blondeau and Nyberg [8] considered the adjusted right key randomisation hypothesis where p κ * was assumed to follow N (p, (ELP − 4 2 )/4), where ELP stands for the expected linear probability (or potential) of the underlying block cipher and = p − 1/2. In the formulation in [8], it was assumed that p = 1/2 while a later work [7] by the same authors considered the case p = 1/2. For the case p = 1/2, [8] considers the plaintexts to be sampled with replacement while for the case p = 1/2, [7] considers both sampling with and without replacement. In both [8] and [7], the adjusted right key randomisation hypothesis was considered in conjunction with the adjusted wrong key randomisation hypothesis. The statistical methodology used in both of these papers is based on the hypothesis testing based approach.
Our contributions. We perform a complete and generalised analysis of success probability in linear cryptanalysis using a single linear approximation. More specific details of our contributions are given below.
General key randomisation hypotheses: Following the formalisation of the adjusted wrong and right key randomisation hypotheses, we introduce the general key randomisation hypotheses. The general right key randomisation hypothesis models p κ * as a random variable following N (p, s 2 0 ) and the general wrong key randomisation hypothesis models p κ,κ * as a random variable following N (1/2, s 2 1 ). The standard (resp. adjusted) right key randomisation hypothesis is obtained by letting s 0 ↓ 0 (resp. s 2 0 = (ELP − 4 2 )/4); while the standard (resp. adjusted) wrong key randomisation hypothesis is obtained by letting s 1 ↓ 0 (resp. s 2 1 = 2 −n−2 ). A significant portion of the analysis is done using the generalised key randomisation hypotheses and the results obtained are then made specific by setting appropriate values of s 0 and s 1 . Approximate heuristic distributions of the test statistic: For a statistical analysis to be possible, the distributions of the test statistic under both the right and the wrong key assumptions are required. These distributions are obtained as compound distributions. There is, however, a fundamental difficulty. Following previous works [12,1,8,7], the quantities p κ * and p κ,κ * are modelled using normal distributions. As a result, it is possible that these quantities take values outside the range [0, 1]. Since p κ * and p κ,κ * are probabilities, this is meaningless. So, the compound distributions of the test statistic cannot be rigorously obtained. Instead, we provide heuristic derivations of approximations of these distributions under certain assumptions. These derivations cannot be made formal unless the assumption of normality on p κ * and p κ,κ * are dropped. We note that none of the previous works [12,1,7,8] discuss or even identify this issue. In obtaining the distributions of the test statistic we separately consider the cases where the plaintexts are sampled with and without replacements. Analysis of the case p = 1/2: This is the classical scenario for block ciphers and starting from the seminal work of Matsui [26], most papers on linear cryptanalysis of block ciphers have addressed this scenario. For this case, a previous work by Selçuk [33] provided an expression for the success probability. This expression, however, is incomplete as we substantiate later. The subsequent works [12,1] follow Selçuk's approach and hence also obtain incomplete expressions for the success probability. In contrast, the present work provides the complete expression for the success probability. We refer to Section 5 for details. The expression for the success probability can be derived in two different ways. The first method is based on an order statistics approach while the second method uses statistical hypothesis testing. We derive expressions for the success probability using both the order statistics and the hypothesis testing methods. The expressions for the success probability obtained using the two different approaches are different. They turn out to be equal if certain assumptions and approximations used by Selçuk in [33] are applied to the expression obtained from the order statistics based approach. Some theoretical limitations of the order statistics approach was pointed out in [29]. In the present work, we identify two additional implicit independence assumptions that need to be made to apply this approach. In contrast, the hypothesis testing based analysis does not suffer from the theoretical limitations and nor are any assumptions or approximations required. So, from a theoretical point of view, the hypothesis testing based approach is more satisfying. Consequently, we take the expression obtained from the hypothesis testing based approach to be the relevant expression for the success probability. To the best of our knowledge, the expression for the success probability that we obtain does not appear earlier in the literature.
It has been mentioned in [12,1] that in certain cases, the success probability does not increase monotonically with the number of plaintexts. In this work, we perform a thorough analysis of the dependence of the success probability on N . This covers both standard/adjusted right/wrong key randomisation hypotheses as also sampling with/without replacement. Our analysis shows that in most cases the success probability increases monotonically with N . There are indeed a few cases where this does not hold. For such cases, either | | < 2 −n/2−1 · max(1, γ) or , n is the block size, m is the size of the target sub-key and Φ is the standard normal distribution function. In other words, non-monotonicity of the success probability on N is observed only for certain cases where either is very small or ELP−4 2 is very small. The previous analyses [12,1] of the dependence of success probability on N were done only for the standard right key and adjusted wrong key randomisation hypotheses. Even for this case, the analysis in the works [12,1] did not reveal the complete picture that this work presents. Analysis of the case p = 1/2: For p = 1/2 (equivalently = 0) and s 0 ↓ 0, p κ * takes the constant value 1/2. This corresponds to zero correlation attack introduced in [11]. The case of p = 1/2 and s 0 = ELP/4 was considered in [7]. In this case, the means for both p κ * and p κ,κ * are 1/2 and a hypothesis test for the means cannot be done. So, [7] sets up a test of hypothesis for the variances of the two random variables. As mentioned above, the work [7] only considers the case of adjusted right and adjusted wrong key randomisation hypotheses. Based on our formulation of the general key randomisation hypotheses, we also set up a test of hypothesis for the variance leading to a general expression for the success probability. This expression is then instantiated to specific combinations of standard/adjusted right and wrong key randomisation hypotheses. In the case of adjusted wrong and adjusted right key randomisation hypotheses, [7] provides an informal argument that the success probability increases monotonically with the number of plaintexts. In this work, we provide a formal proof that for p = 1/2, in all cases (i.e., standard/adjusted right/wrong key randomisation hypotheses as well as sampling with/without replacement) the success probability increases monotonically with N . A summary of the results: Table 1 provides a summary of the results for various combinations of standard/adjusted right/wrong key randomisation hypotheses and whether the plaintexts are sampled with or without replacements. For each such combination, we indicate whether the case has been previously studied and mention the place in this work where the new expression for the success probability for that case can be obtained.
For p = 1/2, there are a total of eight cases, out of which four cases have been previously tackled. To the best of our knowledge, for the other four cases, the expressions for the success probabilities that we provide have not appeared previously. For the four cases where expressions for the success probabilities were previously known, we provide the complete expressions for the success probabilities.
For p = 1/2, there are also a total of eight cases. Out of these, the settings of standard right and adjusted wrong key randomisation hypotheses correspond to zero correlation attack. Expressions for the success probability of key recovery zero correlation attack are not given in [11] or in the follow-up work [13]. As indicated in Table 1, expressions for success probability has previously appeared in only two of the eight cases arising for p = 1/2. To the best of our knowledge, for the other six cases, the expressions for the success probabilities that we provide have not appeared earlier. Out of the two cases that were known, in one case, the expression for success probability that we obtain is the same as that obtained earlier; for the other case, we obtain a more accurate expression for the success probability as is explained later. Here "type" denotes whether p = 1/2 or not; wr (resp. wor) denotes sampling with (resp. without) replacement; RKRH (resp. WKRH) is an abbreviation for right (resp. wrong) key randomisation hypothesis; std (resp. adj) denotes whether the standard (resp. adjusted) key randomisation hypothesis is considered.
Previous and related work. Linear cryptanalysis was first proposed by Matsui in [26]. This paper obtained p κ * to be a constant different from 1/2. Until recently almost all papers on linear cryptanalysis also considered this setting. Junod [22] gave a detailed analysis of Matsui's ranking method [26,27]. This work introduced the notion of order statistics in linear cryptanalysis. The idea was further developed by Selçuk in [33], where he used a well known asymptotic result from the theory of order statistic to arrive at an expression for the success probability. Building on a work by Daemen and Rijmen [15], a paper by Bogdanov and Tischhauser [12] introduced the adjusted wrong key randomisation hypothesis where p κ,κ * is assumed to follow a normal distribution with mean p = 1/2. The work [12] considered the plaintexts to be sampled with replacement. A later work by Ashur, Beyne and Rijmen [1] analysed success probability under adjusted wrong key randomisation hypothesis in the setting where the plaintexts are sampled without replacements. Blondeau and Nyberg [8] considered the setting of adjusted right and wrong key randomisation hypotheses where plaintexts are sampled with replacement.
Zero correlation attack was introduced by Bogdanov and Rijmen in [11]. In the setting of zero correlation attack p κ * is assumed to be equal to 1/2. The work [11] considered a single zero correlation linear approximation. Both distinguishers and key recovery attacks were proposed in [11]. The distinguisher is general and works for all block ciphers whereas the key recovery attacks were for specific ciphers. Reduction in data complexity of zero correlation attacks using several linear approximations was given by Bogdanov and Wang [13]. This work also described a general distinguishing algorithm. Blondeau and Nyberg [7] considered the case where p κ * and p κ,κ * both follow normal distributions with the mean of both distributions equal to 1/2. They analysed both the settings of sampling of plaintexts with and without replacements.
Analyses of attacks using multiple linear approximations have been reported in the literature [27,21,4,24,2,23,3,17,28,19,8,20,29,30,31,32]. There have also been several subsequent works [10,6,35] on multiple and multidimensional zero correlation attacks. Since this paper is concerned only with the basic setting of a single linear approximation, we do not discuss the various aspects which arise in the context of multiple linear approximations.

Linear cryptanalysis: Background and statistical model
Let is a bijection from the set {0, 1} n to itself. Here K is called the secret key. The n-bit input to the block cipher is called the plaintext and n-bit output of the block cipher is called the ciphertext.
Block ciphers are generally constructed by composing round functions where each round function is parametrised by a round key. The round functions are also bijections of {0, 1} n to itself. The round keys are produced by applying an expansion function, called the key scheduling algorithm, to the secret key K. Denote the round keys by k (0) , k (1) , . . . and the round functions by R (0) denote the concatenation of the first i round keys, i.e., K (i) = k (0) || · · · || k (i−1) and E (i) K (i) denote the composition of the first i round functions, i.e., E (1) A block cipher may have many rounds and for the purposes of estimating the strength of a block cipher, a cryptanalytic attempt may target only some of these rounds. Such an attack is called a reduced round cryptanalysis. Suppose an attack targets the first r + 1 rounds where the block cipher may possibly have more than r + 1 rounds. For a plaintext P , we denote by C the output after r + 1 rounds, i.e., C = E (r+1) K (r+1) (P ), and by B the output after r rounds, i.e., B = E (r) Linear approximation: Any block cipher cryptanalysis starts off with a detailed analysis of the structure of the block cipher. This results in one or more relations between the plaintext P , the input to the last round B and possibly the expanded key K (r) . In case of linear cryptanalysis a linear relation of the following form is obtained.
where Γ P , Γ B ∈ {0, 1} n and Γ K (r) ∈ {0, 1} nr denote the plaintext mask, the mask to the input of the last round and the key mask.
A relation of the form given by (1) is called a linear approximation of the block cipher. Such a linear approximation usually holds with some probability which is taken over the random choices of the plaintext P . Obtaining such a linear approximation and the corresponding probability is a non-trivial task and requires a lot of ingenuity and experience. This forms the basis on which the statistical analysis of block ciphers is built. Define Inner key bit: Let Note that for a fixed but unknown key K (r) , z is a single unknown bit. Since the key mask Γ K is known, the bit z is determined only by the unknown but fixed K (r) . Hence, there is no randomness in either of K (r) or z. The bit z is called the inner key bit. Target sub-key: A linear relation of the form (1) usually involves only a subset of the bits of B. In order to obtain these bits from the ciphertext C it is required to partially decrypt C by one round. This involves a subset of the bits of the last round key k (r) . We call this subset of bits of the last round key to be the target sub-key. The ciphertext C is obtained by encrypting P using a key K. By κ * we denote the value of the target sub-key corresponding to the key K. We are interested in a key recovery attack where the goal is to find κ * .
Let the size of the target sub-key be m. These m bits are sufficient to partially decrypt C by one round and obtain the bits of B involved in the linear approximation. There are 2 m possible choices of the target sub-key out of which only one is correct. The purpose of the attack is to identify the correct value. Probability and bias of a linear approximation: Let P be a plaintext chosen uniformly at random from {0, 1} n ; C be the corresponding ciphertext; and B be the result of partially decrypting C with a choice κ of the target sub-key. The random variable B depends on the choice κ that is used to partially invert C. Further, C depends on the correct value κ * of the target sub-key and hence so does B. So, the random variable L defined in (2) depends on κ and κ * and we write L κ,κ * to emphasise this dependence. For κ = κ * , we will simply write L κ * . Define Here κ,κ * and κ * are the biases corresponding to incorrect and correct choices of the target sub-key respectively. The secret key K is a fixed quantity and so the randomness arises solely from the uniform random choice of P . Statistical model of the attack: Let P 1 , . . . , P N , with N ≤ 2 n , be chosen randomly following some distribution from the set {0, 1} n of all possible plaintexts. It is assumed that the adversary possesses the N plaintext-ciphertext pairs (P j , C j ); j = 1, 2, . . . , N where C j = E K (P j ) for some fixed key K. Using the linear approximation and the N plaintext-ciphertext pairs, the adversary has to find κ * in time faster than a brute force search on all possible keys of the block cipher.
For each choice κ of the target sub-key it is possible for the attacker to partially decrypt each C j by one round to obtain B κ,j ; j = 1, 2, . . . , N . Note that B κ,j depends on κ even though C j may not do so. Clearly, if κ = κ * , then the C j 's depend on κ, while if κ = κ * , C j has no relation to κ.
X κ,z,j is determined by the pair (P j , C j ), the choice κ of the target sub-key and the choice z of the inner key bit. Since C j depends upon K and hence upon κ * , X κ,z,j also depends upon κ * through C j . The randomness in X κ,z,j arises from the randomness in P j and also possibly from the previous choices P 1 , . . . , P j−1 . X κ,z,j is binary valued and the probability Pr[X κ,z,j = 1] potentially depends upon the following quantities: z : the choice of the inner key bit; p κ * or p κ,κ * : the probabilities of linear approximation as given in (3). j : the index determining the pair (P j , C j ).
This models a general scenario which captures a possible dependence on the index j. The dependence on j will be determined by the joint distribution of the plaintexts P 1 , . . . , P N . In the case that P 1 , . . . , P N are independent and uniformly distributed, Pr[X κ,z,j = 1] does not depend on j. On the other hand, suppose that P 1 , . . . , P N are sampled without replacement. In such a scenario, Pr[X κ,z,j = 1] does depend on j.
Test statistic: For each choice κ of the target sub-key and each choice z of the inner key bit, let T κ,z ≡ T (X κ,z,1 , . . . , X κ,z,N ) denote a test statistic. Then T κ,z is a random variable whose randomness arises from the randomness of P 1 , . . . , P N . Define Then So, the test statistic T κ,z does not depend on the value of z and it is sufficient to consider z = 0.
Remark. To simplify notation, we will write X κ,j and X κ instead of X κ,0,j and X κ,0 respectively; W κ and T κ instead of W κ,0 and T κ,0 respectively. Using this notation, the test statistic T κ is defined in the following manner: This test statistic was considered by Matsui [26].
There are 2 m choices of the target sub-key and so there are 2 m random variables T κ . The distribution of T κ depends on whether κ is correct or incorrect. To perform a statistical analysis of an attack, it is required to obtain the distribution of T κ under both correct and incorrect choices of κ. Later we consider this issue in more details.
Success probability: An attack will produce a set (or a list) of candidate values of the target sub-key. The attack is considered successful if the correct value of the target sub-key κ * is in the output set. The probability of this event is called the success probability of the attack. Advantage: An attack is said to have advantage a if the size of the set of candidate values of the target sub-key is equal to 2 m−a . In other words, a fraction 2 −a portion of the possible 2 m values of the target sub-key is produced by the attack. Data complexity: The number N of plaintext-ciphertext pairs required for an attack is called the data complexity of the attack. Clearly, N depends on the success probability P S and the advantage a. One of the goals of a statistical analysis is to be able to obtain a closed form relation between N , P S and a. Key-alternating and long-key ciphers: We recall the definitions of key-alternating and long-key block ciphers from [14]. A key-alternating block cipher consists of an alternating sequence of unkeyed rounds and simple bitwise additions of the round keys. Well known examples of key-alternating ciphers are AES, Serpent and Square while ciphers such as DES, IDEA, Twofish, RC5 and RC6 are not key-alternating ciphers. A long-key block cipher is a key-alternating cipher where the round keys are considered to be independent and uniformly distributed. Expected linear probability or potential: The linear probability (or potential) of a linear approximation is the square of its correlation. In [14], the expected linear probability (ELP) of a characteristic over a key-alternating cipher is defined to be the average linear probability of that characteristic over the associated long-key cipher. More generally, the ELP can also be defined for iterative ciphers by taking the average linear probability over all round keys by ignoring the key schedule. Notation on normal distributions: By N (µ, σ 2 ) we will denote the normal distribution with mean µ and variance σ 2 . The density function of N (µ, σ 2 ) will be denoted by f(x; µ, σ 2 ). The density function of the standard normal will be denoted by φ(x) while the distribution function of the standard normal will be denoted by Φ(x).

General key randomisation hypotheses
Recall the definitions of p κ,κ * and p κ * from (3). The corresponding biases are κ,κ * and κ * . For obtaining the distributions of W κ * and W κ , κ = κ * , it is required to hypothesise the behaviour of p κ * and p κ,κ * respectively. The two standard key randomisation hypotheses are the following. Standard right key randomisation hypothesis: p κ * = p, for some constant p for every choice of κ * . Standard wrong key randomisation hypothesis: p κ,κ * = 1/2 for every choice of κ * and κ = κ * .
The standard wrong key randomisation hypothesis was formally considered in [18], though it was used in earlier works. Modification of this hypothesis has been been considered in the literature. Based on an earlier work [15] on the distribution of correlations for a uniform random permutation, the standard wrong key randomisation hypothesis was relaxed in [12]. Under the standard wrong key randomisation hypothesis, the bias κ,κ * = 0. In [12], it was suggested that instead of assuming κ,κ * to be 0, κ,κ * should be assumed to follow a normal distribution with expectation 0 and variance 2 −n−2 . This is stated more formally as follows.

Remarks.
1. In this hypothesis, there is no explicit dependence of the bias on either κ or κ * . 2. From (4), κ,κ * should take values in [−1/2, 1/2]. If κ,κ * is assigned a value which is outside the range [−1/2, 1/2], then p κ,κ * takes a value outside the range [0, 1]. Since p κ,κ * is supposed to be a probability, this is meaningless. On the other hand, a random variable following a normal distribution can take any real value. So, the above hypothesis may lead to κ,κ * taking a value outside the range [−1/2, 1/2] which is not meaningful. The reason why such a situation arises is that in [15], a discrete distribution has been approximated by a normal distribution without adjusting for the possibility that the values may fall outside the meaningful range. From a theoretical point of view, assuming κ,κ * to follow a normal distribution cannot be formally justified. Hence, the adjusted wrong key randomisation hypothesis must necessarily be considered to be a heuristic assumption. 3. The variance 2 −n−2 is an exponentially decreasing function of n and by Cheby- In other words, p κ,κ * takes values outside [0, 1] with exponentially low probability. 4. The formal statement of the adjusted wrong key randomisation hypothesis appears as Hypothesis 2 in [12] and is | κ,κ * | ∼ N 1/2, 2 −n−2 , i.e., the condition in Hypothesis 2 of [12] is on the absolute value of κ,κ * rather than on κ,κ * . Since the absolute value is by definition a non-negative quantity, it is not meaningful to model its distribution using normal. In fact, the proof of Lemma 5.9 in the thesis [36] makes use of the hypothesis without the absolute value, i.e., it uses the hypothesis as stated above. Further, the later work [1] also uses the hypothesis without the absolute value. So, in this work we will use the hypothesis as stated above and without the absolute sign. While the adjusted wrong key randomisation hypothesis was used in [12] and later in [1] both of these works used the standard right key randomisation hypothesis. Modification of the right key randomisation hypothesis was considered in [8] and [7]. Adjusted right key randomisation hypothesis: Remarks. The first two points made in the context of the adjusted wrong key randomisation hypothesis also holds in the present case.
1. It is required to assume that the variance (ELP − 4 2 )/4 ≤ 2 −n . Then, the variance is an exponentially decreasing function of n and by Chebyshev's inequality Pr[|p κ * − 1/2| > 1/2] ≤ 2 −n . In other words, p κ * takes values outside [0, 1] with exponentially low probability. Without the assumption of an exponentially low value for the variance, it is not possible to argue that the probability of p κ * taking values outside [0, 1] is exponentially small. This point is not mentioned in [7]. 2. The work [8] considers the case p = 1/2 (equivalently, = 0). This is the classical case of linear cryptanalysis which corresponds to the situation where the correlation of the right key is non-zero.
3. The work [7] considers the case p = 1/2 (equivalently, = 0). For p = 1/2, = 0 and so the variance is ELP/4. The variance for the adjusted wrong key randomisation hypothesis is 2 −n−2 . In [7] it is assumed that the variance for the adjusted right key randomisation hypothesis is greater than that of the adjusted wrong key randomisation hypothesis which is equivalent to ELP > 2 −n . In our analysis, we do not make this assumption and instead work out both the cases of ELP > 2 −n and ELP < 2 −n .
Motivated by the above, we formulate the following general key randomisation hypotheses for both the right and the wrong key.
General right key randomisation hypothesis: where p is a fixed value and s 2 0 ≤ 2 −n ; let = p − 1/2. Given p, = p − 1/2 is the bias and 2 is the correlation.
General wrong key randomisation hypothesis: We note the following.
1. As s 0 ↓ 0, the random variable p κ * becomes degenerate and takes the value of the constant p. In this case, the general right key randomisation hypothesis becomes the standard right key randomisation hypothesis. 2. For p = 1/2 and s 0 ↓ 0 the random variable p κ * becomes degenerate and takes the constant value 1/2. The class of attack arising from this setting was introduced in [11] and such attacks called zero correlation attacks. For such attacks, we must necessarily have s 2 1 > 0 as otherwise, both the right and wrong key randomisation hypotheses become the same and so the attack will fail. 3. In [14], it was shown that the fixed key correlation for a long key block cipher corresponds to the choice p = 1/2. This had formed the motivation in [7] for considering the case p = 1/2 in the adjusted right key randomisation hypothesis where s 2 0 was taken to be ELP/4. We note, however, that not all block ciphers are long key ciphers and so the assumption p = 1/2 cannot be made in general. So, while the case p = 1/2 is a valid choice of study for the adjusted right key randomisation hypothesis, it is not the only choice. The case p = 1/2 is also an equally valid choice of study. 4. More generally, for p = 1/2, we must have s 0 = s 1 as otherwise both the right and wrong key randomisation hypotheses become the same and it will not be possible to mount an attack. 5. As s 1 ↓ 0, the random variable p κ,κ * becomes degenerate and takes the value 1/2. In this case, the general wrong key randomisation hypothesis becomes the standard wrong key randomisation hypothesis. 6. For s 2 0 = (ELP − 4 2 )/4, the general right key randomisation hypothesis becomes the adjusted right key randomisation hypothesis. 7. For s 2 1 = 2 −n−2 , the general wrong key randomisation hypothesis becomes the adjusted wrong key randomisation hypothesis.
So, the general key randomisation hypotheses covers both the standard and adjusted right and wrong key randomisation hypotheses. Further, it also covers zero correlation attacks. In view of this, we perform the statistical analysis of success probability in terms of the general key randomisation hypotheses and later deduce the special cases of the standard and the adjusted key randomisation hypotheses. This provides a unifying view of the entire analysis.
Remark. The issues discussed in Points 1 to 3 as part of the remarks after the adjusted wrong key randomisation hypothesis also hold for both the general right and the general wrong key randomisation hypotheses. In particular, we note that the requirements s 2 0 ≤ 2 −n and s 2 1 ≤ 2 −n have been imposed so that using Chebyshev's inequality, we obtain Pr[ In other words, the requirements s 2 0 ≤ 2 −n and s 2 1 ≤ 2 −n ensure that the probabilities of p κ * and p κ,κ * taking values outside the range [0, 1] is exponentially small.

Distributions of the test statistic
Given the behaviour of p κ and p κ,κ * modelled by the two general key randomisation hypotheses, the main task is to obtain normal approximations of the distributions of W κ * and W κ as given by (8). The distributions of W κ * and W κ depend on whether P 1 , . . . , P N are chosen with or without replacement. We separately consider both these cases.

4.1.
Distributions of W κ * and W κ , κ = κ * under uniform random sampling with replacement. In this case, P 1 , . . . , P N are chosen under uniform random sampling with replacement so that P 1 , . . . , P N are assumed to be independent and uniformly distributed over {0, 1} n .
First consider W κ * whose distribution is determined from the distribution of p κ * . Recall that X κ * = X κ * ,1 + · · · + X κ * ,N . Since P 1 , . . . , P N are independent, the random variables X κ * ,1 , . . . , X κ * ,N are also independent. Under the general right key randomisation assumption, p κ * is modelled as a random variable following N (p, s 2 0 ) and so the density function of p κ * is f(p; p, s 2 0 ). The distribution function of X κ * is approximated as follows: The sum within the integral is the distribution function of the binomial distribution and can be approximated by N (N p, N p(1−p)). In this approximation, the variance of the normal also depends on p which makes it difficult to proceed with further analysis. Using (10), it is possible to approximate p(1 − p)) as 1/4. This approximation, however, is valid only for p ∈ [p − θ 0 , p + θ 0 ] and under the assumption that ( + θ 0 ) 2 is negligible. In particular, the approximation is not valid for values of p close to 0 or 1. The probability that p is not in [p − θ 0 , p + θ 0 ] is exponentially small as shown in (9). So, we break up the integral in (13) in a manner such that the approximation p(1 − p)) ≈ 1/4 can be made in the range p − θ 0 to p + θ 0 and it is possible to show that the contribution to (13) for p outside this range is negligible.
The sum inside the integral is approximated by the distribution function of N (N p, N p(1 − p)). The range of the integration over p is from p − θ 0 to p + θ 0 . Using (10), it follows that for p ∈ [p − θ 0 , p + θ 0 ] the normal distribution N (N p, N p(1 − p)) can be approximated as N (N p, N/4) (i.e., p(1 − p) ≈ 1/4) under the assumption that ( + θ 0 ) 2 is negligible. Note that the above analysis has been done to ensure that the range of p is such that this approximation is meaningful.
The last equality follows from Proposition 1 in Section A.2. Comparing (13) and (16), it may appear that a roundabout route has been taken to essentially replace the sum inside the integral by a normal approximation. On the other hand, without taking this route, we do not see how to justify that the variance of this normal approximation is approximately N/4.
From (18), the distribution of X κ * is approximately N (N p, s 2 0 N 2 + N/4). Consequently, the distribution of W κ * = X κ * /N − 1/2 is approximately given as follows: For W κ with κ = κ * , we need to consider the general wrong key randomisation hypothesis where p κ,κ * is modelled as a random variable following N (1/2, s 2 1 ). A similar analysis as above is carried out where instead of (9) and (10), the relations (11) and (12) respectively are used. In particular, for p ∈ [1/2−ϑ 1 , 1/2+ϑ 1 ], it is required to approximate N (N p, N p(1 − p)) by N (N/2, N/4), i.e., p(1 − p) ≈ 1/4. The validity of this approximation for p ∈ [1/2 − ϑ 1 , 1/2 + ϑ 1 ] follows from (12) where s 2 1 2 n/2 is considered to be negligible. Again, we note that the approximation p(1 − p) ≈ 1/4 is not valid for values of p near to 0 or 1. The analysis yields the following approximation: Remark. For the adjusted wrong key randomisation hypothesis, i.e., with s 2 1 = 2 −n−2 , in [12] the distribution of W κ for κ = κ * was stated without proof to be N 0, 1 2 n+2 + 1 4N . Lemma 5.9 in the thesis [36] also stated this result and as proof mentioned . This refers to the fact that the sum of two independent normal distributed random variables is also normal distributed. While this fact is well known, it is not relevant to the present analysis.

4.2.
Distributions of W κ * and W κ , κ = κ * under uniform random sampling without replacement. In this scenario, the plaintexts P 1 , . . . , P N are chosen according to uniform random sampling without replacement. As a result, P 1 , . . . , P N are no longer independent and correspondingly neither are X κ,1 , . . . , X κ,N . So, the analysis in the case for sampling with replacement needs to be modified.
We first consider the distribution of W κ * in the scenario where p κ * is a random variable. A fraction p κ * of the 2 n possible plaintexts P satisfies the condition Γ P , P ⊕ Γ B , B = 1. Let us say that a plaintext P is 'red' if the condition Γ P , P ⊕ Γ B , B = 1 holds for P ; otherwise, we say that P is 'white'. So there are p κ * 2 n red plaintexts in {0, 1} n and the other plaintexts are white. For k ∈ {0, . . . , N }, the event X κ * = k is the event of picking k red plaintexts in N trials from an urn containing 2 n plaintexts out of which p κ * 2 n are red and the rest are white. So, Under the general right key randomisation hypothesis it is assumed that p κ * follows N (p, s 2 0 ) so that the density function of p κ * is taken to be f(p; p, s 2 0 ). Then An analysis along the lines of (14) to (15) using (9) shows that The sum within the integral can be seen to be the distribution function of the hypergeometric distribution Hypergeometric(N, 2 n , p2 n ). If N 2 n , then the hypergeometric distribution approximately follows Bin(N, p); on the other hand, if N/2 n = t ∈ (0, 1), then the hypergeometric distribution approximately follows For p ∈ [p−θ 0 , p+θ 0 ], from (10) the normal distribution N (pN, N (1−N/2 n )p(1− p)) is approximated as N (N p, N (1 − N/2 n )/4) under the assumption that ( + θ 0 ) 2 is negligible. Again, we note that the approximation holds in the mentioned range of p and it is not valid for values of p close to 0 or 1: The last equality follows from Proposition 1 in Section A.2. So, X κ * approximately follows N (N p, s 2 0 N 2 + N (1 − N/2 n )/4) and since W κ * = X κ * /N − 1/2 we have that the distribution of W κ * is approximately given as follows: For W κ with κ = κ * , we need to consider the general wrong key randomisation hypothesis where p κ,κ * is modelled as a random variable following N (1/2, s 2 1 ). In this case, it is required to use (11) and (12) instead of (9) and (10) respectively. In particular, as in the case of sampling with replacement, we note that for p ∈ i.e., p(1 − p) ≈ 1/4. The validity of this follows from (12) and the approximation is not valid for values of p near to 0 or 1. With these approximations, the resulting analysis shows the following approximate distribution: Remark. In [1], for the adjusted wrong key randomisation hypothesis, i.e., with s 2 1 = 2 −n−2 , the distribution of W κ for κ = κ * was stated to be N 0, 1 4N . We note the following issues.
1. The supporting argument in [1] was given to be the fact that if two random variables X and Y are such that X ∼ N (aY, σ 2 1 ) and Y ∼ N (µ, σ 2 2 ), then X ∼ N (aµ, σ 2 1 + a 2 σ 2 2 ) (see Proposition 2 in the appendix for a proof). This argument, however, is not complete. The distribution function of X κ for κ = κ * is After interchanging the order of the sum and the integration, one can apply the normal approximation of the hypergeometric distribution. It is not justified to directly start with the normal approximation of the hypergeometric distribution as has been done in [1]. 2. The issue is more subtle than simply a question of interchanging the order of the sum and the integral. After applying the normal approximation of the hypergeometric distribution one ends up with N (N/2, N (1 − N/2 n )p(1 − p)), which is then approximated as N (N/2, N (1 − N/2 n )/4). This requires assuming that (p − 1/2) 2 is negligible. Clearly, this assumption is not valid for values of p close to 0 or 1. On the other hand, the approximation is justified for p ∈ [1/2 − ϑ 1 , 1/2 + ϑ 1 ] under the assumption that s 2 1 2 n/2 = 2 −2−n/2 is negligible (see (12)). Also, the probability that p takes values outside of [1/2 − ϑ 1 , 1/2 + ϑ 1 ] is exponentially low as shown in (11). So, it is required to argue that the integral in (24) is from 1/2−ϑ 1 to 1/2+ϑ 1 and the contribution of the integral outside this range is negligible. This can be done in a manner which is similar to that done in Steps (14) to (15). In [1], the assumption that (p − 1/2) 2 is negligible has been made for all values of p which is not justified.

5.
Success probability for attacks with p = 1/2 The general right key randomisation hypothesis postulates p κ * ∼ N (p, s 2 0 ). In this section, we consider the success probability of attacks in the case p = 1/2. As mentioned earlier, this is the classical scenario of linear cryptanalysis.
From (8), the test statistic is T κ = |W κ | where W κ = (X κ,1 + · · · + X κ,N )/N − 1/2. To obtain the success probability of the attack it is required to obtain the distributions of T κ for the two scenarios when κ = κ * and when κ = κ * . This is obtained from the distributions of W κ * and W κ for κ = κ * . The distributions of W κ * and W κ have been obtained in Section 4. Suppose, the following holds: From (19) and (22), note that the condition µ 0 = 0 corresponds to = 0.
We now consider the derivation of the success probability of linear cryptanalysis in terms of µ 0 , σ 0 and σ 1 using both the order statistics based analysis and the hypothesis testing based analysis. From the expressions given in (19), (20), (22) and (23), we see that σ 0 and σ 1 depend on N whereas µ 0 = which is a constant.

5.1.
Order statistics based analysis. This approach is based on a ranking methodology used originally by Matsui [26] and later formalised by Selçuk [33]. The idea is the following. There are 2 m random variables T κ corresponding to the 2 m possible values of the target sub-key. Suppose the variables are denoted as T 0 , . . . , T 2 m −1 and assume that T 0 = |W 0 | corresponds to the choice of the correct target sub-key κ * , where W 0 follows the distribution of W κ * which is N (µ 0 , σ 2 0 ). Let T (1) , . . . , T (2 m −1) be the order statistics of T 1 , . . . , T 2 m −1 , i.e., T (1) , . . . , T (2 m −1) is the ascending order sort of T 1 , . . . , T 2 m −1 . So, the event corresponding to a successful attack with a-bit advantage is Using a well known result on order statistics, the distribution of T (2 m q) can be assumed to approximately follow . Using this result, P S can be approximated in the following manner: Some criticisms: The order statistics based approach is crucially dependent on the normal approximation of the distribution of the order statistics. In the statistics literature, this result appears in an asymptotic form. Using the well known Berry-Esséen theorem, a concrete upper bound on the error in such approximation was obtained in [29]. A key observation is that the order statistics result is applied to 2 m random variables and for the result to be applied even in an asymptotic context, it is necessary that 2 m is sufficiently large. A close analysis of the hypothesis of the theorem and the error bound in the concrete setting showed the following issues. We refer to [29] for details. m must be large: This condition arises from a convergence requirement on one of the quantities in the theorem showing the result on order statistics. For the error in such convergence to be around 10 −3 , it is required that m should be at least around 20 bits. So, if the size of the target sub-key is small, then the applicability of the order statistics based analysis is not clear. m − a must be large: This condition arises from the requirement that the error in the normal approximation is small. If the error is to be around 10 −3 , then m − a should be at least around 20 bits. Recall that a is the advantage of the attack. So, for attacks with high advantage, the applicability of the order statistics based analysis is not clear. Independence assumptions: We identify two assumptions that are required for the analysis to be meaningful. These were implicitly used by Selçuk in [33]. We know of no previous work where these assumptions have been explicitly highlighted.
1. The approximation of the distribution of the order statistic T (2 m q) by normal is a key step in the order statistics based approach. As mentioned above, this follows from a standard result in mathematical statistics. The hypothesis of this result requires the random variables T 1 , T 2 , . . . , T 2 m −1 to be independent and identically distributed. It indeed holds that T 1 , T 2 , . . . , T 2 m −1 are identically distributed. However, the randomness of all of these random variables arise from the randomness of P 1 , . . . , P N and so these random variables are certainly not independent. So, the independence of these random variables is a heuristic assumption. 2. Considering W 0 and T (2 m q) to follow normal distributions, it is assumed that W 0 − T (2 m q) (and W 0 + T (2 m q) ) also follows a normal distribution. A sufficient condition for W 0 − T (2 m q) to follow a normal distribution is that W 0 and T (2 m q) are independent. If W 0 and T (2 m q) are not independent, then it is not necessarily true that W 0 − T (2 m q) follows a normal distribution even if W 0 and T (2 m q) follow normal distributions. So, in assuming W 0 − T (2 m q) to follow a normal distribution, it is implicitly assumed that W 0 and T (2 m q) are independent. Since the randomness of both W 0 and T (2 m q) arise from the randomness in P 1 , . . . , P N , they are clearly not independent. As a result, the assumption that W 0 − T (2 m q) follows a normal distribution is also a heuristic assumption. In short, the above two assumptions can be summarised as assuming that the test statistics corresponding to different choices of the sub-key are independent. We note that such assumptions are sometimes made in the context of cryptanalysis though it is a bit surprising that the above assumptions do not seem to have been explicitly mentioned in the literature.
In later works on multiple linear and multiple differential cryptanalysis, the order statistics based analysis has been used in a number of papers [12,19,5]. The above mentioned issues, i.e., both m and m − a have to be large; and the assumption that the test statistics for different choices of the sub-key are independent, apply to all such works.

5.2.
Hypothesis testing based analysis. Statistical hypothesis testing for analysing block cipher cryptanalysis was carried out long back in [2] in the context of distinguishing attacks. For distinguishing attacks using integral and zerocorrelation linear cryptanalysis, this framework has been used in [10]. For analysing key recovery attacks on block ciphers, hypothesis testing based approach has been used in [9,34,8,29].
The idea of the hypothesis testing based approach is simple and intuitive. For each choice κ of the target sub-key, let H 0 be the null hypothesis that κ is correct and H 1 be the alternative hypothesis that κ is incorrect. The test statistic T κ = |W κ | is used to test H 0 against H 1 where the distributions of W κ are as in (25) for both κ = κ * and κ = κ * . The following hypothesis test is considered: Here t is a threshold whose exact value is determined depending on the desired success probability and advantage. Such a hypothesis test gives rise to two kinds of errors: H 0 is rejected when it holds which is called the Type-1 error; and H 0 is accepted when it does not hold which is called the Type-2 error. If a Type-1 error occurs, then κ = κ * is the correct value of the target sub-key but, the test rejects it and so the attack fails to recover the correct value. So, the attack is successful if and only if Type-1 error does not occur. So, the success probability On the other hand, for every Type-2 error, an incorrect value of κ gets labelled as a candidate key. So, the number of times that Type-2 errors occurs is the size of the list of candidate keys.
Suppose the hypothesis test given in (28) is applied to T κ for all κ ∈ {0, 1} m . Let P S = 1 − Pr[Type-1 error]. Then Proof. First assume µ 0 > 0. Let α = Pr[Type-1 error] and β = Pr[Type-2 error] and so P S = 1−α. For each κ = κ * , let Z κ be a binary valued random variable which takes the value 1 if and only if a Type-2 error occurs for κ. So, Pr[Z κ = 1] = β. The size of the list of candidate keys returned by the test is κ =κ * Z κ and so the expected size of the list of candidate keys is The expected number of times that Type-2 errors occurs is 2 m−a . So, The Type-1 and Type-2 error probabilities are calculated as follows: Using β = 2 m−a /(2 m − 1) in (34), we obtain Substituting t in (33) and noting that P S = 1 − α, we obtain If µ 0 < 0, then an analysis similar to the above shows that the resulting expression for the success probability is still given by (29). Remarks.

We have
where lg is logarithm to base two. We will be interested in attacks where the advantage a is at least lg(2 m /(2 m − 1)) so that γ can be assumed to be non-negative. 2. The computation in (30) does not require the Z κ 's or the T κ 's to be independent. 3. The theoretical limitations of the order statistics based analysis (namely, m and m−a are large and the heuristic assumption that the T κ 's are independent) are not present in the hypothesis testing based analysis. 4. Comparing (29) to (27), we find that the two expressions are equal under the following two assumptions: (a) 2 m /(2 m − 1) ≈ 1: this holds for moderate values of m, but, is not valid for small values of m. (b) σ 0 σ q : this assumption was used in [33] and we provide more details later.
In the rest of the work, we will use (29) as the expression for the success probability.

Success probability under general key randomisation hypotheses.
The distributions of W κ * and W κ for κ = κ * are respectively given by (19) and (20) for the case of sampling with replacement and are given by (22) and (23) for the case of sampling without replacement. These expressions can be compactly expressed in the following form: where for sampling with replacement; for sampling without replacement.

Suppose the hypothesis test given in
and the expected number of times that Type-2 errors occurs is 2 m−a .
Let P (wr) S denote the success probability when sampling with replacement is used and let P (wor) S denote the success probability when sampling without replacement is used. Using the corresponding expressions for σ from (37) in (38) we obtain the following expressions for P

Remarks.
1. If N 2 n , then P given by (40) becomes useful only when the fraction N/2 n is non-negligible. 2. In the case of sampling with replacement, due to the birthday paradox, having N to be greater than 2 n/2 is not really useful, since repetitions will begin to occur. In the following sections, we will instantiate the expressions for P S to specific values of s 0 and s 1 . To differentiate between these cases, we will use superscripts to P S denoting the different possible cases. The notation for these superscripts are as follows.
1. The superscript nz will denote that the success probabilities are for the case of attacks where p = 1/2.
2. The superscripts wr and wor will denote namely sampling with replacement and sampling without replacement respectively. 3. The superscript std will denote that the standard key randomisation hypothesis is considered for both right and wrong key. 4. The superscript adj will denote that the adjusted key randomisation hypothesis is considered for both right and wrong key. 5. The superscript radj will denote the adjusted right key randomisation hypothesis and the standard wrong key randomisation hypothesis. 6. The superscript wadj will denote the adjusted wrong key randomisation hypothesis and the standard right key randomisation hypothesis.

5.4.
Success probability under standard key randomisation hypotheses. As discussed in Section 3, the standard key randomisation hypotheses is obtained from the general key randomisation hypothesis by letting s 0 ↓ 0 and s 1 ↓ 0. Using these conditions in (39) and (40) lead to the following expressions for the success probabilities in the two cases of sampling with and without replacement: For standard right and wrong key randomisation hypotheses, the setting of p = 1/2 is not meaningful, since in this case, it is not possible to distinguish between right and wrong keys. So, we do not introduce the superscript nz in P (wr,std) S and P (wor,std) S . Success probability in [33]: Selçuk [33] had obtained an expression for the success probability under the standard key randomisation hypotheses and under the assumption that P 1 , . . . , P N are chosen uniformly with replacements. The expression for P (wr,std) S given by (41) was not obtained in [33]. This is due to the following reasons.
1. For analysing the success probability, Selçuk [33] employed the order statistics based approach. As discussed in Section 5.1, in this approach the T 's are written as T 0 , . . . , T 2 m −1 and it is assumed (without loss of generality) that T 0 corresponds to the right key. With this set-up, an attack with a-bit advantage is successful, if T 0 > T (2 m q) where q = 1 − 2 −a . Selçuk [33] insteads considers success to be the event W 0 > T (2 m q) and the condition W 0 /µ 0 > 0. Since the T 's can take only non-negative values, it follows that T (2 m q) ≥ 0 and so the event W 0 > T (2 m q) implies W 0 > 0 and so µ 0 > 0. Conversely, if µ 0 < 0, then for the condition W 0 /µ 0 > 0 to hold we must have W 0 < 0 in which case the event W 0 > T (2 m q) is an impossible event. So, the condition W 0 > T (2 m q) subsumes the condition W 0 /µ 0 > 0 for µ 0 > 0 and has probability 0 for µ 0 < 0. No justification is provided in [33] for considering success to be W 0 > T (2 m q) instead of T 0 > T (2 m q) . From (26) we see that the event W 0 > T (2 m q) is a sub-event of T 0 > T (2 m q) which is the event that the attack is successful. 2. It is assumed that σ 0 σ q . This is justified in [33] by providing numerical values for a in the range 8 ≤ a ≤ 48 and it is mentioned that the assumption especially holds for success probability 0.8 or more.
Under the above two assumptions, the expression for success probability obtained in [33] is the following: Assume that m is large so that 2 m − 1 ≈ 2 m and so γ ≈ Φ −1 1 − 2 −a−1 . Then the right hand side of (43) becomes equal to the first term of (41). This shows that the expression for the success probability obtained in [33] is incomplete.
To the best of our knowledge, no prior work has analysed the success probability of linear cryptanalysis with the standard key randomisation hypotheses and under the condition where P 1 , . . . , P N are chosen uniformly without replacement. So, the expression for P (wor,std) S given by (42) is the first such result.

5.5.
Success probability under adjusted wrong key randomisation hypothesis. Setting s 2 1 = 2 −n−2 converts the general wrong key randomisation hypothesis to the adjusted wrong key randomisation hypothesis. Also, we let s 0 ↓ 0, so that the general right key randomisation hypothesis simplifies to the standard right key randomisation hypothesis. Using the conditions for s 0 and s 1 in (39) and (40) provides the following expressions for the success probabilities in the two cases of sampling with and without replacement: Expressions for the success probability with the adjusted wrong key randomisation hypothesis and the standard right key randomisation hypothesis were obtained in [12] and [1]. Both the works followed the order statistics approach as used by Selçuk. The work [12] considered the setting of uniform random choice of P 1 , . . . , P N with replacement whereas [1] considered the setting of uniform random choice of P 1 , . . . , P N without replacement. Under the approximation 2 m ≈ 2 m − 1, the expressions obtained in [12] and [1] are equal to the first terms of (44) and (45) respectively. The reason why the complete expressions were not obtained in [12,1] is similar to the reason why Selçuk was not able to obtain the complete expression for P can be seen to be functions of | |, N and γ. Since γ itself is a function of the advantage a and the size of the target sub-key m, it follows that both P converts the general right key randomisation hypothesis to the adjusted right key randomisation hypothesis. Also, we let s 1 ↓ 0, so that the general wrong key randomisation hypothesis simplifies to the standard wrong key randomisation hypothesis. In this case, we have s 2 0 > s 2 1 . Using these conditions in (39) and (40) lead to the following expressions for the success probabilities in the two cases of sampling with and without replacement: To the best of our knowledge, no prior work has analysed the success probability of single linear cryptanalysis for the adjusted right key randomisation hypothesis and standard wrong key randomisation hypothesis corresponding to the situation where plaintexts P 1 , . . . , P N are chosen with and without replacement, respectively. So, the expressions for P (nz,wr,radj) S and P (nz,wor,radj) S given by (46) and (47)  converts the general right key randomisation hypothesis to the adjusted right key randomisation hypothesis. Also, we let s 2 1 = 2 −n−2 , so that the general wrong key randomisation hypothesis simplifies to the adjusted wrong key randomisation hypothesis. Assume that, ELP − 4 2 > 2 −n . Using these conditions in (39) and (40) lead to the following expressions for the success probabilities in the two cases of sampling with and without replacement: The expression for success probability under adjusted key randomisation hypothesis under sampling with replacement was earlier obtained in [8]. For sampling without replacement the authors derived the distribution of the test statistic under both the null and the alternate hypothesis, but the paper does not give an expression for the success probability. The expression obtained in [8] for sampling with replacement is approximate and is given by the first term of (48). 6. Dependence of P S on N for attacks with p = 1/2 Recall that the general key randomisation hypothesis postulates p κ * ∼ N (p, s 2 0 ). In this section, we study the dependence of success probability on N for attacks where p = 1/2.
Consider the setting of general key randomisation hypothesis where P 1 , . . . , P N are chosen with replacement. From (19) and (20), we have µ 0 = , σ 2 0 = s 2 0 + 1 4N and σ 2 1 = s 2 1 + 1 4N . Since s 0 and s 1 are constants (i.e., independent of N ), σ 0 and σ 1 are both inversely proportional to N . So, as N increases the normal curve for W κ * becomes more concentrated around the mean . This is shown in Figures 1, 2  and 3. Also, since γ is a constant, t = σ 1 γ is also inversely proportional to N . So, π is a function of N . One may expect π to be a monotonic decreasing function of N (and 1 − π to be a monotonic increasing function of N ), but, this does not necessarily hold as we explain below.
Let t 1 = (s 2 1 + 1/(4N 1 ))γ and t 2 = (s 2 1 + 1/(4N 2 ))γ and so t 2 < t 1 . Let π 1 and π 2 be the values of π corresponding to N 1 and N 2 . There are two possibilities: From  Figures 1 and 2, in both cases, it can be noted that the area under the curve corresponding to N 1 is more than the area under the curve corresponding to N 2 . So, π 1 > π 2 . In other words, increasing N leads to π going down and correspondingly 1 − π going up. As a result, in this case, the first term in the expression for success probability given by (33) increases with N . t 2 > x 0 : In this case, we have x 0 < t 2 < t 1 . From Figure 3, it is no longer clear that the area under the curve corresponding to N 1 is more than the area under the curve corresponding to N 2 . So, it cannot be definitely said that π 1 is more than π 2 and so the 1 − π does not necessarily go up. As a result, it can no longer be said that the first term in the expression for success probability given by (33) increases with N .
Note that the above explanation is purely statistical in nature. It is entirely based upon the expressions for the variances of the two normal distributions.
In the above discussion, we have tried to explain the possible non-monotonic behaviour of the probability of the event Pr[W κ * ≤ t] for the case of sampling with replacement. Considering this specific case makes it easy to see the dependence of the variances on N in determining possible non-monotonicity. The explanation extends to the complete expression for the success probability as well as to the case of sampling without replacement.
Explanations for non-monotonic behaviour have been provided in [12,1]. In [12], non-monotonicity has essentially been attributed to the strategy of sampling with replacement leading to duplicates. The later work [1], observed non-monotonicity even for the strategy of sampling without replacement and so the explanation based on the occurrence of duplicates could not be applied. Instead, [1] provides an explanation for non-monotonicity for both sampling with and without replacement based on the ranking strategy used in the order statistics based approach. As we have seen, expressions for success probability can be obtained without using the order statistics based approach. So, an explanation of non-monotonicity based on order statistics based approach is not adequate. Instead, as we have tried to explain above, the phenomenon is better understood by considering that the variances of the two normal distributions in question are monotone decreasing with N . Figure 3. Case x 0 < t 2 < t 1 .
6.1. Analysis . Consider the general expression for the success probability P S as given by (38). The subsequent expressions for success probability with/without replacement and under standard/adjusted key randomisation hypotheses are all obtained as special cases of (38). In (38), the quantities s 0 , s 1 and γ are constants which are independent of N and only σ depends on N as shown in (37). Further, from (37), it is clear that σ is a decreasing function of N for both the cases of with and without replacements. We analyse the behaviour of P S as a function of N and identify the situations where P S is a monotonic increasing function of N . Proof. We proceed by taking derivatives with respect to N . Since σ is a decreasing function of N , dσ dN < 0.
Using the definition of the standard normal density function and some simplifications, we have Since dσ/dN < 0 we have that dP S /dN > 0 if and only if f 1 (σ) < 0 if and only if f 2 (σ) > 0. If s 0 ≥ s 1 , then f 2 (σ) > 0 and so in this case we have dP S /dN > 0 which implies that P S is an increasing function of N . This proves the first point. Now consider the case s 0 < s 1 . We write If s 2 1 − s 2 0 γ ≥ | | σ 2 + s 2 1 , then f 2 (σ) < 0 and so dP S /dN < 0 which implies that P S is a decreasing function of N . This proves the second point.
So, suppose that s 0 < s 1 and s 2 1 − s 2 0 γ < | | σ 2 + s 2 1 both hold. By the condition of this case, we have 0 < δ < 1. Also, we have the assumption that δ is small enough such that δ 3 and higher powers of δ can be ignored. Then f 2 (σ) > 0 if and only if Cancelling 2γ on both sides and rearranging the terms shows the third point.
Fisher information: Suppose a random variable Y follows a distribution whose density is given by a function g(y; θ 1 , θ 2 , . . .), where θ 1 , θ 2 , . . . are the finitely many parameters specifying the density function. A relevant question is how much information does the random variable Y carry about one particular parameter θ i . Fisher information is a well known measure in statistics for quantifying this information. The Fisher information about a parameter θ ∈ {θ 1 , θ 2 , . . .} carried in the random variable Y is defined to be In other words, the information contained in the random variable Y is inversely proportional to σ 2 . So, as the variance increases, the information about the mean contained in the random variable Y decreases.
We view the first point of Theorem 6.1 in the context of Fisher information. Recall that p κ * is a random variable following N (p, s 2 0 ) and p κ,κ * is a random variable following N (1/2, s 2 1 ). So, I p κ * (p) = s −2 0 and I p κ,κ * (1/2) = s −2 1 . From the first point of Theorem 6.1 we have that if s 0 > s 1 , then P S is an increasing function of N for all N > 0. Put in terms of Fisher information, this is equivalent to saying that if I p κ,κ * (1/2) ≥ I p κ * (p), then P S is an increasing function of N . More explicitly, if the information about the mean contained in p κ * is not more than the information about the mean contained in p κ,κ * , then increasing N increases the success probability. Viewed differently, if the variability of p κ * is at least as much as the variability of p κ,κ * , then the chances of the attack being successful increases as the number of observations increases.
Applying Theorem 6.1 to the case of standard key randomisation hypothesis, we have s 0 ↓ 0 and s 1 ↓ 0. So, by the first point of Theorem 6.1, it follows that both P (nz,wr,std) S and P (nz,wor,std) S are increasing functions of N for all N > 0.
6.2. Adjusted wrong key randomisation hypothesis. In this case s 2 1 = 2 −n−2 . Also, assuming the standard right key randomisation hypothesis (as in [12,1]), s 0 ↓ 0. So, Points 2 and 3 of Theorem 6.1 apply. This case is divided into two subcases. Sampling with replacement: In this case, σ 2 = 1/(4N ). Let N (wr) 0 = (s 2 1 − 2 )/(4 2 s 2 1 ) and note that N For sampling with replacement, it is more meaningful to consider 2 n/2 to be the upper bound for N , since beyond a sample size of 2 n/2 there will be too many repetitions in the sample. We have N We have s 1 = 2 −1−n/2 and γ = Φ −1 1 − 2 m−a−1 /(2 m − 1) , where 2 m /(2 m − 1) < a ≤ m. The maximum value of γ is achieved for a = m and this value is Φ −1 (2 m+1 − 3)/(2 m+1 − 2) which is around 8.21 for m ≤ 64. So, s 1 γ is not much greater than s 1 . It seems reasonable to assume that in practice the value of will turn out to be such that max(s 1 , s 1 γ) < | |. Under this condition, both P (nz,wr,wadj) S and P (nz,wor,wadj) S are increasing functions of N for 0 < N ≤ 2 n . In other words, the anomalous non-monotonic behaviour will mostly not occur in practice. The non-monotonic behaviour is observed only when the value of | | is small enough to be less than either s 1 or s 1 γ.
We further note the following point. The distribution of W κ for κ = κ * is approximated as N (0, 2 −n−2 +1/(4N )) for sampling with replacement and is approximated as N (0, 1/(4N )) for sampling without replacement. As explained in Sections 4.1 and 4.2, both of these approximations require making the assumption that (p−1/2) 2 is negligible for p ∈ [1/2 − ϑ 1 , 1/2 + ϑ 1 ]. From (12), the assumption is meaningful only if we consider s 2 1 2 −n/2 = 2 −2−n/2 to be negligible. So, the derivation of the distribution of W κ for κ = κ * is meaningful only if 2 −2−n/2 is considered to be negligible. Consequently, it is perhaps not meaningful to apply the analysis for values of | | lower than 2 −2−n/2 . This is a further argument that the analysis actually shows P S is a monotonic increasing function of N in the range where the analysis is actually meaningful.
Remarks. The following comments are based on the assumption that γ ≈ Φ −1 (1− 2 −a−1 , i.e., 2 m ≈ 2 m − 1. 1. In [12] it was stated without proof that the first term of P We note that the complete picture of the dependence of the success probability on N was not provided in either [12] or [1] . Also, assuming the standard wrong key randomisation hypothesis, s 1 ↓ 0. So, by Point 1 of Theorem 6.1, the success probability expressions given by (46) and (47) are both monotonically increasing for both sampling with and without replacement for all N > 0. . For sampling with replacement, it is more meaningful to consider 2 n/2 to be the upper bound for N , since beyond a sample size of 2 n/2 there will be too many repetitions in the sample. Thus, .
is an increasing function of N in the range 0 < N < N 1 ; and • P (wr,adj) S is a decreasing function of N in the range N (wr) 1 is an decreasing function in the range 0 < N < N is an increasing function of N for 0 < N ≤ 2 n . Table 2 summarises the results on the detailed analysis of the dependence of the success probability on the number of plaintexts. Based on this table, we have the following result.
Theorem 6.2. For p = 1/2, a necessary condition for the success probability P S to decrease with increase in N for some range of N is the following: Proof. For both sampling with and without replacement, the monotone decreasing feature occurs either for the case of standard right key randomisation hypothesis and adjusted wrong key randomisation hypothesis or for the case of adjusted right key randomisation hypothesis and adjusted wrong key randomisation hypothesis. First consider the case of standard right key randomisation hypothesis and adjusted wrong key randomisation hypothesis. In this case s 2 1 = 2 −n−2 = s 2 1 . From Table 2, we observe that if max(s 1 , s 1 γ) < | | then P S increases monotonically with N . So a necessary condition for P S to decrease with increase in N for some range of N is ≤ max(s 1 , s 1 γ).
In the case of adjusted right key randomisation hypothesis and adjusted wrong key randomisation hypothesis, we observe from Table 2 that for P S to decrease monotonically with increase in N for some range of N , it must necessarily hold that 4 2 ≤ ELP < 4 2 + 2 −n .
Combining the two cases gives the desired result.
Theorem 6.2 states that for any range of N for P S to decrease with increase in N , either | | must be very small or ELP − 4 2 must be very small. Both of these conditions are unlikely to occur in practice.

7.
Success probability for attacks with p = 1/2 The general key randomisation hypothesis postulates p κ * ∼ N (p, s 2 0 ). In this section, we consider the case p = 1/2. We first derive the success probability under general key randomisation hypotheses and then plug in appropriate values of s 0 and s 1 to obtain the success probabilities under particular key randomisation hypotheses. Under the general key randomisation hypotheses, p κ * ∼ N (p, s 2 0 ) while p κ,κ * ∼ N (1/2, s 2 1 ). In the case of zero correlation attack, = 0 which is equivalent to p = 1/2. So, in this case, p κ * ∼ N (1/2, s 2 0 ). As a result, for zero correlation attacks, there is no difference in the means of p κ * and p κ,κ * . The statistical methodology then becomes a test for the variance. So, we assume s 2 0 = s 2 1 , as otherwise both the distributions are same and it is not possible to distinguish between them. A consequence of s 2 0 = s 2 1 is that the situation of standard right and standard wrong key randomisation hypotheses does not arise.    Table 2. Summary of the different cases and sub-cases showing the dependence of the success probability on the data complexity for p = 1/2. Here n is the block size, = p−1/2, The first task in the statistical analysis is to determine the distributions of the test statistics. This is determined by the distributions of W κ * and W κ for κ = κ * . The analysis in Section 4 determines these distributions in the general case. The distribution of W κ remains unchanged for both sampling with and without replacements and are respectively given by (20) and (23). The distributions of W κ * for the cases of sampling with and without replacements are obtained by putting = 0 in (19) and (22).
Using σ defined in (37) and setting σ 2 0 = s 2 0 + σ 2 , σ 2 1 = s 2 1 + σ 2 as before, we have Since in this case, the means of W κ * and W κ are equal, a test of hypothesis for the variance is used. The test statistics used is 4T 2 κ where T κ = |W κ | as defined in (8) and the following hypothesis test is considered.
Proof. We provide the proof for the case σ 0 < σ 1 , the other case being similar. The Type-1 error probability is given by The Type-2 error probability is given by Putting β = 2 −a and equating the expressions for t given in (55) and (56), we obtain the desired expression for P S . The general expression for the success probability given by Theorem 7.1 is instantiated to specific cases to obtain expressions for the success probability under standard/adjusted, right and wrong key randomisation hypotheses and also under sampling with and without replacement. The specific cases of the key randomisation hypotheses are obtained by substituting appropriate values of s 0 and s 1 while the cases of sampling with and without replacement are obtained by substituting appropriate values of σ as given by (37). In all cases, we will work under the assumption that N ≤ 2 n . Notation. As in the case of attacks with non-zero correlation, in this case also to differentiate between these cases, we will use superscripts to P S denoting the different possible cases. The superscripts will have the same meaning as in the previous case. The only new superscript that we will use is z which will denote that the corresponding success probability is for p = 1/2. 7.1. Success probability under adjusted wrong key randomisation hypothesis. We set s 2 1 = 2 −n−2 , so that the general wrong key randomisation hypothesis simplifies to the adjusted wrong key randomisation hypothesis. Also, we set s 2 0 ↓ 0 so that the general right key randomisation hypothesis simplifies to the standard right key randomisation hypothesis. In this case, we have s 1 > s 0 and from Theorem 7.1 and (37) we obtain the following:

Remarks.
1. For the case of sampling with replacement, it is not meaningful to take N to be greater than 2 n/2 as then repetitions will begin to occur. In Section 8, we show that P S increases monotonically with N . So the maximum value of P (z,wr,wadj) S is achieved for N = 2 n/2 . With this value of N and assuming 2 −n/2 1, we have the maximum value that P (z,wr,wadj) S can achieve is 2 −a . In other words, P (z,wr,wadj) S degrades exponentially with the advantage a. 2. For the case of sampling without replacement as N becomes close to 2 n , P (z,wor,wadj) S gets close to 1. In Section 8, we show that P S increases monotonically with N . So, the minimum value of P (z,wor,wadj) S is achieved by setting N = 1. With this value of N and assuming 2 n /(2 n − 1) ≈ 1, we have that the minimum value of P (z,wor,wadj) S is 2 −a−1 . More generally, the data complexity N can be expressed in terms of P (z,wor,wadj) S as follows: The setting of p = 1/2 and s 2 0 ↓ 0 results in p κ * being a constant taking the value 1/2. This is the setting of zero correlation attacks which has been introduced in [11]. The work [11] provided a distinguisher which requires 2 n−1 chosen plaintexts. It is implicit in the analysis that these plaintexts are distinct. While key recovery attacks on particular ciphers were outlined in [11], a general analysis of zero correlation key recovery attacks does not appear there. A follow-up work [13] showed how to reduce the data complexity of a distinguishing attack using multiple zero correlation linear approximations. To the best of our knowledge, no prior work has analysed the success probability of a single zero correlation key recovery attack. So, the expressions for P (z,wr,wadj) S and P (z,wor,wadj) S given by (57) and (58) do not appear in the literature. Further, the expression for data complexity given by (59) also do not appear in the literature.

7.2.
Success probability under adjusted right key randomisation hypothesis. Setting s 2 0 = ELP 4 converts the general right key randomisation hypothesis to the adjusted right key randomisation hypothesis. Also, we let s 2 1 ↓ 0, so that the general wrong key randomisation hypothesis simplifies to the standard wrong key randomisation hypothesis. In this case, we have s 0 > s 1 and from Theorem 7.1 and (37) we obtain the following: To the best of our knowledge, no prior work has analysed the success probability for the adjusted right key randomisation hypothesis and standard wrong key randomisation hypothesis corresponding to sampling with or without replacement. So, the expressions for P (z,wr,radj) S and P (z,wor,radj) S given by (60) and (61) are the first such results. 7.3. Success probability under adjusted key randomisation hypothesis. Setting s 2 1 = 2 −n−2 converts the general wrong key randomisation hypothesis to the adjusted wrong key randomisation hypothesis. Also, we let s 2 0 = ELP/4, so that the general right key randomisation hypothesis simplifies to the adjusted right key randomisation hypothesis. There are now two cases, namely, s 0 > s 1 (equivalently, ELP > 2 −n ) and s 0 < s 1 (equivalently, ELP < 2 −n ) corresponding to the two cases of Theorem 7.1: 2 n 2 n + N (2 n · ELP − 1) Expressions for the success probability with the adjusted key randomisation hypothesis under both sampling with and without repetitions were obtained in [7] for the case ELP > 2 −n . The expression for success probability for sampling with replacement obtained in [7] is the same as above. On the other hand, for sampling without replacement the expression for success probability obtained in [7] is different from the one obtained above. The reason for the difference arises from the fact that [7] uses (without justification) a non-standard normal approximation of the hypergeometric distribution which is different from the one available in the literature. (See Appendix A.3 for a brief summary of the literature on normal approximation of hypergeometric distribution). 8. Dependence of P S on N for attacks with p = 1/2 Recall that the general key randomisation hypothesis postulates p κ * ∼ N (p, s 2 0 ). In this section, we study the dependence of the success probability P S on N for the case p = 1/2. This is determined in the following result.
Theorem 8.1. Consider P S to be given by (54) where s 0 and s 1 are positive and independent of N while σ > 0 is a monotone decreasing function of N . Then P S is an increasing function of N for all N > 0.
Proof. First consider the case s 0 > s 1 . In this case, (54) can be written as follows: Differentiating both sides with respect to N , we get For sampling without replacement, we have d(b/N ) dN = d dN 2 n N (2 n − 1) − 1 2 n − 1 = − 2 n (2 n − 1)N 2 < 0. This shows that for s 0 > s 1 , P S is an increasing function of N for all N > 0. Now consider the case s 0 < s 1 . In this case, (54) can be written as follows: Differentiating both sides with respect to N , we get Since in this case s 0 < s 1 , d(σ1/σ0) dN > 0 if and only if d(b/N ) dN < 0. We have seen above that d(b/N ) dN < 0. This shows that for s 0 < s 1 , P S is an increasing function of N for all N > 0.
So, in all cases, P S is an increasing function of N for all N > 0.

Conclusion
In this paper, we have carried out a detailed and complete analysis of success probability of linear cryptanalysis. This has been done under a single unifying framework which provides a deeper insight and a better understanding of how the success probability behaves with respect to the data complexity.
This follows from a standard result in mathematical statistics. (See [37] for a proof of the asymptotic version of the result and [29] for a proof of the concrete version of the result.) Further suppose T i = |W i | where W i follows N (0, σ 1 ). Then T i follows a halfnormal distribution whose density function is f (y) = 2/(σ 1 √ 2π) exp(−y 2 /(2σ 2 1 )) and the distribution function F (y) is obtained by integrating the density function f (y). In this case, T (2 m q) approximately follows N (µ q , σ 2 q ) where A.2. Compound normal. Recall that the density function of N (µ, σ 2 ) is denoted as f(x; µ, σ 2 ).
Proposition 2. Let X and Y be two random variables such that X ∼ N (aY, σ 2 1 ) and Y ∼ N (µ, σ 2 2 ), where a is a constant. Then, X ∼ N (aµ, σ 2 1 + a 2 σ 2 2 ). Proof. Let, f X|Y (x, y), f X,Y (x, y) denote the conditional and joint distributions of the random variables X and Y , respectively. Also, let f Y (y) and f X (x) denote the marginal distributions of the random variables Y and X, respectively. Then, .
The last equality follows from Proposition 1. So, X ∼ N (aµ, σ 2 1 + a 2 σ 2 2 ). A.3. Hypergeometric distribution. Suppose an urn contains N distinguishable balls out of which R are red and the rest are white. A sample of size n is chosen from the urn without replacement. For k ∈ {0, . . . , n}, the probability that there are exactly k red balls in the sample is p(k; n, N, R) = R k N−R n−k N n .
Here p(k; n, N, R) is the probability mass function of the hypergeometric distribution H(k; n, N, R).
Let p = R/N and q = 1 − p. According to Problem 2 in Section 11 of Chapter II of Feller [16], Consequently, if N n, then p(k; n, N, R) ≈ n k p k q n−k . In other words, if N n, then the hypergeometric distribution is well approximated by the binomial distribution.
Another approximation of the hypergeometric distribution by the normal distribution appears in Problem 10 in Section 7 of Chapter VII of Feller [16]. Suppose t ∈ (0, 1) and p are such that n N → t, R N → p as n, N, R → ∞. Let h = 1/ Np(1 − p)t(1 − t) be such that h(k − np) → x. Then p(k; n, N, R) ∼ hΦ(x). Consequently, if Y is a random variable following the hypergeometric distribution H(k; n, N, R) then Y approximately follows N (pn, Np(1 − p)t(1 − t)). Conditions for the normal approximation to be meaningful and bounds on the error in the