SOME CRYPTANALYTIC AND CODING-THEORETIC APPLICATIONS OF A SOFT STERN ALGORITHM

. Using the class of information set decoding algorithms is the best known way of decoding general codes, i.e. codes that admit no special struc- ture, in the Hamming metric. The Stern algorithm is the origin of the most eﬃcient algorithms in this class. We consider the same decoding problem but for a channel with soft information. We give a version of the Stern algorithm for a channel with soft information that includes some novel steps of ordering vectors in lists, based on reliability values. We demonstrate how the algorithm constitutes an improvement in some cryptographic and coding theoretic ap- plications. We also indicate how to extend the algorithm to include multiple iterations and soft output values.


Introduction
For a general code with no special structure used for communication on the binary symmetric channel (BSC), the maximum-likelihood decoding problem (with some assumptions) is NP-hard. Still, decoding random linear codes is a central problem for many applications in cryptography, for example code-based crypto. Information set decoding (ISD) algorithms are the most promising candidates for solving instances of this problem.
The performance of these algorithms determines the security and hence the necessary parameters for many cryptosystems. The development of ISD algorithms include the Prange algorithm [23], the Lee-Brickell algorithm [18], the Stern algorithm [25], Canteaut-Chabaud [8], Ball-Collision Decoding [6], Finiasz-Sendrier [13], BJMM [3] and the recent improvement from May and Ozerov [19]. The Stern algorithm is the starting point for the most efficient algorithms in this class as it introduced a collision step that significantly decreased the complexity.
In this paper, we consider the decoding problem for a general code with no special structure used for communication on the Additive White Gaussian Noise (AWGN) channel using the Euclidean metric. This is motivated by the fact that we have seen some recent applications for such decoding algorithms in coding theory and cryptography. One such application is the recently proposed version of the McEliece Public Key Cryptosystem (PKC) using soft information [2]. Another is the use of such algorithms in side-channel cryptanalysis, see, e.g., [22]. A third one is a new hybrid decoding of low-density parity-check (LDPC) codes in space telecommand links [1].
The soft decoding problem has been studied extensively in coding and communication, see, e.g., [10,17], but mostly for special codes allowing efficient decoding. The study of general codes has been less intense. Early work by Chase [9] was followed by some work in the communication direction and a highly cited paper is [14]. More recently, fast soft-decision decoding of linear codes was considered in [1,11,26,28] and by Wu and Hadjicostis in [30]. The same problem considered in the context of side-channel cryptanalysis in cryptology can be found in [4,5,12].
In this paper, we give a version of the Stern algorithm for the decoding problem with soft information, named the soft Stern algorithm. The algorithm reuses some ideas from previous work, such as ordered statistics [14]. It uses the idea of sorting of error vectors in lists, based on reliability values [27], and presents a novel way of combining this idea with the structure of the Stern algorithm. This leads to better performance compared to previously suggested algorithms like the one in [22]. Initially we consider a one-pass algorithm that succeeds with some probability q. We can then repeat this one-pass algorithm to achieve a higher success probability, where the way it is repeated depends on the application. Later, we briefly consider extending the algorithm to also allow for multiple iterations.
Next, we demonstrate how this new algorithm can be used in cryptographic and coding theory applications. First, we present a very efficient attack on an idea of using soft information in McEliece-type cryptosystems presented at ISIT 2016 [2]. Not only do we severely break the proposed schemes, but our algorithm shows that the whole idea of using soft information in this way is not fruitful. Secondly, we show how our algorithm can be applied to side-channel attacks. The problem of soft decoding of general codes appears in side-channel attacks in both [21] and [22]. Using our algorithm, both of those attacks can be improved. Thirdly, we show how our algorithm can be used to improve the hybrid decoding of low-density parity-check (LDPC) codes [1]. Finally, we indicate that by using soft output, our algorithm can be applied to the problem of decoding product codes.
The remaining parts of the paper are organized as follows. In Section 2 we give some preliminaries on coding theory and the considered channel. Section 3 gives an overview of the new algorithm, and in Section 4 we give a complete example of the algorithm. In Section 5 we analyze its time complexity and give simulation results demonstrating the improvement compared to previously proposed algorithms. In Section 6 we indicate how to generalize our algorithm by allowing for multiple iterations and soft output. In Section 7 we cover different applications of the algorithm. Finally, Section 8 concludes the paper.

Preliminaries
We present some basic concepts in coding theory. Let F 2 denote the binary finite field, |x| the absolute value of x for x ∈ R, and ln(·) the logarithm with base e. Let π be a permutation of {1, . . . , n} and π −1 be its inverse. For a matrix G, we let π(G) denote the matrix obtained from G by permuting its column indices according to π.

Basics in coding theory.
Definition 1. An [n, k] binary linear code C is a k-dimensional vector subspace of F n 2 . Its co-dimension is r = n − k, characterizing the redundancy of the code. A generator matrix G of the linear code C is defined as a k × n matrix in F k×n 2 whose rows form a basis of the code. Equivalently, the code can be defined by a matrix H in F r×n 2 whose kernel is the code C, called a parity-check matrix of C. For a length n vector v, the support supp(v) is defined as Suppose an [n, k] binary linear code C with generator matrix G is used for transmission on the AWGN channel. Let c = (c 1 , c 2 , · · · , c n ) be a codeword to be transmitted. In Binary Phase-Shift Keying (BPSK) transmission, the codeword c is mapped to a bipolar sequenceĉ = (ĉ 1 ,ĉ 2 , · · · ,ĉ n ), whereĉ i ∈ R througĥ c i = (−1) ci , for 1 ≤ i ≤ n. For any binary vector x, we use the notationx to denote the result after applying the above mapping.
After transmission, where AWGN noise is added, the received vector is denoted r = (r 1 , r 2 , · · · , r n ), r i ∈ R for 1 ≤ i ≤ n, where r i =ĉ i + w i and w i are iid Gaussian random variables with zero mean and standard deviation σ. Since the values are floating-point, we say that we have soft information. If the noise would be binary, we would have worked with hard information. If we would translate each value of the r vector to its most probably binary value in c, we we would make a so called hard decision.
In the continuation, when discussing the reliability value of a position we refer to r i . When discussing how reliable a position is we refer to the absolute value of r i .
Our soft-decision decoding problem is now the following: Find the most likely codeword being transmitted when receiving r. We consider maximum-likelihood decoding (MLD). It is well known that the MLD metric becomes the squared Euclidean distance and that the codeword c closest to a received vector r is the one that minimizes the distance D(ĉ, r) = n i=1 (r i −ĉ i ) 2 (see, e.g., [29]). For binary codes, it is common to use the log likelihood ratio (LLR), which is defined as where p(r i |c i ) is the pdf of r i conditioned on c i . After some calculations one can rewrite this for the AWGN channel as We point out that we actually only need soft information in LLR form and the algorithm to be proposed works for any noise distribution, not just AWGN.
Finally, we introduce a class of codes for later use.

Definition 2.
A low density parity-check (LDPC) code is a linear code admitting a sparse parity-check matrix, while a moderate density parity-check (MDPC) code is a linear code with a denser but still sparse parity-check matrix.
In previous work, the Hamming weight of the row vector is usually employed to characterize its sparsity; LDPC codes have small constant row weights, MDPC codes have row weights O( √ n log n). These classes of codes are of interest since they

Algorithm 1 The Stern algorithm
Input: Generator matrix G, parameters p, l 1. Choose a column permutation π and form π(G), meaning that the columns in G are permuted according to π. 2. Bring the generator matrix π(G) to systematic form:  [20]. The codes used in the cryptosystem are linear codes with sparse parity-check matrices of the form, where n 0 is a small integer and each block H i , 0 ≤ i ≤ n 0 − 1, is a circulant matrix with size r × r, and H n0−1 is invertible. For simplicity, we assume that n 0 = 2 throughout the paper, unless otherwise specified. Thus, we consider codes with rate R = 1/2, length n = 2r, and dimension k = r. [2] is a recent code-based McEliece PKC proposal using soft information. Instead of generating intentional errors from a Hamming ball, the authors generate noise according to a Gaussian distribution. In the key generation, as in the QC-MDPC scheme, they generate a sparse parity-check matrix H with the form of (1) and use it as the secret key. The public key can be derived as the (dense) generator matrix G in systematic form corresponding to H. Given a message u ∈ F k 2 , let c = uG be the encoded codeword andĉ the codeword in R n . The ciphertext is

Soft McEliece. Soft McEliece
where w = (w 1 , w 2 , . . . , w n ) and w i (1 ≤ i ≤ n) is AWGN. The generation of w is repeated until the number of bit errors in r reaches a certain threshold. The decryption -decoding the received vector -can be performed using an iterative soft LDPC/MDPC decoder that uses the secret H, see [2,20].
2.3. The stern algorithm. The Stern algorithm finds a low weight codeword of Hamming weight w in the code described by G. Transform the generator matrix G to systematic form with generator matrix where I is the k × k identity matrix, Q is a k × l matrix and J is a k × (n − k − l) matrix. Let φ(x) be the value of a vector x in positions k + 1 to k + l, i.e. φ(x) = (x k+1 , x k+2 , · · · , x k+l ). The algorithm description is given in Algorithm 1.

A soft version of the stern algorithm
We now present as the main contribution a version of the Stern algorithm that uses soft information.
3.1. A one-pass soft stern algorithm. Receiving the vector r, one can obtain a binary vector by making bitwise hard decisions. We define Assuming that c i is uniformly distributed over F 2 , according to Bayes' law, the conditional probability Pr [c i = sgn(r i )|r i ], denoted p i , can be written as Also, define the bias as τ i = |p i − 1/2|. The problem of recovering the message from a ciphertext is solved by finding a minimum-weight codeword from a linear code with a generator matrix G sgn(r 1 ), sgn(r 2 ), . . . , sgn(r n ) .
This would, however, give a poor performance compared to what can be achieved when we use the soft information. Instead, we suggest to use the Stern algorithm as a basis and to modify the different steps to make use of the soft information in the best possible way. Initially, we consider only a single round in this algorithm, which will give a (small) probability q of success. In many (cryptographic) applications this is sufficient as one might repeat the decoding attempt many times and thus achieve an expected complexity which is a factor 1/q larger than the complexity of a single round. Later on, in Section 6.2, we indicate how to extend the algorithm to allow for multiple iterations.
The new algorithm can be divided into three steps in the following way: 3.1.1. Transformation. This step performs a column permutation and some transformations. Instead of selecting a random column permutation as in the original Stern algorithm, we consider only a single round and we use a permutation that puts the most reliable positions as the k + l first columns. These columns will correspond to the information set and l additional positions. Firstly, all the n coordinates r i are sorted according to the absolute value of their LLR and then we choose a set S containing the k + l most significant coordinates. Denote the set containing the other positions by O. We use π to denote a permutation such that π(S) = {1, . . . , k + l}.
The second condition on π is that the first k columns of π(G) are independent, forming a basis. We then derive a systematic generator matrix G * from G by permuting the columns using π and performing Gaussian elimination, giving where Q is a k × l matrix. The received vector r is permuted accordingly, giving vector π(r). The k first positions are now an information set, denoted I.
We next perform a transformation to ensure that the reliability value for each variable in the information set is positive. We first determine the most likely value for the variables in the information set, denoted by m, where m i = sgn(r π −1 (i) ), for 1 ≤ i ≤ k. This m corresponds to the codeword c = mG * . Then the vector π(r) is transformed to the vector r = (r 1 , . . . , r n ), where We have the following proposition.
Therefore, the transformation has not changed the problem, but the first k positions now all have positive reliability, which may ease the description in the continuation.
For the next step, we will consider the shortened code from (I Q) and try to find a list of codeword candidates close to r in the first k +l positions. For columns with indices in {k + 1, . . . , k + l} corresponding to the matrix Q in G * , we determine a syndrome s by s i = sgn(r k+i ), for 1 ≤ i ≤ l.
Codewords for the shortened code are vectors c (s) such that c (s) H = 0, where H = Q I l . As we change the signs of position k + 1, . . . , k + l to be all positive when we introduced the syndrome, our problem is finding the most probable low weight vectors z such that zH = s, assuming that the reliability values in position 1, . . . , k + l are all positive, i.e., assuming r i ≥ 0, for 1 ≤ i ≤ k + l. We next partition the set π(S) = {1, . . . , k + l} into two disjoint equal-sized parts, S 1 and S 2 , such that (2). For simplicity, we assume that S 1 = {1, . . . , In the algorithm, this is yet another condition to consider when selecting π. In the original Stern algorithm the choice of indices for the two sets does not influence the performance, but for the soft case it does, and this is the reason for the above condition.
3.1.2. Creating bit patterns and partial syndromes. We now create the most probable (low weight error words) z (1) having nonzero values only in S 1 . We store the corresponding partial syndrome for the code with transposed parity check matrix H , created as (z (1) , 0)H . As all reliability values are positive, the zero word is the most likely one, then different vectors of weight one, etc. Let z (1) run through T length-(k + l)/2 binary vectors with the largest probability i∈S1 Pr c i |r i . We build a table L 1 to store all selected z (1) together with the vector (z (1) , 0)H . The table L 1 is sorted according to this partial syndrome. Input: Generator matrix G, received vector r, parameters T = 2 l , δ Step 1: (1a) Choose a column permutation π such that 1) the first k + l positions in π(G) have the k + l largest |r i |'s and 2) the first k columns are independent and 3) i=1,2,.
Make the permuted generator matrix π(G) systematic: Permute and transform the received sequence r to make the reliability value for each coordinate in positions 1, 2, . . . k positive, following (3), giving r . (1c) Calculate the corresponding partial syndrome s and change the sign of any negative values of r k+1 , . . . r k+l .
Step 2: Construct a list L 1 storing the most probable vectors z (1) and the corresponding partial syndromes (z (1) , 0)H . Then construct another list L 2 storing the most probable vectors z (2) and the corresponding partial syndromes s + (0, z (2) )H .
Step 3: Sort the two lists according to their partial syndromes and search for collisions. For each colliding syndrome (z (1) , 0)H and s + (0, z (2) )H , create a new vector u by choosing the first k entries of (z (1) , z (2) ) and compute the correspondingĉ = (ĉ 1 , . . . ,ĉ n ), s.t.ĉ i = (−1) ci , where c = uG * . If D(ĉ, r ) ≤ δ, invert the transformations to get the codeword close to the original r and return it. If no c with D(ĉ, r ) ≤ δ is found, return failure.
We now repeat the same thing but for the subset S 2 , creating another table L 2 in a similar manner. In this case we run through the most probable vectors z (2) with nonzero positions only in S 2 . Each entry in the table consists of z (2) together with the partial syndrome s + (0, z (2) )H sorted according to the latter. Note that we add s in this case.
3.1.3. Colliding partial syndromes. Next, we search for partial syndrome collisions between the tables L 1 and L 2 . On average we obtain T 2 /2 l colliding vectors. Later we assume that we choose T ≈ 2 l to minimize time-complexity.
For each collision, we add the corresponding vectors (z (1) , 0) and (0, z (2) ), and create a new vector u by choosing its first k entries. Then we get a candidate codeword uG * . As a final step, we check the Euclidean distance between each candidate codeword and the received vector r . If it is sufficiently small we return it as the desired closest codeword.

3.2.
How to create the most probable vectors. In this part, we explain how to create the T most probable vectors required in Step 2 of the previous description.
Since the reliability values are all transformed to be positive, for the partitions S m , m = 1, 2, the most probable pattern is the all-zero vector 0, with probability P m = i∈Sm p i . The probability for a pattern with ones in positions in J and the remaining positions all zero is exp(− j∈J L j ) · P m .
Let I i1,i2,...,i k , where i 1 < i 2 < · · · i k , denote the bit pattern with the value 1 in positions i 1 , i 2 , . . . , i k , and the value zero in the other positions. For such a bit pattern, its function value is again f (I) = j∈I L j , where I = {i 1 , i 2 , . . . , i k }. Now, let R i denote the set of bit patterns with a 1 in position i and zeros in all positions after i. Sort the elements in R i by its function values in increasing order to form the list R i . Given a pattern I ∈ R i , by the successor of I we mean the next pattern in R i .
To solve our original problem we use a binary tree, where each node represents one of the lists R 2 , R 3 , . . . , R (k+l)/2 . Initially, let the nodes store the top element in each list R i , being the patterns I 2 , I 3 , . . . , I (k+l)/2 , respectively. Also, let each node store an index value 0. The root of the tree will have the pattern with the smallest function value, which initially is I 2 . Each parent node in the tree has a smaller function value than its child nodes.
Let A denote a list of the bit patterns we have found sofar, and their corresponding function values. Initialize this list with the all zeros pattern at index 0 and the pattern with a 1 in the first position at the index 1.
In each step of our algorithm we add the pattern of the root node to A. Assume the root node has the pattern I i1,i2,...,im,i . Then we replace the node label by the next pattern in the list R i . This is found by starting in A at the index of the pattern I i1,i2,...,im and finding the next pattern I j1,j2,...,jn,i in A such that j n < i. If no such pattern exists, we have used all patterns in R i and we can delete the node from the tree. Otherwise, we replace the node label by the pattern I j1,j2,...,jn,i and we also store the index in A of the pattern I j1,j2,...,jn . In either case, we end by rearranging the tree such that each parent node has a smaller function value than its child nodes.
The most expensive part of the algorithm is rearranging the tree. This requires at most log 2 (k + l)/2 function comparisons. If we store the function value for each pattern in A, calculating the function value for a new pattern in the tree only requires a single addition.

3.2.1.
Example of how to find the most probable vectors. An example of how to find the most probable bit patterns is illustrated in Figure 1. In this case we work with vectors of length 8 and the corresponding 8 real values are For the sake of clarity we work with the whole bit patterns, but storing only the indices of the positions with the value 1 is of course more efficient. At the beginning the list A = [(00000000, 0), (10000000, 0.1622)]. In each step we add the root node and its corresponding function value to A. We use the index of the root node to determine where in A we start looking for a new bit pattern. When we have found the next pattern we modify the root node and rearrange the tree. Notice that the bit pattern 11000000 does not have a successor node. Therefore, after adding the  The bit patterns in A gives us the 7 most probable bit patterns. By looking at the root node we see that pattern number 8 is 01100000.

A decoding example
This section contains a complete example of how a message is encoded, how Gaussian noise is added, how the errors are corrected using the proposed Soft Stern algorithm, and finally how the original message is recovered. The extended, binary Golay code is a linear, systematic, error-correcting code with parameters (n, k, d min ) = (24,12,8) and generator matrix G equal to  Let us use l = 4 and T = 2 l = 2 4 = 16. After performing a permutation of the positions such that the first k + l are the most reliable, such that the first k columns in the generator matrix are linearly independent, and such that the first k + l positions are split into two parts with approximately equal products of p i values, we obtain an r * vector equal to The corresponding systematic generator matrix G * is Encoding the message m, where each of the k positions of m are 1 if the corresponding position in r * is positive and 0 otherwise, then changing the sign for each position in r * where the corresponding position in the encoded vector mG * is positive, results in an r vector equal to Using the first half of the LLR values to create the T most probable vectors on the form (z 1 0) and their corresponding syndromes (z 1 0)H T we get the following list of vectors and syndromes Using the last half of the LLR values to create the T most probable vectors on the form (0 z 2 ) and their corresponding syndromes (0 z 2 )H T + s we get the following list of vectors and syndromes Colliding these vectors we get the following list of possible candidates for a solution (where the first half of each row corresponds to z 1 and the second half corresponds to z 2 ) For each candidate we invert the permutation corresponding to the sorting of the LLR values. Then we pick the k first bits and create the message u 0 . We then encode the message using G * to c 0 = u 0 G * and transform each 1 to -1 and each 0 to 1, creatingĉ 0 . Then we calculate the Euclidean distance betweenĉ 0 and r . The vector c 0 that will lead us to the original message is probably the one with the smallest Euclidean distance. In this example the smallest Euclidean norm corresponds to candidate number 4.
Inverting the sorting step and picking the k first bits gives us

Complexity analysis and simulations
A suitable complexity measure is given by C one-pass /Pr [A], where C one-pass is the complexity of one pass of the algorithm and A represents the event that after the permutation and the transformation, the actual error pattern in the first (k + l) positions is a summation of two vectors in the two lists, respectively (i.e., that the Soft Stern algorithm will find the correct message).
When estimating complexity for matrix operations, we note that we can inductively implement the vector/matrix multiplication of vM by adding a new vector to an already computed and stored vectorṽM, where supp(ṽ) ⊂ supp(v) and d H (ṽ, v) = 1. Thus, C one-pass measured in simple bit-oriented operations is roughly given by C Gauss + 4T · (n − k) + C create , where C Gauss is the complexity of Gaussian elimination that usually equals 0.5nk 2 and C create is the complexity for creating these most probable vectors. From Section 3.2 we have that C create = 2T log 2 ((k + l)/2) . Notice that the cost of creating the lists is low compared to calculating the partial syndromes and colliding these.
The probability Pr [A] is given by where Here Q l is the probability that k + l columns in a uniformly random, binary k × (k + l) matrix have full rank. Also, P (i) is the set containing the T index sets corresponding to the T different vectors in L i , for i = 1, 2. For the sizes of k used in this paper, with very good precision we have Here we have Q 0 ≈ 0.2888, and for each new column that is added the probability of not getting a matrix with full rank is roughly halved. Thus the probability of getting a full rank matrix increases fast with l. The next subsection will try to estimate (the remaining factors of) the probability Pr [A].

Estimating and simulating Pr [A]. As Pr [A]
depends on the received vector r, it appears to be quite complicated to provide a useful explicit expression or approximation for E(Pr [A]), where the expectation is over r. We choose instead to provide thorough simulation results to illustrate how Pr [A] compares to other previous algorithms. In our comparison, Pr [A] directly translates to the success probability for the algorithm in question. We have simulated the following algorithms: • The Soft Stern algorithm as proposed in the paper.
• Ordered Statistics Decoding (OSD). As explained in for example [15,27,30], we select the k most reliable and independent positions to form the most reliable basis (MRB). The error patterns in the list are chosen according to their reliability. • Box-and-Match approach [28]. The essence of this algorithm is Stern-type, choosing the operated information set from the most reliable positions (i.e., an extension of MRB). However, they ignore the bit reliability when building the two colliding lists. For ease of comparison, we estimate the performance of its variant similar to the newly proposed algorithm but without choosing the error patterns in the colliding lists according to their reliability. • A hard-decision Stern algorithm. This is a simple approach where we first make a hard decision in each position and then apply the original Stern algorithm. Each position of the received vector is X i ∼ N(1, σ) if zero is sent and hard decision gives a bit error if X i < 0. The bit error probability is The probability of t errors is n t p t (1 − p) n−t . The simulation results show that for the simulated parameter setting, this algorithm performs much worse than its three counterparts. We thereby removed it from our comparisons in the plots for readability. For simplicity of analysis we compare single iteration versions of the algorithms. Techniques for taking advantage the soft information in multiple iterations is discussed briefly in Section 6.2, and can be applied to any of the algorithms. For a fair comparison, we assume that the complexity in one-round is approximately C Gauss + C · (n − k)T , where C is a small constant. Thus, we assume that for every algorithm, the size of one list is limited to T = 2 l (to 2T for OSD, since only one list is built in this algorithm). The comparison of E(Pr [A]) for various k, σ, T is shown in figures below. In all figures we let n = 2k. Thus, we have a code rate of 1/2. In all figures we ignore the Q l factor. 1 We look at two different scenarios, one with parameters applicable to a cryptographic scenario and one with parameters applicable to a coding theoretic scenario.
We have implemented the algorithm in Sage [24] 2 . The implementation covers the algorithm as described in Section 3. It was used to create the example from Section 4 and for simulating the success probability in this section. The source code for the implementation can be obtained upon request. 5.1.1. Cryptographic scenario. In cryptographic applications of general soft decoding algorithms it is not uncommon to see a very small, but still non-negligible success probability Pr [A]. A large value of T is typically used. To compare the performance of the algorithms in a crypto scenario we use large σ values. We let σ vary between 0.65 and 1 (in the latter case the capacity of the channel and the code rate are equal). We let T = 2 l = 2 20 . In Figure 2, we plot the logarithm of the success probability as a function of σ in the cases where k = 256 and k = 1024. In both cases our soft Stern algorithm performs much better than the other algorithms. Notice that the scale on the y-axis is not the same in the two plots.

5.1.2.
Coding scenario. In a coding scenario it is crucial that the word error probability 1 − Pr [A] is small. The acceptable value of T is smaller than in the cryptographic setting. To compare the algorithms we look at their probability of failing for small σ values. We vary σ between 0.4 and 0.65. We let T = 2 l = 2 10 . In Figure 3, we plot the failure probability as a function of σ in the cases where k = 128 and k = 512. again, in both cases our soft Stern algorithm outperforms the other algorithms. Again, notice that the scale on the y-axis is not the same in the two plots.
6. Generalizations 6.1. Soft output. The algorithm can easily be modified to allow for soft output. The algorithm above outputs either the codewordĉ closest to the received vector r, or the first vector that is within some distance δ of r. Instead, we can keep a number of the vectors c i closest to r. Based on the probabilities of each of the corresponding bit patterns we can then calculate the weighted average for each position. Now the algorithm can output soft information. 6.2. Multiple iterations. If we are unsuccessful in our one-pass algorithm, we might want to allow for a new iteration, or many, to increase the success probability. We then suggest to swap one column from S with one column from O. We want to take advantage of the reliability values, while still having a degree of randomness in the swapping. The technique we suggest is the approach experimentally tested as the optimal in [22]. Here the probability of swapping out i ∈ S is proportional to the probability that the corresponding position is wrongfully classified, that is, where p i is the conditional probability of having a correct bitwise hard decision, as being defined in (2). The probability of swapping in j ∈ O is proportional to the squared bias of j, that is, τ 2 j / τ k ∈O τ 2 k , where τ j is the respective bias, i.e., τ j = |p j − 1 2 |. The complexity can be analysed by employing a Markov-chain model, as was done in [7,8].
7. Applications 7.1. Breaking soft McEliece. In [2], using soft information to enhance the performance of an attacking algorithm has been discussed, but no attacks below the security level have been presented. We show that soft McEliece can be broken by a trivial variant of Algorithm 2. The adversary will employ a simplistic version, i.e., keeping one element in each list. Therefore, she chooses l to be 0 and the considered error pattern is 0 in the k most reliable positions.
The attack can also be described as follows. The adversary chooses the k most reliable indices to form an information set I, makes a bit-wise hard decision sgn(·), and calculates the message u via a Gaussian elimination. She then tests whether this is a valid message. Otherwise, the adversary selects another ciphertext and tries again (if a single ciphertext can be broken the scheme is considered insecure). For one pass, the attack succeeds in case (i) that the sub-matrix corresponding to this information set is invertible and (ii) that there exist no errors among these positions. In implementation this latter probability is 0.98 if 80-bit security is targeted, and the expected complexity for recovering one message is about 3.5 Gaussian eliminations.
We give some intuition why this basic strategy can solve the decoding problem in soft McEliece for the proposed security parameters. In [2], one key security argument is that the total number of bit errors in one ciphertext follows a modified binomial distribution, which gives at least n 2 erfc 1 √ 2σ bit errors. However, for the most reliable coordinates, the number of bit errors are very few. We see that the expected number of bit errors among the n 2 most reliable bits is only 0.022 (or 0.015) using the parameters for 80 (or 128)-bit security. Most of the error positions are among the least reliable ones. 7.1.1. Moving to a higher noise level. Though this simplistic attack works well for soft McEliece, Algorithm 2 performs much better when the size of the targeted instance increases. Therefore, one should employ the full algorithm when aiming for cryptosystems with a reasonable security level.
A higher noise level increases the decryption (decoding) error probability. If (n, σ) = (7202, 0.66), for instance, the 3601 most reliable bits are error-free with probability 2 −13.0 . Hence, on average about 29, 000 Gaussian eliminations are required using this simplistic attack. By using soft Stern, setting l = 23 and choosing a suitable δ in Algorithm 2, we can reduce the expected complexity to around 23 Gaussian eliminations 3 .

7.2.
Applications in side-channel attacks. Transforming some problems in side-channel analysis to that of decoding random linear codes originates in [5]. In this context, although the noise distribution is not exactly a Gaussian, soft information can still be exploited, making Algorithm 2 more efficient than other ISD variants. For side-channel attacks in [21,22], the following modified version of the LPN problem occurs. Here, Ber(p) denotes a random variable that takes the value 1 with probability p and 0 with probability 1 − p, and ·, · denotes the binary inner product of two vectors.
Definition 3 (Learning Parity with Variable Noise (LPVN) [22]). Let s ∈ F k 2 and ψ be a probability distribution over [0, 0.5]. Given n uniformly sampled vectors a i ∈ F k 2 , n error probabilities i sampled from ψ, and noisy observations b i = a i , s + e i = c i + e i , where e i is sampled from Ber( i ), find s.
They solve the problem by translating it into decoding a random linear code with soft information. They apply Stern's algorithm, but they do not sort the error patterns based on their probability of occurring. In this case the error is not Gaussian, but with some minor modifications our algorithm can still be applied. We sort the positions based on the i values. The smaller i is, the more reliable the position is. Next, we have After having done the transformations, such that the all-zero vector is the most probable vector in an index set S, the probability of a bit pattern with ones in positions in J ⊂ S and zeros in the other positions is With some minor adjustments, the method for finding the most probable bit patterns, described in Section 3.2, can now be used.
7.3. Hybrid decoding. Another problem suited for our algorithm can be found in [1], where the problem of decoding linear codes with soft information appears. They analyze two codes proposed for space telecommanding. Both are LDPC codes, with (n, k) = (128, 64) and (n, k) = (512, 256) respectively. A hybrid approach for decoding is used. First one applies an efficient iterative decoder. In the few cases when the iterative decoder fails, one uses a decoder based on ordered statistics, thereby reducing the risk of decoding failure drastically. However, the proposed ordered statistics algorithm does not make use of a Sterntype speed-up. It orders the positions after decreasing reliability. Then they try different error patterns in the k most reliable positions. Using our soft Stern algorithm, we instead divide the most reliable k + l positions into two sets, and then look for collisions between the partial syndromes of the bit error patterns in the two sets. Adding such a Stern-type modification would greatly improve their ordered statistics decoder. 7.4. Product codes. An application of the soft Stern algorithm with soft output is the decoding of product codes. Consider the serial concatenation of two codes, that do not have an efficient decoder with soft output. A small example would be the Golay code. An iterative decoder for this product code can be constructed by using the soft Stern algorithm with soft output together with a message-passing network (Tanner graph) between code symbols in the product code. Investigating this idea in more detail is an interesting research direction.

Conclusions
We have presented a new information set decoding algorithm using bit reliability, called the soft Stern algorithm. The algorithm outperforms what has been previously suggested for decoding general codes on the AWGN channel and similar tasks.
It can be utilized for a very efficient message-recovery attack on a recently proposed McEliece-type PKC named Soft McEliece [2], for an improved hybrid approach of decoding LDPC codes as in [1], and for side-channel attacks as in [21,22]. We have also mentioned its use for decoding product codes.
Some modifications, such as multiple iterations of the algorithm, and producing soft output values, were discussed but not explicitly analyzed. Some ideas of future work may include further analyzing its use in iterative decoding and extending and deriving the exact algorithmic steps when considering multiple iterations.