Power decoding Reed-Solomon codes up to the Johnson radius

. Power decoding, or “decoding using virtual interleaving” is a technique for decoding Reed–Solomon codes up to the Sudan radius. Since the method’s inception, it has been an open question if it is possible to use this approach to decode up to the Johnson radius – the decoding radius of the Guruswami–Sudan algorithm. In this paper we show that this can be done by incorporating a notion of multiplicities. As the original Power decoding, the proposed algorithm is a one-pass algorithm: decoding follows immediately from solving a shift-register type equation, which we show can be done in quasi-linear time. It is a “partial bounded-distance decoding algorithm” since it will fail to return a codeword for a few error patterns within its decoding radius; we investigate its failure behaviour theoretically as well as give simulation results. This is an extended version where we also show how the method can be made practically faster using a reencoding or a syndrome formulation.

1. Introduction.Power decoding was originally proposed by Schmidt, Sidorenko and Bossert for low-rate Reed-Solomon codes (RS) [41].Using shift-register synthesis techniques, the method can decode as many errors as the Sudan algorithm [47].As opposed to Sudan's list decoder, Power decoding is a one-pass algorithm where decoding is realised by solving a simultaneous shift-register problem; however, Power decoding always returns at most one codeword and will for a few error patterns simply fail.Simulations indicate that this occurs very rarely for random errors 1  The Sudan decoder generalises to the Guruswami-Sudan decoder [17] by introducing the multiplicity parameter, improving the decoding radius for all rates up to the Johnson radius [18].Since [41], it has been an open question whether it is likewise possible to introduce a "multiplicity parameter" into Power decoding and thereby increase the decoding radius up to the Johnson radius.
We settle this question in the affirmative.The overall behaviour of the obtained decoder is similar to Power decoding: 1.The equations are of a generalised shift-register type, and no root-finding as in Guruswami-Sudan is necessary.
1.1.Related Work.Power decoding was introduced in [41,42]: for low-rate RS codes, it was shown how one can compute generalised syndromes from "powering" the received word, and that these can be used for efficient decoding by solving a multi-sequence shift-register synthesis problem.One chooses a "powering degree" ℓ: higher ℓ yields better decoding radius, but is admissible only for lower-rate codes.
In [42], a bound on the failure probability was given for RS codes over binary extension fields when ℓ = 2, but a general conjecture was given based on simulations results.The failure behaviour was then further examined in [54] and [32], where bounds on the failure probability were obtained over any field for ℓ = 2 and ℓ = 3.
In [32], a reformulation of Power decoding was given based on Gao's decoder [14], and this was used to show that whether or not Power decoding fails depends only on the error pattern, and not the sent codeword.The Guruswami-Sudan algorithm [17] is a polynomial-time list-decoding algorithm up to the Johnson radius J n,k = n − n(k − 1) [18]."List-decoding" means that the algorithm will return all codewords within the decoding radius.For the algorithm one chooses two parameters s, ℓ ∈ Z + , usually dubbed "the multiplicity" respectively "the list size".They satisfy s ≤ ℓ, and they need to grow large for attaining the best decoding radius: for a decoding radius of J n,k − εn, one needs s, ℓ ∈ O(1/ε) for any ε ∈ R + .See [31, p. 58] for an extreme numerical example with ε ≈ 1/n 2 .
As noted already in [42], Power decoding is related to Guruswami-Sudan when s = 1 (also known as "Sudan decoding" after [47]): choosing the same value for ℓ yields (almost) exactly the same decoding radius.Computationally, there are more similarities, as noted below.
One approach for fast Interpolation in Guruswami-Sudan has been to formulate "Interpolation key equations", as in [40] for the case s = 1, and [53] for the general case.These are shift-register-type equations whose solution result in an interpolation polynomial.These are related to Power decoding: the generalised syndromes in [40] equal those of the original Power decoding [41].However, the two sets of key equations are inherently different : the solution to the Power decoding equations yields the error locator, while no clear notion of an error locator is known for the Guruswami-Sudan algorithm.Similarly, the key equations that we derive in Section 3 bears a resemblance to the equations of [53], and it is an interesting question what the algebraic relation between the two approaches is.
The Wu decoding algorithm [52] is an amalgamation between classical key equation decoding [8] and the Guruswami-Sudan: one first attempts half-the-minimum distance decoding using the classical key equation (see the following section).If this fails, the polynomials computed in the failed attempt are then used to set up a problem solvable by an F(x)-variant of the Guruswami-Sudan algorithm.One again needs Interpolation and Root-finding sub-algorithms which are similar to, but slightly more involved than, for Guruswami-Sudan; see e.g.[7,10,49] for work on these.The best complexities for these steps equal those of the Guruswami-Sudan algorithm [7,10].However, from a practical perspective, the Wu algorithm is slightly more complicated to implement.The Wu algorithm is also a list-decoding algorithm, and also decodes up to J n,k .Also here one needs to choose parameters s, ℓ, whose growth relate to the decoding radius as in the Guruswami-Sudan algorithm [7].
1.2.Organisation.In Section 2 we give an introduction to the previous key equationbased decoding algorithms: half-the-minimum distance and Power decoding.In Section 3, we then derive the new key equations: non-linear relations between known polynomials, revealing the error.We derive a decoding radius in Section 4, and relate it directly to that of the Guruswami-Sudan algorithm.Power decoding will fail on certain error patterns within this radius, however, and we investigate this in Section 5.In Section 6 we give simulation results.In Section 7 we show how to efficiently solve the key equations.In Section 8 and Section 9 we investigate re-encoding respectively syndrome reformulations of the proposed key equations, providing practical -if not asymptotic -speedups to the decoder.
The decoding method has been implemented in Sage v8.0 [45] and can be downloaded from http://jsrn.dk/code-for-articles,together with the code for running the simulation.

Preliminaries and Existing Key Equations.
In complexity discussions, we count arithmetic operations in the field F. We will use ω as the exponent for matrix multiplication, i.e. 2 ≤ ω ≤ 3. We use O ∼ (•) as big-O but ignoring log-factors.In a few places we also use M(n) to denote the complexity of multiplying together two polynomials of degree at most n; we can trivially use M(n) ∈ O(n 2 ) or we can have M(n) ∈ O ∼ (n), see e.g.[50].
2.1.GRS codes.Consider some finite field F. Choose n ≤ |F| as well as distinct α 1 , . . ., α n ∈ F as well as non-zero (not necessarily distinct) β 1 , . . ., β n ∈ F. For any The α i are called evaluation points and the β i column multipliers.C has minimum distance d = n − k + 1 which is the maximal possible according to the Singleton bound.
Note that column multipliers can be ignored in decoding: we simply compute r ′ = (r 1 /β 1 , . . ., r n /β n ) = c ′ + e ′ , where c ′ is in the code C ′ which has the same evaluation points α i but where all β i = 1.e ′ is an error vector with the same number of errors as e.In the remainder of the article, we therefore assume β i = 1.
Introduce two essential polynomials, immediately computable by the receiver: G can be pre-computed, while R is computed upon receiving r using Lagrange interpolation.
Key equation decoders revolve around the notion of an error locator Λ and error evaluator Ω: where The following simple relation is at the heart of our investigations: Proof.The closed formula for Lagrange interpolation implies that The objects c, r, e, Λ, etc. introduced here will be used in the remainder of the article.

Classical Key Equations.
Let us revisit the key equation implicit in Gao's decoder [14], which follows directly from Lemma 2.1: This is a non-linear equation in the unknowns Λ and f , and it is not immediately obvious how to build an efficient decoder around it.The good -and classicalidea is to linearise the relation: we replace the sought quantities Λ and Λf with unknowns λ and ψ, both in F[x], and such that This is now a linear relation with infinitely many solutions.We further restrict the solutions by requiring Note that this is satisfied if λ is replaced by Λ and ψ by Λf .Finally, we seek such λ, ψ where λ is monic and has minimal degree.The hope is now that λ = Λ even though we solved for a much weaker relation than (1); effectively, it is therefore the low degree of (ΛR mod G) which is used to solve for Λ.Solving such requirements for λ and ψ is sometimes known as rational function reconstruction [50] or Padé approximation [3].They are easy to solve for in complexity O(n 2 ) or O ∼ (n), using e.g. the extended Euclidean algorithm [13,14,48].
Whenever 0 is not an evaluation point, i.e. α i = 0 for all i, then the equation can be rewritten to the more classical syndrome key equation [8].First some notation: for p ∈ F[x], let rev d (p) denote the reversal of the coefficients of p at degree d, i.e. rev d (p) = x d p(x −1 ) for some integer d ≥ deg p.To lighten the notation, we will often omit the d-argument when there is an implied upper bound on the degree of the polynomial being reversed; to be precise, note that we then reverse on the upper bound on the degree, and not on the actual degree which might happen to be lower.
Introduce S(x) as the power series expansion2 of rev(R)/rev(G) truncated at x n−k .Then by reversing Lemma 2.1 at degree ǫ + n − 1 we get: Since x ∤ rev(G) this implies the well-known formula: A (now less obvious) algebraic relation exist between rev(Λ) and rev(Ω).One can again show that this approach will succeed, i.e. in the end λ = rev(Λ), whenever ǫ ≤ ⌊(d − 1)/2⌋ [8].Slightly stronger, one can show that the approach will succeed if and only if the Gao key equation approach succeeds [32].

2.3.
Simply Powered Key Equations.(Simple) Power decoding, or decoding by virtual interleaving [42], is a generalisation of (1) where not one but multiple nonlinear relations between Λ and f are identified, essentially still based on Lemma 2.1.The original formulation of [42] is based on the classical syndrome key equation, while powering the Gao key equation was described in [32].We will begin with the latter: Lemma 2.2 (Simply Powered key equations).For any t ∈ Z + then Proof.By Lemma 2.1 we have Again this gives non-linear relations between Λ and f .To solve them efficiently, we choose some ℓ and linearise the first ℓ of the equations, introducing unknowns λ, ψ 1 , . . ., ψ ℓ ∈ F[x].We then solve for λ, ψ t such that λ is monic and of minimal degree such that Finally, we hope that the found λ = Λ.In that case f = ψ 1 /λ and decoding is finished.
By regarding the linearised problem as a linear system of equations over F, and counting available coefficients versus constraints, one arrives at an expression for the greatest number of errors we should expect to be decodable: This argument does not imply that we will necessarily succeed when the bound is satisfied: the constructed system might have spurious "false solutions" of degree less than or equal to that of Λ.In such rare cases decoding might fail for fewer errors than (3).Bounding the probability that this occurs has proven difficult: we now know upper bounds when ℓ = 2, 3 [32,42], and Schmidt, Sidorenko, and Bossert posed a conjecture, backed by simulation, on the probability in general [42].
From (3) we can determine the value of ℓ that maximise the decoding radius.Whenever k/n > 1/3, one should simply choose ℓ = 1, i.e. classical key equation decoding.Thus simple Power decoding is only useful for low-rate codes.Note that (3) is almost the same bound as the Sudan decoding algorithm [47], which is the Guruswami-Sudan algorithm with multiplicity 1.
Power decoding was originally described using a syndrome formulation instead of (3) [41]: we restrict ourselves to the case where 0 is not an evaluation point, and we define S (t) as the power series expansion of rev(R (t) )/rev(G) truncated at x n−t(k−1)−1 , where R (t) is the unique polynomial of degree less than n such that R (t) ≡ R t mod G. Then it follows from Lemma 2.2, by the same rewriting as in Section 2.2 [32], that: where Ω t are certain polynomials of degree at most ǫ − 1 that we omit defining explicitly.It can be shown using the same rewriting that Power syndrome decoding fails if and only if Power Gao decoding fails [32].
3. New Key Equations.In this section we describe the main result of the paper: a new generalisation of Power decoding where we introduce a second parameter, the multiplicity.The resulting relations will again be non-linear in Λ and f , and we will employ a linearisation strategy similar to before.
The generalised key equations are described in the following theorem: Theorem 3.1.For any s, ℓ ∈ Z + with ℓ ≥ s, then Proof.We simply rewrite for each i by Lemma 2.1.This finishes the first part of the theorem.If t ≥ s then for i = s, . . ., ℓ, the summand equals The above theorem describes ℓ equations in the (algebraically related) "unknowns" Λ s , Λ s−1 Ω, . . ., ΛΩ s−1 as well as Λ s f, . . ., Λ s f ℓ .These are "key equations" in the following sense: inner products of the unknowns Λ s−i Ω i with vectors of known polynomials (the t i R t−i G i ) equal the unknowns Λ s f t modulo G s -and hence have surprisingly low degree.
The relations of Theorem 3.1 are highly non-linear and solving for Λ and f directly would be computationally infeasible.Instead we linearise the relations: derive weaker, linear relations from Theorem 3.1 which can be solved efficiently: Problem 3.2.Find a vector (λ 1 , . . .λ s , ψ 1 , . . ., ψ ℓ ) ∈ F[x] s+ℓ with λ 1 monic and such that the following requirements are satisfied: Clearly Λ = (Λ s , Λ s−1 Ω, . . ., ΛΩ s−1 , Λ s f, . . ., Λ s f ℓ ) satisfies the requirements.The strategy is to find a minimal solution, by which we mean that deg λ 1 is minimal, and then hope that this solution is actually Λ.If that turns out to be the case, decoding can be completed simply by computing f = ψ 1 /λ 1 .Whether we can expect that to be the case is addressed in Sections 4 and 5.
The complete decoding algorithm is given as Algorithm 1, where we assume a solver for Problem 3.2.Note that Problem 3.2 could be solved as a series of linear systems in the coefficients of the λ i , one system for each guess at deg λ 1 .A much more efficient algorithm for solving Problem 3.2 is addressed in Section 7, where we obtain the complexity O ∼ (sℓ ω n) for Algorithm 1 (or O ∼ (s 2 ℓ ω−1 n) relying on the unpublished [37]).
Remark 3.3.The shape of the equations of Theorem 3.1 bears a striking resemblance to certain approaches for solving the Interpolation phase in the Guruswami-Sudan algorithm: the F[x] module characterisation as in [6,23], and the (intermediate) Interpolation key equations as in [53,Eqn. (31)].However, the Guruswami-Sudan algorithm has, a priori, nothing to do with the error locator, and the true connection between the two sets of key equations is unclear.For instance, it is not known if one can easily obtain the error locator from a Guruswami-Sudan interpolation polynomial or vice versa.

Algorithm 1 Efficient Power Decoding with Multiplicities
Remark 3.4.The original Power decoding can be described by analogy with decoding of certain Interleaved RS codes [42].It would be interesting to find a similar analogue for the key equations of Theorem 3.1.
4. Decoding Radius.We will now discuss how many errors Algorithm 1 will usually be able to correct.When calling this a "decoding radius" we need to be wary: indeed, the method will fail for certain received words whenever the number of errors is at least d/2, and this is unavoidable since it is a unique decoding algorithm.Therefore, "decoding radius" really involves two parts: 1) how many errors should we at most expect to be able to correct; and 2) what is the probability that we will fail when the number of errors is at most this.In this section we will answer the first of these questions, and turn to the latter in Section 5.
The decoding radius upper bound that we will derive is based on linear algebra: when the number of errors ǫ is large enough, then solutions to Problem 3.2 that are smaller than the sought Λ will appear.Proposition 4.1.Consider a received word r and the corresponding instance of Problem 3.2.There is a vector v = ( λ1 , . . ., λs , ψ1 , . . .ψℓ ) satisfying Items 1a and 1b of Problem 3.2 as well as: where Proof.Satisfying Items 1a, 1b of Problem 3.2 as well as Items 2' and Items 3' above is a homogeneous linear set of restrictions in the coefficients of the λ i : the linear combinations on the right-hand side of Items 1a and 1b should have bounded degree, either directly or reduced modulo G s .If there are more coefficients than constraints, there will be a solution.
Let us write τ = τ Pow (s, ℓ) for brevity; we will derive that if τ satisfies (5), then there will be a solution to the homogeneous system.For every t = 1, . . ., s − 1, Item 1a imposes C t constraints, where: For Item 1b, then ψ t has bounded degree modulo G s , so this gives for t = s, . . ., ℓ: We thus have a total number of constraints: The total number of coefficients in λ 1 , . . ., λ s is: The condition for a guaranteed solution is then K > t=1,...,ℓ C t , i.e.: Thus, there must be a solution satisfying Items 1a, 1b, 2' and 3' for τ satisfying: The solution λ1 , . . ., λs guaranteed by Proposition 4.1 will not necessarily solve Problem 3.2: it might e.g.be that deg λ1 < deg λ2 + 1 ≤ sτ Pow (s, ℓ).However, it is natural to suspect that, once there are solutions to the system of Proposition 4.1, there will be solutions with deg λ1 = sτ Pow (s, ℓ), and such solutions will necessarily also solve Problem 3.2.The minimal solution to Problem 3.2 will in such cases not be Λ that we are looking for.Therefore, we might expect to fail, whenever ǫ > τ Pow (s, ℓ).This intuition is completely backed by simulation, see Section 6: with high probability, decoding seems to fail if ǫ > τ Pow (s, ℓ), but for a few error patterns it does succeed after all.We will therefore regard τ Pow (s, ℓ) as the decoding radius of Algorithm 1.
The expression τ Pow (s, ℓ) turns out to related to something very well known: Corollary 4.2.Denote the maximal decoding radius of the Guruswami-Sudan algorithm on C with multiplicity s and list size ℓ by τ GS (s, ℓ).Then .
Taken over all s and ℓ, the decoding radius of Guruswami-Sudan describes a curve J(n, d) = n − n(n − d), often called the Johnson radius after [18].For any integer τ < J(n, d) there exists infinitely many choices of s, ℓ such that τ = ⌊τ GS (s, ℓ)⌋.Thus, by Corollary 4.2, Power decoding is similarly bounded by the Johnson radius (for s, ℓ → ∞ then τ GS (s, ℓ) − τ Pow (s, ℓ) → 0).The corollary even allows us to use closed-form expressions for small s and ℓ already analysed for the Guruswami-Sudan algorithm: As long as τ < J(n, d) then τ Pow s(τ ), ℓ(τ ) ≥ τ , where Proof.Since τ < J(n, d), it is a valid decoding radius for the Guruswami-Sudan algorithm, and so by [31, p. 53], then τ GS s(τ ), ℓ(τ ) ≥ τ .Therefore Corollary 4.2 gives us τ Pow (s, ℓ) ) , so we are done if s(τ ) ≤ s(τ ).But s min (τ ) is monotonically increasing for 0 < τ < J(n, d) so s(τ ) is non-decreasing, Remark 4.4.We remark that the condition of Proposition 4.3 that τ < J(n, d) seems almost always to be verified: for n < 100 an exhaustive search found only 50 choices of the triple (n, k, τ ) for which it was not verified, and in 48 of these cases n = k + 3.As an example of the tightness of the closed expressions, consider the large parameters [n, k] = [243320, 131155]: here the list size of Proposition 4.3 never exceeds the minimal possible by more than 1 for all possibly decoding radii.
5. Failure Behaviour.We will move on to investigate how Power decoding fails when at most τ Pow (s, ℓ) errors occur.There are two ways in which Algorithm 1 can give an unwanted answer: firstly, the algorithm can return fail; or secondly, the algorithm can return a different codeword than the sent one.For a specific sent codeword c and received word r, we say that Power decoding fails if one of the two following conditions are satisfied: 1. Algorithm 1 returns fail.2. There exists c ′ ∈ C, c ′ = c, and such that dist(r, c ′ ) ≤ dist(r, c).
Recall that when Algorithm 1 does not return fail, it always returns a codeword of minimal distance to the received.So if neither of the above conditions are satisfied, Algorithm 1 returns the correct answer.Contrarily, if only item 2 above is satisfied and dist(r, c ′ ) = dist(r, c), then c might still be correctly returned.However, it is much more likely that the found solution to the key equation in Line 3 will be some mix of the solutions corresponding to the two errors r − c and r − c ′ , in which case decoding will fail.For the sake of a cleaner definition, we therefore consider this possibility as a failure as well.
We will begin with showing that the error vector alone determines whether the method succeeds.This drastically simplifies further examinations on the failure behaviour.It allows us first to show the-quite expected-property that the method never fails when fewer than d/2 errors occur.Secondly, it allows us to give a closed upper bound on the failure probability when (s, ℓ) = (2, 3).Lastly, we discuss the relation between Power decoding failing and having multiple close codewords to the received word.Proof.It suffices to show that if Power decoding fails for r as received word, then Power decoding also fails for r+ ĉ where ĉ is any codeword.If decoding fails on input r this is because there exist λ 1 , . . ., λ s , ψ 1 , . . ., ψ ℓ ∈ F[x] which solve Problem 3.2, and where λ 1 = Λ s and deg λ 1 ≤ deg Λ s .Assume this is the case.Let R be the Lagrange interpolant corresponding to r + ĉ as received word, i.e.R = R + f where f = ev −1 (ĉ) and deg f < k.We will show that there exist ψ1 , . . ., ψℓ ∈ F[x] such that the λ i , ψt form a solution to Problem 3.2 for R in place of R. Therefore, Power decoding will also fail for r + ĉ as received word.
Consider for t = 1, . . ., ℓ the following expansion: Therefore, the above equals where we by "≡" mean = when t < s and congruent modulo G s when t ≥ s.We set ψt as the last expression above.By hypothesis, deg Since deg f < k we therefore get ψt − t(k − 1) < deg λ.This means the λ i , ψ t indeed form a solution to Problem 3.2 for R, as we set out to prove.
The proved implication can immediately be applied in the other direction since −ĉ is a codeword, showing the bi-implication.
For the case P (0), we need to prove that Λ | λ 1 and ψ s = 0. Consider the s'th key equation of Problem 3.2 which is satisfied by the λ i+1 and ψ s : Υ s divides each term of the summand, as well as the modulus G s , and so it must divide ψ s .However, we have where the last inequality holds since 2ǫ < n − k + 1.Thus ψ s = 0.
Returning to (6), we can then conclude Λ | λ 1 R s , since Λ divides every other term in the sum as well as the modulus.This implies Λ | λ 1 since gcd(Λ, R) = 1.
For the inductive step, assuming P (t − 1) we will prove P (t) for 1 ≤ t < s.Consider now the (s − t)'th key equation, i.e.
Similar to before, Υ s−t divides every term of the sum, so it divides ψ s−t .By P (t−1) then Λ t−i | λ i+1 for i = 0, . . ., t − 1, and therefore It remains to show that Λ t+1−i | λ i+1 for i = 0, . . ., t.For j = 1, . . ., t, multiply the (s − j)'th key equation with R j and relax it to a congruence modulo G s .We obtain t + 1 homogeneous linear equations in λ i+1 R s−i G i of the form: Subtracting the jth equation from the (j − 1)st for j = 1, . . ., t, we eliminate λ 1 and get This can be continued to get a series of equation systems, that is, for t ′ = 1, . . ., t, we have a system: For t ′ = t, the system (which is one equation) implies that Λ t+1 | λ t+1 R s−t G t since Λ t+1 divides all the sum's other terms and the modulus, and this implies Λ | λ t+1 .
We can now go to the t ′ = t − 1 system and regard any of the two equations, and we conclude similarly that Λ t+1 | λ t R s−t+1 G t−1 since Λ t+1 now is seen to divide all other terms of the sum as well as the modulus.This implies Λ 2 | λ t .Continuing with decreasing t ′ we can iteratively conclude This finishes the induction step, establishing P (t) for t = 0, . . ., s − 1.As mentioned, this implies a contradiction, finishing the proof.
We are now in a position to bound the probability that Power decoding fails if errors of a given weight are drawn uniformly at random, for the case (s, ℓ) = (2, 3).Note that by Proposition 4.1, then , so these parameters allow the decoder to improve upon both half-the-minimum distance and the original Power decoding whenever the rate is between 1/6 and 1/2, for long enough codes.
Proof.By Proposition 5.1, we can consider the probability over the choice of error vector, and simply bound the failure probability when the sent codeword was 0.
Since we know by Proposition 5.2 that the failure probability is zero when ǫ < d/2, then we can also assume ǫ ≥ d/2.Fix now the number of errors ǫ and error positions E, implying a specific Λ.For a given error e = r with these non-zero positions, we will call r, or R, "bad" if for R there exist λ i , ψ t solving Problem 3.2 and such that λ 1 = Λ s while deg λ 1 ≤ deg Λ s .Consequently, Power decoding fails only for bad error-values.Denote by S Λ ⊂ F[x] the set of bad R. We will give an upper bound N on the size of S Λ and so N/(q −1) ǫ bounds the probability that for the fixed error positions, Power decoding fails (since for each position, we have q − 1 choices of an error value).N will turn out to be independent of the choice of Λ, and thus N/(q − 1) ǫ is a bound on the probability that Power decoding fails for any error of weight ǫ.
By assumption, the following equations are satisfied: Since R(α i ) = 0 whenever i / ∈ E, then Υ | R where Υ = G/Λ.Thus the above implies Υ | ψ 1 and Υ 2 | ψ t for t = 2, 3. Furthermore, letting g gcd(λ 1 , Λ), we can conclude that g = gcd(ψ t , Λ) for all t.The regular form of the above three equations allows eliminating λ 1 and obtain: From this we first note that G | (ψ 2 − Rψ 1 ).We will use this fact momentarily.With the two above equations we continue to eliminate λ 2 and rewrite: This leaves the simple relation where ψt ψ t /Υ min (2,t) , and is a polynomial by our earlier observations.Thus, whenever R is bad, there is a triple ( ψ1 , ψ2 , ψ3 ) ∈ F[x] satisfying the above relation as well as deg ψt We will count the number of such triples momentarily.However, to thusly bound the number of bad error values, we have to determine how many different R could have the same triple.Recall that determining R up to congruence modulo Λ suffices, since this determines the error values.However, by our previous observation we have This means that for a given triple ( ψt ) t , having gcd( ψt , Λ) = g, there can be at most q deg g possible choices of R.
Consider now γ ≥ 0. We choose again first f 2 in one of q K2−1 ways.Then f 1 f 3 must be in the set {Bf 2  2 + pA | p ∈ F[x], deg p ≤ γ}, having cardinality at most q γ+1 .For each of these choices of f 1 f 3 , we can again choose f 1 and f 3 in at most (q − 1)2 K1+K3 ways.
The bound of Proposition 5.3 demonstrates a rapid, exponential decrease in the probability of failure as the number of errors decrease away from τ Pow (2, 3).The bound only becomes non-trivial a few errors below τ Pow (2, 3), due to the term 0.29ǫ/ log q − 1/4 in the exponent.For instance, for a [64, 27] code over GF (64), then τ Pow (2, 3) = 20 1 /4, but the bound is only less than 1 for ǫ ≤ 19, also for the unsimplified bound (9).Such a penalty is not observed in simulations, however, and seems to be an artefact of our proof.For the [64, 27] code, decoding succeeds almost always with 20 errors (see next section).Similarly, for a [256, 63] code over GF (256), the bound is only non-trivial for ǫ < 108, while (9) would be slightly better with ǫ < 110; however, in simulations decoding works almost always up to ⌊τ Pow (2, 3)⌋ = 112.
Proof.Consider the high-error failure probability of Proposition 5.3: 4(q −8 ) (τPow(2,3)−ǫ)−(0.29ǫ/ log q−1/4) ≤ 4(q −8n ) δ−0.29ǫ/(n log q)+1/(4n) , where δ = τ Pow (2, 3)/n − ǫ/n.Asymptotically δ approaches some positive constant, while the other terms in the exponent vanishes.The low-error case is similar.5.1.Failure Behaviour in Relation to List Decoding.It is natural to ask if the failure behaviour of Power decoding is linked to whether or not there are multiple codewords close to the received word, i.e. the list of codewords that e.g. the Guruswami-Sudan algorithm would return.There seems, however, to be no clear relation like this, as we explain below.
Consider that c ∈ C was sent and r was received.Suppose Power decoding has a decoding radius of τ , and that we have a list decoder of the same decoding radius.Consider that c ′ ∈ C is another codeword and assume that all other codewords are farther from r than c or c ′ .Then there are the following possibilities: Clearly, both Power decoding and the list decoder will fail in recovering c in Item 5 and Item 6.In Items 1-4, the list decoder guarantees to recover c on a list, though for Items 1-3 that list will have length at least 2.
For Power decoding it is less clear-cut.Firstly, for Items 1 and 2, then Power decoding "fails" according to the definition given at the beginning of Section 5. Indeed, for Item 2, then Power decoding guarantees to return c ′ or fail.For Item 1, however, then Algorithm 1 might be lucky and find c, but in all likelihood the obtained solution to Problem 3.2 will be some linear combination of the two solutions corresponding to c and c ′ ; probably Line 4 or at least Line 5 of Algorithm 1 will return fail.For Items 3 and 4, then Power decoding will probably obtain c; but in either case, one can construct examples where it will fail.That is, whether or not there is only one codeword within radius τ , then Power decoding might succeed or it might fail.
This last example was found by random generation of error vectors of weight 9, after roughly 47 000 successful decoding trials.As an aside, the failure probability bound of Proposition 5.3 gives the trivial bound 1 for the failure probability in this case.
6. Simulation Results.The proposed decoding algorithm has been conceptually implemented in Sage v8.0 [45], and is available for download at http://jsrn.dk/code-for-articles.The implementation follows the approach of Section 7, computing a solution basis using the Mulders-Storjohann algorithm [26].The asymptotic complexity of the implementation is therefore O(ℓ 3 s 2 n 2 ).
To evaluate the failure probability, we have selected a range of code and decoding parameters and run the algorithm for a large number of random errors.More precisely, for each set of parameters, and each decoding radius τ , we have created N = 10 5 random errors of weight exactly τ and attempted to decode a received word r = c + e for some randomly chosen c (though, of course, Proposition 5.1 implies that shifting by c makes no difference).We have limited the decoding radii used to being ⌊τ Pow (s, ℓ)⌋ + {−1, 0, 1}.The results are listed as Table 1.
[  1. Simulation results.P f (τ ) denotes the observed probability of decoding failure (no result or wrong result) with random errors of weight exactly τ .τ bnd indicates the number of errors ǫ for which Proposition 5.3 yields a bound < 1 (where applicable); in parentheses is if the probability estimate of ( 9) is used instead.
As is evident, τ Pow (s, ℓ) very clearly describes the number of errors we can rely on correcting: the probability of failing appears to decay exponentially with τ Pow (s, ℓ)− ǫ, as we might expect if extrapolating from the bound of Proposition 5.3.In fact, the failure probability is so low that it is difficult to observe failing cases for randomly selected errors.
The case having the highest failure rate is the very low-rate code [21,3] GF (23) .For such a low-rate code, τ Pow (s, ℓ) is quite close to the covering radius, and there is a significant probability that a random error will yield a received word which is closer to another codeword.In this case, Power decoding always fails.We performed another simulation for this code with 10 4 random errors of weight exactly 14 and decoding using the Guruswami-Sudan list decoder.This simulation gave a 16.1% chance that another codeword was as close or closer to the sent codeword.Thus most of the 19.7% failures of Power decoding stem from this.
7. Efficient Solving of the Key Equations.To solve Problem 3.2, we will leverage existing algorithms by modelling Problem 3.2 as a simultaneous Hermite Padé approximation (SH Padé), a well-studied computational problem.This problem does not fit perfectly to Problem 3.2, so to describe the modelling from one to the other we will introduce some technical notions pertaining to the solution sets of SH Padé problems.The upshot is Algorithm 2 and Corollary 7.10 stating that we can rely completely on existing sophisticated algorithms to solve Problem 3.2 in complexity O ∼ (ℓ ω n) (or the faster O ∼ (s 2 ℓ ω−1 n) if we rely on the unpublished [37]).

Definition 7.1 (Simultaneous Hermite Padé approximation). Given
(The modulo operation is element-wise, i.e. the i'th entry of λA is congruent to ψ i modulo Γ i .)SH Padé approximations have appeared elsewhere in coding theory: for the interpolation step of Guruswami-Sudan and the Wu decoding algorithms for Reed-Solomon and other codes [10,53], and for decoding of Hermitian algebraic-geometry codes [29].Computing solutions to these very general forms of Padé approximation goes back much further in the computer algebra community, though Γ i are usually powers of x, see e.g.[4,5] and the references therein.In the generality above, the problem was first considered in [31] solved using row reduction of F[x] matrices, and shortly thereafter in [10] solved as an F-linear system exhibiting block-Hankel structure [9].Even more general notions include minimal approximant basis, or order basis [15,16], and relation bases [27].
First we define a measure of how far a solution is from the degree bounds: Definition 7.2.For a given SH Padé problem with A ∈ F[x] s×ℓ as well as Γ i , T i , N t , and a vector λ = (λ 1 , . . ., λ s ), the discrepancy δ ∈ Z of λ (wrt.the SH Padé problem) is given as: , Note that a λ is a solution to an SH Padé problem if and only if its discrepancy is negative.We wish to link the type of degree restrictions of Definition 7.1 with those of Problem 3.2: for this, observe that a solution λ = (λ 1 , . . ., λ s ) where ψ = (ψ 1 , . . ., ψ ℓ ) = λA rem (Γ 1 , . . ., Γ ℓ ).This leads to the following lemma: Lemma 7.3.Consider a received word r, let τ ∈ Z ≥0 be a chosen decoding radius and assume at most τ errors occurred.Consider the SH Padé approximation defined by A = [A i,t ] ∈ F[x] s×ℓ as well as Γ t , T i and N t , given as: for t = 1, . . ., s − 1 and i = 0, . . ., s − 1 t i R t−i G i mod G s for t = s, . . ., ℓ and i = 0, . . ., s − 1 Let λ = (λ 1 , . . ., λ s ) be a solution to this SH Padé approximation with minimal discrepancy among solutions whose discrepancy equals deg λ 1 − T 1 .Then λ, ψ is a minimal solution to the instance of Problem 3.2 corresponding to r, where ψ = λA rem (Γ 1 , . . ., Γ ℓ ).
Proof.We first prove that λ is well-defined, that is there is such a solution to the SH Padé problem.We claim that if λ ′ , ψ ′ is a minimal solution to the instance of Problem 3.2 (which we know exists) then λ ′ is such a solution to the SH Padé approximation.Note first by the assumption on decoding radius that deg λ so Item 2 and Item 3 of Problem 3.2 imply the discrepancy condition.
The other direction is very similar: assume now λ is a solution to the SH Padé problem with discrepancy deg λ 1 − T 1 , and ψ = (ψ 1 , . . ., ψ ℓ ) = λA rem (Γ 1 , . . ., Γ ℓ ).Item 1b of Problem 3.2 is obviously satisfied; for Item 1a, we know λA * ,t ≡ ψ t mod Γ t for t = 1, . . ., ℓ, where A * ,t denotes the t'th column of A. Note that for t < s we have For these values of t we have Γ t = x sτ +t(n−1)+1 and so the congruence lifts to equality, i.e.Item 1a.Item 2 and Item 3 of Problem 3.2, follow directly from the discrepancy condition on λ together with (10).
For minimality, assume conversely that Since λ ′ is a solution to the SH Padé problem satisfying the discrepancy restriction, then this contradicts the minimality of λ.
7.1.Solution bases for Padé approximations.Lemma 7.3 states that special solutions to a specific SH Padé problem are actually minimal solutions to Problem 3.2.Many algorithms for solving SH Padé problems, and in particular the fastest ones known, actually find a basis of all solutions, for a notion of "basis" which we introduce momentarily.We will now show that such a basis must contain a solution satisfying the constraints of Lemma 7.3 and hence will be a minimal solution to Problem 3.2.This section uses a number of concepts which are standard in polynomial matrix literature, but less so in coding theory.They will not be used outside this section.
The degree of a vector v ∈ F[x] m or matrix A ∈ F[x] m ′ ×m is the maximal degree of its entries.The leading matrix of A, denoted LM(A) ∈ F m ′ ×m , has (i, j)'the entry equal to the coefficient of x di of A i,j , where d i is the degree of the i'th row of A. The leading indices of v, denoted leads(v) ⊂ {1, . . ., m}, are the indices of v which have degree deg v.In other words, LM(A) is non-zero exactly at the leading indices of the rows of A. We also introduce shifted variants of the above notions: given a "shift" h ∈ Z m , then deg h v := deg(vx h ), where x h is the diagonal matrix with entries x h1 , . . ., x hm .Similarly deg h A := deg(Ax h ); LM h (A) := LM(Ax h ); and leads h (v) := leads(vx h ).Note that if h has negative entries, this notation may formally pass over the ring of Laurent series.
We say that matrix A is h-row reduced if LM h (A) has full row rank, see [19,Ch. 6.3.2] and [4].For any M ∈ F[x] m ′ ×m with full row rank, there always exists another matrix A which is h-row reduced and has the same row space as M ; see e.g.[26] for a succinct iterative algorithm.Row reduced matrices derive their interest from having minimal row degrees of all possible bases of the same row space; this property can be generalised to the predictable degree property [19,Ch. 6.3.2], of which we will use the following variant: Proposition 7.5.Let h ∈ Z m be a shift, let A ∈ F[x] m ′ ×m be row reduced, and let a 1 , . . ., a m ′ be the rows of A. Let v ∈ F[x] m be any vector in the row space of A. Then there exists q = (q 1 , . . ., q m ′ ) ∈ F[x] m ′ such that v = qA and Proof.The existence of q is trivial since v is in the row space of A. Note that and so deg h v ≤ t.Let q ∈ F m be the scalar vector whose i'th entry is the leading coefficient of q i if i ∈ I and 0 otherwise.Let v ∈ F m be the scalar vector of x t 'th coefficients of vx h .Then (11) has full row rank.Therefore deg h (v) = t.Further, leads h (v) is then the indices of non-zero entries of v, i.e. the non-zero entries of qLM h (A), i.e. a subset of the non-zero columns of LM h (A ′ ), where A ′ is the rows of A indexed by I.
We are now in a position to define a notion of "basis" of all solutions: Not only will the rows of a solution basis B be solutions themselves; the main point is that they will span every single solution in a predictable way: any solution must be a linear combination of the rows of the complete, h-row reduced matrix B ′ , but due to the predictable degree property, for the h-degree of a vector to be negative, it must be spanned only by vectors with negative h-degree themselves, and with coefficients of bounded degree.
We now see an easy algorithm for solving SH Padé problems: set up M and compute a row reduced matrix B ′ which is left-equivalent to M .This could e.g.be done using the iterative Mulders-Storjohann algorithm [46], or using the reduction from row reduction to order basis [15,16].The latter yields a complexity of O ∼ ((s + ℓ) ω D), where D = max T i + max deg g t .
We will continue the discussion a bit further, since a result from [37] To solve Problem 3.2 using SH Padé approximations in the complexity of Proposition 7.8, the only remaining piece is to prove that a solution specification must contain a row for which we can apply Lemma 7.3.Proposition 7.9.Consider an SH Padé approximation problem with A ∈ F[x] s×ℓ as well as Γ i , T i , N t .If there exists a solution λ ∈ F[x] s such that its discrepancy equals deg λ 1 − T 1 , then such a solution with minimal discrepancy will appear in a solution specification.
Proof.Let B ∈ F[x] (s+ℓ)×(s+ℓ) be a completed, h-row reduced matrix, left-equivalent to M , corresponding to the solution specification, where h and M are as in Lemma 7.4.Let λ ∈ F[x] s be a solution with minimal discrepancy satisfying deg λ 1 − T 1 .Then there is ψ ∈ F[x] ℓ and q = (q 1 , . . ., q s+ℓ ) ∈ F[x] s+ℓ such that (λ|ψ) = qB and deg h (λ|ψ) = deg λ 1 − T 1 , i.e. 1 ∈ leads h (λ|ψ).By Proposition 7.5 then there is a row b i of B with 1 ∈ leads h (λ|ψ) and q To not contradict our choice of λ, then equality must hold, and λ ′ is a satisfactory solution appearing in a solution specification.
A complete algorithm for solving Problem 4 using solution specifications of SH Padé approximations is given as Algorithm 2.
Proof.Correctness follows from the results of this section and the last; note that the requirements of Proposition 7.8 are satisfied in our case and that D ∈ O(sn).For complexity, we merely need to argue that computing the solution specification of the SH Padé approximation dominates.Indeed, A can be computed using dynamic programming in O(ℓsM(sn)) by remarking that R t−i G i can be computed as the product of two previously computed terms of roughly half the size.The only other non-trivial computation is that of ψ which can also be carried out in complexity O(ℓsM(sn)).
Remark 7.11.For short block-lengths, it can be of interest to consider the computational complexity when not using fast arithmetic, i.e. taking M(n) = O(n 2 ) and n ω = O(n 3 ).In this regime, a much simpler algorithm than those mentioned in Corollary 7.10 is to compute a solution basis by applying the Mulders-Storjohann row-reduction algorithm [26].This yields the complexity O(ℓ 3 s 2 n 2 ), which is similar to complexities of interpolation algorithms for the Guruswami-Sudan in this regime, see e.g.[22,34,53].
8. Re-Encoding."Re-Encoding" is a simple technique invented by Kötter and Vardy, originally for reducing the computational burden of the interpolation step in the Guruswami-Sudan algorithm [21].It is especially powerful when using different multiplicities at each point, such as in the Kötter-Vardy soft-decision decoding version of Guruswami-Sudan [20].For the regular Guruswami-Sudan, and in usual asymptotic analysis where k/n is considered a constant, re-encoding does not change the asymptotic cost; however, it can have a significant practical impact on the running time, especially for higher-rate codes.We will now show that the re-encoding transformation easily applies to Power decoding as well.Consider that r is the received word.Using Lagrange interpolation, we can easily compute the unique ĉ = ev( f ) ∈ C such that ĉ and r coincide on the first k positions.Clearly, decoding r = r − ĉ immediately gives a decoding of r, and thanks to Proposition 5.1 we know Power decoding will succeed on r if and only if it succeeds on r.The idea of re-encoding is that the leading k zeroes of the resulting r might be utilised in the decoding procedure to reduce the computation cost of decoding r.
Assume therefore for this section that r is the received word after re-encoding and therefore has k leading zeroes.That means Ĝ | R where Ĝ = k i=1 (x − α i ).Consider the linearised key equations of Problem 3.2.Each of them are now divisible by Ĝmin(s,t) , and so we obtain: The elements ψt ψ t / Ĝmin(s,t) and R t−i G i / Ĝmin(s,t) are all polynomials, but of much lower degree than before.Thus, we can solve for λ i and ψt directly which has fewer coefficients.The degree restriction on ψt becomes The complete decoding algorithm is exactly as Algorithm 1 with Line 3 replaced by the re-encoded key equations, and where f in Line 4 can be computed as f = ψ1 Ĝ/λ 1 .
To solve the re-encoded key equations, we proceed exactly as before: the following is an analogue of Lemma 7.3 linking the restrictions on λ i and ψt to an SH Padé approximation, whose proof is analogous to that of Lemma 7.3.Lemma 8.1.Consider a received word r whose first k positions are 0, let τ ∈ Z ≥0 be a chosen decoding radius and assume at most τ errors occurred.Consider the SH Padé approximation defined by A = [A i,t ] ∈ F[x] s×ℓ as well as Γ t , T i and N t , given as: Let λ = (λ 1 , . . ., λ s ) be a solution to this SH Padé approximation with minimal discrepancy among solutions whose discrepancy equals deg λ 1 − T 1 .Then λ, ψ is a minimal solution to the instance of Problem 3.2 corresponding to r, where ψ = λA rem (Γ 1 , . . ., Γ ℓ ).
As mentioned, the asymptotic complexity of solving the SH Padé approximation of Lemma 8.1 is not lower than that of Corollary 7.10, in the usual asymptotic regime where we take k ∈ Θ(n).However, considering Proposition 7.8, the expression D becomes s(τ + n − k) rather than s(τ + n).This should give a noticeable, constant factor speed-up for the complete decoding algorithm.9. Syndrome Key Equations.As described in Section 2, the first key equation decoding algorithm was based on the notion of syndrome polynomial [8], and similarly, Power decoding without multiplicities was first described using a similar list of key equations [42].The key equations of Theorem 3.1 can similarly be rewritten to be based on syndrome polynomials, which we will show in this section.As is usual for syndrome-formulated key equations, we will assume that 0 is not used as an evaluation point.Therefore x ∤ G. Furthermore, due to a non-essential technicality, we will assume s < n.If this did not hold, the following analysis of parameters would be slightly more complicated but not impossible.
Recall the reversal operator rev d (p) which we defined in Section 2.2.Define for a given value of the multiplicity s the following variants of the powered Lagrange interpolant R as well as a generalised notion of syndrome: R (i,t) R t−i mod G s−i S (i,t) = rev(R (i,t) ) rev(G) s−i .
Note the degree that the reversal-operator on rev(R (i,t) ) uses: if t − i ≤ s − i then R (i,t) = R t−i so the degree upper bound is (t − i)(n − 1).If t − i > s − i then deg R t−i > deg G s−i since we have assumed s < n, and therefore deg R (i,t) ≤ (s − i)n − 1.
If s = 1 then S (1,1) equals the classical syndrome polynomial S which we used in Section 2.2, and S (1,t) equals the syndromes S (t) discussed in Section 2.3.The syndromes S (i,t) also appear (with a slightly different definition) in the Interpolation key equations for Guruswami-Sudan by Gentner et al. [53].We can then formulate the-markedly more involved-syndrome variant of Theorem 3.1: Theorem 9.1.For any s, ℓ ∈ Z + with ℓ ≥ s, then there exist g t ∈ F[x] for t = s, . . ., ℓ such that Proof.We need to distinguish between two cases: t < s and t ≥ s.Assume first t < s.Since R (i,t) = R t−i , Theorem 3.1 gives us where ǫs + t(n − 1) arise from counting the degree upper bound on the left-hand side.Every term in the sum has the same degree bound, so we get t i=0 rev(Λ s−i Ω i ) t i rev(R (i,t) )rev(G) i = rev(Λ s f t )x t(n−k) =⇒ t i=0 rev(Λ s−i Ω i ) t i rev(R (i,t) )rev(G) i ≡ 0 mod x t(n−k) ⇐⇒ t i=0 rev(Λ s−i Ω i ) t i S (i,t) ≡ 0 mod x t(n−k) , where the last line follows from rev(G) s being invertible modulo x t(n−k) .This concludes the case t < s.
For the case t ≥ s, we proceed similarly.In the congruence of Theorem 3.1, we can readily replace R t−i G i with R (i,t) G i modulo G s .This gives us: for some rev(g t ) ∈ F[x].The degree of the right-hand side is bounded as This immediately bounds deg g t as the theorem states.Note that the above equals ̺ t + sǫ + t(k − 1) in all cases.We can now reverse the equation as in the previous case.When t > s then the degree bound on the summands are not all the same, so The exact failure behaviour of the decoding method remains largely open.For s = 1, i.e. the original Power decoding, the failure probability has previously been bounded only for ℓ = 2, 3.The case s > 1 seems no easier to analyse: Proposition 5.1 simplifies the equations one needs to analyse, and this was instrumental in the case for which we were able to bound the failure probability: (s, ℓ) = (2, 3).For these parameters, the decoding radius improves upon the case s = 1 whenever the rate is within ]1/6; 1/2[.The claimed decoding radius of the decoder for other parameters was backed by simulations on a range of codes: this demonstrates a failure probability which seems to decay exponentially as the number of errors is reduced.
We also discussed two variants of the decoding method which reduces the cost in practice: re-encoding and a syndrome formulation.Either method roughly replaces the complexity dependency on n with n − k.More detailed analysis, and concrete choices of basis reduction algorithms is necessary to determine which one is fastest in practice.
The proposed decoding algorithm has already been adapted to improve decoding of Interleaved RS codes [36].Power decoding has previously been applied to other codes as well, e.g.Complex RS codes [25], and it seems clear that the proposed addition of multiplicities can aid those applications as well.Another interesting question is to extend Power decoding to soft-decision decoding, similar to Kötter-Vardy's variant of the Guruswami-Sudan algorithm [20].
11. Acknowledgements.The author would like to thank Vladimir Sidorenko, Martin Bossert and Daniel Augot for discussions on Power decoding and this paper.The author gratefully acknowledges the support of the Digiteo foundation, project IdealCodes while he was with Inria, and also, while the author was with Ulm University, the support of the German Research Council "Deutsche Forschungsgemeinschaft" (DFG) under grant BO 867/22-1.

Proposition 5 . 1 .
The success of Power decoding r = c + e depends only on the error e.

Definition 7 . 6 .
Consider a given SH Padé problem with A ∈ F[x] s×ℓ as well as Γ t , T i and N t .Let B ′ be any matrix which is left-equivalent 3 to M and h-row reduced, where M and h are as in Lemma 7.4.Let B ∈ F[x] m×(s+ℓ) consist of the rows of B ′ with negative h-degree.Then B is a solution basis to the SH Padé problem.
To allow for efficient solving, we forget this relation, and replace rev(Λ) and −rev(Ω) by unknowns λ and ω, and solve for the minimal degree λ satisfying λS ≡ ω mod x n−k and deg λ > deg ω .This time the modulus is a power of x; solving such an equation for λ and ω is n, k] q (s, ℓ) τ Pow P f (⌊τ Pow ⌋ − 1) P f (⌊τ Pow ⌋) P f (⌊τ Pow ⌋ + 1) τ bnd -which is not yet published -allows a faster algorithm if we only compute the first s columns of a solution basis: Definition 7.7.Consider a given SH Padé problem with A ∈ F[x] s×ℓ as well as Γ t , T i and N t .A solution specification is a matrix L ∈ F[x] m×s and discrepancies Algorithm 2 Solving Problem 3.2 using SH Padé approximation Input: R, G ∈ F[x], s, ℓ, τ ∈ Z + with s ≤ ℓ.Output: A minimal solution (λ|ψ) to Problem 3.2 if one exist, or fail 1 Compute A ∈ F[x] s×ℓ as in Lemma 7.3, and set Γ t , T i+1 , N t as in that lemma. 2 L, δ ← solution specification to the SH Padé problem of A and Γ t , T i+1 , N t .3 λ ← a minimal h-degree row of L among rows with 1 ∈ leads h (λ), where h is as in Lemma 7.4.If there is no such row, return fail.4 ψ ← λA rem (Γ 1 , . . ., Γ ℓ ). 5 return (λ|ψ).δ 1 , . . ., δ m < 0 such that there is a matrix B ∈ F[x] m×ℓ for which [L | B] is a solution basis, and whose rows have h-degree δ 1 , . . ., δ m .Proposition 7.8 ([37]).Consider an SH Padé approximation problem with A ∈ F[x] s×ℓ as well as Γ i , T i , N t , satisfying s < ℓ, and T i < deg lcm(Γ 1 , . . ., Γ ℓ ) for i = 1, . . ., s, and deg N t < deg Γ t for t = 1, . . ., ℓ.There exists an algorithm which computes a solution specification using (i,t)x ιi,t ≡ g t mod x ̺t for t = s, . . ., ℓ ,