On the condition number of the critically-scaled Laguerre Unitary Ensemble

We consider the Laguerre Unitary Ensemble (aka, Wishart Ensemble) of sample covariance matrices $A = XX^*$, where $X$ is an $N \times n$ matrix with iid standard complex normal entries. Under the scaling $n = N + \lfloor \sqrt{ 4 c N} \rfloor$, $c>0$ and $N \rightarrow \infty$, we show that the rescaled fluctuations of the smallest eigenvalue, largest eigenvalue and condition number of the matrices $A$ are all given by the Tracy--Widom distribution ($\beta = 2$). This scaling is motivated by the study of the solution of the equation $Ax=b$ using the conjugate gradient algorithm, in the case that $A$ and $b$ are random: For such a scaling the fluctuations of the halting time for the algorithm are empirically seen to be universal.


Introduction
Consider the sample covariance matrix A = XX * where X is an N × n matrix with iid entries with some distribution F . Our main result is a limit theorem (see Theorem 1.3) for the condition number of these matrices when 1 F ∼ X c , X c has the standard complex normal distribution 2 and The study of this scaling is motivated by a computational problem discussed below. In the case that F ∼ X c we refer to the matrices {A} as lying in the Laguerre Unitary Ensemble (LUE).
Here K Ai | L 2 ((s,∞)) represents the integral operator with kernel K Ai acting on L 2 ((s, ∞)). The function F 2 (s) is the distribution function for the largest eigenvalue of a random Hermitian matrix in the edge-scaling limit as N → ∞ and is known as the Tracy-Widom (β = 2) distribution [25]. For a positive Hermitian matrix A, let λ max be the largest eigenvalue of A, λ min be the smallest and let κ = λ max /λ min be the condition number. Fix c > 0. We prove:   History of the problem. The study of the eigenvalues, and in particular, the condition number, of random positive definite matrices has a rich history in mathematics and statistics going back at least to the seminal paper of Goldstine and von Neumann [10]. The exact distributions of the largest and smallest eigenvalues of sample covariance matrices, with iid columns, were computed in [23] and [17], respectively, in terms of infinite series and hypergeometric functions for any finite N and n. When F is either a standard real or standard complex Gaussian distribution and α = 0, Edelman [7] determined the scaling limit of the smallest and largest eigenvalues and the condition number as N → ∞, It also can be shown that the condition number distribution is heavy-tailed for finite N because the density of λ min does not vanish near zero -λ min is exponentially distributed with parameter N/2 [7]. As noted by Johnstone 3 [15], a by-product of Johansson's work on last-passage percolation [14] is that the fluctuations of the largest eigenvalue of LUE with α fixed are given in terms of the Tracy-Widom distribution F 2 (t) as N → ∞. When α = cN , c > 0, it can be shown that the smallest eigenvalue also has Tracy-Widom fluctuations in the N → ∞ limit [1] and that the condition number has fluctuations given by the convolution of two independent, but scaled, Tracy-Widom distributions. This result does not appear to be explicitly stated in the literature but it follows from the asymptotic independence of the extreme eigenvalues [2]. See [13] for the case of n N 3 .
In terms of interpolation between the limiting condition number distribution F E (t) at α = 0 and the convolution of two Tracy-Widom distributions when α = cN , we see that the scaling (1.1) is sufficiently strong to force λ min away from zero and give pure Tracy-Widom statistics for the condition number. Assuming independence of the smallest and largest eigenvalues, from Theorems 1.1 and 1.2 we have Here ξ (1) GUE and ξ (2) GUE are iid random variables with P(ξ (1) GUE ≤ t) = F 2 (t). Then using a Neumann series we have a formal expansion If α = cN , the first two terms in this expansion are of the same order and dominate the expansion. Thus, it is clear why the convolution is involved. Also, for α = 0 it is clear that such an expansion cannot be justified. In the case of (1.1), α ν and just the first term is dominant. Lemma 4.6 rigorously justifies the formal expansion in this case. It does not appear that the scaling (1.1) has been treated previously in the literature.
Remark 1.1. Our calculations in this paper are for the case α = (4cN ) γ , γ = 1/2. Note that the first term in the right-hand side of (1.3) is still dominant for 0 < γ < 1. For this reason we anticipate a similar Tracy-Widom limit theorem for the condition number for 0 < γ < 1. In addition, we anticipate that the conclusions of the numerical studies that we discuss below when γ = 1/2 will be unchanged for 0 < γ < 1.
Our method of proof makes use of the asymptotics of Laguerre polynomials {L (α) n (x)} n≥0 which are orthogonal with respect to the weight x α e −x dx on [0, ∞). These asymptotics are derived via the Deift-Zhou method of nonlinear steepest descent as applied to orthogonal polynomials [4] (see also [3] for an introduction). For fixed α, this problem was addressed by Vanlessen [27] for the generalized weight x α e −Q(x) dx (see also [21] for Q(x) = x). From the classical work of Szegö [24] and [27], the asymptotic expansion of L (α) N (x) as N → ∞ near x = 0 is given in terms of Bessel functions. Forrester [9] noted that this implies the statistics of the smallest eigenvalue are given in terms of a determinant involving the so-called Bessel kernel.
As is seen in Theorem 1.1, under the scaling (1.1), the asymptotics of Laguerre polynomials is given in terms of the Airy function, giving rise to the Airy kernel and producing Tracy-Widom statistics. This was noted first by Forrester [9]. This transition from Bessel to Airy can also be seen by considering the weight x α e −x−t/x dx for varying t [28]. The difference here is that this transition is induced via the parameter α that is naturally present in the Laguerre polynomials.
A computational problem. Our motivation for considering the scaling (1.1) comes from a computational problem. In numerical analysis, the condition number of a positive-definite N × N matrix A is arguably the most important scalar quantity associated to the matrix. Specifically, it controls the loss in precision that is expected when solving the system Ax = b. The condition number can also be tied directly to the difficultly encountered in solving the system by iterative methods. This is evident, in particular, in the conjugate gradient algorithm used to solve Ax = b [12]. The conjugate gradient algorithm is stated as follows (see [11] for an overview): Given an initial guess x 0 (we use x 0 = b), compute r 0 = b − Ax 0 and set p 0 = r 0 . For k = 1, . . . , N , 1. Compute the residual r k = r k−1 − a k−1 Ap k−1 where 4 a k−1 = r k−1 , r k−1 2 p k−1 , Ap k−1 2 .
If A is strictly positive definite x k → x = A −1 b as k → ∞. Geometrically, the iterates x k are the best approximations of x over larger and larger affine Krylov subspaces K k , as k ↑ N . In exact arithmetic, the method takes at most N steps: In calculations with finite-precision arithmetic the number of steps can be much larger than N . The quantity one monitors over the course of the conjugate gradient algorithm is the norm r k 2 . By [16], from which we see that the larger κ is, the slower the convergence. It is remarkable that the bound (1.4) does not depend explicitly on n, only implicitly through κ. We note that (1.4) was derived in [16] under the assumption of exact arithmetic, but (1.4) is also useful in calculations with finite precision, provided the effect of rounding errors is anticipated to be small. We are interested in the statistical behavior of the conjugate gradient algorithm (1)-(3) when A and b are chosen randomly. Let b have iid entries distributed according to a distributionF which may differ from the matrix-entry distribution F . We use the pair E = (F,F ) to refer to the ensemble, encoding the distribution of the entries of both A and b in Ax = b. Let > 0, E, N > 0 and n > 0 be given. For a pair (A, b) we define the halting time T ,E,N,n = T ,E,N,n (A, b) to be the smallest integer such that r k 2 = r k (A, b) 2 ≤ . The residuals in the conjugate gradient method decrease monotonically, thus r k (A, b) 2 ≤ for k ≥ T ,E,N,n .
In [6], the authors performed a numerical study of T ,E,N,n as a random variable. For n = N + 2 √ N , Monte Carlo simulations for different ensembles E were used to show that the fluctuations of T ,E,N,n had a universal limiting form 5 . The motivational goal of this paper is to further our understanding of the universality that is observed in [6] under the scaling (1.1). To this end, we focus on the condition number κ = κ N,n = κ N,n (A) of the matrix A and use the the rigorous results above, numerical simulations of τ ,E,N,n [6] and κ N,n , and the estimate (1.4), to infer properties of the halting time distribution. Numerical results presented in Appendices A.1-A.3 illustrate the sharply different behavior of the condition number and halting time distributions in the cases α = 0, α = 2 √ N and α = N . Universality appears to depend critically on the choice α = 2 √ N : If α = 0 or α = N , the histogram for τ ,E,N,n does not appear to have a universal form. The numerical experiments also reveal an interplay between the tightness of the condition number distribution and tightness of the halting time distribution that is consistent with the following upper bound on T ,E,N,n , which is obtained by taking a logarithm of (1.4): In order to use this bound to estimate the expectation of T j ,E,N,n , j = 1, 2, . . . one must estimate We make some observations: does not provide an a priori bound on the higher moments of T ,E,N,n , N and n fixed, if the distribution P(κ N,n ≤ s) is heavy-tailed.
(ii) The right hand side of (1.5) may diverge as N → ∞ even when all moments are finite for finite N . 5 By universal, we mean that the histogram for τ ,E,N,n is independent of E for N sufficiently large and sufficiently small.
(iii) When the system is well-conditioned, for example if κ N,n = C +N −a ξ, a > 0, C ≥ 1, for a fixed, positive random variable ξ with exponential tails, all moments of T ,E,N,n are bounded by an N -independent constant.
In Appendix A.1, we present numerical experiments on both κ N,n and τ ,E,N,n in the ill-conditioned case, n = N , assumingF is uniform on [−1, 1] and F ∼ X c or F is Bernoulli. The numerical results in Figure 11(b) and 11(c) show that τ ,E,N,n does not converge as n, N → ∞ (Note from Figure 11(b), in particular, that the kurtosis for both distributions does not converge.). This case illustrates point (i): from [7] we see that the condition number distribution has infinite expectation and hence the right hand side of (1.5) is infinite. This is consistent with the numerical results, since the empirical mean of the halting time is (much) larger than N . Since the maximum number of steps in the conjugate gradient method is N in exact arithmetic, this also shows that round-off errors have degraded the accuracy of the computation.
The well-conditioned case, n = 2N , is considered in Appendix A.2. We assumeF is uniform on [−1, 1] and F ∼ X c or F is Bernoulli. In contrast with the case above, the condition number κ N,n satisfies (iii) in the limit N → ∞ with a = 2/3 implying, in particular, by (1.5) The numerical results in Figures 12(b) and 12(d) indicate that the random variable τ ,E,N,n remains discrete in the large N limit. Indeed, this is necessarily true for any sequence of integer-valued random variables (X N ) N ≥0 with sup N Var[X N ] < ∞: Clearly, in such a case the fluctuations cannot have a limiting distribution with a density. The only universal limit possible here is a statistically trivial point mass. Finally, we turn to the critical scaling, n = 2 √ N in Appendix A.3. It follows from Theorem 1.3 that As in (ii), the estimates on the right hand side of (1.5) diverge as N → ∞. This divergence is consistent with an empirically observed divergence in the mean and variance of T ,E,N,n . Indeed, in contrast with the well-conditioned case, such a divergence is necessary to obtain a non-trivial limiting distribution for τ ,E,N,n . Further, in contrast with the ill-conditioned case, the empirical mean of the halting time, while large, is (much) smaller than N and rounding errors do not appear to play a dominant role. The paper is organized as follows. In Section 2 we review the global eigenvalue density for LUE and discuss its connection to Laguerre polynomials and hence to a Riemann-Hilbert problem. In Section 3, we use classical Riemann-Hilbert analysis to rigorously determine the asymptotics of the Laguerre polynomials that appear in the global eigenvalue density. In Section 4 we use these asymptotics to prove limit theorems for the distribution of the largest and smallest eigenvalues, along with the condition number. This final section contains the proofs of the main results. As the behavior of the conjugate gradient algorithm is universal with respect to the choice of E above, it is sufficient to consider one particular ensemble. For this reason we have decided to study the analytically tractable case E = LUE. We include a table of notation to guide the reader. N N = 2N (2 log 2 + 1), page 12 The branch of the logarithm that is cut on [0, ∞) with a real limit from above, page 10 The branch of the root that is cut on [0, ∞) with a positive limit from above, page 10 The analytic prefactor for S ← (z), page 20  We first recall the definition of the Laguerre kernel. These ideas are well-known and may be found in [9,Section 2]. Let A = XX * where X is an N × (N + α) matrix of iid standard complex Gaussian random variables. Then the eigenvalues 0 ≤ λ min = λ 1 ≤ λ 2 ≤ · · · ≤ λ N = λ max of A have the joint probability density The statistics of eigenvalues are more conveniently expressed as determinants involving Laguerre polynomials.
Recall that the Laguerre polynomials, {L , are a family of orthogonal polynomials on [0, ∞), orthogonal with respect to the weight e −x x α . We normalize them as follows [20] L (α) We then define the associated wavefunctions, orthogonal with respect to Lebesgue measure on [0, ∞), and the correlation kernel The kernel K N defines a positive, trace-class operator on L 2 ([a, b]). Since K N has finite rank, it is clearly trace class. To see that K N is positive, consider f ∈ C ∞ ((s, t)) with compact support and note that It is by now classical that the statistics of the eigenvalues λ 1 . . . < λ N may be expressed in terms of Fredholm determinants of the kernel K N [3,9] (which are well-defined since K N is trace-class). In particular, the statistics of the extreme eigenvalues are recovered from the determinantal formula By the Christoffel-Darboux formula [24], we may also write Thus, questions about the asymptotic behavior of K N (x, y) as N → ∞ reduce to the study of the large N asymptotics of L (α) What is new in this paper is the study of these asymptotics in the scaling regime for α, see (1.1).

The Riemann-Hilbert approach to Laguerre polynomials
To compute the asymptotics of the Laguerre polynomials we use their representation in terms of the solution of a Riemann-Hilbert problem and follow [27] for the general theory. We also refer to [21] for some explicit calculations. We note that we use Riemann-Hilbert theory as opposed to using the integral representation for Laguerre polynomials to determine the appropriate asymptotics. This is because in the scaling region of interest the Riemann-Hilbert method gives a direct and algorithmic approach to the difficulties arising from turning point considerations. Define the rescaled polynomials (cf. [21]) Here {π j (x)} are the monic orthogonal polynomials for the weight This scaling is chosen so that the asymptotic density of the zeros ofL N (x) as N → ∞ is supported on the interval [0, 1] (see [21] for α fixed). Following [8] we define where C Γ denotes the Cauchy integral operator and, by (2.1), In the remainder of the manuscript we use the notation 6 Y ± (z) = Y ± (z), z ∈ Γ to denote the boundary values of an analytic function Y (z) from the left (−) or the right (+) side of as one moves along an oriented contour Γ. See Figure 1 for schematic of a contour Γ. We also use the notation By general theory, the matrix Y (z) defined in equation (2.6), is the unique solution to the following Riemann-Hilbert problem: 6 We allow the ± to be in either the sub-or super-script for notational convenience.
Since det(Y (z)) ≡ 1, this identity is equivalent to equation (2.7). The goal of the reminder of this section is to solve for Y (z) asymptotically as N → ∞. This requires a series of explicit transformations or deformations: • T → S so that the jump matrices for S tend uniformly to the identity matrix on closed subsets of C \ [0, 1], and • S → E where the jump matrices of E tend to the identity matrix in L 2 ∩ L ∞ .
This procedure is now standard and a general reference is [4].

The first deformation: normalization at infinity
Before we proceed we introduce additional notation to fix branch cuts. For γ ∈ R let have its branch cut on (−∞, 0] such that (z) γ ← > 0 for z > 0, i.e., the principal branch. By contrast, let have its branch cut on [0, ∞) in order that (z) γ →,+ > 0 for z > 0, when the real axis is oriented left-to-right, i.e. the limit from above is positive. Note that for Im z > 0, (z) We similarly define branches of the logarithm. The map z → log ← (z) = log(z), denotes the principal branch of the logarithm, whereas has its branch cut on [0, ∞) so that log →,+ (z) > 0 for z > 1. It is then clear that if Im z > 0, log ← (z) + 2πi, if Im z < 0.
In this notation the left arrow ← is used for emphasis and signifies the principal branch. It is sometime omitted when there is no confusion. Also √ · is always used to denote the non-negative square root of a non-negative number.
As in [5], we remove the polynomial behavior of Y (z) at infinity using the log transform of the equilibrium measure as follows. In our case α/N → 0 as N → ∞ and the equilibrium measure (EM) is the so-called Marchenko-Pastur distribution [19] dµ(s) = 2 π 1 − s s χ [0,1] (s)ds, (2.8) where χ A denotes the characteristic function of the set A. We define the log transform We also introduce the following functions to simplify the analysis of the asymptotic behavior of g as z → ∞ and its jumps across the interval (0, ∞): (2.14) Proof. Parts (i) and (ii) follow directly from the definition of g(z) and log → (z). To establish (iii), we first show that extends to an entire function of z. First, it is clear that the function is bounded in the finite plane, and analytic for Re z < 0. For Re z > 0 we check the boundary values. For 0 < Re z < 1 we have Now, if z > 1 we use (2.13) and (2.14) and (2.15) holds. This shows that g(z) + φ → (z) is an entire function because 0 and 1 would have to be bounded, isolated singularities. To establish (2.12) we turn to an explicit integration of φ → (z). Consider for 0 < z < 1 Then, we must consider the analytic continuation of this function so that it is analytic in C \ [0, ∞). We find that if θ = arcsin √ z then If we take the (−) sign, we actually need π − θ to consider the correct branch. From this, it follows that First, it can be shown that if ψ → (z) = c ∈ R then z > 1. But for z > 1, ±ψ ± → (z) > 0 so that ψ → must map into either the upper-or lower-half plane. We find that Im ψ → (z) > 0 for z ∈ C \ [0, ∞). From this we find the analytic continuation to the whole complex plane where log ← an be replaced with log → because Im ψ → (z) ≥ 0. Also, for this reason, log ← ψ → (z) is analytic anywhere ψ → (z) is. We also have This proves (2.12).

The second deformation: lensing
We factor the matrix J T on (0, 1) as follows: This allows a lensing of the problem. Let Γ ↑ and Γ ↓ be contours as in Figure 2 and set if z is outside the region enclosed by Γ ↑ and Γ ↓ , , if z is inside the region enclosed by [0, 1] and Γ ↑ , , if z is inside the region enclosed by [0, 1] and Γ ↓ .
We obtain the following Riemann-Hilbert problem for S.
Riemann-Hilbert Problem 2.3. The function S(z) satisfies the following properties: Figure 2: The jump contours Γ for S. The region bounded by Γ ↑ and Γ ↓ is called the "lens".
This deformation is valid for any choice of contours Γ ↑ and Γ ↓ arranged as in Figure 2.
Proof. We now obtain an upper bound on the real part of φ → . For Im z > 0 the function (z) It is also clear that for arg z ∈ (0, π), Im (z) From this, the simple estimate follows for Im z > 0 Assume 0 ≤ Re z ≤ 1, Im z > 0 and consider the real part of the integral The first term is purely imaginary and the only contribution to the real part is from the second term. Then This quantity on the right-hand side is bounded uniformly above by a negative constant provided Im z ≥ δ > 0. A similar argument follows for Im z < 0 resulting in the same estimate. From this discussion, it follows that for some positive constants D δ , c δ . For z > 1 we also look to estimate the real part of φ → (z). It follows that which is necessarily a monotonic, strictly increasing function giving the estimate for some new positive constants D δ , c δ . These estimates hold even in the case α → ∞ because e −(2α+2)z+α log z ≤ 1 for z ≥ 1. The lemma follows from a redefinition of c δ , D δ .
We require the following lemma in the sequel.
Proof. We first claim that sign Im Assume that there exists two points a, b ∈ C + so that It follows that there exists a point z * on the line that connects a and b so that Im z * −1 because g is positive in the upper-half plane, f + (s) = 0 for 0 < s < 1 and f + (s) > 0 for s < 0. Also, for For z ∈ C − , φ → (z) = φ → (z * ) * because the two functions are equal for z < 0. Thus, it remains to show that φ + → (z) vanishes on [0, 1] only at z = 0. From is a strictly monotone function on (0, 1) and the lemma follows.
This leads us to consider the solution of the Riemann-Hilbert problem obtained by removing the jumps on Γ ↑ , Γ ↓ and [1, ∞).
Riemann-Hilbert Problem 2.4. We seek the function S ∞ (z) with the following properties: We now show that S ∞ exists by an explicit construction. We first start with the determination of a matrix function N which satisfies the following conditions Direct calculation shows that M = U −1 N U satisfies the conditions It is also important to note that for We write the jump condition for S ∞ (z) as We make the ansatz . Then (2.24) turns out to be equivalent to the condition d + (z)d − (z) = eŵ (z) . Formally, by taking the logarithm and letting h(z) = log d(z) To find h(z) we let h(∞) = 0 (d(∞) = 1) and we recall (2.16).
Following [21], we claim that is an appropriate choice. First, we have that for z > 1, Finally, lim z→∞ h(z) = 1 2 α(2 log 2+1)+ 1 2 follows from (2.16). Then D(z) = e h(z)σ3 , D ∞ := e (α log 2+ 1 2 (α+1))σ3 and then We note that we can also write, using (2.13), The asymptotics of critically-scaled Laguerre polynomials From the above calculations, one may guess that S(z) ≈ S ∞ (z) in some sense as N → ∞. However, this is not justified because the convergence of the jump matrix of S to the jump matrix of S ∞ is not uniform. Actually, because of the singularity behavior of S ∞ (z) at z = 0, we'll see that S(z) ≈ S ∞ (z). We now develop local parametrices to solve the Riemann-Hilbert problem for S locally near z = 0 and z = 1. This requires the construction of the so-called Airy and Bessel parametrices.
Remark 3.1. Unless otherwise noted we use the convention Specifically, O(·) should be treated as a scalar, but it can be a different function in each component.

The classical Airy parametrix.
First, we divide the complex plane for variable ξ into sectors, see Figure 3. Let ω = e 2πi/3 and Ai(ξ) denote the Airy function. Define From the asymptotic calculations in Appendix B.1, it follows that P Ai solves the following Riemann-Hilbert problem: Riemann-Hilbert Problem 3.1.
We will come back to this and use it heavily later. We also rewrite the the asymptotics in a more convenient form:

The classical Bessel parametrix.
The Airy parametrix has the characteristic that four contours exit from the origin in the ξ plane. This is the case, locally, near z = 1 in our Riemann-Hilbert problem for S. Near z = 0 it, the contours look more like those in Figure 4. Define (3.2) Here I α , K α , H (2) α and H (1) α are the modified Bessel and Hankel functions [20]. From the calculations in (B.2) and the jump condition established in [18] we have the following: Figure 4: Dividing the complex plane to define the Bessel parametrix with the contour Σ Bessel = β 1 ∪β 3 ∪β 3 .
Riemann-Hilbert Problem 3.2. When α is given by (1.1), the function P Bessel solves: As in the case of the Airy parametrix, we rewrite this in a more convenient form: These asymptotics apply for all ξ with | arg ξ| < π. Furthermore, the asymptotics remain valid up to the boundary, arg ξ = ±π.

Mapping the Airy parametrix.
As it stands, the function P Ai solves a Riemann-Hilbert problem that resembles the jumps of S near z = 1 but not exactly. We perform a change of variables and pre-multiply by an analytic matrix function to make this match exact. There is an additional constraint. We want the resulting local solution to also match with S ∞ in an appropriate manner. Consider for z > 1 whereĜ(z) has a convergent power series in s − 1 with real coefficients. Define the function which is analytic function in a neighborhood of z = 1. Here (·) 2/3 denotes the principal branch. We then have , in the sense that (f ← (z)) 3/2 = 3 2 φ ← (z) for z close to one. We establish the following facts about f ← : This follows from (3.3). This shows that, in particular, f ← is one-to-one (conformal) near z = 1.
This follows becauseĜ(z) is real for z real.
First, let δ be sufficiently small so that f ← is one-to-one on B(1, δ ). Let L > 0 be the largest value such Assume that Im f ← (a) > 0 and Im f ← (b) < 0 for a, b ∈ B(1, δ) ∩ C + . Therefore, on the line that connects a to b there must be a value z * such that f (z * ) ∈ (−L, L) and this contradicts that f ← is one-to-one. Then f ← (B(1, δ) ∩ C + ) must be mapped into either the upper-or lower-half planes.
we see that Im f ← (1 + i ) > 0 for sufficiently small and the claim follows. Figure 6: The pullback of K δ ∩ Σ Ai to create the contour Γ Ai .
Let δ > 0 be sufficiently small so that f ← is one-to-one, analytic and maps C + into C + when restricted to B(1, δ). Let K δ = f ← (B(1, δ). It follows that K δ is an open neighborhood of the origin. Define a contour Figures 5 and 6 for a graphical representation of this procedure and how this affects the precise definition of Γ ↑ and Γ ↓ inside the ball B(1, δ).
We prove the following lemma in Appendix B.3.
Lemma 3.1. The functions S ← (z) and M Ai (z) have the following properties: has the same jumps as S(z) in a neighborhood of z = 1.

Mapping the Bessel parametrix.
We perform the same steps for P Bessel so that it solves the Riemann-Hilbert problem for S(z) near z = 0. As before, we would want the resulting local solution to match with S ∞ but we will see, this is impossible and this complication will modify the entire discussion that follows.
Consider for z < 0 Now because √ 1 − s has a convergent Taylor series about s = 0, with real coefficients, we find for an analytic function T (z) whose Taylor series has real coefficients. Define It is clear that this function is analytic in a neighborhood of z = 0. We further note that for z ∈ [0, ∞ sufficiently small because f → (z) and −φ → (z) are both positive for z < 0. We establish the following facts concerning f → : This follows directly from the definition of f → .
This follows from the fact that the Taylor series of √ 1 − s has real coefficients.
The same argument used to show that f ← (B(1, δ) ∩ C ± ) ⊂ C ± can be applied to −f → (z) and this claim follows. Consider Note that φ → (z) only vanishes at z = 0 from (2.22). Consider the function In the same way as before, let δ > 0 be sufficiently small so that f → is one-to-one, analytic and maps C + into C − when restricted to B(0, δ). Let L δ = f → (B(0, δ). It follows that L δ is an open neighborhood of the origin. Define a contour Γ Bessel := f −1 ← (L δ ∩ Σ Bessel ). We reverse the orientation of all contours. See Figures 7 and 8 for a graphical representation of this procedure and how this affects the precise definition of Γ ↑ and Γ ↓ inside the ball B(0, δ).
We prove the following lemma in Appendix B.4.
• M Bessel (z) is analytic in a neighborhood of z = 0.
To account for the fact that (3.9) has a factor of e 2c/φ→(z)σ3 we consider an additional Riemann-Hilbert problem.
First, we look closer at the jump matrix Note that A ∞ is independent of N . This jump matrix is analytic across (0, 1): Indeed, for 0 < z < 1 Following standard theory, the solution is unique if it exists. We now construct an integral representation for A ∞ (z) assuming it exists. Once we have the representation it is a rather simple matter to check that the formula gives a bonafide solution. First consider, B(z) := A ∞ (z)N (z). Then B(z) is analytic in C \ Γ δ , Γ δ = ∂B(0, δ) ∪ [0, 1] and it has the following properties be the first row of B(z). It follows that f 2 (z) = O(z −1 ) as z → ∞ and we are led to consider g(z) = (z(z − 1)) 1/2 f 2 (z)f 1 (z). Then for some ∈ C to be determined below g + (z) = g − (z), |z| = δ or 0 < z < 1, g(∞) = .
Because B(z) should have at most fourth-root singularities at z = 0, 1, we require g(z) to be bounded at z = 0, 1. Thus g(z) is entire, g(z) ≡ and hence We are led to the following Riemann-Hilbert problem for f 1 (z): (3.10) Now, using the principal branch of the logarithm, consider the function r(z) = (log f 1 (z))(z(z − 1)) −1/2 : We choose to enforce the last condition. The function r(z) is the sum of a function that satisfies the first jump condition and a function that satisfies the second condition. We first claim that . This is true as the argument of the logarithm is never negative. Indeed, if But because (z(z − 1)) 1/2 > 0 for z > 1 and (z(z − 1)) 1/2 < 0 for z < 0 the claim follows. It also follows thatr + (z) +r − (z) = log(z(1 − z)) and hence Then r(z) = r 1 /z + O(z −2 ) as z → ∞ where We must have r 1 = 0, so we choose = (c) by Because φ → (z) = φ → (z) for z ∈ [0, ∞), the exponent is real, implying that ∈ iR + . Then Figure 9: The full definition of the contours Γ ↑ and Γ ↓ . The definition inside the shaded regions is given by Γ Ai and Γ Bessel and the remainder is a straight line connecting the two contours. Note that the contours are continuous but not necessarily smooth.
It is easy to see that both f 1 and f 2 have at worst fourth-root singularities at z = 0, 1.

Using (3.4)
(3.14) From here on, we let 0 < δ < 1/2 be sufficiently small so that that f ← and f → are one-to-one on B(1, δ) and B(0, δ), respectively. We establish the following: Riemann-Hilbert Problem 3.4. The function E(z) satisfies the following properties: and continuous up to the contour Σ R .

The jump conditions for E(z) are given by
where the error terms tend to zero in L 2 ∩ L ∞ as N → ∞.
Using (3.14) and (3.13) and Lemmas 3.2 and 3.1 it is clear that the jump matrix for E has the correct behavior on ∂B(0, δ) and ∂B(1, δ). On Γ ↑ , Γ ↓ and (0, ∞) the jump matrix for E(z) is given by Because A ∞ (z), A −1 ∞ (z), N (z) and N −1 (z) are all uniformly bounded on the contours being considered we only need to consider D(z)J S (z)D −1 (z). For z ≥ 1 + δ
From classical theory [3] (see also [26]) it follows that It then directly follows that E(z) → I, E (z) → 0 uniformly for z bounded away from Σ R . We have thus proved the following theorem concerning the asymptotics of Laguerre polynomials.

The extreme eigenvalues and the condition number
In this section, we prove Theorem 1.1, Theorem 1.2 and Theorem 1.3. The proof of Theorem 1.1 and Theorem 1.2 relies on the asymptotic results of Section 3, and some basic operator theory. Since the kernel K N is related to Y through equation ( 2.7), we apply Theorem 3.1 to show that after suitable rescaling, the kernels K N converge to the Airy kernel K Ai at both the soft and hard edge. Standard results on operator theory are then applied to establish convergence of the associated Fredholm determinants. The proof of Theorem 1.3 follows from these results, though an additional lemma is necessary to account for the fact that the largest and smallest eigenvalues are not independent.

Uniform convergence of K N at the hard edge
We first consider the smallest eigenvalue. Let δ * be as in Theorem 3.1, fix δ ∈ (0, δ * ) and consider x in the interval (0, δ]. We use equations (3.6)-(3.7) relating S to the Bessel kernel and Theorem 3.1(a) to obtain It follows from the definition of M Bessel (z) thatM Bessel (z) := D ∞ M Bessel (z)M − 1 2 σ3 is an analytic function in B(0, δ) that is independent of N (also of M and α). It is convenient to introduce the following functions: A straightforward calculation then yields Thus, we may define the analytic function B(y) := E(y)M Bessel (y) and rewrite equation ( 2.7) in the form The study of the asymptotics of K N has now been reduced to the asymptotics of V and W . We further simplify V and W using the definition and properties of the Bessel parametrix. Comparing (3.2) and ( 4.1), we see that we must consider the Bessel parametrix P Bessel (ξ) with argument ξ = M 2 f → (x) with x ∈ (0, 1). The relevant regime here for the Bessel parametrix is (III). Indeed, for x ∈ (0, 1), φ + → (x) is purely imaginary with a positive imaginary part so that f → (x) < 0. Furthermore, because f → maps the upper-plane to the lower-half plane (at least locally) ξ 1/2 = − 1 2 M φ + → (x) (negative imaginary part) and (−ξ) 1/2 = − i 2 M φ + → (x) (positive real part). Recall also that P −1 Bessel is easily computed since P Bessel has determinant one. From (3.2)(III), we obtain ) .

(4.3)
In the second line we have used the Bessel function identity J α (x) = 1 2 H (1) α (x) + 1 2 H (2) α (x). A similar calculation for W yields The above formulas hold in the region x ∈ (0, 1). It is convenient to extend the range of definition of V and W by adopting the convention V (x) = W (x) = 0 for x ≤ 0. We now apply classical asymptotic formulas for the Bessel function J α to obtain quantitative decay estimates on V and W . These will imply convergence results for K N . More precisely, define the rescaled variables and the rescaled kernel ,K N (x, y)dy := K N (x,ŷ)dŷ.
We then have the following convergence result. Proposition 4.1. As N → ∞ the rescaled kernels converge pointwise, and the convergence is uniform for (x, y) in any compact subset of (−∞, L] 2 for any L ∈ R. If x = y then the limit is determined by continuity. Further, there exists a positive, piecewsie-continuous function G : (−∞, L) 2 → (0, ∞), such that The technical lemmas that underly this result are stated below, but proved in Appendix C. otherwise.
Assume x ∈ (−∞, L], L ∈ R. Then there exists a constant C L > 0 and A > 0 such that if α > A then , , where the error terms are uniform in x.

Uniform convergence of K N at the soft edge
We write the expansion for 1 < z < 1 + δ From (2.12) and (2.13) we have We simplify these expressions using and noting that B(z) := E(z)A ∞ (z)D ∞ M Ai (z)M − 1 6 σ3 has no N dependence, is analytic on (1 − δ, 1 + δ) and has a constant determinant. In view of (2.7) we define It follows from [20, (9.2.8)] that det P Ai (M 2/3 f ← (z)) = (2πi) −1 so that By analytic continuation, the same formula holds for 1 − δ < z < 1. And then for appropriate values of x and y. Next, for z ≥ 1 + δ we examine Define the N -independent functionB(z) = E(z)A ∞ (z)N (z) and the two functions And then for appropriate values of x and y. Consider the scaling operatoř uniformly for x in a compact set. DefineǨ N (x, y) through the equalityǨ N (x, y)dy = K N (x,y)dy.
The following follows directly from (4.9).
where the error terms are uniform in x.

Proofs of the main theorems
Our main tools for our proofs are from [22]: Here J 1 is the set of trace class operators with norm From [3] it is known that det(I − K N | L 2 ((t,s)) ), gives the probability that are is no eigenvalues in the interval (t, s). Thus taking into account the initial scaling by ν P(λ min /ν ≥ t) = det(I − K N | L 2 ((0,t)) ), P(λ max /ν ≤ t) = det(I − K N | L 2 ((t,∞)) ).
The same statements hold forǨ N (x, y) andK N (x, y).

From Theorems 4.2 and 4.1 we have that
Also, ν = 4N + o(α −1 ) = α 2 /c so that this result can be simplified. We use the following lemma with Proof. First, consider Because is arbitrary, the lemma follows.
To see that our choice of c N ,c N , d N andd N fits the hypotheses of this lemma, note that and then The theorem follows.
Following the arguments in the proof of Theorem 1.1 we have sufficient conditions for Written another way, using that ν = 4M , This proves Theorem 1.2.
Before we prove Theorem 1.3 we prove a critical lemma from first principles.
Lemma 4.6. Assume two sequences of random variables (X n ) n≥0 , (Y n ) n≥0 and two sequences of real numbers (a n ) n≥0 , (b n ) n≥0 satisfy the following properties: • Y n > 0 a.s., a n , b n > 0, • Y n = a n + b nŶn so thatŶ n → ξ in distribution, and • a n /b n → ∞ and an bn |X n − 1| → 0 in probability. Then Proof. We then claim that for each t ∈ R (where t is a point of continuity for F (t)) where F (t) = P(ξ ≤ t). To see this, compute P X n Y n ≤ 1 a n + b n t = P X n a n + b nŶn ≤ 1 a n + b n t = P a n b n (X n − 1) + X n t ≤Ŷ n For 1 > > 0, consider P a n b n (X n − 1) + X n t ≤Ŷ n = P a n b n (X n − 1) + X n t ≤Ŷ n , a n b n |X n − 1| ≥ + P a n b n (X n − 1) + X n t ≤Ŷ n , a n b n |X n − 1| < .
It is clear, by the fourth assumption that the first term here vanishes as n → ∞ so we concentrate on the latter. It is certainly true that for t ≥ 0 and sufficiently large n lim sup n→∞ P a n b n (X n − 1) + X n t ≤Ŷ n , a n b n |X n − 1| < ≤ lim sup Then we use the fact that P(A ∩ B) = P(A) + P(B) − P(A ∪ B) to find A n = a n b n (X n − 1) + X n t ≤Ŷ n , B n = a n b n |X n − 1| < , P a n b n (X n − 1) + X n t ≤Ŷ n , a n b n |X n − 1| < = P(A n ∩ B n ) = P a n b n (X n − 1) + X n t ≤Ŷ n + P(B n ) − P(A n ∪ B n ).
It is also clear by the third assumption that lim n→∞ P(B n ) = lim n→∞ P(A n ∪ B n ) = 1. We use the estimate P a n b n (X n − 1) + X n t ≤Ŷ n ≥ P( + (1 + )t ≤ Y n ).
We have shown that Letting ↓ 0 demonstrates the claim. For t < 0, this argument can be adapted by replacing (1 ± ) with (1 ∓ ). We now modify things and consider for t being a point of continuity of F (t) P X n Y n ≤ 1 a n 1 − b n a n t .
We note that (for fixed t) 1 a n 1 − b n a n t = 1 a n + b n t and consider a n b n (X n − 1) = a n b n (X n − 1) + a n b n E n (t)X n .
It then follows that an bn |X n − 1| prob → 0 and from this we may apply (4.11) with X n replaced byX n to state that for any s where s is a point of continuity of F (s), If we set s = t we find This proves the lemma.
Proof of Theorem 1.3: The proof of this theorem relies critically on on Lemma 4.6. We note that For any L > 0 there exists N * > 0 such that for N > N * lim sup because ν/α → ∞. A similar estimate follows for Then because L is arbitrary we have This proves the theorem.

A Motivating numerical calculations
In this appendix, we discuss the simulations of the halting time T ,E,N,n that motivated the study of the critically-scaled where T ,E,N,n and σ ,E,N,n represent the sample average and sample standard deviation, respectively, taken over the M 1 samples. We plot the histogram of τ ,E,N,n in Figure 10 for three choices of F . This computation indicates universality for τ ,E,N,n .
Notation. When F is a Bernoulli random variable, taking values ±1 with equal probability, we call the resulting ensemble the positive definite Bernoulli ensemble (PBE). We also refer to the pair E = (F,F ) as PBE (or LUE of F ∼ X c ). HereF is understood to be uniform on [−1, 1].

A.1 Ill-conditioned random matrices
In this section we consider the distribution of T ,E,N,N , i.e. n = N in the case of LUE and PBE. The limiting distribution for the condition number is given in (1.2) for LUE. We plot a simulated histogram for the condition number when N = 100 and when N = 196 in Figure 11(a) again for LUE. In Figure 11(b) we plot the corresponding simulated halting time distribution once again for LUE. The computed moments are shown in Figure 11(c) for LUE and PBE and indicate that the fluctuations are not universal (compare with Figure 13(c) below).

A.2 Well-conditioned random matrices
Here we consider the distribution of T ,E,N,2N , i.e. n = 2N in the case of LUE and PBE. It is known that the distribution of the condition number has a limit with finite mean. This is demonstrated in Figure 12(a) for N = 100 and N = 196 for LUE. A simulated histogram for T ,E,N,2N is shown in Figure 12(b) for LUE. From this plot, it is apparent that the discrete nature of the distribution will persist as N → ∞ in agreement with the discussion in the introduction.      4cN (c = 1) in the case of LUE and PBE. In Figure 13(a) we examine the condition number which we know by Theorem 1.3 has Tracy-Widom fluctuations for LUE. In Figure 13(b) we examine the distribution for T ,E,N,N + √ 4N for LUE. The calculations show a limiting form for the halting time (see also Figure 10). In this case, the moments of the condition number are unbounded as N → ∞ but the limiting distribution is not heavy-tailed and a non-trivial limit exists. Numerical experiments also indicate that this phenomenon persists for the scalings n = N + (4cN ) γ if 0 < γ < 1. Universality is also apparent from Figure 13

B The Parametrices
In this appendix we present the asymptotic calculations for the Airy and Bessel parametrices.

B.1 The Airy parametrix
We use the connection formulas (ω = e 2πi/3 ) and refer to Figure 3 for the sectors I, II, III and IV: Here Ai(ξ) is the Airy function. We also calculate the large-ξ asymptotics of the matrix functions in (3.1) using uniformly for | arg ξ| ≤ π − for any > 0.

B.3 Proof of Lemma 3.1
We consider and seek a function M Ai (z) so that matches with the outer solution S ∞ . We make the ansatz Using M = N + 1 2 (α + 1), (2.17),(2.18) and (2.26) we find uniformly for |z − 1| = δ This calculation depends critically on (2.26). We now show that M Ai (z) is analytic in a neighborhood of z = 1. First note that

Then by direct calculation
From this we find that Then, it remains to show that the ratio f←(z) z(z−1) is analytic and does not vanish in a neighborhood of z = 1. But this follows directly from the expression for f ← (z) in terms of a power series.
Finally, we check the jumps of S ← (z). In the following calculations we leave of ± signs for boundary values for functions that are analytic in a neighborhood of the point under consideration. Recall that the contours Γ ↑ and Γ ↓ in a neighborhood of z = 1 are defined in Figure 6. For z ∈ Γ ↑ , f ← (z) ∈ γ 2 and therefore The same calculation follows for z ∈ Γ ↓ . For z ∈ (1, 1 + δ) we have f ← (z) > 0 so We note that both S ← and S −1 ← are analytic in B(1, δ) \ Γ Ai and are continuous up to the contour. This follows from the fact that S −1 ← has unit determinant. To see this, note that P Ai has constant determinant in each sector of C \ Σ Ai by Liouville's formula and the fact that this constant is the same in each follows from the fact the jump matrices have unit determinant. Thus S ← (z) has a determinant that is independent of both M and z and the determinant is found to be unity by examining its asymptotics.

B.4 Proof of Lemma 3.2
We consider We first check the asymptotic behavior for |z| bounded away from zero.
This follows from (2.25). Note the extra factor of e 2c/φ→(z)σ3 when comparing this with (B.4). We now check that M Bessel (z) is analytic. So, we must consider From this, the question of analyticity of M Bessel (z) is reduced to the question of analyticity of the function . This is clearly analytic in a neighborhood of z = 0 because f → (0) = 0 but f → (0) < 0. Now, we check the jumps. Recall that Γ ↑ and Γ ↓ in a neighborhood of z = 0 are defined in Figure 8. For z ∈ Γ ↑ , f → (z) ∈ β 3 (see Figure 4). The limit to Γ ↑ from above (+ side) is the same as limit into β 3 from below (− side) so that Then we find that e −(α+1)πi+2N φ→(z)+w(z) = e 2N φ→(z)+ŵ(z) and the jump agrees with the corresponding jump for S(z). For z ∈ Γ ↓ , f → (z) ∈ β 1 and Then because log → z = log ← z + 2πi we have e (α+1)πi+2N φ→(z)+w(z) = e 2N φ→(z)+ŵ(z) and again, this is the same as the corresponding jump for S(z). Finally, for z ∈ (0, δ) we have f → (z) ∈ β 2 and 0 .
Define t as a function of z by t(z) = − M α iφ + → (z) so that J α (αt) = J α (−iM φ + → (z)) to match (4.3) and (4.4). From the estimates in (C.2) it follows that t(z) ≤ 4 M α √ z so that if t(z * ) = 1 then α 4M 2 ≤ z * and ζ(t(z)) > 0 for z ≤ z * , as t(z) is an increasing function of z. Furthermore, we know that ζ ≥ 2 1/3 (1 − t) for 0 < t ≤ 1 so that From the last line of (C.2) we have As we will see (C.4) is useful near z = 0 and (C.3) is useful for slightly larger values of z. Using (C.1) we have for a constant C 1 and any 0 ≤ and C 2 is determined below in (C. 10).
It also follows that zJ α (z) + J α (z) = z −1 (α 2 − z 2 ) J α (z) and therefore Then we have for 0 < z ≤ δ This follows because 1 − (1 + y) 1/2 ≥ −y/2 for y ≤ 0. Then 1 − 4M c/α 2 = O(N −1/2 ) and the second term in brackets vanishes as N → ∞. Thus, for sufficiently large N Next, for −1 ≤ x ≤ L consider the inequality The right-hand side is a decreasing function of x so we consider Given L, we choose ζ 1 so that this condition holds and the estimate for −ζ 1 α −1 ≤ ζ ≤ 0 in (C.5) can be used for −1 ≤ x ≤ L. Then We note that by (C.8) this choice of C 2 satisfies C 2 ≤ α 4M 2 . We estimate the derivative using (C.7) From (C.6) we havê We first consider the second term in the expression. We note that g(x)g(y) ≤ D L g(x)g(y).
For the first term, we assume |x − y| ≥ 1 and we have 1 2πi V (x)W (ŷ) x − y ≤ C 2 L g(x)g(y).
To determine the uniform limit ofǨ N we use (D. This proves the proposition.