A gradient-based method for atmospheric tomography

In Adaptive Optics (AO) systems for ground-based telescopes, one aims at mechanically correcting for atmospheric aberrations by means of quickly moving deformable mirrors. In complex AO systems, which are using several light sources and aim at a good reconstruction in a large field of view, the derivation of optimal mirror commands from measured light typically includes the problem of atmospheric tomography. As the computational effort for such a limited-angle tomography problem is strongly increasing for growing telescope sizes, fast algorithms are needed. 
 We present a novel algorithm for atmospheric tomography that takes real-life effects such as tip/tilt indetermination, cone effect and spot elongation into account. Furthermore, we discuss two models for the tip and tilt components of an incoming wavefront and incorporate them into the reconstruction. 
We find a fast step size choice for our Gradient-based iteration and compare it with different existing step size choices. Numerical results are demonstrated for two different AO systems on a 42 m telescope, using the European Southern Observatory's end-to-end simulation tool, OCTOPUS.

1. Introduction. The problem of atmospheric tomography arises in astronomical Adaptive Optics (AO), a technology which is primarily used in large ground-based telescopes in order to correct for atmospheric aberrations. Due to layers of warm and cold air, originally plane light wavefronts from astronomical objects get distorted while travelling through the Earth' atmosphere, resulting in blurred images of observed astronomical objects. AO compensates for these atmospheric perturbations by means of deformable mirrors that are adapting in real time to the shape of the distortions. Thus, the reflected light wavefronts are flattened and image quality improves. For a detailed description of Adaptive Optics, we refer to, e.g., [28,29].
In complex AO systems, several light sources are used: bright astronomical objects as Natural Guide Stars (NGS) and artificially created laser beacons as Laser Guide Stars (LGS). By means of wavefront sensors (WFS), the incoming light of each guide star is measured. The calculation of the shape of a deformable mirror from these WFS measurements typically involves a limited-angle tomography problem. Currently, the standard method for solving this ill-posed inverse problem is by a precomputation of the (generalized) inverse of the matrix that maps wavefront measurements to mirror shapes, the so called matrix-vector-multiplication (MVM). A thorough description of the mathematical challenges in this field can be found, e.g. in [8].
In particular, for the new generation of Extremely Large Telescopes (ELT), the computational complexity of the reconstruction algorithm becomes a major issue. The deformable mirrors have to be controlled in real-time, e.g. at around 500 Hertz. This reduces the time for one reconstruction step to 1 ms. For the presented simulation results, one second of real time, i.e. 500 time steps, are performed. The dimension of the planned European ELT (E-ELT), along with a realistic setting of six LGS and three NGS arranged in a field of view of one arc minute, would yield a system matrix of the order of 10 4 × 10 4 to be inverted in each time step.
To solve the atmospheric tomography problem in AO, many methods have been developed: In [10], the authors present a minimum-mean-square-error estimator, while [11] gives a back-projection tomography algorithm. Iterative solvers, such as conjugate gradient (CG) based methods are presented, e.g. in [6]. Efficient multigrid preconditioners were investigated in [6,12,13], whereas in [45,43] Fourier domain preconditioners were found. Atmospheric reconstruction for future ELTs has to be performed within an extremely tight time frame, which is barely reachable even on future hardware. Recent reconstruction methods attempt to fulfil the stringent speed requirements. Methods for solving the Bayesian MAP estimator are the CG-based fractal iterative method (FrIM) [5,35,38,37], and the wavelet-based Finite Element Wavelet Hybrid Algorithm (FEWHA) [47,48,46]. Additionally, the atmospheric tomography has been solved in a matrix-free approach via the Kaczmarz algorithm, presented in [24,33].
This paper is based on the idea that the problem of determining the DM shapes from WFS data is split in three independent steps [24]: the reconstruction of the incoming wavefronts, the problem of atmospheric tomography with wavefronts as input data and the computation of the optimal mirror correction. Due to the limited time frame for determining the DM shapes, there is a growing need for methods that yield good qualitative performance with very low computational complexity. We propose a Gradient-based method to solve the subproblem of atmospheric tomography iteratively, in order to achieve a fast reconstruction. Furthermore, we include LGS deficiencies, such as spot elongation, cone effect and tip/tilt indetermination in our models. We propose two different models for tip/tilt indetermination, the Gradient-tilt and the Zernike-tilt, and investigate their differences and similarities. For the forward operators, we give the corresponding adjoint operators for two differently weighted inner products. We summarize the noise model for spot elongated measurements and review the prior statistics of the atmosphere. Two approaches for the Gradient-based iteration are presented, a least-squares model without statistical information and a Tikhonov-type model including prior information.
In order to speed up the method, we develop a simplified and accelerated step size choice for the Gradient iteration and compare it to several existing step size methods. For the numerical demonstration of the algorithm, we consider two different AO systems, a Multi-Conjugate Adaptive Optics (MCAO) system [26,8] and a Multi-Object Adaptive Optics (MOAO) system [1,42]. We chose this combination to demonstrate the versatility of our algorithm, because the two systems differ not only in their size of field of view and guide star setting, but also intrinsically in the design and arrangement of deformable mirrors. We study the quality performance of the Gradient-based method in detail in the context of an E-ELT setting on OCTOPUS, the ESO end-to-end simulation tool.
The paper is organized as follows: In Section 2, we introduce the problem setting, develop a forward model including real-life effects such as tip/tilt indetermination, cone effect and spot elongation. The Gradient-based method is described in Section 3, where we present two choices of inner products, the corresponding adjoint operators and the realization for two main AO systems, as well as different step size choices and the corresponding computational cost estimates. Numerical performance is demonstrated in Section 4. We conclude with a summary and an outlook to future work in Section 5.
2. Atmospheric tomography. Contrary to many standard methods in AO, we follow a 3-step approach, first introduced in [24], that splits the problem of determining the optimal mirror shape(s) from WFS measurement data into three independent steps that can be solved with different algorithms. In the first step, the incoming wavefronts ϕ αg of guide stars in directions α g are determined from WFS sensor data s αg = (s x αg , s y αg ) via a wavefront reconstruction algorithm. Then, in the second step, a limited angle tomography problem is solved, i.e. discrete atmospheric layers are reconstructed from the incoming wavefronts of all guide stars. The final step is the determination of the optimal mirror shape(s) from the discrete atmosphere, i.e. by simple projection in the case of MOAO or LTAO (Laser Tomography Adaptive Optics) systems, or by more elaborate algorithms for MCAO systems, see e.g. [24,47]. For a graphical illustration of the 3-step approach, see Figure 1.
For an accurate reconstruction of the atmosphere, light sources are needed, i.e. an astronomical object or natural guide star (NGS). However, only a limited skycoverage can be reached with NGS, therefore, laser guide stars (LGS) are created by strong laser beacons that stimulate the sodium layer of the atmosphere.
The emitted light is affected by several problems [8]: first, the light source is only finitely high, thus, the light travels down in a cone. This cone effect can be treated rather easily in the tomography step, see Section 2.2. Second, due to the fact that the laser beam travels up and down the same way, the tip/tilt component of the atmospheric aberrations cannot be measured correctly. This problem of tip/tilt indetermination is treated in Section 2.3. Third, the Shack-Hartman (SH) WFS measurements from LGS are affected by spot elongation, an effect explained in Section 2.4 together with a turbulence model for the atmosphere.
In this paper, we will mainly focus on the second step of the 3-step approach, the atmospheric tomography. This is a limited angle tomography problem and, thus, an ill-posed inverse problem. Due to the discontinuous dependence of the data on the solution, regularization techniques are needed in order to obtain a stable and efficient reconstruction.
In the following, we assume a SH WFS, but want to point out that this choice is by no means necessary and the Gradient-based algorithm for atmospheric tomography with incoming wavefronts as data presented below can be easily used with a Pyramid WFS as well.
For a SH WFS, one has to solve and Γ = (Γ 1 , . . . , Γ G ) the combined SH operator for all guide star directions, ϕ = (ϕ α1 , . . . , ϕ α G ) the incoming wavefronts and s = (s α1 , . . . , s αg ) the WFS measurement data. In the following we omit the subscript g for the SH operator, as it is independent of the specific guide star direction: 2.2. Forward model. The second step of the 3-step approach is the solution of the atmospheric tomography problem. Given the reconstructed wavefronts ϕ αg in each guide star direction, we have to solve a limited angle tomography problem, see Figure 2, where we want to reconstruct a discrete atmosphere Φ = (Φ (1) , . . . , Φ (L) ) with layers at heights h l . This is an ill-posed inverse problem and requires regularization. In the following, we present a Gradient-based approach that allows to regularize the problem on the one hand by the number of iterations, on the other by the optional inclusion of a statistics-driven penalty term. Following [32,33] and the principles of geometrical optics, we can model the light propagation through the turbulence. For NGS and LGS we assume geometric light Figure 2. Illustration of the atmospheric tomography problem with three guide star directions, three WFS and three atmospheric layers propagation with the forward operators: where Ω D is the (ring-shaped) aperture of the telescope and Ω l the area on layer l "seen" by the wavefront sensors, with the shifted version of the aperture The direction of GS g is given by the two-dimensional vector α g = (α (1) g , α (2) g ) which corresponds to the 3D directional vector (α (1) g , α (2) g , 1). The forward model presented above can be used for incoming wavefronts from LGS and NGS and just differs in the values of the parameter c l denoting whether the cone effect is taken into account. Note that it is also possible to consider the tomography operator (2) between the spaces L l=1 H 1 (Ω l ) to H 1 (Ω D ), see [9].
2.3. Tip/tilt indetermination. When introducing LGS with SH WFS in an AO system, one faces the problem of tip/tilt indetermination, i.e. the planar component of the LGS data is wrongly measured. Thus, additional NGS measurements are used to obtain correct tip/tilt information. In some AO systems, instead of full NGS sensor data, so-called tip/tilt sensors (TTS) are used, which have only a small number of subapertures with two values on each, i.e. the x-and y-coordinate of the normal vector of the plane describing the tip and tilt.
There are several possible models for the tip/tilt data (see e.g. [41]), we will follow two of them: the Gradient-tilt and the Zernike-tilt.

Gradient-tilt.
The Gradient definition of tip/tilt has been formulated, e.g. in [23]. The values t x and t y can be modelled as the averages of the derivatives of the incoming wavefronts. The forward model for N tip/tilt sensors with 1 subaperture reads as with r = (x, y) ∈ Ω D and c l as in (4). Note that c l can be omitted as the light source for TTS are NGS, i.e. c l = 1. The operator G αg is well defined for with s > 1 2 . This smoothness condition, i.e. H 1 (Ω l ), makes the formal derivation and computation of the adjoint G * αg costly, see [23]. 2.3.2. Zernike-tilt. Contrary to the Gradient-tilt, the Zernike-tilt, or best-fit tilt, is defined on the space of square integrable functions. The Zernike-tilt is described by the normal vector to the best-fit plane of the wavefront distortion. Furthermore, the two components are equal exactly the coefficients of the two low-order Zernike polynomials [41]. The corresponding forward operator for N tip/tilt sensors with 1 subaperture each reads as To derive the constant c, we use the fact that the Zernike-tilt it is a least-squares linear approximation to the phase over the aperture, i.e. for a plane the Zernike-tilt is exactly the corresponding normal vector. Thus, the choice of c depends on the geometry of the aperture. First, assume a square aperture , then for an arbitrary plane with coefficients A, B, C ∈ R it holds that and thus, c = d 2

. For circular apertures Ω
i.e., c = D 2 4 . For circular apertures with central obstruction, i.e., a ring {r ∈ R 2 : . For a wavefront ϕ sufficiently smooth, we can expand it by means of Taylor series and compare the two tip/tilt definitions: and Ω D = {r ∈ R 2 : d ≤ r ≤ D} be an annular aperture with 0 ≤ d < D. Then, for the Zernike-tilt t Z and the Gradient-tilt t G , it holds that > 0 only depending on the geometry of the aperture.
Proof. We expand ϕ by means of Taylor series, i.e., there exists θ ∈ (0, 1) such that for all r ∈ B D , Then, the components of the Zernike-tilt are , while the components of the Gradient-tilt are from Definition (4) above. Then, the Euclidean norm of the difference in the two tip/tilt descriptions fulfils Thus, for plane, bilinear and bi-quadratic wavefronts, Zernike-and Gradient-tilt yield the same values. Only when introducing cubic or higher order terms, the tip/tilt values differ. A realistic value for C, as used in our numerical examples in Section 4, is close to C ∼ 17.
In AO systems with a combination of LGS and NGS, both with SH WFS, one needs to remove the incorrect tip/tilt information from LGS data. Therefore, the operator Π is introduced for Gradient-tilt (as in [33]) and for Zernike-tilt with r = (x, y) ∈ Ω D : 2.4. Turbulence and spot elongation modelling. In an atmospheric tomography problem, one aims to accurately reconstruct the turbulent atmosphere [40,39,8]. To model these turbulences, we make the standard assumption that an atmosphere Φ can be described by a finite number of thin layers (Φ (1) , . . . , Φ (L) ) at heights h l for l = 1, . . . , L [27]. Let us assume that the incoming wavefronts are measured by a Shack-Hartmann (SH) wavefront sensor [28]. Following the random setting in [14,47,48], SH WFS data can be modelled as random variable S, the turbulent layers as random variable φ and the additive noise due to spot elongation as random variable η ∼ N (0, C η ) with zero mean, i.e.
with Γ the SH WFS operator (see Section 2.1) and A the tomography operator (see Section 2.2). The separate layers are assumed to be uncorrelated and zero centered, thus the covariance matrix C Φ of the Gaussian random variable φ with zero mean has a block-diagonal structure is the covariance matrix of layer l. In the following, we will employ a wavelet approximation of the Kolmogorov model [27,28] with covariance matrices C Φ (l) described in [14,47,48]. The additive noise due to spot elongation in (7) arises due to the thickness of the sodium layer and results in a light source that is an elongated spot rather than a point. Depending on the laser launch position, the height of the LGS and the sodium layer density, one can model the corresponding covariance matrix C η for the noise [4,36]. The error in the measurements due to spot elongation follows a normal distribution with zero mean and covariance matrix C η , thus, noisy SH WFS data can be modelled as s δ = s + C 1/2 η η, with η white noise and s being the concatenation of SH WFS data s αg = [s x αg s y αg ] in all LGS directions α g . The noise covariance matrix C η is a block diagonal matrix with blocks C αg for each LGS α g (and blocks σ 2 I of weighted identity for NGS) and has been described, e.g. in [36]. We use the approach presented in [3] where C αg is given as block diagonal matrix with 2 × 2 blocks for each subaperture i, with θ the full width at half maximum of a non elongated spot, the elongation vector and x LLT the laser launch position, FWHM N a the full width at half maximum of the sodium density profile, h LGS the height of the LGS at 90km and σ ∼ 1/photons per subap the noise variance of a non-elongated measurement. The noise variance σ in (8) can also vary for each guide star direction, i.e., σ g might be different for different directions α g . Note that this does not affect the presented Gradient-based algorithm. The easily computable inverse of (8) reads as The parameter α η allows to tune the influence of the spot elongation. For α η = 0 the covariance in equation (8) equals the covariance for a uniform and decorrelated case.
3. A Gradient-based method for atmospheric tomography. We want to reconstruct the discrete atmosphere Φ, that is the solution to the system with ϕ the incoming wave-fronts or tip/tilt data from all guide stars. The operator A maps the turbulent layers onto the incoming data: with G the number of LGS and Y ∈ {L 2 (Ω D ), R 2 } for the N NGS. For AO systems with full 84 × 84 SH WFS for the NGS, the space Y equals L 2 (Ω D ), while for AO systems with TTS with only 1 sub-aperture Y is set to R 2 . In the following, we denote the function space for the incoming wavefronts and tip/tilt data as The operator equation (9) can be solved with the Gradient-based method (Algorithm 1) in two possible ways: First, the method of steepest descent can be applied to the least-squares functional (approach without noise models) We call this the approach without noise models. Second, the method of steepest descent can be applied to the Tikohonov-type functional with α Φ ∈ [0, ∞). We call this the approach with noise models, where a regularized version of (9) with the a-priori information about the atmosphere is solved. Furthermore, for spot-elongated measurements, the inclusion of the noise covariance matrix into the data misfit term is possible. Note that the approach with noise models, i.e. (11), is similar to the maximum a-posteriori estimation [15]. Contrary to standard Landweber iteration and existing methods for atmospheric tomography, such as the Kaczmarz method [24], the Gradient-based approach allows to include the statistical models for the atmospheric turbulence and the noise due to spot elongation, which is of particular interest in cases of low photon flux.
The incoming wavefronts follow a linear model with additive noise ϕ δ αg = Γ † (s αg + C 1/2 αg η) . Thus the inverse of the noise covariance matrix of the incoming wavefronts can be described as C η −1 = Γ T C −1 η Γ, see, e.g., [15]. In order to account for the fact that the incoming wavefronts lie in a Sobolev space H s with s ≈ 2 [27], we approximate the corresponding transfer matrix U in H 2 .
However, we do not want to actually calculate U −1 Γ T as this would destroy any advantage of separating step 1 and 2 in the three step approach. As Γ T Γ is a discrete approximation to the negative Laplace operator with homogeneous Neumann boundary conditions [13], we approximate numerically αg Γ, such that for all guide star directions combined C η −1 ∼ Γ † C −1 η Γ, which can be easily pre-calculated, e.g., by means of the CuReD algorithm or a truncated SVD. The gradient of (11) then reads as Algorithm 1 Gradient-based method for atmospheric tomography j :=0 Φ 0 := initial guess begin steepest descent iterations 3.1. Spaces and inner products. We will consider the incoming wavefronts as elements of the space L 2 (Ω D ), and use the canonical inner product For the atmospheric layers, we can use the canonical inner product, given by with the c 2 n -profile γ l as the relative strength of layer l with L l=1 γ l = 1 .
However, in a gradient based method, this inner product leads to optimal updates only within the area that is covered by all guide stars. Thus we would like to define a new, weighted inner product that also gives optimal updates in areas that are covered by less or even just one guide star.
To this end we define on the layer l the overlap function with ω l (r) ∈ {1, 2, . . . , G + N }, for all r ∈ Ω l . We define Λ i l := {x ∈ Ω l : ω l (x) = i} for i = 1, . . . , G + N and immediately observe: We can now define the new inner product on the layers by and the weighted inner product on the atmosphere by For the approach without noise models, i.e. the minimization of (10), the stepsize τ ∈ [0, ∞) in the method of steepest descent (Algorithm 1) then reads as with Z ∈ {Ξ, L}. Similarly for (11) when noise models are included In the case of Z = Ξ, i.e. the inner product ., . Ξ , the stepsize in (18) can be simplified to τ j = 1 2 . 3.2. The adjoint operators. The adjoint of the operator depends on the chosen inner products. In this Section we present the adjoint operators for full LGS or NGS data and the adjoint of the Zernike-tilt operator, for the case of Gradient-tilt we refer to [23].
With the inner product in (14), the L 2 -adjoint A * αg : be written as and (T αgh l Φ)(r) := Φ(c l r + h l α g ), with c l as above, see also [32,33]. Note that we use the L 2 −adjoint here due to its superior computational efficiency. For the choice of H 1 (Ω D ) instead of L 2 (Ω D ), we refer to [32,9]. The adjoint of A with respect to the inner product (17) can be formulated as:

Proposition 2. The adjoint of the operator A is given by
Proof. Setting r = c l r + h l α g we obtain dr = 1 c 2 l dr and r ∈ Ω D ⇐⇒ r ∈ Ω(h l α g ). Thus we have The inner product on the wavefronts can be rewritten as follows: For the Zernike-tilt operator in (4), the adoint can be computed as follows: Proposition 3. The adjoint of Z αg is given by Proof. We use t = (t x , t y ) T and u(x, y) = t x · x + t y · x. Furthermore, r = r−αgh l c l and, thus, dr = 1 c 2 l dr. Then, with the definition of the Zernike-tilt in (4), it follows that

Adaptation to different sensor geometries
The theory of the Zernike-tilt can be easily extended to different TTS geometries: Let us assume a p × p sensor with p ∈ N. The forward operator then reads as   y) for (x, y) ∈ Ω i and i = 1, . . . , p. Furthermore, Proof. Similarly as in the Proof for Proposition 3 and with the fact that Ω i ∩Ω j = ∅ 3.3. The Gradient-based method for specific AO systems. We now apply the abstract idea of the Gradient-based method to Multi-Conjugate AO systems as well as Multi-Object AO system and compare different stepsize choices, 3.3.1. Multi-Conjugate Adaptive Optics. For MCAO, the task of atmospheric tomography is to solve the operator equation (9) from LGS measurements and additional tip/tilt measurements. Thus, the operator A and the vector of measurements ϕ are given as A solution can be obtained via applying the method of steepest descent on the corresponding least squares functional (10). The gradient of the functional then reads as For cases with high noise, i.e. low photon flux with spot elongation, we minimize (11), with the gradient and α Φ ∈ [0, ∞). In the case of a combination of LGS and TTS, where the TTS have only 1 subaperture each, we have to reconstruct the tip/tilt information differently than in the case of full NGS data. The separated and integrated tip/tilt reconstruction have been introduced in [23] in the case of Gradient tilt. For Zernike tilt, the algorithms are analogous while the computational effort of the adjoint tip/tilt operator is considerably reduced due to the L 2 -setting.

Multi-Object Adaptive Optics.
For MOAO, no separate treatment of the tip/tilt data is necessary as we assume to have full NGS measurements. Thus, equation (9) now reads as The gradient of the least-squares functional in (10) reads as while the corresponding gradient of the Tikhonov functional (11) can be formulated as and α Φ ∈ [0, ∞).

Fast stepsize.
One of the crucial points of a gradient iteration is the choice of the stepsize τ as the stepsize is responsible for the convergence and the convergence speed. For the classical Landweber method the stepsize is chosen constantly, which yields very slow convergence speed. The classical steepest descent stepsize as in (18), reads as which, for the denominator, includes the application of operators A αg to each gradient in all directions and the calculation of G + N scalar products.
A novel and computationally faster stepsize choice is chosen as . (26) Contrary to the classical steepest descent stepsize, the so called fast stepsize choice (26) is a non-monotone stepsize. Due to this fact, the convergence behaviour is improved, see Figure 4. The fast stepsize is applicable to all problems Ax = b, where the operator A is decomposed in a vector of operators. For any operator equation of this type, the stepsize (26) only involves the evaluation of the two inner products. Here, (26) is tailored to the application of (atmospheric) tomography, as it relies on simplifying the denominator in the presence of several data directions.
The difference can be described as follows: while the classical steepest descent stepsize uses "smoothed" information over all directions, the fast stepsize conserves the information of each direction separately. This is reflected in the choice of the denominator. Let the norms involving the residual phases directions α g be defined as The norm p shows a monotone behaviour over the number of iterations. On the contrary, the norm of the sum over all residuals q behaves non monotonously and similarly to d 2 L . This stems from the fact that q equals d (1) if h 1 = 0. Moreover, note that the residual in each direction directly relates to the gradient in each direction defined as The fast stepsize is chosen as the ratio of d 2 L and q, where the number of guide stars G + N is only needed for correct scaling. The numerical effort for evaluating (26) decreases significantly, as no additional application of A αg is needed. The already computed residual can be used instead, fewer summations are included and a better possibility for parallelization is given, see also Table 1. This is a crucial improvement, as the computational complexity for an AO reconstruction algorithm is severely restricted due to time constraints.
For comparison, we considered the well-known non-monotone stepsize choice of Barzilai-Borwein [2], i.e., with ∆Φ = Φ j −Φ j−1 and ∆d = J (Φ j )−J (Φ j−1 ), and compared the performance of the algorithm in Figure 4. According to the numerical results, the stepsize choice in (26) yields so far the best quality results, in particular for few Gradient iterations. For the exact simulation setting and the evaluation criterion, we refer to Section 4.
The pseudo-code for the Gradient-based method with the simplified and accelerated stepsize computation (26) for MOAO can be seen in Algorithm 2. ., . L vs. fast stepsize in (26) vs. Barzilai-Borwein stepsize in (28). See Section 4 for simulation details.
In Table 1, we provide the computational complexity of the operators used in the Gradient-based method with and without noise models. We focus on a system with full SH WFS for LGS and NGS, i.e. no TTS. The parameter n is the number of active apertures, L is the number of reconstructed layers and G is as the number of guide star directions. The computational effort is given as number of floating point operations in total and per process with the number of parallel processes equal to Algorithm 2 Gradient-based method with accelerated stepsize computation max(L, G). Note that both, the Gradient method with and without noise models, are parallelizable on layers L and in guide star directions G.
One can observe that the Gradient method without noise models is considerably faster than the Gradient method on the Tikhonov functional, as no costly noise models (inverse turbulence covariance or inverse spot elongation covariance) are needed. The stepsize choice in (26) yields the lowest computational complexity for the Gradient method without noise models. Under consideration of noise models, the numerical cost is higher due to the inclusion of the inverse turbulence covariance. The Barzilai and Borwein stepsize choice has very low computational cost for both cases, however, needs the storage of the previous update as well as the previous gradient, both on all reconstruction layers L. The choice of the weighted inner product ., . Ξ only slightly reduces the effort of one Gradient iteration.
4. Numerical results. In this Section, we describe the numerical performance of the algorithms presented in terms of quality using OCTOPUS [17], the end-to-end simulation tool of the European Southern Observatory (ESO). We use two test cases, a Multi-Conjugate Adaptive Optics (MCAO) system and a Multi-Object Adaptive Optics (MOAO) system on the European Extremely Large Telescope, simulated with a telescope diameter of 42 m.
(L + 1)n (237L + 2)n 2n 239n SD steps. ., . Ξ --  The simulation results include different types and levels of noise: Cases without and with noise due to spot elongation are simulated. In both cases, the level of photon flux is varied, typically from a few photons per subaperture to 10000. The photon flux relates directly to the noise level in the WFS measurements, i.e. the noise variance in the absence of spot elongation is σ 2 ∼ 1/photons per subap. In the case of spot elongation the covariance of the WFS measurements is given in (8).
To evaluate the reconstruction quality, we use the long exposure (LE) Strehl ratio [16] and simulate one second of real time, i.e. 500 time steps. Strehl ratio is a measure of image quality ∈ [0, 1] related to the L 2 -error, where 1 is best and 0 worst. As in MCAO, one is interested in a good reconstruction quality over the whole field of view, we evaluate the quality in 25 directions, arranged in a square of 1 arcmin of length. In MOAO, we only observe one direction of interest in the zenith. The reconstruction in MCAO is performed on three artificial layers at the heights of the deformable mirrors, whereas in MOAO nine layers are reconstructed.
We compare the Gradient-based method with the reference results provided by the ESO: for MCAO the reference method is the MVM, for MOAO the Fractal Iterative Method (FrIM) [35], both tuned by ESO. For the Gradient-based method, the initial guess is chosen as zero in the first time step. From then onward, a warmrestarting technique is used: At time step t > 0 the initial guess is set to the last iterate from the previous time step, i.e. Φ t 0 := Φ t−1 i . Choosing Φ 0 = 0 in each time step, would severely decrease quality performance when maintaining the used (low) number iterations, necessary to fulfil the speed requirements. More sophisticated choices of initial guess, e.g. by predictive control have not been tested extensively and are left to future work. First, we consider LGS without spot elongation. In Figures 6 and 7, one can observe that only 2 to 5 iterations are necessary for a comparable reconstruction quality in MCAO. In Figure 8, we can observe a superior quality over a variety of flux levels, when using the Zernike-tilt instead of the Gradient-tilt. This holds for MCAO (with tip/tilt reconstruction from TTS data) as well as for MOAO (with full NGS data). As expected, one can, in general, observe that the noise level (i.e. the inverse of photon flux) influences all algorithms and settings similarly: the higher the number of photons that reach each subaperture, the better the reconstruction quality. For the MOAO system we compare the inner products ., . L in (14) and ., . Ξ in (17) in Figure 9. The results for the inner product ., . Ξ lie around 5 percent lower than ., . L in the case of Gradient-tilt. For the Zernike-tilt, the difference is less pronounced. The usage of different inner products entails different gradients and, thus, results in different iterations. Let us now consider cases with spot elongation, i.e. Figure 10. In the MCAO system, we can observe an improvement when using the Gradient-based method with noise models, especially in the low flux regime. Moreover, the Zernike-tilt reconstruction shows better results than the Gradient-tilt. However, for photon flux levels below 200ph/subaperture/frame, the algorithm still shows deficiencies in the reconstruction quality.
In MOAO, this problem is even more pronounced due to the wide FoV. The Gradient-based method with noise models only yields small improvements for the  [48,25] and the Kaczmarz method [33], the Gradient-based method performs similarly for cases without spot elongation. A detailed comparison in the case of MOAO can be found, e.g., in [25]. We can conclude that the quality performance of the algorithm suffers from the spot elongation noise compared to 2-step methods, such as FEWHA or FrIM. This stems from the separation of the atmospheric tomography step into two individual subproblems -the wavefront reconstruction and the tomography step on incoming phases. In particular, the wavefront reconstruction algorithm used in the numerical tests, the CuReD algorithm, includes the summation of measurements along x-and y-axes, which removes the information of spot elongation. One possible remedy for the 3-step approach for LGS, is the application of a different wavefront reconstruction algorithm, which is more sensitive to the problem of spot elongation.
The 3-step approach with the Gradient-based method is particularly well suited for AO systems with NGS. The method is very fast and well parallelizable, and without the drawback of spot elongation, it yields good quality results. Furthermore, it is very versatile, as different algorithms for the three individual stepswavefront reconstruction, tomography and fitting step -can be combined.

Conclusion and outlook.
In this paper, we have introduced a novel iterative method based on the Gradient method for solving the problem of atmospheric tomography. Based on previous work in iterative reconstruction methods for AO systems, we developed a fast and versatile method and studied its qualitative performance in the context of MCAO and MOAO. Contrary to standard Landweber iteration and existing methods for atmospheric tomography, such as the Kaczmarz method, the Gradient-based approach allows to include the statistical models for the atmospheric turbulence and the noise due to spot elongation. Two types of tip/tilt models have been investigated, the Zernike-tilt and the Gradient-tilt. A new analysis of the Zernike-tilt is presented and simulation results suggest that a quality improvement can be reached at a lower computational cost. In addition, we presented a new stepsize choice for the gradient iteration, which decreases the numerical effort and yields superior quality results in numerical simulations. A thorough analysis of the fast stepsize choice is left to future work. We demonstrated a fast convergence of the algorithm, in particular for MCAO, and illustrated the performance in terms of quality for high-and lowflux regimes.
We believe that the Gradient-based method is a promising algorithm, in particular for NGS systems. Due to its fast and flexible character, it can be applied in different AO systems without major changes or precomputations. The improvement of the method with respect to spot elongation in wide field of view AO and the incorporation of predictive schemes are left for future study.