A stochastic approach to reconstruction of faults in elastic half space

We introduce in this study an algorithm for the imaging of faults and of slip fields on those faults. The physics of this problem are modeled using the equations of linear elasticity. We define a regularized functional to be minimized for building the image. We first prove that the minimum of that functional converges to the unique solution of the related fault inverse problem. Due to inherent uncertainties in measurements, rather than seeking a deterministic solution to the fault inverse problem, we then consider a Bayesian approach. In this approach the geometry of the fault is assumed to be planar, it can thus be modeled by a three dimensional random variable whose probability density has to be determined knowing surface measurements. The randomness involved in the unknown slip is teased out by assuming independence of the priors, and we show how the regularized error functional introduced earlier can be used to recover the probability density of the geometry parameter. The advantage of the Bayesian approach is that we obtain a way of quantifying uncertainties as part of our final answer. On the downside, this approach leads to a very large computation since the slip is unknown. To contend with the size of this computation we developed an algorithm for the numerical solution to the stochastic minimization problem which can be easily implemented on a parallel multi-core platform and we discuss techniques aimed at saving on computational time. After showing how this algorithm performs on simulated data, we apply it to measured data. The data was recorded during a slow slip event in Guerrero, Mexico.


Introduction
Subduction zones around the world are periodically prone to devastating earthquakes. The 2011 Tohoku Oki earthquake in Japan was a stark reminder of that occurrence. Experts are now warning the public and policy makers that in North America, the Pacific Northwest is in great danger of being struck by a massive earthquake in the near future partly because there is strong evidence of such events in the recent past [3]. Of course it is not possible at this stage to say when this may happen again. A better knowledge of the structure of the subduction zone and the interface between oceanic and continental crusts in the Pacific Northwest, together with an assessment of the mechanical stress budget will undoubtedly help geophysicists make progress in predictive skills. Deformations in the vicinity of major subduction zones have been continuously recorded for some time using geodetic networks (GPS, tiltmeters) as well as broadband seismological networks. GPS networks have revealed the existence of periods of reversed motion relative to the interseismic motions in many subduction zones worldwide [8,9]. These reversals of movement are interpreted as aseismic slow slip events (SSE) occurring deep beneath the subduction zone below the locked seismogenic zone. Prior to two recent studies [28,30], the geometry profiles of subduction zones have been derived for the most part from seismology and are therefore poorly constrained. In a separate step, using these geometry profiles, investigators have studied slip distributions for SSE's from GPS time series, occasionally augmented by InSAR data. In this paper, we introduce and analyze error functionals for the reconstruction of fault geometries based on surface measurements of displacement fields, and we derive a stochastic inversion procedure which relies on these functionals. The physics of our problem are modeled using the equations of linear elasticity and the data for the fault inverse problem consists of measurements of surface displacements. Evidently, GPS surface measurements are inherently tainted by errors. There are also errors due to using a PDE model, which of course can only give a simplified sketch of complex geophysical processes occurring in subduction zones. In this paper we take into account these errors by seeking to determine probability densities for the geometry thanks to a Bayesian formulation. We assume that the fault is planar which is a common assumption in geophysics, at least for the active part of a fault (the part where the slip is non zero) during a slow slip event. With this assumption only the joint probability density of three scalar parameters giving the equation of the plane containing the fault has to be determined, thanks to the assumption that the geometry parameters and the slip field on the fault are independent. This paper is organized as follows. In section 2 we introduce a mathematical formulation for modeling slow slip events on faults. This model relies on the equations of linear isotropic elasticity in half space. We review existence and uniqueness results for the forward problem and a uniqueness result for the fault inverse problem. This result ensures that if surface displacement fields are known on an open set of the top boundary then it is possible to reconstruct the fault and the slip on that fault. In section 3 we introduce a regularized functional for reconstructing faults and slip fields from surface measurements. We prove that as the regularization constant C tends to zero, the reconstructed profile and slip field obtained by minimizing that functional converge to the actual profile and slip that produced the surface displacements. Let us point out here that this result is not trivial since, although this inverse problem is linear in the slip field, it is non -linear in the geometry of the fault. In section 4 we define another reconstruction functional which involves only a finite number of surface measurements and slip fields in a finite dimensional subspace. In that case it is not possible to invoke the uniqueness result for the inverse problem proved in earlier work. However, we are able to prove that if the number of measurement points is sufficiently large, if the subspace over which this functional is minimized is large enough, and if the regularization constant is sufficiently small, then the solution to this discrete minimization problem can be arbitrarily close to the actual profile and slip that produced the surface displacements. In section 5 we take into consideration that the number N of surface displacement measurements is low and these measurements are uncertain, so rather than seeking a deterministic solution to the fault inverse problem, we consider a Bayesian approach to solving the fault inverse problem. As customary in Bayesian modeling of inverse problems, the difference between measured data and predicted surface displacements for a given geometry of the fault and a given slip is assumed to be a Gaussian random variable with mean zero. The random slip field is also assumed to be Gaussian and we make the assumption that the prior geometry parameters and the prior slip field are independent. Thanks to this independence assumption, recovering only the probability density of the geometry parameters becomes a computationally tractable problem, albeit by use of advanced computational techniques discussed further. As further motivation for this Bayesian approach, we prove in section 5 that the recovered probability density of the geometry parameters tends to zero for all geometry parameters different from those of the true profile as the number of measurement points grows large and the variance of the measurements and the regularization parameter for the reconstructed slip field become small. Our proposed algorithm is amenable to implementation on a parallel multi-core computational platform. The combination of relevant linear algebra techniques and parallel implementation led to great savings in computational time. In section 6 we apply this reconstruction algorithm to the case of the 2007 Guerrero, Mexico SSE. We first examine three test cases with numerically generated data for the inverse problem. The surface points for the test cases are the same as those where geophysicists sampled real world measurements. All length scales and noise levels have same order of magnitude as those observed in the real world. Different geometries are considered and in one case we add a systematic error due to imperfections in the model. Our last numerical computation involves real world measurements and results in the reconstruction of the part of the subduction interface beneath the Guerrero region which was active during the 2007 SSE. In this last simulation the only benchmarks for our calculation are geometries estimated by other authors (in most cases, based on other physical processes). We observe that many of the profiles found by other authors fall in the plus or minus one standard deviation envelope of the profile derived in this present study.

Mathematical model and uniqueness result 2.1 Forward problem
Using the standard rectangular coordinates x = (x 1 , x 2 , x 3 ) of R 3 , we define R 3− to be the open half space x 3 < 0. The derivative in the i-th coordinate will be denoted by ∂ i . In this paper we only consider the case of linear, homogeneous, isotropic elasticity; the two Lamé constants λ and µ will be two positive constants. For a vector field u = (u 1 , u 2 , u 3 ), the stress and strain tensors will be denoted as follows, and the stress vector in the normal direction n will be denoted by Let Γ be a Lipschitz open surface which is strictly included in R 3− . Let u be the displacement field solving T n u is continuous across Γ, (2.3) [u] = g is a given jump across Γ, where e 3 is the vector (0, 0, 1). In [30], we defined the functional space V of vector fields u defined in R 3− \ Γ such that ∇u and u Lipschitz surface containing Γ. We define the Sobolev spaceH 1 2 (Γ) 2 to be the set of restrictions to Γ of tangential fields in H 1 2 (∂D) 2 supported in Γ. We proved in [30] the following theorem Theorem 2.1 Let g be inH 1 2 (Γ) 2 . The problem (2.1-2.4) has a unique solution in V. In addition, the solution u satisfies the decay conditions (2.5).
In this paper we will only consider forcing terms g which are tangential to Γ. Physically, this reflects that the fault Γ is not opening or starting to self intersect: only slip is allowed. We recall that if g is continuous, the support of g, supp g, is equal to the closure of the set of points in Γ where g is non zero; in general supp g is defined in the sense of distributions.

Fault inverse problem
Can we determine both g and Γ from the data u given only on the plane x 3 = 0? Many investigators have studied uniqueness and stability results for inverse boundary problems. Earlier studies include papers such as Sylvester and Uhlmann's, [26], regarding the isotropic conductivity equation where it is proved that the knowledge of the Dirichlet to Neumann boundary operator uniquely determines smooth conductivities. In [19], Lee and Uhlmann showed that this is still true in the anisotropic case, up to a diffeomorphism. On the subject of cracks, Friedman and Vogelius [10] proved that, in dimension 2, it suffices to apply two adequately chosen forcing terms on the boundary to uniquely determine cracks in the framework of the conductivity equation. The case of the two dimensional elasticity equation was considered by Beretta et al. in [5]. Stability results for linear cracks were derived; Beretta et al. proposed in [4] a MUSIC type algorithm for determining the position of these linear cracks from boundary measurements.
We note, however, that the case of interest in our present paper is substantially different for two main reasons: first, the forcing term g is given on the fault Γ, and second, our problem is three dimensional. In [30], we proved the following result: Theorem 2.2 Let Γ 1 and Γ 2 be two bounded open surfaces, with smooth boundary, such that each of them is included in a rectangle strictly contained in R 3− . For i in {1, 2}, assume that u i solves (2.1-2.5) for Γ i in place of Γ and g i , a tangential field inH 1 2 (Γ i ) 2 , in place of g. Assume that g i has full support in Γ i , that is, supp g i = Γ i . Let V be a non empty open subset in {x 3 = 0}. If u 1 and u 2 are equal in V , then Γ 1 = Γ 2 and g 1 = g 2 .
There is a Green's tensor H such that the solution u to problem (2.1-2.4) can also be written out as the convolution on Γ Γ H(x, y)g(y) dσ(y), (2.6) The practical determination of this adequate half space Green's tensor H was first studied in [21] and later, more rigorously, in [27]. Due to formula (2.6) we can define a continuous mapping M from tangential fields g inH 1 2 (Γ) 2 to surface displacement fields u(x 1 , x 2 , 0) in L 2 (V ) where u and g are related by (2.1-2.5). Theorem (2.2) asserts that this mapping is injective, so an inverse operator can be defined. It is well known, however, that such an operator M is compact, therefore its inverse is unbounded. It is thus clear that any stable numerical method for reconstructing g from u(x 1 , x 2 , 0) will have to use some regularization process. In fact, in practice, our problem is even more challenging due to the fact that the geometry of the fault Γ is also unknown. A numerical solution to determining Γ and g from u(x 1 , x 2 , 0) will have to use a priori regularizing assumptions on g and must be tested for robustness to noise.

A functional for the regularized reconstruction of planar faults
Let R be a closed rectangle in the plane x 3 = 0. Let B be a set of (a, b, d) such that the set We assume that B is a closed and bounded subset of R 3 . It follows that that the distance between Γ m and the plane x 3 = 0 is bounded below by the same positive constant for all m in B. In this section we assume that slips are supported in such sets Γ m (meaning that their supports are included in Γ m , but they could be different from Γ m ). We can then map all these fields into the rectangle R. We thus obtain displacement vectors for x in V by the integral formula for any g in H 1 0 (R) and m in B, where σ is the surface element on Γ m and H m (x, y 1 , y 2 ) is derived from the Green's tensor H for y on Γ m . We now assume that V is a bounded open subset of the plane x 3 = 0 and for a fixedũ be in L 2 (V ), and a fixed m in B we define the functional where C(x) is a diagonal positive definite 3 by 3 matrix for x in V , which is continuous in x, and C is a positive constant. In formula (3.3) we intentionally used C −1 rather than C because we will later view it as a covariance term. Define the operator It is clear that A m is linear, continuous, and compact. The functional F m,C can also be written as, where in L 2 (V ) we use the norm and in H 1 0 (R) we use the norm In the remainder of this paper, for the sake of simplifying notations, both L 2 (V ) and H 1 0 (R) will be abbreviated by ; context will eliminate any risk of confusion.
Proof : The result holds thanks to classic Tikhonov regularization theory (for example, see [18], Theorem 16.4).
For h m,C as in the statement of Proposition 3.1 we set, Proposition 3.2 f C is a Lipschitz continuous function on B. It achieves its minimum value on B.
Proof : We first note that the term H m (x, y 1 , y 2 ) and all its derivatives are uniformly bounded for x in V , (y 1 , y 2 ) in R and m in B thanks to (3.1). It follows that H m (x, y 1 , y 2 ) is Lipschitz continuous in for m in B with uniform Lipschitz constants for (y 1 , y 2 ) in R and x in V , so there is a positive constant L such that for any m and m in B, for all (y 1 , y 2 ) in R, and all x in V . It follows that there is a constant F such that for all m and m in B. By minimality for h m ,C , and given that we can switch the roles of m and m , we found that Finally we just recall that B is compact to claim that f C achieves its minimum value.
The following theorem explains in what sense the argument of the minimum of the functional F m,c converges to the slip solving the fault inverse problem, and how the argument of the minimum of f C converges to the geometry parameter solving the fault inverse problem.
Theorem 3.1 Assume thatũ = Amh for somem in B and someh in H 1 0 (R). Let C n be a sequence of positive numbers converging to zero. Let m n be any sequence in B such that f Cn (m n ) minimizes f Cn (m) for m in B and set f Cn (m n ) = F mn,Cn (h mn,Cn ). Then m n converges tom, h mn,Cn converges toh in H 1 0 (R), and A mn h mn,Cn converges toũ in L 2 (V ).
Proof : We first note that Arguing by contradiction, assume that m n does not converge tom. After possibly extracting a subsequence, we may assume that m n converges to m * in B with m * =m. By (3.9) h mn,Cn is bounded in H 1 0 (R): after possibly extracting a subsequence, we may assume that h mn,Cn is weakly convergent to some h * in H 1 0 (R). Next we observe that A mn is norm convergent to A m * and we recall that A m * is compact: it follows that A mn h mn,Cn converges strongly to A m * h * , thus we may take the limit as n approaches infinity of (3.9) to find As m * =m, this contradicts uniqueness Theorem 2.2. The same argument can be carried out to show that A mn h mn,Cn converges toũ. We now show that h mn,Cn converges toh in H 1 0 (R). Since A mn h mn,Cn converges toũ, A mn is convergent to Am in norm, and by (3.9) h mn,Cn is bounded, we can claim that Amh mn,Cn converges toũ. Let v be in L 2 (Γ). The dot product in L 2 (V ) and in H 1 0 (R) associated with the norms (3.6) and (3.7) will be denoted by ( , ). As and Am is injective (due to theorem 2.2, so the range of A * m is dense), this shows that h mn,Cn converges weakly toh in H 1 0 (Γ). To obtain strong convergence we recall that due to (3.9), h mn,Cn ≤ h and we write which tends to zero due to the weak convergence of h mn,Cn toh.

A functional for the reconstruction of planar faults from a finite set of surface measurements
For j = 1, .., N , let P j be points on the surface x 3 = 0 andũ(P j ) be measured displacements at these points. Let F p be an increasing sequence of finite-dimensional subspaces of where A m was defined in (3.4), C > 0 is a constant. As to the constants C (j, N ), simply put, they relate F disc m,C to F m,C as N tends to infinity. More precisely we assume that C is smooth and that for all positive integer k, and for all where D l ϕ is a partial derivative of ϕ with total order l and β is a positive integer depending on k. We also assume that C (j, N ) > 0 for all positive integer N and all j = 1, .., N .
The functional F disc m,C achieves a unique minimum on F p .
According to Proposition 4.1, F disc m,C achieves its minimum at some h disc m,C in F p . We set is a Lipschitz continuous function on B and achieves its minimum value on B.

Proof :
The proof is similar to that of Proposition 3.2.
We now discuss the connection between the continuous and the discrete reconstruction functionals. We will assume that the surface dataũ is given byũ = Amh, for some sliph in H 1 0 (R) andm in B. Evidently, in the extreme case where the number of measurement points is N = 1, we should expect no relation between h disc m,C andh. In this section we want to analyze the convergence properties of h disc m,C and the minimizer of f disc C as the number of measurement points N tends to infinity, p tends to infinity, and C tends to zero. Related proofs are rather intricate and technical, so we placed them in Appendix.
Then for all η > 0, there is an N 0 in N, a p 0 in N, and two positive constants C 0 and C 1 such that if N > N 0 , p > p 0 , and C 0 N −β < C < C 1 then Proof : The proof is given in Appendix.
Remarks: Note that m disc depends implicitly on N , C, and p. To interpret the condition C 0 N −β < C in Theorem 4.1, we recall that as C tends to zero, intuitively speaking, the functional F disc m,C tends to an error (or misfit) calculated on the N surface measurementsũ(P j ). Theorem 4.1 states that a sufficient requirement for the reconstructed geometry parameter m disc to approach the real geometry parameterm is for the regularization parameter C to tend to zero and the subspace F p to become large, all the while the number of measurement points N tends to infinity with a rate such that C 0 N −β < C. Roughly speaking, this means that N should not tend to infinity too slowly as C tends to zero.

Model derivation
In our stochastic model we assume that the geometry parameter m = (a, b, d) in B, the slip field g in F p , and the measurementsũ(P j ), are related by where A m is given by (3.4), m and g are now random variables, and E in R 3N is additive noise, which is also assumed to be a random variable. We assume that E has a normal probability density ρ noise with mean zero and diagonal covariance matrix such that Accordingly, the probability density of the measurementũ meas knowing the geometry parameter m and the slip field g is Next, we assume that the random variables m in B and g in F p are independent. The prior distribution of m, ρ prior is assumed to be uninformative, that is, ρ prior (m) ∝ 1 B (m) and the prior distribution of g is assumed to be Gaussian with mean zero and given by Applying the Bayesian theorem and independence of the priors of m and g, we write ρ(m, g|ũ meas ) ∝ ρ(ũ meas |m, g)ρ Fp (g)ρ prior (m). (5.5) In this work we are only interested in recovering the posterior probability density of m, so we integrate formula (5.5) in g over F p to obtain the marginal posterior density of m. It turns out that this can be done explicitly thanks to the Gaussian formulation. Introducing the 3N by 3N diagonal matrix D such that we can state, Proposition 5.1 Integrating in g over F p the right hand side of formula (5.5), we find where F disc m,C is given by (4.1), h disc m,C is defined by (4.3), I q is the identity operator of the q dimensional subspace F p , A m , initially defined by (3.4), is here restricted to a linear operator from F p to R 3N .

Proof:
The proof is given in Appendix.

Proof of convergence of stochastic model to the unique solution
of the deterministic inverse problem as covariance tends to zero Investigators have examined the limit probability laws for inverse problems as the number of measurement points grows large [1] or as the random fluctuations of the media are small [6] but these studies pertained to source location in media with propagating waves, so the connection to our framework is non trivial and a detailed mathematical analysis is likely to be involved. We can still state and prove the following result regarding the pointwise convergence of the posterior distribution ρ(m|ũ meas ) for small noise covariance. The proof relies on estimates shown in the previous two sections. Formally, the limit case where the covariance defined by C tends to zero can be interpreted in light of Theorem 4.1. Suppose that the measurementsũ meas (P j ) were produced by a slip on a fault whose geometry was given bym. As C tends to zero, ρ(m|ũ meas ) tends formally to the Dirac measure centered atm, so the probability density ρ(m|ũ meas ) will achieve its maximum arbitrarily close tõ m. More precisely, we now set where τ > 0 is a constant that will tend to infinity, 1 B is the indicator function of B, and I τ is a normalizing constant. Note the new surface covariance is τ −1 C, so letting τ → ∞ will make this rescaled covariance tend to zero. Observe also that in this formulation, since both C −1 and C are both rescaled by τ , h disc m,C is independent of τ .
Proposition 5.2 Suppose that the probability density of m knowing surface measurements is given by (5.8) whereũ in the definition (4.1) of F disc m,C is given byũ = Amh for somem in B andh in H 1 0 (R) and h disc m,C is as in (4.3). Let m 0 be in B such that m 0 =m. Then for all large enough N and p, and small enough C, the posterior distribution ρ τ (m|ũ meas ) evaluated at m 0 converges to zero as τ → ∞. Additionally, this convergence is uniform in m 0 , as long as m 0 remains bounded away fromm.
Proof : Fix m 0 different fromm. It is clear that the exponential and the fraction in formula (5.8) converges to zero as τ tends to infinity, so the crux of the proof is to account for the normalizing constant I τ . According to Theorem 4.1, for a fixed N , p, and C, provided N and p are large enough and C is small enough, the minimum of f disc C will occur in the ball with centerm and radius |m−m0| 2 . We now fix such an N , p, and C. Let m be a point where f disc C will achieve its minimum in this ball. Let h m0 be the element in F p where F disc m0,C achieves its minimum and h m be the element in F p where F disc m ,C achieves its minimum. Set . We must have γ > 0 since m 0 is not in the ball with center m and radius |m−m0| 2 . But by continuity, there is a positive α such that if |m − m | ≤ α, Let h m be the element in H 1 0 (R) where F disc m,C achieves its minimum. Necessarily, if |m−m | ≤ α, We can now estimate the normalizing constant I τ . Now letting α be the maximum of the largest eigenvalue of A m D 2 A m for m in B, and we have, It follows that which tends to zero as τ → ∞. We note that this convergence is uniform in m 0 , as long as m 0 remains bounded away fromm: this is due to Theorem 4.1.

Discrete problem and size of computation
As mentioned earlier, H 1 0 (R) is approximated by the q -dimensional vector space over R, F p . The slip field g will be approximated by g (p) and the term ∇g in (4.1) will be approximated by Dg (p) for ∂ y1 g and Eg (p) for ∂ y2 g where D and E are two invertible q by q matrices. The term (A m g)(P j ), j = 1, .., N , in (4.1) is approximated by Ag (p) , where A is a 3N × q matrix (the reader should bear in mind that this matrix depends on m). The singular values of the continuous operator A m are known to decrease very fast, [11]. Accordingly, the singular values of A decrease fast too. The discrete equivalent of minimizing (4.1) is to minimize where we use the usual Euclidean norms on R 3N and on R q and u (3N ) in R 3N stands for the dataũ meas (P j ), j = 1, .., N . If C > 0 is known, this amounts to solving the following linear equation In practice this inverse problem is vastly underdetermined since 3N << q. Even if 3N was greater or equal than q, it would not be possible to set C = 0 since the singular values of A decay very fast. We set a grid of points in B Thus all together, if C is known the q × q linear system (5.12) has to be solved |I| times.
Here we recall R is two dimensional; in our practical calculations q was 50 2 or 100 2 , while |I| was about 50 3 . Finally, an appropriate value for C had to be computed by iterations, so for each value of (i 1 , i 2 , i 3 ) the linear system (5.12) had to be solved multiple times (the number of iterations depended on (i 1 , i 2 , i 3 ), it was on a range from 5 to 50, so all together linear system (5.12) had to be solved about 10 6 to 10 7 times). In fact our calculation was only possible to perform thanks to the use of adequate linear algebra techniques and a parallel multi core implementation which are described in details in subsequent paragraphs.

Algorithm for selecting the regularizing constant C
Some general suggestions for selecting the regularizing constant C can be found in the literature, [12,13,16]. We note, however, that some well known methods (for example, truncated SVD) are inapplicable since we are not within the framework of L 2 regularization. The additional challenge in our case is that we have as many matrices A as different points (a i1 , b i2 , d i3 ) for the chosen grid of B. Our method for selecting a constant C uses the following lemma.
Proof : The proof is given in Appendix. Solving for C(i 1 , i 2 , i 3 ) for (i 1 , i 2 , i 3 ) ∈ I is expensive since in practice I has a large cardinality, and for each (i 1 , i 2 , i 3 ) in I a non-linear inequality has to be solved where at each iterative step a large linear system has to be solved.
We employed several techniques for cutting down computational time.
1. The Green's tensor for half space linear elasticity H(x, y) is notoriously expensive to compute. It is possible, however, to use a simplified form since we know that x 3 = 0. Formulas that reduce the number of operations were given in [27]. Additional savings in computational time were achieved for the fault inverse problem as we computed values H(x, y) by passing a single, large array.
2. Woodbury's formula is remarkably helpful for a fast computation of (A D 2 A + C(D D + E E)) −1 A D 2 u (3N ) : this is because (D D + E E) −1 can be precomputed and stored, and 3N << q, so computing the SVD of DA is cheap, and A D 2 A has low rank.
3. Finally, since the set of indices I is typically large, it is greatly beneficial to use a multi-core parallel implementation. After pre-computing a few variables, matrices, and tables, computing all the constants C(i 1 , i 2 , i 3 ) can be done in parallel.
This led us to the following algorithm.
Algorithm for computing 3N ) ) < Err use a non-linear iterative solver to find C(i 1 , i 2 , i 3 ) % at each iteration use Woodbury's formula for computing We are now able to define a uniform regularization constant C by setting To justify this choice, first we note that for each (i 1 , i 2 , i 3 ) in I, intuitively speaking, C(i 1 , i 2 , i 3 ) will give rise to the most regular (or the least energy) pseudo-solution to the equation Ag (p) = u (3N ) with error tolerance Err. This is in line with previous studies of de-stabilization of faults [7,14,29] and general minimum energy principles in physics. Next, picking the same C for all (i 1 , i 2 , i 3 ) ∈ I will make it possible to find the geometry m that will best replicate the surface measurementsũ meas (P j ) with a uniform control of energy exerted by the slip.

Algorithm for computing the probability density ρ(m|u (3N ) )
The probability density of ρ(m|u (3N ) ) is given by , (5.14) where I is a normalizing constant (which is unknown at the beginning of the calculation) and Again, since the cardinality of I is rather large, we are careful to pre-compute and store adequate terms. This led us to the following algorithm.
Algorithm for computing the probability density ρ(u (3N ) Evaluate ρ(m|u (3N ) )/I by using formula (5.14) end After applying the algorithm, there only remains to compute the normalizing constant I. Since A m is a smooth function of m it follows that g (p) is in turn smooth in m, and so is ρ(m|u (3N ) )/I. The exponential in formula (5.14) shows that ρ(m|u (3N ) )/I is also rapidly decaying way from its maximum, so ρ(m|u (3N ) )/Idm can be efficiently evaluated by the three dimensional trapezoidal rule.

Numerical results
We present in this section numerical results for the recovery of the fault geometry parameter m from surface measurements. Ultimately, we will show results pertaining to the specific case of the 2007 Slow Slip Event (SSE) which occurred in the Guerrero region of Central Mexico. Figure 1 shows locations where surface displacements have been recorded. Over time, some recording stations have closed while others have opened: it is therefore preferable not to use all these stations in the training step. Specifically, we will use measurements from the stations ACAP, ACYA, CAYA, COYU, CPDP, DEMA, DOAR, IGUA, MEZC, UNIP, YAIG. In effect, these will be the points P j in our computations, for j = 1, .., N , and N is equal to 11. We use a rectangular system of coordinates x 1 , x 2 , x 3 centered at ACAP: the x 1 direction runs West-East, the x 2 direction runs South-North, and the x 3 direction runs down-up. In effect, this assumes that the Earth is locally flat. Units for distances will be kilometers. Local geography is ignored, so x 3 = 0 at each of these 11 stations. The medium Lamé coefficients λ and µ will be set to 1, which results in a Poisson ratio 0.25, a commonly agreed upon value for Earth's rocks. We refer to [28,30] for an account of how raw displacement data was collected day after day. The data was then completed and smoothed, as explained in [28]. The error bar on the data can be estimated by comparing the smoothed data to the raw data. Here we have to emphasize that finding the most optimal and accurate estimates of the average and the standard deviation of displacement fields is beyond the scope of our work. However, satisfactory estimates are easy to find and provide a good starting point for addressing the stochastic fault inverse problem. The effective maximum of |ũ(P j )| is about 100 mm. The standard deviation on measurements of horizontal displacements can be estimated to .8 mm, and 2 mm for the standard deviation of vertical displacements. We will show in this section three test cases before covering the real world case. In the test cases, the surface points P j will be the same as the ones used in the real world data case. We simulated data and added gaussian noise with same covariance as the one estimated in the real world data case. In the test cases we made sure to set faults with depths that are consistent with what geophyscists expect to find in that region (in general, these depths are not deeper than 80 km, due to the thickness of Earth's crust) and to produce surface displacements with the same order of magnitude as those observed for the 2007 Guerrero SSE. In each case we set the center of the rectangle R to be the average of the coordinates of P j weighted by |ũ(P j )|. The lengths of the sides of the rectangle R can be set by first examining a large area which includes all the P j s and then re-focusing it from a reconstructed h. Alternatively, the size of rectangle R can be estimated by applying the quasi constant slip method presented in section 3.1 of [28].

First test case
In our first example,m is such that a = −0.3, b = −0.15, d = −14. A sketch of the fault Γ, of the slip fieldh, and the resulting surface measurementsũ(P j ) is shown in Figure 2. After surface displacements were computed following formula (3.2), Gaussian noise with zero mean was added. We picked a covariance matrix with diagonal terms equal to (.5 mm) 2 for horizontal displacements and (1.5 mm) 2 for vertical displacements. In Figure 3 we show computed selected values of C(i 1 , i 2 , i 3 ) near C for different values of Err, see definitions 5.1 and (5.13). It is not possible to point to a single preferred value for Err, but we should expect it to be at least twice the standard deviation of the measurements. In Figure 3 we show selected values of C(i 1 , i 2 , i 3 ) for the relative error Err/ u (3N ) between 0.01 and 0.2. Since there is no preferred value of Err, we choose several possible C: in this particular case k · 10 −4 , for k = 1.. 10. In Figure 4 we show computed marginal distributions for the geometry parameters a, b, and d for the value C = 10 −3 and three different assumed values of σ hor and σ ver , the standard deviation for horizontal and vertical measurements.

Second test case
In our second test,m is such that a = −0.3, b = 0.15, d = −25. The slip field for producing the surface data is sketched in Figure 5. This is a more challenging case since this field is non-convex. In addition, for this combination of geometry and slip field only a few points P j contribute valuable information for the surface displacement field. In theory, with continuous data on an open set of the surface x 3 = 0 this should not be a problem, but in practice, with a limited number of observation points our algorithm does not perform as well as previously. The most likely recovered values for m are about −0.2, 0.1, −27, this is not as close to the correct values as in the previous case. In Figure 7 we show the reconstructed slip field for this most likely geometry. Note how one of the two connected components ofh is better reconstructed than the other one.

Third test case
In our third test,m is such that a = 0.1, b = −0.15, d = −24. In this case we illustrate how (modest) modeling errors may impact the reconstruction algorithm. Here, the direction of slip is not in line with the direction of steepest ascent, while in the reconstruction step we wrongly assume that these two directions are the same. These two directions and the fault are sketched in Figure 8. In addition, noise was added to the surface measurements as in the previous two cases. In Figure 9 we show computed marginal distributions for the geometry parameters a, b, and d. The computed maximum likelihood for m are achieved at .12, -.14, -20, so in this "wrong model" case these values are not as close to the original values that were used to produce data as they were in the first test case.

Application to the case of measured surface displacements during the 2007 SSE in Guerrero, Mexico
We now show the most interesting case as far as applications are concerned. We start from measurements relative to the 2007 SSE in Guerrero, Mexico, which were processed as described earlier: both u(P j ) and standard deviation on these measurements were estimated. We show in Figure 10 computed marginal distributions for the geometry parameters a, b, and d for the constant C set to 6 · 10 −4 . Next we fix (approximate) most likely values for the geometry parameters a, b, d to −.13, −.19, −18, and we compute expected slip on the fault and standard deviation: results are shown in Figure 11. Here we need to point out that once the geometry of the fault is fixed, we only need to solve a linear stochastic inverse problem: this is rather trivial since there is a linear relationship between the covariance matrix of the data and the covariance matrix of the slip on the field. In the case of measured data, we can only validate our calculation by comparing our reconstructed fault to those offered by earlier studies: see [17,22,25] for the geometry of the fault (these studies were based on seismicity and gravity), [23,24] for the profile of the slip on the fault, and [28,30] for combined (deterministic) studies of simultaneous reconstruction of geometry and slip fields. In Figure 11,  Figure 4: Test case 1. Computed marginal distributions for the geometry parameters a, b, and d. The blue star curve corresponds to the assumption σ hor = 1, σ ver = 3, the red circle curve corresponds to the assumption σ hor = 2, σ ver = 6, and the orange cross curve corresponds to the assumption σ hor = 3, σ ver = 9.
steepest descent the depth is between 27 km and 33 km (these are plus or minus 1 standard deviation intervals). This is comparable and rather on the high side of depths found in other studies, see Figure 10 in [30].   The blue star curve corresponds to the assumption that σ hor = .5, σ ver = 1.5, the red circle curve corresponds to the assumption that σ hor = 1, σ ver = 3, and the orange cross curve corresponds to the assumption that σ hor = 2, σ ver = 6.

Appendix
The following two lemmas are needed for the proof of Theorem 4.1.
Lemma 7.1 Let h disc m,C be the unique point where F disc m,C achieves its minimum in F p . There are two positive constants C 0 and C 1 such that h disc m,C is uniformly bounded in H 1 0 (R) for all m in B, p in N, N in N and C such that C 0 N −β < C < C 1 .

Proof:
Withh as in the statement of Theorem 3.1, leth p be the orthogonal projection ofh on F p . As Since h p ≤ h , given that A m is continuous in m, B is compact, and A m continuously maps H 1 0 (R) into smooth functions on V , by (4.2), Then γ > 0.
Proof : Arguing by contradiction, assume that γ = 0. Then there is a sequence g n in H 1 0 (R) such that g n ≤ M and V |C − 1 2 (A m g n −ũ)| 2 converges to zero as n → ∞. A subsequence of g n is weakly convergent in H 1 0 (R) to some h * . It will still be denoted by g n for the sake of simpler notations. As the operator A m is compact, we find at the limit that Since m =m, this contradicts uniqueness Theorem 2.2.
Proof of Theorem 4.1: Arguing by contradiction, assume that there exist an η > 0 and three sequences N n in N, p n in N, and C n in (0, 1) such that N n → ∞, p n → ∞ and C n → 0 while C −1 n N −β n is bounded above and denoting we have that |m n −m| > η. As B is compact, after possibly extracting a subsequence, we may assume that m n converges to some m * in B, with m * =m. Since C n tends to zero, applying Theorem 3.1, there is a sequence m n which converges tom and such that where F m n ,Cn (h m n ,Cn ) = min F m n ,Cn (g), so F m n ,Cn (h m n ,Cn ) converges to zero. Fix > 0. Set F disc mn,Cn (h disc mn,Cn ) = min g∈Fp n F disc mn,Cn (g). Let h m n ,Cn,p be the orthogonal projection of h m n ,Cn on F p . We first note that the convergence of h m n ,Cn to h implies that h m n ,Cn,p converges to h m n ,Cn as p → ∞, uniformly in n. Thus, using minimality of h disc mn,Cn , F disc mn,Cn (h disc mn,Cn ) ≤ F disc m n ,Cn (h m n ,Cn,pn ) ≤ F disc m n ,Cn (h m n ,Cn ) + , for all n large enough. Using again the boundedness of h m n ,Cn , we can write that for all n large enough, F disc m n ,Cn (h m n ,Cn ) ≤ F m n ,Cn (h m n ,Cn ) + , and since F m n ,Cn (h m n ,Cn ) converges to zero we infer that for all n large enough, F disc mn,Cn (h disc mn,Cn ) ≤ 3 . Setting g = h disc m,C + h, it follows that DA m g − Dũ meas 2 + C g 2 = DA m h disc m,C − Dũ 2 + C h disc m,C 2 + DA m h 2 + C h 2 , Next we set q = dim F p and we introduce an orthonormal basis e 1 , ..., e q of F p which diagonalizes A m D 2 A m . Let µ 2 j be such that A m D 2 A m e j = µ 2 j e j . We can now integrate exp(− 1 2 DA m h 2 − 1 2 C h 2 ) for h over F p by just rotating the natural basis of F p to the orthonormal basis e 1 , ..., e q to obtain .
In [30], Appendix B, we showed how D and E can be chosen assuming that we use a regular grid on R. For that particular choice, D and E are equal to 2 while D −1 and E −1 are bounded by √ q. (we used q for an upper bound for D and E but that bound can be improved to √ q by observing that the block matrix M defined in appendix B of [30] is the sum of the identity and a m-nilpotent matrix with norm 1, where m = √ q). As for any x in R q x T (A D 2 Ag (p) + C(D D + E E))x ≥ C Dx 2 + C Ex 2 ≥ 4C q x 2 , (7.8) (A D 2 A + C(D D + E E)) −1 exists for all C > 0 and is a continuous function of C. Since g (p) solves (5.12), g (p) and D(Ag (p) − v (3N ) ) are also continuous functions of C in (0, ∞). Left multiplying (5.12) by g p and applying the Cauchy Schwartz inequality we find To find the limit of Ag (p) −u (3N ) as C tends to zero we first recall that A D 2 (u (3N ) −v (3N ) ) = 0. By definition there is an x in R p such that DAx = Dv (3N ) . From (5.12),