Optimal Local Multi-scale Basis Functions for Linear Elliptic Equations with Rough Coefficient

This paper addresses a multi-scale finite element method for second order linear elliptic equations with arbitrarily rough coefficient. We propose a local oversampling method to construct basis functions that have optimal local approximation property. Our methodology is based on the compactness of the solution operator restricted on local regions of the spatial domain, and does not depend on any scale-separation or periodicity assumption of the coefficient. We focus on a special type of basis functions that are harmonic on each element and have optimal approximation property. We first reduce our problem to approximating the trace of the solution space on each edge of the underlying mesh, and then achieve this goal through the singular value decomposition of an oversampling operator. Rigorous error estimates can be obtained through thresholding in constructing the basis functions. Numerical results for several problems with multiple spatial scales and high contrast inclusions are presented to demonstrate the compactness of the local solution space and the capacity of our method in identifying and exploiting this compact structure to achieve computational savings.


Introduction
Many problems of practical importance in science and engineering have multi-scale feature: composite materials modeling and flows in porous media are typical examples of such kind. In many cases, the qualities of interest are only related to the large-scale properties of the solutions. However, the fine-scale features of the model can have significant impact on the large-scale properties of the solutions, thus one needs to use very fine mesh to resolve the small-scale variations of the problem even though only large-scale properties of the solutions are required. The computational cost can be prohibitive. For these so-called multi-scale problems, efficient upscaling methods that allow us to incorporate the small-scale features of the problem into the large-scale properties of the solutions are desired.
In this work, we use the following example of second order linear elliptic equation with homogeneous Dirichlet boundary condition to illustrate our upscaling methodology, −div(a(x)∇u(x)) = f (x), x ∈ Ω; u(x)| ∂Ω = 0, (1.1) where Ω is a convex polygon domain in R d with d = 2, 3. We assume that the equation is uniformly elliptic, i.e., there exist λ min > 0 and λ min > 0 such that We do not assume any regularity of the coefficient a(x) ∈ L ∞ (Ω), which may have multiple spatial scales, thus the above equation (1.1) can be used to model diffusion process in strongly heterogeneous media. We also assume that in (1.1) the forcing function f (x) ∈ L 2 (Ω). Then the existence of solution to (1.1), u(x) ∈ H 1 0 (Ω) follows immediately from the Lax-Milgram theorem, and we have However, due to the roughness of the coefficient a(x), the solution to (1.1) u(x) loses regularity and ceases to be in H 2 (Ω). Classical finite element method uses piecewise polynomials to approximate the solution space, and its convergence depends on the following approximation property ∥u(x) − Ju(x)∥ H 1 0 (Ω) ≤ Ch∥u(x)∥ H 2 (Ω) , (1.4) and the regularity result ∥u(x)∥ H 2 (Ω) ≤ C∥f (x)∥ L 2 (Ω) , (1.5) where Ju is the piecewise polynomial interpolation of u(x), and h is the underlying mesh size. Classical finite element method may fail for these problems (1.1), since ∥u(x)∥ H 2 cannot be bounded by ∥f (x)∥ L 2 (Ω) in (1.5). It is actually showed in [6] that the polynomial finite elements can perform arbitrarily badly in this setting, thus (1.1) can serve as a typical example of multi-scale problems.
One of the strategies to numerically solve the multi-scale problem (1.1) when classical finite element methods fail is using problem-dependent basis that incorporates properties of the coefficient a(x), to approximate the solution space. To be specific, one constructs basis functions φ 1 (x), φ 2 (x), . . . φ n (x) ∈ H 1 0 (Ω), (1.6) that may depend on the elliptic operator −div(a(x)∇(·)), and find numerical solution using the Galerkin projection. Namely, we find u h (x) within the trial space V h , such that The numerical solution defined above satisfies the following optimal property under the energy norm where the energy norm is equivalent to the H 1 0 (Ω) norm, and defined as ∥u(x)∥ 2 a = a(u(x), u(x)) = Ω ∇u(x) t a(x)∇u(x)dx. (1.11) In this work we will employ the above strategy to numerically solve (1.1). Note that to obtain the numerical solution u h (x) from the Galerkin projection (1.8), one needs to solve a linear system of size n × n. Thus to make the computational cost small, we want the number of the basis functions used in (1.6) to be small. Besides, we want the basis functions in (1.6) to have compact support such that the stiffness matrix formed in (1.8) is sparse thus easy to compute and invert.
We propose an effective method to construct basis functions (1.6) with optimal local approximation property in this paper. Our method is based on the compactness of the solution operator to (1.1) restricted on local regions of the domain. To be specific, we introduce the following operator where D i is local subset of Ω of size O(H), and H is chosen according to the desired order of accuracy. We construct local basis functions that can approximate the range of T i with controlled accuracy, and combine them together to get the approximate solution space to (1.1). The compactness of the operator T i , after removing the singular part, will be demonstrated numerically in section 2.
On each local region of the domain, D i , we divide the local solution u i (x) into two orthogonal parts with respect to energy norm (1.9): an a(x)-harmonic part, and a local bubble part. We show that the bubble part of the solution is small and its compact structure can be easily identified by inverting the elliptic equation (1.1) locally on each region D i . We consider approximating the a(x)-harmonic part of the solution space using a special type of basis functions that are a(x)-harmonic on each D i (but not across the boundary of D i ), and call basis functions of such type multi-scale basis. Due to the smallness of the bubble part of the solution, we demonstrate that multi-scale basis functions are optimal in approximating the solution space for fixed local boundary conditions on ∂D i .
The a(x)-harmonic part of the solution only depends on the restriction of the solution on the boundary of the local regions D i , and we seek to identify the compact structure of the trace of the solution space on ∂D i . Using a primary set of interpolation basis functions, ψ i (x), we can reduce our problem to approximating the solution space on each edge of the coarse mesh, e. We introduce an oversampling operator that maps the solution on an oversampling domain W to the solution restricted on an edge of the mesh e. Then we employ the compactness of the oversampling operator to construct the optimal boundary basis functions on each edge e. The optimal choice of the interpolation basis functions ψ i (x) can also be identified by solving least square problems. With these local boundary basis functions, we construct basis functions (1.6) that approximate the a(x)-harmonic part of the solution space by solving some local boundary value problems. The resulting basis functions (1.6) are a(x)-orthogonal with respect to the bubble part of the solution space, and because of this property we can add the bubble part back to our numerical solutions by simply solving some local cell problems.
The resulting method consists of two stages: in the offline stage we identify the local compact structure of the solution space, and build multi-scale basis functions and the corresponding stiffness matrix in (1.8); in the online stage, for any given forcing function f (x) ∈ L 2 (Ω), we solve the equation (1.1) efficiently using the multi-scale basis functions constructed offline with a very low computation cost. Our method can achieve significant computational savings in the multi-query setting where equation (1.1) need to be solved for multiple times with different forcing functions f (x) ∈ L 2 (Ω).
Numerical examples for several problems with rough coefficients and high-contrast channels are presented. Our method achieves high accuracy for these problems, and these numerical results suggest that our method is very robust and works equally well for problems without scale separation, or have high contrast channels. We demonstrate our methodology through the second-order scalar elliptic equation, but it can be applied to other linear elliptic problems like elasticity equations without much difficulty.
There is a fast growing literature on multi-scale methods for elliptic equations with rough coefficient, and we briefly review some related work below. The classical homogeneziation theory [7] considers a(x) with periodic structure a(x) = a(x, x ϵ ), and derives an effective equation governing the asymptotic behaviors of the solutions as the small scale ϵ goes to 0. In [21,18,22,14], the authors proposed the Multi-scale finite element method (MsFEM), in which the multi-scale basis functions are constructed by solving local elliptic problems with homogeneous right hand side, and the convergence analysis of MsFEM in the periodic setting was given in [18,15]. An oversampling technique [18] was proposed to reduce the resonance error introduced due to the artificial boundary conditions in solving the local problems. This present work can be viewed as a continuation of the MsFEM, and we propose a robust method to choose and enrich the local boundary conditions using the oversampling operator. In [27], the authors showed that the solutions gain an order of regularity with respect to the Harmonic coordinate [3], and constructed multi-scale basis functions using this property. The Harmonic coordinate was recently employed in [11] for global upscaling. In [8,28], the flux norm was introduced and employed to show the compactness of the solution space and construct (localized) basis functions. In [29] the polyharmonic spline was employed, which was later put in the Bayesian inference setting [26]. In [24], the generalized finite element method was proposed, which provides a general framework to combine local approximation spaces together using a partition of unity formulation. In [5,4], the local special basis functions are constructed by solving some local spectral problems, and then combined together using the partition of unity framework. The generalized multi-scale finite element method [12] is a systematic method to construct multi-scale basis functions for a family multi-scale problems with parameters. The Heterogeneous multi-scale method [1,30] numerically decomposes the structure of the medium into a micro-scale and a macro-scale, and compute solutions of cell problems on the micro-scale to get the local homogenized matrices. Finally we remake that multi-scale methods using problem-dependent basis functions were also employed to solve problems in which the coefficient has high contrast inclusions [9].
The remaining part of this paper is organized as follows. In section 2, we demonstrate the compactness of the solution operator restricted on local regions of the spatial domain. In section 3, we divide the solutions on each local region of the domain to different parts corresponding to the trace of the solution on the edges of the coarse mesh, and identify their compact structures separately. And with this compact structure, we construct multi-scale basis functions that have optimal local approximation property. In section 4, numerical results are presented to demonstrate the capacity of our method in identifying and exploiting the compactness of the solution space to achieve computational savings. Concluding remarks are made in section 5.

Compactness of the Solution Space Restricted on Local Regions of the Domain
The existence of finite number of basis functions (1.6) that can approximate the solutions space to (1.1) up to any accuracy is implied by the compactness of the solution operator, T , which maps from the forcing function f (x) ∈ L 2 (Ω) to the corresponding solution u(x) ∈ H 1 0 (Ω).
The compactness of T is demonstrated in [23,8], and it was employed for elliptic equations with random input data recently in [19,20] for stochastic model reduction.
To be specific, the solution operator T can be decomposed as where L −1 maps f (x) ∈ H −1 (Ω) to the solution u(x) ∈ H 1 0 (Ω), and I L 2 (Ω)→H −1 (Ω) is the embedding operator from L 2 (Ω) to H −1 (Ω). From (1.3), we can see that L −1 is continuous and indeed a homomorphism, and the compactness of I L 2 (Ω)→H −1 (Ω) is well known based on the Sobolev space theory [16]. Thus the compactness of T follows from the decomposition (2.2). To quantify the approximability of T by a finite-rank operator, we consider its Kolmogorov-n width, which is defined in below.
Definition 2.1 (Kolmogorov n-width). For a compact linear operator T that maps between two Hilbert spaces, we define its Kolmogorov n-width as where T n runs over all rank-n linear operators.
Due to the fact that L −1 is a homomorphism, one can easily see that the Komogorov-n width of T is only different from that of I L 2 (Ω)→H −1 (Ω) by a factor that depends on λ min , λ max (1.2) and Ω, cd n (I L 2 (Ω)→H −1 (Ω) ) ≤ d n (T ) ≤ Cd n (I L 2 (Ω)→H −1 (Ω) ). (2.4) The Kolmogorov-n width of the embedding operator is well-known [23,8], and we have The approximation property (2.6) is optimal, and does not depend on the regularity of the coefficient a(x). For practical applications in multi-scale problems, we want the basis functions φ i (x) to have local support such that the corresponding stiffness matrix in (1.8) is sparse and easy to invert. However, the basis functions in (2.6) whose existence is implied by (2.4), (2.5) and (2.6) may be nonlocal.
Since our objective is finding basis functions (1.6) with local support, we consider a local region of the domain, D with diameter O(H), and a slightly larger local domain which contains D, W , which we call the oversampling region. We consider the restriction of the solutions to (1.1) on W , (2.7) The local solution u W (x) can be divided into two parts, (2.10) We call the first part u 1 W (x) the local a(x)-harmonic part, and the second part u 2 W (x) the local bubble part. The two parts are orthogonal with respect the local inner product, a W (·, ·), The local bubble part u 2 W (x) is small in the sense that Then we consider a local solution operator T D that maps f (x) to the local a(x)-harmonic part, u 1 W (x) restricted on D, and we want to construct local basis functions on D that can approximate the range of T D . To demonstrate the compactness of T D , we choose a set of orthonormal basis in the domain and range of T D to discretize T D as a matrix, and compute the decay of its singular values. We consider the following choice of coefficient in (1.1), which has multiple fine spatial scales and is illustrated in Figure 1a, where   The fast decay of singular values of T D implies that we can use a very small number of local basis functions, to be specific, the first several left singular vectors of T D , to get very good local approximation property. However, we cannot afford to construct T D explicitly since it is a solution operator and its construction involves solving the equation (1.1) a large amount of times (globally). It is known that for a low-rank operator, the main action of T D can be captured in its image on some random vectors. This, to some degree, explains the success of some global upscaling methods [13,11,27] that use sampled global solutions to (1.1) to approximate the solution space to (1.1) locally.
We will not pursue this perspective in this work. Instead, we decompose T D using a global operator and a local oversampling operator, and construct local multi-scale basis functions employing the compactness of the oversampling operator. The resulting method does not involve any global solving of the equation (1.1), and will be detailed in the next section.
3 Identify the compact structure of the solution space using Oversampling In this section, we identify the compact structure of the local solution space through oversampling, and use it to construct basis functions (1.6). In our numerical examples, the domain Ω is chosen to be [0, 1] × [0, 1], and we discretize Ω using a coarse square mesh of size H, which should be chosen according to the desired order of accuracy. With this discetization, we have where D i have disjoint interiors. Underlying this coarse mesh, we use a triangle fine mesh of size O(h), which is a refinement of the coarse mesh. The fine mesh size h should be chosen such that it can resolve the small scale variation of the multi-scale coefficient in (1.1). In our method we solve the equation (1.1) on the coarse mesh, and the basis functions that we use are constructed and saved using linear basis functions on the fine mesh. The two level discretization is illustrated in Figure 2.

The multi-scale basis
We first introduce a special class of problem-dependent basis functions (1.6), which we call the multiscale basis.
Definition 3.1 (Multi-Scale basis). For a discretization of Ω (3.1), we consider basis functions If they are a(x)-harmonic on each coarse element of the coarse discretizaion, D j , then we call them multi-scale basis functions.
Clearly, the multi-scale basis functions are determined by their restrictions on the boundary of coarse elements ∂D i since they are a(x)-harmonic in each D i . We denote We have the following proposition, which implies that if the desired accuracy is O(H), multi-scale basis functions are optimal for fixed local boundary conditions on Γ (3.4).
Namely, if only O(H) accuracy in the energy norm is desired, the multi-scale basis can perform as well as other set of basis functions, given that the local boundary conditions of basis functions are the same.
To prove the above proposition, we first divide the solution u(x) to (1.1) into two parts. On each coarse mesh element D i , we consider and divide it to an a(x)-harmonic part and a local bubble part, as we did in (2.8), is the local a(x)-harmonic part, and u 2 i (x) is the local bubble part. Combine these local decompositions from all coarse elements D i together we get, One can see that the two parts u 1 (x), u 2 (x) are orthogonal with respect to the a(·, ·) inner product (1.9), Besides, the combination of the local bubble parts is small according to (2.12). To be specific, Next we prove the proposition 3.1.
Proof. Denote the numerical solution using Thus u 2 (x) is a-orthogonal to u 1 (x) − u ms h (x), and according to (3.11) we have Then we consider u e (x) = u(x) − u h (x), and divide it to two parts as we did for u(x) in (3.9). We get Consequently, we have According to (3.12), we have since they are equal on Γ and a(x)-harmonic on each D i .
Finally based on (3.15), (3.17), and the optimal property (1.10), we have and complete the proof.
As we have showed in (3.11), the bubble part of the solution u 2 (x) is small and of O(H) in the energy norm, thus can be neglected if the desired accuracy in the numerical solution is O(H). In our method in this work, we use multi-scale basis functions in (1.6) to approximate the solution space, which are locally a(x)-harmonic functions, and are a(x)-orthogonal to the bubble part of solution. Due to this a(x)-orthogonality and the Galerkin projection formulation in (1.8), multi-scale basis functions only approximate the a(x)-harmonic part of the solution and will not bring in additional errors in the bubble part. Thus we can recover the bubble part of solution u 2 (x) independently by solve some local bubble problems (2.10). And adding u 2 (x) back to u MS h (x), we can get numerical solution that is free of error in the bubble part. This is one of the advantages of using multi-scale basis functions in (1.6).
To construct local multi-scale basis functions, we divide the a(x)-harmonic part of the solution u 1 (x) into different parts corresponding to different edges of the coarse mesh, and approximate them separately.

Decomposition of the a(x)-harmonic part of the solution
To identify the compact structure of the a(x)-harmonic part of the solution, we first introduce a set of primary interpolation multi-scale basis ψ i (x), i = 1, . . . n based on the coarse mesh node points x 1 , x 2 , . . . x n , (3.20) We also require that ψ i (x) is supported on the four coarse elements around x i . For example, we can simply choose the multi-scale basis ψ i (x) to be linear on the boundaries of coarse elements. We will discuss about the optimal choice of these primary interpolation basis functions in subsection 3.4.
For f (x) ∈ L 2 (Ω), and the spatial dimension d = 2, 3, we have that u(x) is Hölder continuous on Ω [17], so we can consider the interpolation of u 1 (x), namely the a(x)-harmonic part of the solution, using the primary basis functions ψ j (x), and get the residual, For a coarse mesh element D i , we denote its four nodes points as x i1 , x i2 , x i3 and x i4 , then we get the restriction of the residual (3.21) on D i , (a) Decomposition of the interpolation residual.

Oversampling Region W
Target edge e D u D d x i1 x i2 x j1 x j2 x j3 x j4 x j5 x j6 x j7 x j8 x j9 x j10 Since the residual u 1 e (x)| Di vanishes on the node points x ij , j = 1, 2, 3, 4, we can divide the trace of u 1 e on ∂D i to four parts, corresponding to the four edges of D i , e 1 , e 2 , e 3 , e 4 , This decomposition is illustrated in Figure 3a. Each part in the above decomposition (3.23) belongs to H 1/2 (∂D i ) since they vanish on the node points, and we can extend them to D i to get four a(x)-harmonic components of u 1 e (x). We denote them as Combining these local decompositions together, we have where v e (x) is the a(x)-harmonic extension of the interpolation error on the edge e to its two neighbor elements. In the above decomposition (3.25), we are actually dividing the error u e i (x) in the a(x)-harmonic part of solution into different parts corresponding to errors on different edges e. This is possible since u e i (x) vanishes on the node points, thanks to the interpolation operation using ψ i (x) (3.21). We seek to construct boundary basis functions on each edge e that approximate v e (x), and combine them together to get the whole trial space. We introduce the following operator for the edge e with endpoints x i1 and x i2 , which maps f (x) ∈ L 2 (Ω) to the interpolation residual of the solution on e, (3.26) The left singular vectors of T e form the optimal local boundary basis functions. However, T e is a global operator and its construction involves solving the equation (1.1) globally. In the next section, we decompose T e as a global solution operator and a local oversampling operator, and construct boundary basis functions that approximate the range of T e through the oversampling operator.

The oversampling operator
To identify the compact structure of the solution space restricted on the edge e, we put it in an oversampling region that we denote by W . In our numerical examples, we use the square mesh of size H for the coarse discretization, and the oversampling region W is chosen as the union of the six elements around the edge e. It is illustrated in Figure 3b. We remark that our method is also applicable to other types of discretizations like triangular mesh. We denote the solution on W as u W (x).
We remark that the idea of identifying the local structure of the solution space by putting it in a larger region, namely oversampling, was first proposed in [18] to reduce the resonance error due to artificial local boundary conditions, and this strategy was later employed in [2,5,10].
We denote T W as the operator that maps f (x) ∈ L 2 (Ω) to the oversampling solution u W (x) = u(x)| W , and T W →e as the operator that maps u W (x) to the solution restricted on the edge e: (3.27) We also introduce the interpolation residual operator using (3.20), P e , With the above definitions, the operator T e (3.26) can be decomposed as We call the operator P e T W →e in the above decomposition (3.29) the oversampling operator, which maps the solution on W , u W (x) to the interpolation residual, which we denote by v e (x), where x i1 and x i2 are the two endpoints of e, and ψ i (x) is the primary interpolation basis (3.20).
The solution to (1.1) on W , u W (x) can be divided into two parts, the a(x)-harmonic part u 1 W (x) and the bubble part u 2 W (x). We employ the compactness of the oversampling operator (3.30) to construct basis functions in H 1/2 (e) that vanish at x i1 and x i2 , and approximate the range of (3.26). To be specific, we use the first several left singular vectors of P OS as the basis functions associated with e. We first introduce appropriate inner products for the domain and range space of P OS .
On the edge e, the image of T e , v e (x) ∈ H 1/2 (e) and vanishes on the two endpoints. We consider its a(x)-harmonic extension to the upper and lower coarse elements respectively, as showed in Figure 3b, and denote them as v u e (x) and v d e (x). Then we define In the domain of the operator P e T W →e , namely, u W (x), we define its inner product as (3.32) With the above inner products, we compute the singular value decomposition of the oversampling operator P OS . To discretize the domain of P OS , we consider its two parts, the a(x)-harmonic part, and the bubble part. The a(x)-harmonic part only depends on the trace of u W (x) on ∂W , we discretize H 1/2 (∂W ) using all the fine mesh piecewise linear functions. If ∂W intersects with ∂Ω, then we choose H 1/2 (∂W ) to be fine mesh basis functions that vanish on ∂Ω. The bubble part of the solution u 2 W (x) only depends on f W (x) = f (x)| W , and we discretize f (x) using piecewise constant functions on the coarse mesh. The following lemma justifies this discretization of f (x) ∈ L 2 (Ω). (3.33) According to (1.3), the above Lemma implies that if we want to obtain O(H) accuracy in the energy norm of the numerical solution, we can only consider piecewise constant forcing functions on the coarse mesh. If higher accuracy is desired in the numerical solution for f (x) ∈ L 2 (Ω), we can simply refine the discretization of L 2 (Ω) in the oversampling operator (3.30). We remark that if f (x) has higher regularity, we can get better approximation result in (3.33), for example (3.34) With the above discretization of u W (x), we truncate the singular values of P OS to O(H) and select the corresponding left singular vectors as the boundary basis functions e. We denote them as which vanish on the two endpoints of e. According to the oversampling operator in (3.29), and our truncation criteria, we have the following approximation property Then we extend these boundary basis functions to the two neighbourhood coarse elements D u and D d as a(x)-harmonic functions, by solving We have the following error estimate using the trial space (3.39).
The proof of the convergence result (3.40) follows directly from the decomposition of the solution operator (3.29) and the truncation in the singular value decomposition of the oversampling operator.
Proof. We choose c j e as the ones in (3.36), and denote Then we consider ∥u ms h (x) − u(x)∥ a , since the basis functions in (3.39) are multi-scale basis, namely, they are a(x)-harmonic on each D i , we have where u 1 (x) and u 2 (x) are the a(x)-harmonic part and bubble part of the solution u(x).
Then we divide ∥u ms h (x) − u 1 (x)∥ 2 a to different parts on D i . For each part, according to the approximation property (3.36), and the definition (3.31), we have Then using the optimal approximation property (1.10), we finish the proof.
To make the number of multi-scale basis functions in (1.6) small, we want the singular values of P OS = P e T W →e decay fast. P OS can be divided into two parts: the first part acts on the a(x)-harmonic part of u W (x), and we denote it as P 1 OS ; the second part acts on the bubble that only depends on f W (x), and we denote it as P 2 OS . For the first part, similar analysis has been done in [5] in a slightly different setting, and the method there also applies to our problem. We have the following result. OS that acts on a(x)-harmonic functions on W as σ k , then we have the following upper bound on the decay of σ k : for any ϵ > 0, there exist C such that where d is the dimension of the domain Ω.
The second part P 2 OS is small according to (2.12), and the decay rate of its singular values can be obtained from (2.5) using a simple scaling argument.
We will see in our numerical results section that the singular values of P OS decay very fast, and a very small number of boundary basis functions can achieve high local approximation accuracy.
As we have shown previously, the basis functions we obtain are multi-scale basis functions, thus are a(x)-orthogonal to the bubble part of the solution space. This gives us the flexibility to add the bubble parts back to the numerical solutions at local regions where higher accuracy is desired by simply solving some local bubble problems (2.10). In our truncation of the singular values of the local compact operator P e T W →e , we choose the threshold to be O(H), since O(H) accuracy is required in (3.40). If we need higher accuracy than H, for example, O(ϵ) where h ≪ ϵ ≪ H. (3.48) Then we can truncate the singular values of P e T W →e by ϵ, by doing which, the resulting multi-scale basis functions are able to approximate the a(x)-harmonic part of the solution space up to O(ϵ) accuracy. Then by adding back the bubble part of the solution u 2 (x) to the numerical solution u MS h (x), we can get O(ϵ) accuracy in our final numerical solutions. Namely, our upscaling strategy allows us to get arbitrarily high accuracy that is permitted by the fine mesh discretization.

Optimal primary interpolation basis functions
In constructing the multi-scale basis functions in the previous section, we need to choose a set of primary interpolation basis functions (3.20) first, which allows us to reduce the problem to approximating the solutions space restricted on each edge e, (3.29). The choice of these interpolation basis will affect the oversampling operator P OS = P e T W →e . In this subsection, we identify the optimal interpolation basis functions by solving local under-determined least square problems.
The oversampling operator for edge e, P OS , depends on the interpolation basis functions ψ i1 (x) and ψ i2 (x) associated with the two endpoints of e, and we seek optimal primary interpolation basis functions ψ i1 (x) and ψ i2 (x) that make the singular values of P OS have the fastest decay.
We consider the following optimization problem where the norm ∥ · ∥ a is defined in (3.32).
Under certain conditions, the optimization problem (3.49) has unique solutions, and the unique solutions are optimal interpolation basis functions. We have the following theorem.
be the solution to (3.49), and φ i1 (x) and φ i2 (x) be two other multi-scale interpolation basis functions. Denote the corresponding oversampling operators using these interpolation basis as (3.30) as P * OS and P OS , and let σ * i , i = 1, 2, . . . and σ i , i = 1, 2, . . . be their singular values. Assume that both σ * i and σ i are arranged in descending order. Then we have which implies the interpolation basis in (3.49) make the singular value of the oversampling operator (3.30) have the fastest decay.
The proof of Theorem 3.1 is given in the appendix.
Note that in the minimization problem (3.49), the two interpolation basis functions can be constructed independently by solving under-determined least square problems. By going over the oversample regions for each edge of the coarse mesh, we can construct the optimal interpolation basis functions (3.20) on all the boundaries of local regions. Then we can extend them locally to a(x)-harmonic functions by solving some local boundary value problems as in (3.37) to get the interpolation basis (3.20).
We will see in our numerical results section that, in some cases, the optimal interpolation basis functions (3.49) themselves are enough to approximate the solution space of (1.1). Namely, there is no need to construct the multi-scale basis functions associated with each edge (3.37).
We also remark that the optimization problem (3.49) can be interpreted as a Bayesian Inference problem as in [29,26,25]. Interested readers may consult the references for more information.

Implementation of the whole method
The proposed multi-scale finite element method consists of two stages, the offline stage and the online stage. In the offline stage, we identify the compact structure of the solution space. In the online stage, for a given forcing function, we compute the numerical solution using the offline basis functions.
The offline stage involves the following procedures, 1. Build the a(x)-harmonic extension operator on each oversampling region W .
On each oversampling region W , we build the local a(x)-harmonic extension operator, which maps the boundary condition which belongs to H 1/2 (∂W ) to a(x)-harmonic functions on W . This step requires solving a series of boundary value problems on W .
The a(x)-harmonic extension operator and the inner product (3.32) together induces a norm on H 1/2 (W ). We need this weighted norm in the SVD of the oversampling operator.
Discretize the domain of the oversampling operator P OS using the strategy described in section 3.3, and build the restriction operator that maps u W (x) to u W (x i1 ) and u W (x i2 ). Then we solve the optimization problem (3.49), which is essentially an under-determined least square problem.
Solving the optimization problem (3.49) on each oversampling region only gives us the boundary conditions of the primary interpolation basis functions, we need to solve local boundary value problems to get their local a(x)-harmonic extensions as the interpolation basis functions.

Build the oversampling operator.
Using the optimal interpolation basis functions we constructed in the previous step, we build the oversampling operator P OS , (3.3).

Construct the H 1/2 (e) norm.
Construct the H 1/2 (e) norm (3.31), which requires solving a series of local boundary value problems on the two neighbour coarse elements of e . 6. Compute the SVD of the oversampling operators.
Using the inner product (3.32) and (3.31) in the oversampling operator to compute its singular value decomposition. We truncate the singular values to ϵ, and save the corresponding left singular vectors, which are basis functions of H 1/2 (e), v 1 e (x), v 2 e (x), . . . v ke e (x).

Construct multi-scale basis functions associated with each edge e.
We extend the basis functions of H 1/2 (e), namely v i e (x), to the two neighbourhood coarse elements of e by solving local boundary value problems (3.37).

Compute the stiffness matrix.
Combining the multi-scale interpolation basis functions with the multi-scale basis functions associated with each edge of the coarse mesh, we get the trial space (3.51) Then we save these multi-scale basis functions, and compute the stiffness matrix, The online stage involves the following procedures, 1. Compute the load vector.
For a given forcing function f (x) ∈ L 2 (Ω), we compute the corresponding load vector 2. Compute the online numerical solution.
Using the load vector b (3.53) and stiffness matrix (3.52), we solve the linear system and get the online numerical solution on the fine mesh Recall that the multi-scale basis functions associated with the edges vanish on the coarse grid node points. So if we only want the large scale solution, we can simply select the coefficients in c that correspond to the primary multi-scale interpolation basis (3.20).

Recover the bubble part of the solution.
Solve the local boundary value problem (2.10) on each coarse mesh element D i , and get u 2 i (x). Combine these local bubble parts together and add them back to Galerkin solution (3.55), we get If O(H) accuracy is required in the numerical solution, this step is unnecessary since the bubble part does not impact the large scale properties of the solution.
Note that in the offline stage, we need to solve a series of boundary value problems for each edge of the coarse mesh to construct the oversampling operator (3.30), and then compute its singular value decomposition, which is computationally expensive. However, the constructions of multi-scale basis functions on each edge are independent from each other, thus the offline stage can be implemented on a parallel machine to accelerate the computation. In the online stage, the main computational cost comes from solving the linear system (3.54). Our numerical results in the next section suggest that a small number of basis functions are enough to obtain the coarse mesh accuracy, O(H), thus the linear system (3.54) is small and sparse. This implies that the online computation in our method is efficient, and our method can bring in significant computational savings in the multi-query setting, where the equation (1.1) needs to be solved for multiple times using different forcing functions.

Numerical Results
In this section, we present numerical examples that have multiple-scale features and high-contrast channels to demonstrate the capacity of our method in identifying and exploiting the compact structure of the local solution space to achieve computational savings (in the online stage). We discretize the domain of the problems Ω = [0, 1] × [0, 1] using a two level mesh as showed in Figure 2. The coarse mesh is of size H = 1/32, and the fine mesh is of size h = 1/1024.

An example with multiple spatial scales
The first example we consider is one that has multiple spatial scales. The coefficient is given by (2.14), and it is visualized in Figure 1a. To measure the error in our online numerical solution, we need to choose a reference solution. Since the multi-scale basis functions are constructed and saved on the fine mesh of size h, we will use the piecewise linear finite element solution on the fine mesh as the reference.
In the online stage, we choose the forcing function f (x, y) to be f (x, y) = 1, (x, y) ∈ Ω. (4.2) Recall that the basis functions that we use are a(x)-harmonic in each D i , and we can add back the bubble part of the solution by simply solving some local cell problems. Our online numerical solution, and the numerical solution after correction using the bubble part are plotted in Figure 4. We can see that the numerical errors in the online numerical solution is very small.  We measure the error of the numerical solution in the energy norm and L 2 norm. We denote the numerical solution as u MS H (x), and the corrected solution using the bubble part as u H (x). We compute The results are listed in Table 1. Then we consider only using the primary interpolation basis functions ψ i (x), i = 1, . . . N in the trial space (3.51), namely, we do not use the multi-scale basis functions associated with each edge of the coarse mesh. The error in the corresponding numerical solution is also listed in Table 1. We can that the relative error in L 2 is also small, which means the numerical solution can capture the large-scale property of the solution. However, the errors in the energy norm and L 2 norm are both significantly larger than that in (4.3), which implies the necessity of enriching the trial space using the multi-scale basis functions associated with the edges of the coarse mesh if higher accuracy is required.  To further demonstrate the convergence rate in (3.40) and (3.41), we consider a sequence of coarse mesh size with H = 2 −k , k = 3, 4, 5, 6, 7. For each H, we compute the error in the online numerical solution, the decay of the numerical error with H is plotted in Figure 5. From the plot, we can clearly see that the decay rates of the online numerical error agree with the estimates (3.40) and (3.41).

An example without scale-separation
In this subsection, we consider an example of the coefficient a(x) without scale-separation.
a(x, y) = |ã| + 0.5, (4.4) whereã is normally distributed on the node point of a mesh of size 1 128 . And a(x, y) on the fine mesh is obtained using piecewise linear interpolation. This coefficient a(x, y) is rough and has no clear scaleseparation. The configuration of the coefficient (4.4) is illustrated in Figure 6. We discretize the spatial domain Ω using a two level mesh as showed in Figure 2, and then we solve the optimization problem (3.49) and build the oversampling operator (3.30) on the fine mesh. We  truncate the singular value decomposition of the oversampling operator to ϵ = H. After the offline stage, the average number of multi-scale basis functions associated with each edge of the coarse mesh, (4.1), is k e ≈ 1.00. The smallness ofk e reflects the compactness of the local solution space, and the efficiency of our method in the offline stage since the stiffness matrix is small and sparse.
In the online stage, we choose the forcing function f (x) to be same as (4.2), and measure the error of the online numerical solution using the fine mesh solution as reference. The numerical solutions are plotted in Figure 7. We can see that the errors in the online numerical solutions are small. We measure the error (4.3) in the energy norm and the L 2 norm. The results are summarized in Table 2. Again we see that our method achieves very high accuracy in the online stage, which reflects that the good performance of our method does not depend on the scale-separation of the coefficient.

An example with high-contrast channels
In this subsection, we consider an example with high-contrast channels. The high contrast in the coefficient violates our uniform ellipticity assumption (1.2), and brings in additional difficulty. The coefficient that we consider here is the one with multiple scales (2.14) added with some high conductivity patches and channels. log 10 a(x) is plotted in Figure 8, which has very strong heterogeneity. We discretize the problem in the spatial domain as the previous two examples, and build the oversampling operator for each edge e of the coarse mesh. We truncate the singular value decomposition   1). Namely, on average, we use less than one multi-scale basis function for each edge of the coarse mesh, which implies the compactness of the solution space on local regions of the domain for this problem with high contrast coefficient.
In the online stage, we choose the forcing function to be (4.2), and use fine mesh solution as reference. The numerical errors are plotted in Figure 9. We can see that our numerical solutions have high accuracy and can capture the large-scale properties of the solution. The numerical errors of the solutions (4.3)  are listed in Table 3. Again, we see the high accuracy of our online numerical solution.

Concluding Remarks
In this paper, a novel multi-scale finite-element method is proposed, which is based on the compactness of the solution space on local regions of the spatial domain, and does not depend on any scale-separation or periodicity assumption of the coefficient. By introducing a set of primary interpolation basis functions, we reduce the problem of approximating the local solution space to approximating the trace of the solution on each edge of the coarse mesh. We construct basis functions for each edge of the coarse mesh separately employing an oversampling operator which is local and compact. The optimal primary interpolation basis is also identified by solving a series of under-determined least square problems.
The resulting method involves two stages: in the offline stage we identify the local compact structure of the solution space; in the offline stage we solve the problem efficiently exploiting this compact structure. The offline computation is expensive but it only requires solving local problems and is parallel in nature. We can rigorously control the error in the online numerical solutions by thresholding in the offline stage. The resulting basis functions are called multi-scale basis because they are a(x)-harmonic on each coarse element, which have optimal approximation property. Multi-scale basis functions are orthogonal to the bubble part of the solution space, and this gives us the flexibility to put back the bubble part of the solution in regions where high accuracy is desired.
Numerical results suggest that our method is very robust and can achieve high accuracy for the challenging problems without scale-seperation, or have high-contrast inclusions. e x i1 x i2 x i3 x i4 x i5 Oversampling Region W Figure 10: The oversampling region.

Proof of Theorem 3.1
Recall that the restriction operator P W →e (3.27), maps u(x)| W to H 1/2 (e) ∩ C α (e), P W →e : u W (x) → u e (x) ∈ H 1/2 (e) ∩ C α (e). (.1) And we have introduced the following inner product (3.32) on the domain of T W →e , which denote by Note that u e (x) does not necessarily vanish on the two endpoints of e, thus the inner product we defined in (3.31) does not apply to v e (x). We first extend u e (x) to the boundary of its two neighborhood coarse elements D u , D d , Figure 10. To be specific, we let u e (x) vanish on x i3 , x i4 , x i5 , x i6 , and be piecewise linear on ∂(D u ∪D d ). Then we solve local boundary value problems on D u and D d separately to extend u e (x) to D u ∪ D d as a(x)-harmonic functions. With this a(x)-harmonic extension to D u and D d , we introduce the following inner product, which agrees with (3.31) for u e (x) that vanishes on x i1 and x i2 . Next we consider P 1 and P 2 , which are the bounded linear functional that maps u W (x) ∈ V W to its values on x i1 and x i2 separately. We can easily see that P 1 and P 2 are linearly independent, thus there exist φ i1 (x) and φ i2 (x), such that P j (φ xi k (x)) = δ jk , j,k = 1, 2. (. We denote the intersection of the kernel of P 1 (x) and P 2 as V 0 W , which is closed subspace of V W . And we denote the projection of ψ i1 (x) and ψ i2 (x) to the orthogonal complement of V 0 W , (V 0 W ) ⊥ , as ψ * i1 (x) and ψ * i2 (x), namely, where the orthogonality is in the sense of (.2). Then we have ψ * ij (x ik ) = δ jk , ∥ψ ij (x)∥ VW ≥ ∥ψ * ij (x)∥ VW , j,k = 1, 2. (.6) And we finish the proof of the existence of unique solutions ψ * i1 (x) and ψ i2 (x) to the optimization problem (3.49). Next we proof the property (3.50).
We choose ψ * i1 (x)| e and ψ * i2 (x)| e as the interpolation basis in (3.3), and get the oversampling operator P * OS . For any other two interpolation basis functions, ψ i1 (x) and ψ i2 (x), we denote the corresponding oversampling operator as P OS , then we compare their singular values, σ * k , and σ k . We use the following characterization of singular values, ∥P OS u(x)∥ H 1/2 (e) .
( .7) where V k is a k-dimensional subspace of V W .