Regularization of a backwards parabolic equation by fractional operators

The backwards diffusion equation is one of the classical ill-posed inverse problems, related to a wide range of applications, and has been extensively studied over the last 50 years. One of the first methods was that of {\it quasireversibility\/} whereby the parabolic operator is replaced by a differential operator for which the backwards problem in time is well posed. This is in fact the direction we will take but will do so with a nonlocal operator; an equation of fractional order in time for which the backwards problem is known to be ``almost well posed.'' We shall look at various possible options and strategies but our conclusion for the best of these will exploit the linearity of the problem to break the inversion into distinct frequency bands and to use a different fractional order for each. The fractional exponents will be chosen using the discrepancy principle under the assumption we have an estimate of the noise level in the data. An analysis of the method is provided as are some illustrative numerical examples.

(Communicated by the associate editor name) Abstract. The backwards diffusion equation is one of the classical ill-posed inverse problems, related to a wide range of applications, and has been extensively studied over the last 50 years. One of the first methods was that of quasireversibility whereby the parabolic operator is replaced by a differential operator for which the backwards problem in time is well posed. This is in fact the direction we will take but will do so with a nonlocal operator; an equation of fractional order in time for which the backwards problem is known to be "almost well posed. " We shall look at various possible options and strategies but our conclusion for the best of these will exploit the linearity of the problem to break the inversion into distinct frequency bands and to use a different fractional order for each. The fractional exponents will be chosen using the discrepancy principle under the assumption we have an estimate of the noise level in the data. An analysis of the method is provided as are some illustrative numerical examples.
1. Introduction. The setting is in a bounded, simply connected domain Ω with smooth (C 2 ) boundary ∂Ω. L is a uniformly elliptic second order partial differential operator defined in Ω with sufficiently smooth coefficients and subject to boundary values on ∂Ω that, for simplicity, we take to be of homogeneous Dirichlet type so that the domain of L can be taken to be H 2 (Ω) ∩ H 1 0 (Ω). There is, however, a completely parallel situation if the boundary conditions are of impedance type.
Thus we have u t − Lu = 0, (x, t) ∈ Ω × (0, T ) u(x, t) = 0 (x, t) ∈ ∂Ω × (0, T ) where u 0 is unknown and has to be determined from the final time value for some T > 0 and a measured function g(x) taken over the domain Ω. This problem is well-known, and easily shown, to be extremely ill-posed. While often viewed as the "backwards heat problem," it in fact arises anytime a diffusion process has to be reversed and governs a wide variety of applications. Some of these would dictate an initial state not governed by a smooth function u 0 but one with significant information residing in the mid and high frequency bands. The reversal from a final state might not be through a strict time-process. An example here is the degradation of an image by a blurring process; the backwards problem becomes one of de-blurring. Indeed, the solution to (1) with (2) can be represented in the form g(x) = Ω K(x − y; T )u 0 (y) dy. This is a Fredholm integral equation of the first kind for the initial state u 0 and its inversion corresponds to deblurring from a perturbation of a Gaussian kernel.
Further recent application examples we wish to mention is identification of airborne contaminants [1] and imaging with acoustic or elastic waves in the presence of strong attenuation, which leads to a similar setting after reformulation as a first order in time system [2,9,23], arising, e.g., in photoacoustic tomography [22].
Given its physical importance, (1) with (2) has received considerable attention over the last sixty years and in the next section we review some of these approaches as they will have relevance to the main results of this paper.
The standard regularization technique to invert a compact operator is to replace it by a "nearby" operator with a bounded inverse and for the case of (1) this has been with either another differential operator or what is in effect a truncated singular value decomposition of the original. Our approach will be rather different; we seek to replace the parabolic equation (1) with a fractional subdiffusion operator whereby the time derivative u t becomes D α t u for some α, 0 < α < 1. D α t − Lu = 0, (x, t) ∈ Ω × (0, T ).
The rationale behind this lies in the fact that the parabolic equation arises from a diffusion model based on a Markov process in which the current state of the system is determined from only the previous state. The model based on (3) is non-Markovian and the value of the current state depends on all previous states; indeed these have to be retained in the solution of (3). Thus in contrast to the parabolic differential operator we are now using a non-local operator. This fact allows for a more transparent reversal in time. At the solution level the exponential function inherent in (1) is replaced by a Mittag Leffler function for the subdiffusion operator and the decay of this for large argument is only linear. Indeed, it has been shown, [31], that the backwards subdiffusion problem is only mildly ill-conditioned; equivalent to a two derivative loss in space. However, as we shall see there are several complexities involved and the replacement as a regularizer cannot be done without some care. On the other hand, the subdiffusion equation (3) is itself of considerable importance in applications. If D α t is represented by a single fractional exponent α then backwards inversion can be accomplished in a straightforward way. However, if there are multiple exponents involved, that is D α t = M i=1 q i D αi t or, more generally if D α t represents a fully distributed fractional derivative then the regularization techniques discussed in this paper are exactly the pattern that would have to be followed.
2. Quasi-Reversibility and Random Walk models. In this section we provide some background on why we view regularization of the backwards parabolic equation by replacing it by one of subdiffusion type is in a sense natural from a physical motivation standpoint.
Dating from the late 1960s the initial attack on the inverse diffusion problem (1), (2) was by the method of quasi-reversibility whereby the parabolic operator was replaced by a "nearby" differential operator for which the time reversal was well posed and the approach was popularized in the book by Lattes and Lions, [24]. Some examples suggested were adding a term ǫu tt so that the equation (1) was of hyperbolic type or a fourth order operator term ǫL 2 (thus converting the heat equation into the beam equation with lower order terms). The difficulty with both these perturbations is that the new operators require either further initial or further boundary conditions that are not transparently available. It should also be noted that the idea of adding a small, artificial term to a differential operator in order to improve the ill-conditioning of a numerical scheme, such as adding artificial viscosity to control the behaviour of shocks, is even older.
The quasi-reversibility approach by Showalter, [32,33,34] was to instead use the pseudoparabolic equation which has a natural setting of H 2 (Ω) ∩ H 1 0 (Ω) and subject to the single initial condition u ǫ (x, 0) = u 0 (x). There is an interesting history to this equation. It occurs independently in numerous applications such as a two-temperature theory of thermodynamics and flow in porous media [4,11,8] and is known in the Russian literature as an equation of Sobolev type. Of course, in these applications the additional term ǫL u t was part of the extended model and not added merely for a stabilizing effect.
The operator B ǫ := (I − ǫL) −1 L is a bounded operator on H 2 (Ω) ∩ H 1 0 (Ω) for ǫ > 0. Thus the full group of operators exp(−tB ǫ ) is easily defined by the power series (−t) n n! (B ǫ ) n under conditions on the resolvent R(λ, L) which are satisfied by any strongly elliptic operator. Under such conditions, exp(−tB ǫ ) converges to exp(−tL) in the strong topology as ǫ → 0 and is the basis of Yosida's proof of the Hille-Phillips-Yosida Theorem which shows the existence of semigroups of differential operators. There are known error estimates on the rate of this convergence. The quasi-reversibility step is to recover an approximation to u 0 (x) by computing u 0 (x; ǫ) = exp(−T B ǫ )g(x). One has to select an ǫ > 0, depending on the expected noise level δ in g and using u 0 (x; ǫ) as the approximation to the initial u 0 . Thus replacing the heat equation by the pseudoparabolic equation is a regularizing method for solving the backwards heat problem and well-studied in the literature.
Of course, there were other approaches. For example, a blending of the quasireversibility ideas with those of logarithmic convexity led Showalter to suggest that retaining the heat equation but introducing the quasi-boundary value in place of the final value (2), gives superior reconstructions. Several authors have followed this idea, for example, [10]. A summary article on some of this earlier work can be found in [3] and a more comprehensive discussion in the book [18]. Our approach in this paper will be to take a rather different regularizing equation; one of fractional order in time giving rise to a subdiffusion model. We give some background on this to show why it is feasible and, in a sense to be defined, natural.
The starting point for fractional calculus is the Abel fractional integral operator, ds . The former is the Riemann-Liouville derivative of order α and the latter is the Djrbashyan-Caputo derivative. Note that these are nonlocal operators and have a definite starting point a.
While obviously related, there are clear and important differences. The Riemann-Liouville version allows definition in a wider class of function spaces and this is important from an analysis perspective. One disadvantage from a differential equations viewpoint is how initial conditions should be interpreted. For example, the R-L derivative of a constant is nonzero; in fact it is unbounded at the origin a. The Djrbashyan-Caputo version has no such drawback and is thus more frequently seen in applications involving initial or boundary data. It is the one we will take in this paper. Since we will use t = 0 as the initial point throughout, we will simply write ∂ α t to denote C 0 D α t . This fractional derivative was studied extensively by the Armenian mathematician M . M. Djrbashyan, in his 1966 book (in Russian); an English translation of this appeared in 1993, [12]. However, there was a considerable amount of earlier work on the topic but only available in the Russian literature. The geophysicist Michele Caputo rediscovered this version of the fractional derivative in (1967), [6], as a tool for understanding seismological phenomena, and later with Francesco Mainardi in viscoelasticity where the memory effect of these derivatives was crucial, [7].
In addition to sharing the same initial/boundary conditions, the fractional diffusion equation has additional connections with the parabolic operator u t − Lu which will be useful for subsequent understanding and which we now describe below. The heat equation can be viewed as the macroscopic limit of the basic continuous time random walk (ctrw) process where after each time step δt a random direction is chosen and the walker moves in that direction a length δx. If δt, δx → 0 such that the ratio K = (δx) 2 /δt is held constant, then it is easily seen that the heat equation u t − K△u = 0 ensues. The value of K, the diffusion constant, couples the space and time scales. In the more general situation, one assumes that the temporal and spatial increments ∆t n = t n − t n−1 and ∆x n = x n − x n−1 are independent, identically distributed random variables, following probability density functions ψ(t) and λ(x), respectively, which are the waiting time and jump length distributions, respectively. Thus the probability of ∆t n lying in an interval [a, b] is P (a < ∆t n < b) = b a ψ(t) dt and the probability of ∆x n lying in an interval [c, d] is P (c < ∆x n < d) =  [29], that the ctrw framework recovers the classical diffusion equation, as long as the waiting time pdf ψ(t) has a finite mean and the jump length pdf λ(x) has finite first and second moments. Thus this more general setting case reduces to the basic Gaussian process described by the fundamental solution of the heat equation. This is a realization of the Central Limit Theorem.
On the other hand, if the mean waiting time τ is infinite which could occur, for example, when the particle might be trapped in a certain potential well, then we could, for example, have a waiting time pdf with the asymptotic behavior ψ(t) ∼ A t 1+α as t → ∞, for some α ∈ (0, 1), and A > 0. The (asymptotic) power law decay is heavy tailed and allows occasional very large waiting time between consecutive walks. The closer α is to zero, the slower is the decay and more likely a long waiting time will take place.
It turns out that this changes the dynamics of the stochastic process completely. Assuming a fixed spatial step size ∆x, the combined pdf p(x, t) in the physical domain is now given by where M α (z) is a particular version of the Wright function to be described in the next section and the diffusion coefficient K α is again a coupling between the spatial and temporal scales. Taking α → 1 in (7) recovers the fundamental solution of the heat equation. It also can be shown that (7) is the fundamental solution of the subdiffusion operator (6). This ties in the fact that the fractional diffusion equation (6) results from a ctrw with a temporal pdf given by the above asymptotic behaviour. For classical (Brownian) motion the mean square deviation of the particle from its starting point obeys the relation x 2 ∝ t whereas the subdiffusion model gives x 2 ∝ t α . For some direct applications involving the subdiffusion process see, for example, [35].
In the above analysis we could have assumed a finite waiting time but dropped the assumption of finite variance on the pdf λ. This indeed leads to a fractional derivative in space but we shall not use this approach. There are alternative ways to define a space fractional derivative that will better suit our regularization purpose and we will briefly introduce one standard approach in Section 3.4 as it will have relevance to our analysis of regularization operators.
As well as the above tie in between the parabolic and subdiffusion equations there are some fundamental differences. The most important of these from our current perspective is the fact that the decay of the solution of (1) is exponential in time leading to severe ill-conditioning of the backwards problem. On the other hand that of (6) is only linear decay in time and resulting in the backwards problem being only very mildly ill-conditioned. There are many caveats and details that must be resolved but the basic principle behind the regularization of the backwards parabolic by the backwards fractional diffusion equation relies on this key fact.
The important point we wish to stress is the fact that considering fractional order operators is natural in the sense that they also arise from similar random walk processes just as in the parabolic case. Their distinguishing features give rise to exactly the properties that we desire in our regularizing equation: the nonlocal fractional operator "stores" all previous time values and this history record enables a feasible backwards in time reconstruction.
We should point out that in the discussion of random walks we needed to correlate the space and time scales through a diffusion constant K. This is incorporated into the leading coefficients of the operator L, but if we had L = △ then an explicit K would have to be brought in through K△. The units of K are distance 2 /time and even for excellent conductors such as metals this is typically quite small, of the order of 10 −5 . Thus in our scaling of 1 we should consider the presence of the coupled values KT . By scaling K to unity we are in fact scaling the values of the final time T . Thus values of T in the paper of the order of 10 −2 − 10 −3 actually represent fairly long waiting times.
In the next section we shall provide some necessary background information on the key special functions needed for the subdiffusion operator; those of Mittag-Leffler and of Wright. We will also look at fractional powers of elliptic operators. Finally, some background information on regularization methods, in particular the discrepancy principle for choosing regularization parameters, is provided. Section 4 will describe various regularization strategies for the recovery of u 0 in (1) subject to (2) that rely on fractional derivatives and Section 6 provides a convergence analysis for some of them. Section 5 will show some numerical reconstructions based on the algorithms presented in Section 4.

Fractional operators and regularization parameter choice.
Here we collect background material to be used in the following sections. First we describe the main function of fractional calculus, the Mittag-Leffler function, as well as the Wright function needed for a description of the fundamental solution of the fractional subdiffusion operator. Second, we introduce some notation for fractional derivatives and collect a few basic lemmas that will be needed for our analysis and to obtain a representation theorem for the subdiffusion operator which will be the core of our regularization methods.
3.1. The Mittag-Leffler function E α,1 (−x). An essential component of fractional derivative formulations is the two-parameter Mittag-Leffler function E α,β (z) defined by This generalizes the exponential function ubiquitous to classical diffusion; The following lemmata will be needed, these can be found in many sources including [12,14,20].
From Lemma 3.3 we obtain the stability estimate . Proof. The first inequality in (13) is an immediate consequence of the lower bound in Lemma 3.3. The second inequality can be obtained by using the reflection formula Γ(z)Γ(1 − z) = π sin(πz) , which allows to estimate . This lemma when taken together with filtering of the data with some function f γ so that λf γ (λ) ≤ C γ implies a bound on the noise propagation in time fractional reconstruction of u 0 . The fact that the noise amplification grows only linearly with 1 1−α as α → 1 is one of the key facts that renders fractional backwards diffusion an attractive regularizing method.

BARBARA KALTENBACHER AND WILLIAM RUNDELL
We proceed by deriving an estimate of the the L p norm of w of the form with C > 0 independent of α and λ.
The reason for the importance of this function in subdiffusion lies in the fact that the Laplace transform of a Wright function is a Mittag-Leffler function Of course this is really used in reverse to obtain the inverse Laplace transform of a certain Mittag-Leffler function.
One case of the Wright function relevant to fractional diffusion is the following M -Wright function, [28]

Solution of the subdiffusion equation.
Combining the lemmas in sections 3.1 and 3.2 we obtain the following results Lemma 3.8. The initial value problem for the fractional ordinary differential equation ∂ α t u + λu = 0 for x > 0 and 0 < α < 1 with u(0) = 1, has solution u(t) given by .
The solution satisfies From the above we easily obtain by separation of variables and using the eigenvalues and -functions The solution of (6) is given by By taking Fourier transforms in space and Laplace in time using the above lemmas give, [28] Lemma 3.10. The fundamental solution G α (x, t) is given by Note that for α ∈ (0, 1), for every t > 0, the function x → G α (x, t) is not differentiable at x = 0; in fact it fails to be Lipschitz at x = 0.
The limited smoothness of the fundamental solution results in limited smoothness of the subdiffusion equation. The following result, [31], is critical Lemma 3.11. Let 0 < α < 1 and u 0 ∈ L 2 (Ω). Then there exists a unique weak 3.4. Space fractional derivatives. While one can use derivatives based on the Abel integral for space variables there is also a considerable literature on fractional powers of operators. For example, the Fourier transform of −△ defined on R n has symbol ξ 2 and hence the fractional power of order β of −△ can be defined as the pseudodifferential operator whose symbol is ξ β . In the case of bounded domains Ω we can proceed as follows.
We define an operator Since A is a self-adjoint, uniformly elliptic operator, the spectrum of A is entirely composed of eigenvalues and counting according to the multiplicities, we can set 0 < λ 1 ≤ λ 2 . . .. By φ j ∈ H 2 (Ω) ∩ H 1 0 (Ω), we denote the L 2 (Ω) orthonormal eigenfunctions corresponding to λ j . Then from [21], the fractional power A β is defined for any β ∈ R by Next we introduce a spaceḢ s (Ω) bẏ By definition, we have the following equivalent form: The standard pseudoparabolic equation can be generalized to elliptic operators A and B not necessarily of the same order. Continuing with same structure, assuming that both A and B are positive operators in the sense that Ax, x > 0, Bx, x > 0 we form the equation If both operators are of the same order we have a straightforward perturbation of the pseudoparabolic equation. If the order of A is greater than that of B then C ǫ := (I + ǫA) −1 B will be bounded (in fact compact) on L 2 and a full group e −tCǫ will result. Conversely, if the order of B is greater than that of A then C ǫ is unbounded and we will obtain a semigroup once again from C ǫ in the limit as ǫ → 0. Our interest here is in the case that B = −L and A = (−L) β ; that is a fractional power of −L. ( This β−pseudoparabolic equation is no longer a regularizer for the backwards parabolic equation u t + Lu = 0 if β < 1 but we expect it to have partial regularizing properties and the exploration of this will be studied in the next sections. Finally, we can combine both space and time fractional derivatives to obtain Define µ n by µ n = λn Then the solution to (28) has the representation while the solution to equation (29) has the representation 3.5. The Morozov Discrepancy Principle. As in every regularization method, certain parameters have to be chosen appropriately as part of a trade-off such that on one hand the ill-posed problem is stabilized, but on the other the approximation error arising from the modification of the problem by the addition of the stabilizing terms does not become too large. Regularization parameters appearing in the methods considered in this paper are, for example: the fractional orders α and β of the time or space derivatives, respectively; the multiplier ǫ in these pseudoparabolic equation; and later in the paper, the indices K i at which we split the frequency band for treatment with different methods. There exists a large body of literature on regularization parameter choices; an overview on regularization parameter choice rules with many relevant references can be found in [13,Chapter 4], [16,Chapter 7], and more recently, in [26,Chapters 2,3].
In this paper, we will follow a rather classical, but also versatile, paradigm for regularization parameter choice, namely the discrepancy principle. This dates back to Morozov's well-known paper [30]. The idea is to choose, out of a family of regularized problems, the most stable one such that the residual is of the order of magnitude of the expected noise level. In the context of, e.g. subdiffusion regularization u δ 0 (x; α) = u(x, 0), where u solves (6) with given noisy final data g δ , the discrepancy principle requires one to choose α such that the difference between the final data simulated from the reconstruction exp(LT )u δ 0 (x; α) differs from the noisy data g δ by not more than the noise level δ, while α is kept as far away as possibly from the critical valueα = 1 where δ is an estimate on the L 2 norm of the noise, cf. (32). We will actually apply this in a relaxed, easier to compute manner, and to a smoothed version of the data.
Of course a crucial point here is knowledge of the noise level (or of a good estimate on it), which is admittedly not available in some applications. On the other, whenever δ is known, the discrepancy can often be proven to yield a convergent regularization method, even one with optimal convergence rates.
/bin/bash: a: command not found 4. Regularization strategies. We have outlined several possible candidates for a quasi-reversible regularizer for the backwards heat equation. In this section we provide an overall strategy and look at how individual regularizing equations fit in. The ultimate idea is to split the problem into distinct frequency bands and then combine to recover the value of u 0 (x). This is feasible since the mapping u 0 → g(x) is linear. Such a strategy is of course not new for this problem but the key is to recognize that each quasi-reversible component that we have described will perform differently over each frequency band and the problem is how to make the most effective combination.
Throughout we assume that the final value g(x) has been measured subject to a noise level, the magnitude of which we know. Clearly, knowing further information such as some of the moments of the probability density function of the noise is desirable, but we will simply assume that it has mean zero and a known maximum expected value, which leads to the deterministic noise bound with given δ > 0.

4.1.
Using a subdiffusion regularization. Perhaps the simplest possibility of regularization by a subdiffusion process is to replace the time derivative in (1) by one of fractional order ∂ α t , relying on the stability estimate from Lemma 3.4. However, there are two obstacles to this.
First, from (3.11) there is still some smoothing of the subdiffusion operator and the actual final value at t = T will lie inḢ 2 (Ω). The subdiffusion equation still decays to zero for large T and indeed the amplification factor A f rac (k, α) connecting the Fourier coefficients is and thus grows linearly in λ k .
Thus we must formg δ (x) the projection of the data g δ (x) ontoḢ 2 (Ω), in order that the amplification remain bounded for all λ k . This is easily accomplished by some conventional regularization method. Tikhonov regularization (or iterated versions of it) is not appropriate for this purpose, since due to its saturation at a finite smoothness level, it would not be able to optimally exploit the fact that we deal with infinitely smooth exact data g. Thus we employ Landweber iteration for this purpose which by setting w (i) = −Lp (i) can be interpreted as a gradient descent method for the minimization problem min p∈L 2 (Ω) and setg δ = w (i * ) for some appropriately chosen index i * . In practice we use the discrepancy principle for this purpose, while the convergence result in Lemma 6.1 employs an a priori choice of i * . In (35), the step size µ is assumed to satisfy Second, one has to check that none of the amplification coefficients in (34) exceeds that for the heat equation itself. However, as shown in [20] for any value of T there exists an N such that all amplification factors A f rac (k, α) in (34) exceed those of the parabolic problem A par (k) := e λ k T for k ≤ N . In this sense the low frequencies are more difficult to recover by means of the regularizing subdiffusion equation than by the parabolic equation itself. Of course for large values of λ the situation reverses as the Mittag-Leffler function decays only linearly for large argument. Figure 1 shows the plots of A f rac (k, α) for α = 0.5, 0.9, 1.  Thus we must modify the reconstruction scheme and there are several possibilities of which we describe two.

4.2.
Adding a fractional time derivative to the diffusion equation. The first of these is to take a multi-term fractional derivative replacing (6) by We must show that the solution u ǫ to (36) and subject to the same initial condition converges in L 2 (Ω) × (0, T ) to the solution of (1) as ǫ → 0. Equation (36) is a specific case of the more general multiterm fractional diffusion operator In this case the Mittag-Leffler function must be replaced by the multiterm version, [27,25] with considerable additional complications although the theory is now well-understood. While one can use (37) the complexity here arises from the 2M coefficients {q j , α j } that would have to be determined as part of the regularization process. Thus we restrict our attention to equation (36). We can calculate the fundamental solution to (36) as follows. First we consider the relaxation equation Taking Laplace transforms {t → s} we obtain Now the imaginary part of s + ǫs α + λ does not vanish if s is not real and positive so that the inversion of the Laplace transform can be accomplished by deforming the original vertical Bromwich path into a Hankel path H η surrounding the branch cut on the negative real axis and a small circle of radius η centre the origin. See, Chapter 4 of [15]. This gives 2πi Hη e st 1 + ǫs α−1 s + ǫs α + λ ds and as η → 0 we obtain H(r) = − 1 π Im 1+ǫs α−1 s+ǫs α +λ s=re iπ . Multiplying both numerator and denominator by the complex conjugate of s + ǫs α + λ gives 1 + ǫr α−1 e (α−1)πi (λ − r) + ǫr α e −απi (λ − r) 2 + 2ǫ(λ − r)r α cos(απ) + ǫ 2 r 2α = (λ − r) − ǫ 2 r 2α−1 + 2ǫr α cos(απ) − ǫλr α−1 e απi (λ − r) 2 + 2ǫ(λ − r)r α cos(απ) + ǫ 2 r 2α .
For ǫ > 0 and 0 < α < 1, H is strictly positive showing that the fundamental solution of the initial value problem (38) is also a completely monotone function. We can use the above to obtain the solution representation to equation (37) which becomes a potential regularizer for the backwards heat problem. Tauberian results for the Laplace transformf (s) of a sufficiently smooth function f (τ ) show that lim τ →∞ f (τ ) = lim s→0 +f (s). If in (41) we make the change of variables r → λρ, H(r, ·) →H(ρ, ·) then consider lim ρ→0 +H (ρ). We obtainH(ρ) ∼ ǫ sin(απ) π λ α−1 ρ α−1 . Now hold t = T fixed in (40) and we see that This indicates that the combined asymptotic behaviour of the two fractional terms in (36) defers to that of the lower fractional index, here α. This is in fact known even for the general multiterm case (37), see [25] . Thus given we have made the prior regularization of the data by mapping it intȯ H 2 (Ω), equation (36) will be a regularization method for the diffusion equation for ǫ > 0. The question then becomes how effective it performs.
The answer is, quite poorly and (43) shows why. If ǫ is very small then the asymptotic decay of the singular values of the map F : u 0 → g is again too great and the combination of the two derivatives is insufficent to control the high frequencies. This is particularly true the closer α is to unity. On the other hand, for lower frequency values of λ, the fractional derivative term plays a considerable role and the greater with increasing ǫ and decreasing α. Thus one is forced to select regularizing constants α and ǫ that will either decrease fidelity at the lower frequencies or fail to adequately control the high frequencies.
There is a partial solution to the above situation by taking instead of (36) the balanced version This ameliorates to some degree the concern at lower frequencies but has little effect at the higher frequencies.
We will not dwell on this version or its above modification as there are superior alternatives as will see in the next subsection. However, the lessons learned in the previous two versions shows the way to achieve both goals; low frequency fidelity and high frequency control.

Using split-frequencies.
Another alternative is to modify the reconstruction scheme as follows: for frequencies k ≤ K we recover the Fourier coefficients of u 0 by simply inverting the parabolic equation as is, using A par (k) and for frequencies k > K we use A f rac (k, α) defined in (34), i.e., The question remains how to pick K and α. For this purpose, we use the discrepancy principle: in both cases using the assumption on the noise level in g δ and its smoothed versiong δ , respectively. More precisely, we first of all apply the discrepancy principle to find K, which -according to existing results on truncated singular value expansion, see for example, [13] -gives an order-optimal (with respect to the L 2 norm) low frequency reconstruction u δ 0,lf (·, K). Then we aim at improving this reconstruction by adding higher frequency components that cannot be recovered by the pure backwards heat equation, which is enabled by a subdiffusion regularization acting only on these frequencies. The exponent α acts as a regularization parameter that is again chosen by the discrepancy principle. We refer to Section 6.2 for details on this procedure. This works remarkably well for a wide range of functions u 0 . It works less well if the initial value contains a significant amount of mid-level frequencies as well as those of low and high order. In this case the split-frequency idea can be adapted as follows.
As above we determine the value of K 1 = K using the discrepancy principle; this is the largest frequency mode that can be inverted using the parabolic amplification A par (k) given the noise level δ. We then estimate K 2 which will be the boundary between the mid and high frequencies. With this estimate we again use the discrepancy principle to determine the optimal α 2 where we will use A f rac (k, α 2 ) for those frequencies above K 2 to recover the Fourier coefficients of u 0 for k > K 2 . In practice we set a maximum frequency value K ℓ . By taking various values of K 2 we perform the above to obtain the overall best fit in the above scheme.
To regularize the mid frequency range we again use the subdiffusion equation with α = α 1 and choose this parameter by again using the discrepancy principle. I.e., we set Thus we solve the backwards diffusion equation in three frequency ranges (1, K 1 ), (K 1 , K 2 ) and (K 2 , K ℓ ) using (34) with α = 1, α = α 1 and α 2 .
We remark that this process could be extended whereby we split the frequencies into [1, K 1 ], (K 1 , K 2 ], . . . (K ℓ−1 , K ℓ ) and use the discrepancy principle to obtain a sequence of values α = 1, α 1 , . . . α m . We found that in general the values of α i decreased with increasing frequency. This is to be expected; although the asymptotic order of E α,1 is the same for all α < 1 the associated constant is not; larger values of α correspond a larger constant and give higher fidelity with the heat equation as Lemmas 3.4 and 3.5 show.

4.4.
Using space fractional regularization. The idea of the previous subsection can be carried over to fractional operators in space. Once again we look for frequency cut-off values and we illustrate with 3 levels, so we have K 1 and K 2 as above. The regularizing equation will be the β−pseudoparabolic as in (28). For the lowest frequency interval we choose ǫ = 0 so that we are simply again inverting the parabolic. for the mid range k ∈ (K 1 , K 2 ] we take β = 0.5 and for the high frequencies k ∈ (K 2 , K ℓ ] we use β = 1 so that we have the usual pseudoparabolic equation. This is a regularizer in L 2 (Ω) and so there is no need for the preliminary mapping of the data intoḢ 2 (Ω).
In each interval we compute the value of ǫ from the discrepancy principle and invert the corresponding amplification factors to recover u 0 from (30).
Variations are possible and in particular reserving the β−pseudoparabolic equation for mid-range frequencies and using a subdiffusion equation for the regularization of the high frequencies as in Section 4.3.

5.
Reconstructions. In this section we will show a few illustrative examples for L = △ in one space dimension based on inversion using the split-frequency model incorporating fractional diffusion operators since overall these gave the best reconstructions of the initial data. Comparisons between different methods is always subject to the possibility that, given almost any inversion method, one can construct an initial function u 0 that will reconstruct well for that method. As noted in the previous section we did find (36) or its modification (44) to be competitive. The β-pseudoparabolic equation (28) when used only for mid range frequencies and with a subdiffusion operator for the high frequencies can give comparable results to the double-split fractional. However, the difficulty lies in determining the pair of constants β and ǫ. It turns out that the optimal reconstruction using the discrepancy principle is not sensitive to β in the range 1 2 ≤ β ≤ 3 4 or even beyond, but it is sensitive to the choice of ǫ.
We have taken two noise levels δ on the data g(x) = u(x, T ) at which to show recovery of u 0 ; δ = 1% and δ = 0.1%. These may seem a low noise level but one must understand the high degree of ill-posedness of the problem and the fact that high Fourier modes very quickly become damped beyond any reasonable measurement level. One is reminded here of the quote by Lanczos, "lack of information cannot be remedied by any mathematical trickery" We have also taken the final time T to be T = 0.02. As noted in the introduction concerning equation scaling we in reality have the combination Kt for the parabolic equation in (1) where K is typically quite small. Since we have set K = 1 here our choice of final time is actually rather long. A decrease in our T by a factor of ten would result in much superior reconstructions for the same level of data noise.
Our first example is of a smooth function except for a discontinuity in its derivative near the rightmost endpoint so that recovery of relatively high frequency information is required in order to resolve this feature. Figure 2 shows the actual function u 0 together with reconstructions from both the single split-frequency method and with a double splitting. One sees the slight but significant resolution increase for the latter method. In the single split method the discrepancy principle chose K 1 = 4 and α = 0.92; for the double split K 1 = 4, K 2 = 10 and α 1 = 0.999, α 2 = 0.92.
It is worth noting that if we had to increase the noise to δ = 0.01 then not only would the reconstruction degrade but would do so more clearly near the singularity in the derivative. However, of more interest is the fact that the reconstructions from both methods would be identical. The discrepancy principle detects there is insufficient information for a second splitting, so that K 2 is taken to be equal to K 1 .
As a second example we chose a function made up by setting its Fourier coefficients and choosing these so that the first 7 are all around unity as are those in the range from 10 to 15. The reconstructions shown in Figure 3 are using the triple interval split frequency and as a comparison a truncated singular value decomposition from the parabolic equation with the parameters chosen again by the discrepancy  principle. As the figure shows, the svd reconstruction can only approximate the low frequency information in the initial state whereas the split-frequency model manages to capture significantly more. Note that the reconstruction here is better at those places where u 0 has larger magnitude.
If we had to reduce the noise level to δ = 0.001 or reduce the value of T , this difference would have been even more apparent. If we included the single split frequency reconstruction it would show a significant improvement over the svd but clearly poorer than the split into three bands. Indeed, a similar instance of u 0 benefits from a further splitting of frequency bands beyond the three level, see    Remark 1. In higher space dimensions, for special geometries, the eigenfunctions and -values of −△ can be computed analytically and everything would proceed as described. For more complex geometries one can rely on a numerical solver and there are many possibilities here, that even would include the multi-term fractional order derivative and more general elliptic operators L, see [19] and references within, or for the combined subdiffusion and fractional operator in space (29), [5]. The linearity of the problem with respect to u 0 would still allow a decomposition into frequency bands as described for the case in this section. Note that only a very limited number of eigenfunctions and -values is required for the reconstruction, since the high frequency part can be tackled directly via the fractional PDE of temporal order α 2 .
6. Convergence analysis. The goal of this section is to provide a convergence analysis in the sense of regularization, i.e., as the noise level δ tends to zero, for the split frequency approach from Section 4.3. Here the order of time differentiation α acts as a regularization parameter As a preliminary result we will show convergence of the subdiffusion regularization (33), (34), both with an a priori choice of α and with the discrepancy principle for choosing α.
6.1. Simple subdiffusion regularization. We first of all consider approximate reconstruction of u 0 as that is, the initial data of the solution u to the subdiffusion equation of order α in equation (6) with final datag δ anḢ 2 (Ω) smoothed version of the given final data and wherec δ k are its Fourier coefficients. We will show that u δ 0 (·; α) → u 0 in L 2 (Ω) as δ → 0 provided α = α(δ) is appropriately chosen. Note that the relation c k = e −λ k T a k holds, which corresponds to the identity g = exp(LT )u 0 .
Since most of the proofs here will be set in Fourier space, we recall the notation where {φ n } are eigenfunctions of −L on Ω with homogeneous Dirichlet boundary conditions and {λ n } are the corresponding eigenvalues enumerated according to value.
As a preliminary step, we provide a result onḢ 2 (Ω) (more generally,Ḣ 2s (Ω)) smoothing of L 2 data, which, in view of H 2 − L 2 -wellposedness of time fractional backwards diffusion, is obviously a crucial ingredient of regularization by backwards subdiffusion.
Recall the above mentioned Landweber iteration for definingg δ = w (i * ) where with s ≥ 1 and µ > 0 chosen so that A L 2 →L 2 ≤ 1.
for some C 1 , C 2 > 0 independent of T and δ.
The proof can be found in the Appendix. Existing results on convergence of Landweber iteration do not apply here due to the infinite order smoothness of the function we are smoothing; more precisely, the fact that it satisfies a source condition with an exponentially decaying index function.
Concerning convergence rates, observe, first of all, that the rate and stability estimates (14), (18) yields the following convergence rate for the time fractional reconstruction in case of very smooth data u 0 and noise free data.
where a k are the Fourier coefficients of the initial data and hence the right hand side is a very strong norm of u 0 . Convergence rates under weaker norm bounds on u 0 and with noisy data can be obtained similarly to [17] by means of Jensen's inequality and an appropriate choice of α.
In the noise free case we have It is readily checked that ϕ is convex and strictly monotonically increasing, and that the values of its inverse can be estimated as follows where C q > 0 is chosen such that log(z) ≤ C q z 1 2q for z ≥ 1 B .