Backward problem for a time-space fractional diffusion equation

In this paper, a backward problem for a time-space fractional diffusion process has been considered. For this problem, we propose to construct the initial function by minimizing data residual error in Fourier space domain with variable total variation (TV) regularizing term which can protect the edges as TV regularizing term and reduce staircasing effect. The well-posedness of this optimization problem is obtained under a very general setting. Actually, we rewrite the time-space fractional diffusion equation as an abstract fractional differential equation and deduce our results by using fractional operator semigroup theory, hence, our theoretical results can be applied to other backward problems for the differential equations with more general fractional operator. Then a modified Bregman iterative algorithm has been proposed to approximate the minimizer. The new features of this algorithm is that the regularizing term altered in each step and we need not to solve the complex Euler-Lagrange equation of variable TV regularizing term (just need to solve a simple Euler-Lagrange equation). The convergence of this algorithm and the strategy of choosing parameters are also obtained. Numerical implementations are provided to support our theoretical analysis to show the flexibility of our minimization model.


(Communicated by Hao-Min Zhou)
Abstract. In this paper, a backward problem for a time-space fractional diffusion process has been considered. For this problem, we propose to construct the initial function by minimizing data residual error in Fourier space domain with variable total variation (TV) regularizing term which can protect the edges as TV regularizing term and reduce staircasing effect. The well-posedness of this optimization problem is obtained under a very general setting. Actually, we rewrite the time-space fractional diffusion equation as an abstract fractional differential equation and deduce our results by using fractional operator semigroup theory, hence, our theoretical results can be applied to other backward problems for the differential equations with more general fractional operator. Then a modified Bregman iterative algorithm has been proposed to approximate the minimizer. The new features of this algorithm is that the regularizing term altered in each step and we need not to solve the complex Euler-Lagrange equation of variable TV regularizing term (just need to solve a simple Euler-Lagrange equation). The convergence of this algorithm and the strategy of choosing parameters are also obtained. Numerical implementations are provided to support our theoretical analysis to show the flexibility of our minimization model. centeral limit theorem to the random walk problem. If we assume the distribution of particle jump is Gaussian, we will obtain normal diffusion equations If we assume the particle jump satisfy Lévy distribution, by continuous time random walk (CTRW) model, we will derive time-space fractional diffusion equation (FDE) as follows x ∈ R 2 (2) with α ∈ (0, 1], β ∈ (1/2, 1]. Here the time derivative is in Djrbashian-Caputo sense defined as follows where g γ (t) : with Γ(γ) is the Gamma function. Denote the Fourier transform of function v as F(v) orv, the inverse fourier transform as F −1 v orv. Then the space fractional derivative (−∆) β can be defined by Fourier transform as (−∆) β v = F −1 (|ξ| 2βv ). Usually, we call this type fractional derivative operator as symmetric Riesz-Feller space fractional derivative operator.
Fractional time-space diffusion equation (2) attracts lots of researchers attention. From the physical point of view, there are two long papers [29,40] provide a good summary. From the stochastic point of view, there is a good book [28] which gives rigorous mathematical deductions. From the functional analysis point of view, Peng, Li [31] propose fractional semigroup theory; Li, Chen [24] propose α-resolvent operator to provide a general theory for the fractional abstract Cauchy problem which can be applied to FDE (2) and some more general FDEs. B. Baeumer et al. [6,5,3,4] propose the concept of stochastic solutions for fractional evolution equations and study FDEs by using stochastic methods combined with operator semigroup theory.
In this paper, we focus on the backward problem for equation (2). As mentioned in a recent tutorial [21], the mathematical theory of inverse problems for FDEs is still in its infancy. However, there are already some pioneering work in this direction. Cheng et al. [12] establish the uniqueness in an inverse problem for a one-dimensional fractional diffusion equation. Sakamoto and Yamamoto [34] establish the unique existence of weak solutions and the asymptotic behavior as time t goes to ∞. They also prove the stability in the backward problem in time and the uniqueness in determining an initial value. Liu and Yamamoto [25] study a backward problem for a time-fractional diffusion equation. Zhang and Xu [41] investigate an inverse source problem for fractional diffusion equation. They obtain the uniqueness of the inverse problem by analytic continuation and Laplace transform. Zheng and Wei [42] study backward problem of space fractional diffusion equations. They show that the problem is severely ill-posed and propose a regularization method.
Recently, Wang and Liu [36] propose to use more general anomalous diffusion models to describe the blurring effect, and the backward problems give the mathematical formulation for the de-blurring process in image restoration. By using total variation regularization term, the discontinuity of the initial data can be recovered. In Wang and Liu's paper, they use time-fractional diffusion models, here we intend to use a more general time-space fractional diffusion model (2). Recovering initial data with discontinuities not only can find applications for de-blurring process but also appeared for recovering initial temperature when we know temperature field diffuse according to some fractional equations. Apart from the usual smooth assumptions, the initial temperature may also exhibit discontinuities as shown in Figure 10.4 in [22].
In the following, we assume the initial data u(x) ∈ R 2 with compact support in a bounded convex open subset Ω of the plane with Lipschitz continuous boundary ∂Ω, which is a reasonable assumption in many applications. Denote g δ (x) in Ω to be the measurement data, our backward problem is to approximate v(0, x) = u(x) from g δ (x). For some known error level δ > 0, the noisy data of the exact gray level In many applications of the backward diffusion problem, the initial distribution u(x) is in general not smooth. Because v(T, x) generated from the Cauchy problem (2), v(t, x) need not have compact support as u(x). Here, we only use the measurement data v(T, x) in Ω, i.e., the values v(T, x) outside of Ω have nothing to do with our reconstruction process, we can define v(T, x) for x ∈ Ω such that g δ (·) − g(·) L 2 (R 2 \Ω) = 0. (6) Taking Fourier transform with respect to x in (2), we obtain ∂ α tv (t, ξ) = −|ξ| 2βv (t, ξ).
By using the Laplace transform with respect to t in (7), we can establish the relation between u(x) and v(T, x) in frequency domain aŝ where E α,1 (·) is the Mittag-Leffer function defined as which can be seen as a generalization of exponential function e z . Denote Su = F −1 (Ŝ(ξ)û(ξ)), then we have Intuitively, the operator S is a convolution operator with kernel F −1 (Ŝ(ξ)). In part 3 of Section 2, we will define the operator S (formula (60)) by the solution operator of an abstract fractional evolution equation.
As mentioned by the previous works [21,36], recovering u(x) from the noisy measurement of exact v(T, x) base on relation (8) in the frequency domain is ill posed due to the rapid decay of the forward process. Usually, there are two conventional methods, namely, Tikhonov regularization and truncated Fourier transform regularization, to overcome this difficulty in the frequency domain. In 2013, Wang and Liu [36] proposed to use total variation (TV) regularization for time fractional diffusion model. TV regularization method successfully recover the edges of the initial data and is robust for the noise. However, TV regularization method suffers from staircasing effect, namely the transformation of smooth regions into piecewise constant regions. A linear combination of the Tikhonov regularization and TV regularization with an adaptive weight is a natural way to reduce this effect. However, linear combination of two regularization terms could be seen as a compromise between Tikhonov regularization and TV regularization, which may not reflect the specific properties of functions on local regions. Hence, Blomgren et al. [8] suggest letting the exponent in the regularization term depend on the data. Li et al. [23] study the variable exponent TV regularization when exponent 1 <p(x) ≤ 2. Harjulehto et al. [16] discuss the variable TV regularization allowingp(x) = 1 for some x by using techniques developed in [17]. Bollt et al. [9] focus on the following variation model where λ > 0 is a positive constant, g is the noisy image,p(·) defined as with Gδ : R 2 → R is a symmetric mollifier centered at 0 and belongs to C 2 ∩ W 3,2 . P N : R + → [1, 2] is a non-increasing C 2 function with P N (N ) = 1 with N > 0 is a positive real number. For example, P N can be taken as follows: They discuss how parameter choices affect recover results and prove the existence and uniqueness of minimizers. Recently, in [35], the author studies image decomposition problems by using variable TV regularization combined with variable Besov space.
We attempt to use the variable total variation regularization term to penalize our fractional backward diffusion problem. More specifically, we intend to use the following model where δ > 0 is error level, K is some suitable admissible set of the approximate solution,p(x) defined as in (12).
By applying the Lagrangin formulation, the variable TV restoration model (14) can be transformed into the following unconstrained minimization problem: where λ is a positive parameter that controls the tradeoff between a good fit to the measurement data and the regularized solution. For each δ > 0, there exists some λ such that (14) and (15) are equivalent.
Comparing (11) and our model (15), the forward operator is more complex than the identity operator. In order to solve (11), we can simply takē where g is the noisy image, and the exponent will not change during our computation, by doing this the Euler-Lagrange equation will be simpler than For clarity, we list the Euler-Lagrange equations for (15) with (16) as follows and for (15) with (17) as follows For model (11), because the edges will not change so much during the computation, the reduction (16) which is taken in [9] is suitable. However, for our model (15), because the edges will change dramatically during the evolution process, we must iterate the value ofp(x) during our computation. To make this clear, we consider an image as the initial data and the solution v(1, x) of the fractional diffusion equation (2) shown in Figure 1. The solution v(1, x) shown on the right hand  Figure 2 is the boundary of the initial data detected by Canny algorithm in the Matlab toolbox. The right image in Figure 2 is the boundary of v(1, x) also detected by Canny algorithm. From these figures, it is clear that the boundary of the initial data will change dramatically during the fractional evolution process as claimed in the beginning of this paragraph.
In summary, theories about existence, uniqueness and stability will be proved in a very general setting, then restricted to backward problem for equation (2) we propose a modified Bregman iterative algorithm to solve problem (15). In the following, we will describe the key point of our proof. In order to prove existence, uniqueness and stability of problem (15), we first generalize the theory constructed in [1] to our variable total variation regularization model, during the proof we propose a concept named CBV-coercive. After building the general theory, we need to verify the operator S appeared in (10) satisfy the conditions in the general theory. One of the new ingredients of this paper is that we propose an abstract fractional evolution equation (58), then prove the solution operator of this abstract evolution equation satisfy the required conditions. Through the abstract formulation, we obtain the existence, uniqueness and stability when the forward problem is (2) and in addition, all the theoretical results can be applied to more general systems. More specifically, our results is valid for system (58) in Section 2 with the spatial derivative operator −A generate a C 0 -semigroup and satisfy condition (67).
The second new ingredients of this paper is that we propose a modified Bregman iterative algorithm to solve problem (15). To the best of our knowledge, researchers use Euler-Lagrange equations directly or construct an evolution process based on Euler-Lagrange equations to solve image restoration problems with variable TV regularizing term. In the traditional image restoration problem, the edges will not change dramatically for the forward operator is the identity operator, hence, we can assume (16) which highly reduce the computational task. In our setting, the edges will change during the diffusion process as shown in Figure 2, so we must iterate the exponentp in our algorithm. Bregman iteration [30] is an efficient methods used to solve TV regularization based image restoration. In the framework of Bregman iteration methods, we obtainp n+1 by using the value of recovered image u n , so during every iteration we can use the Euler-Lagrange equations as in the case of (16). Hence, on one hand we allow the exponentp update during each iteration. On the other hand, a simple Euler-Lagrange equation can be used to reduce the computational load. However, in our modified Bregman iterative algorithm, the regularization term changes its form at each iteration, so we need more techniques to provide theoretical analysis of our algorithm. In Section 3, we prove the convergence and provide a practical stopping criterion based on the detailed analysis.
The organization of this paper is as follows. In Section 2, we propose the concept of CBV-coercive and build a general theory then use the general theory to a general linear model with variable TV regularizing term. By using operator semigroup and fractional operator semigroup theory, we prove the backward problems for an abstract fractional differential equation satisfying the conditions in our general theory. In Section 3, we propose modified Bregman iterative algorithm, then provide detailed theoretical analysis. Finally, the numerical implementations are given in Section 4 to support our theoretical results and to show the validity of the proposed algorithm.
2. Existence, uniqueness and stability. In this section, we will prove existence, uniqueness and stability of our minimization problem (15) in a general setting. Here we need to clarify some notations used through all the following parts of this paper.
• d stands for dimension; Ω ⊂ R d is a bounded domain with Lipschitz boundary; • C m will stands for functions with continuous derivatives up to order m; C m c stands for compactly supported function with continuous derivatives up to order m; • W m,p is the usual Sobolev space with weak derivatives of order up to m belongs to L p ; For simplicity, we denote H m := W m,p when p = 2; H m 0 will stands for the closure of C ∞ c in H m ; • For a subset Ω ∈ R d , χ Ω stands for indicator function which equal to 1 in Ω and equal to 0 outside of Ω; • |Ω| stands for the Lebesgue measure of Ω ⊂ R d ; • If S is a bounded linear operator, we will denote S as the operator norm of S; • BV in this paper stands for functions of bounded variation, the norm defined as 2.1. General theory. In this subsection, we build a general theory for the following unconstrained minimization problem In order to use compactness properties of function spaces for unconstrained minimization problems, we introduce the following property: define T to be CBVcoercive if where J(·) satisfies u BV ≤ CJ(u).
Theorem 2.1. Suppose J is defined as in (21) and T is CBV-coercive. If 1 ≤ q < d d−1 and T is lower semi-continuous, then problem (20) has a solution. If in addition q = d d−1 , dimension d ≥ 2, and T is weakly lower semi-continuous, then a solution also exists. In either case, the solution is unique if T is strictly convex.
Proof. Let u n be a minimizing sequence for T ; in other words, Since T is CBV-coercive, the {u n } are BV-bounded. By Theorem 2.5 in [1], there exists a subsequence u n k which converges to someū ∈ L q (Ω). Convergence is weak Uniqueness of minimizers follows immediately from strict convexity.
Next, we consider a sequence of perturbed problems min u∈L q (Ω) Theorem 2.2. Assume J is defined as in (21), 1 ≤ q < d d−1 and that T and each of the T n s are CBV-coercive, lower semi-continuous, and have a unique minimizer. Assume in addition: 1. Uniform CBV-coercivity: for any sequence v n ∈ L q (Ω), 2. Consistency: T n → T uniformly on J-bounded sets, i.e. given B > 0 and > 0, there exists N such that Then problem (20) is stable with respect to the perturbations (24), i.e. ifū minimizes T and u n minimizes T n , then and one can replaces the lower semi-continuity assumption on T and each T n by weak lower semi-continuity, then convergence is weak: Proof. Note that T n (u n ) ≤ T n (ū), by assumption (2), we have and hence by assumption (1), the u n s are J-bounded. Remember the properties of J, the u n s are BV-bounded. Now suppose our results does not hold. By Theorem 2.5 in [1], there exists a subsequence u n k which converges in But this contradicts the uniqueness of the minimizerū of T .

2.2.
Variable TV regularization for general linear problems. In this subsection, we consider the following special form of T : where and ν ranges over the set of C 1 (Ω) functions with positive minimum, σ ranges over functions in C 1 c (Ω) with |σ(x)| ≤ 1,p defined as in (12), S is a linear bounded operator from L q (Ω) to L 2 (Ω), g is a function in L 2 (Ω), λ > 0 is a positive real number. As demonstrated in [9], if u,p are C 1 , F (u) defined above is equivalent to Ω |∇u|pdx.
So instead of (15) in the previous section, in this subsection we consider T defined in (27). For this particular T , we define which obviously satisfies u BV ≤ CJ(u).
S is a linear bounded operator from L q (Ω) to L 2 (Ω) and g is a function in L 2 (Ω). In addition, we assume that Then T defined in (27) is CBV-coercive with J defined in (30), and the functional T has a minimizer.
Proof. The lower semi-continuous or weakly lower semi-continuous of F (u) follows from Theorem 9 and Theorem 12 in [9]. Hence the (weakly) lower semi-continuous of J(u) obviously hold. If we can prove T in (27) is CBV-coercive, by Theorem 2.1, the functional T has a minimizer. So the main task is to prove T is CBV-coercive. Decompose u as follows: Using Poincaré inequality and Hölder's inequality, there exists a positive constant C such that for any q such that where C 1 := (|Ω| + 1) In the last inequality of (34), we used (35) in [9]. Using (34) and the decomposition (32), we have From the assumption (31), there exists C 2 > 0 such that From the definition of T and the decomposition (32), we obtain By (34), we obtain Combining (36), (37) and (38), we have From the definition of T (u) in (27), we obtain that Hence, considering (40), we finally obtain Combining (40) with the above inequality, we obtain Considering case 1 and case 2, (42) and (44) yields the CBV-coercivity of T . Remark 1. In the above theorem, if we assume the solution lies in a closed convex subset K of L q with 1 ≤ q ≤ d d−1 , T can be written as Because the above T is strictly convex, using same procedure as in the proof of theorem 2.3, we obtain the following theorem.
S is a linear bounded operator from L q (Ω) to L 2 (Ω) and g is a function in L 2 (Ω). In addition, we assume that Then T defined in (27) is CBV-coercive with J defined in (30), and the functional T has a unique constrained minimizer over K.
Next, we addresses the stability of minimizers to functionals of (27). Consider perturbed functionals Theorem 2.5. Assume 1 ≤ q ≤ d d−1 , lim n→∞ g n − g L 2 (Ω) = 0, the S n s are each bounded linear and converge pointwise to S, and for each n, Also assume each S n has a unique minimizer u n and that S has a unique minimizer u. Then for Proof. It suffices to show that conditions (1) and (2) of Theorem 2.2 hold. For condition (1), put u n = ν n + w n as in (32) and (33), and repeat the proof of theorem 2.2. Since S n χ Ω L 2 ≥ γ |Ω| w n L 2 (Ω) , letting M be an upper bound on S and each S n , m is an upper bound on g L 2 and each g n L 2 , one obtains This yields uniform CBV-coercivity by same argument as in the proof of Theorem 2.2. Since condition (2) is obviously satisfied.

2.3.
Variable TV regularization for fractional backward diffusion. In this subsection, we firstly construct an abstract fractional evolution equation based on the following time-space fractional diffusion system for homogeneous media: with 0 < α ≤ 1, β ∈ (1/2, 1]. u is the initial data in R 2 with compact support in a bounded convex open subset Ω of the plane with Lipschitz continuous boundary ∂Ω.
Define an operator A = −∆ with domain Let X = L 2 (R 2 ), then A is a bounded linear operator defined on X. By Theorem 2.4.1 in [11], we know that A is m-accretive and −A generates a uniformly exponentially stable and contractive C 0 -semigroup, denoted as T (t). From the proof of Theorem 2.4.1 in [11], we know that there exists a constant c > 0 such that where · is the operator norm. Considering (54), for 1 2 < β ≤ 1, we can define the following bounded linear operator Then as illustrated in [11], [15] or [38], we can define the operator A β as the inverse operator of A −β . Hence, we have For any x ∈ D(A), the above positive power of operator A has the following Balakrichnan representations [15] A which may provide more intuitive ideas to the readers. With these preparations, we can recast system (52) into the following abstract ordinary differential equations on Banach space X = L 2 (R 2 ) as follows Remark 2. Usually the fractional Laplacian operator defined as So in order to obtain a meaningful abstract form (58), we need to state the equivalence of A β defined in (57) and the usual definition by Fourier transform. By , the abstract form is equivalent to the usual definition by Fourier transform.
Remark 3. The operator A can be defined more generally as follow where a(·), c(·) are functions in C 1 , D(A) defined as in (53) and in addition, we suppose the operator A defined above satisfies the strong elliptic condition. It is well known that this operator generate a contractive C 0 −semigroup [13] and we can define fractional operator as in (55),(56) and (57), so the abstract fractional evolution equation (58) incorporate a natural generalization of fractional Laplace operator. In the following part of this article, we only present the proof of the Laplace case. That is because once we define the general operator mentioned in this remark appropriately, it satisfies all the properties of the operator semigroup which we used and the proof will be almost same. Now we can prove the main theorem in this subsection.
Theorem 2.6. Let K be a closed convex subset of L 2 (Ω), g δ is the measurement data in time T . Then the optimization problem has a unique minimizer over K for any fixed λ > 0.
Proof. We attempt to use Theorem 2.4 to obtain the result, so we need to verify that the operator S is bounded from L 2 to L 2 , and Sχ Ω = 0.
Step 1. Bounded of operator S. Taking the measure µ in [19] to be δ β (·), using Theorem 3.7 in [19], we find that −A β generates a bounded C 0 -semigroup denoted as S β (t). For simplicity, we denote S β as S β (T ) for short. Using Corollary 2.10 in [7], we know that −A β generate an α-order fractional semigroup proposed in [31]. We denote the α-order fractional semigroup generated by −A β as S α,β (t), particularly for t = T , denote S α,β as S α,β (T ) for short. Instead of the semigroup property, this α-order fractional semigroup satisfies the following equality in the strong operator topology. Noting our definition of v(x, 1) for x / ∈ Ω in Section 1, we have the following expression for operator S: Now the meaning of the operator S is not restricted to the one used in (10). In order to obtain our results, we need to introduce the following lemma (restated in our setting) proved in Section 3 of [7].
is an α-order fractional semigroup with α ∈ (0, 1). Then the following representation holds and (61) holds in the strong sense.
Here we only recall that Φ α (t) is a probability density function satisfies: Considering Lemma 2.7 and the above properties (63), for every u ∈ X, we have where we used the fact that S β (s) is a bounded C 0 -semigroup.
At last, let us consider the stability of the minimizer of (59) with respect to the perturbations on the operator S and g δ for any fixed λ > 0. We state this result as follows.
Theorem 2.9. Let the hypothesis in Theorem 2.6 be satisfied. Define two functionals If lim n→∞ g n − g L 2 (R 2 ) = 0, S n converge pointwise to S as n → ∞, then it follows that u n u as n → ∞ in the weak topology of L 2 (Ω).
Proof. We will use Theorem 2.5 to obtain the above result. Here we just need to verify condition (48). From the proof of Theorem 2.6, we know that there exists a constant c > 0 such that Since here we just concern the case with large enough n and S n converge to S pointwise, we can find a positive constant N > 0 such that if n ≥ N then we have From (71) and (72), we obtain Remark 4. For the general operator A mentioned in remark 3, in order to obtain the same results as in the Laplace case, we may need to assume the function c(·) has a small positive bound to verify the conditions mentioned in lemma 2.8. In summary, the proof of Theorem 2.6 and Theorem 2.9 just require −A generate a contraction C 0 -semigroup and satisfies condition (67). In addition, by some modifications, we may obtain similar results for space distributed order fractional diffusion equations studied in [19,20]. Because the modifications is not trivial, it may be more appropriate to report in another paper.
3. Numerical approach. The well-posedness of our optimization problem (59) is obtained in the previous section. In this section, we propose modified Bregman iterative method to solve (59) efficiently. The Bregman distance associated with a convex functional F (·) between points u and v is defined as where p ∈ ∂F (v) = {w : F (u) − F (v) ≥< w, u − v >, ∀u} is the sub-gradient of F (·) at the point v. Then our optimization problem (59) becomes with p ∈ ∂F (u * ). Since the forward problem can be solved efficiently in the frequency domain, by Parseval identity, we have the following equivalent form whereŜ defined as in (8).
Instead of solving (14), Osher et al. [30] proposed Bregman iterative regularization to solve (76) approximately by using the following iterative formula: for m = 0, 1, 2, . . ., beginning with p 0 = u 0 = 0. In order to simplify the computation, we introduce wherep m (x) = P N (|∇(Gδ * u m )(x)| 2 ). Instead of F (u) (p dependent on u) by F m (u) defined above, the Euler-Lagrange equation will be significantly simplified for each step as illustrated in (18) and (19). Through this simplification, we can still capture the change of edges during evolution process by just solving a simple Euler-Lagrange equation.
Using the definition of Bregman distance (74), problem (77) will becomes From the results in Section 2, it is easy to find that u 1 is well defined. By similar arguments as deriving (3.4), (3.5) and (3.6) in [36], we can also deduce that {u m : m ∈ N} is well defined. Now let us firstly provide a recursive procedure which can solve (77) numerically in Algorithm 1. Then, let us prove the properties of until Some stopping condition is satisfied {u m : m ∈ N} appeared in Algorithm 1 and the stoping criterion for iteration based on (79). Since the regularizing term changed during iteration in our case, the proof is more complex than the TV regularization case.
Theorem 3.1. For any fixed λ > 0, the data fitting error from the iteration is non-increasing, i.e.,

Moreover, it follows that
where u * is the minimizer of functional Ŝû −ĝ δ 2 L 2 (R 2 ) , δ > 0 is the known error level.
Proof. Obviously, we have By direct computations, we can obtain Summing up the following estimates into (83) then we have Take u = u * in (85) and rewrite it as Taking summation for m = 1, . . . , M yields where we used u 0 = p 0 = 0. Considering D p M F M (u * , u m ) ≥ 0, we finally arrive at Theorem 3.2. Assume that u is the exact initial distribution. The optimal strategy for regularizing parameters (λ, M ) is that For such a strategy, we have the optimization convergence rate Proof. The results can be obtained by using Theorem 3.1 and the following estimate Since ∇u * L 2 is unknown, the strategy (89) can not be implemented numerically for choosing (λ, M ). An implementable scheme is that we fix λ, then choosing the iteration stopping value M such that is satisfied first time for some specified τ > 1.
For our problem, we must consider how to evaluatep(x) during each iteration. In order to simplify the computation and reduce the number of parameters needed to be specify, here we use a surrogate forp(x) = P N (|∇(Gδ * u)(x)| 2 ). The essential idea ofp(x) is that its value is 1 near the edges and its value is 2 away from the edges. So we can use some simple algorithm to detect the edges firstly, then based on the estimated edges build our exponentp(x). Denote u m to be the input of the (m+1)s iterate, we need to specifyp(x) from u m . Using some simple edge detection algorithm, we will obtain the estimated edges denoted as E(u m ) which is an image with value 1 on the detected edges and with value 0 away from the edges. Then, we definep where I is the matrix which has the same dimension as u m and each element of I is equal to 1, Gδ defined as in (12) is a smoothing kernel. Now we are ready to consider the algorithm for solving withĝ m =ĝ δ +f m for m = 0, 1, . . .. Firstly, we introduce a new function w to represent the gradient term ∇u in optimization problem (93), which generates an equivalent constrained convex optimization problem: Secondly, we split the domain Ω into three parts, for some small constant > 0, Approximately, we can takep(x) = 1 on Ω 1 andp(x) = 2 on Ω 3 . Hence, we can rewrite (94) as follows Based on the domain decomposition, we define w 1 := w| Ω1 , w 2 := w| Ω2 and w 3 := w| Ω3 . Thirdly, by using the splitting technique, we construct an iterative procedure of alternately solving a pair of easy subproblems. The first three subproblems can be called 'w-subproblem' for fixed u = u * : The last subproblem is the 'u-subproblem' for fixed w = w * : Using the definition of Frechet derivatives and standard computations, we can easily obtain the minimizer of the subproblems (97), (99) and (100). Because the deduction is standard, we omit the details and just give the results as follows: For subproblem (98), by a simple calculation, we can obtain the Euler-Lagrange equation Takingg m ij to be a standard finite difference approximation of the right hand side of (104) at x i,j and t m , we get an Euler-like updating scheme Here we use an adaptive step size scheme. The new value w m+1 2 i,j [u * ] is accepted for each step in which the cost is improved, , and the step ∆t is increased by a factor ∆t → (1 + s)∆t, s > 0. For each unsuccessful , the trial step is not used, and the step size is decreased, ∆t → (1 − s)∆t.
In order to solve optimization problem (94) by solving subproblems from (97) to (100), we need to solve subproblems from (97) to (100) iteratively with many times to obtain an accurate solution. However, as mentioned in [14], we actually only need to solve these subproblems with few iterations. Hence, we may not need to solve subproblem (98) with very high accuracy. That is to say we can run the iterative procedure (105) with few steps.
At last, we state the discrete version of gradient operator and frequency operation. For a function f , the discrete version of ∇f is (∇f ) i,j : where λ is the regularization parameter, g δ is the measured data with noise. For the TV regularization model, we refer to [30] which described clearly how to solve TV regularization model. For Tikhonov regularization model, it can be solved just by a small modification of algorithm stated in [30]. More explicitly, we just need to change the Euler-Lagrange equation of problem (106) by the Euler-Lagrange equation of problem (107).
In our numerical examples, we choose
For the noise, we take δ = 0.0005 and δ = 0.005 respectively. Because the value of λ can determine the convergence rate of our algorithm, we take different λ for different noise level. If we take λ too big, Su 1 −ĝ δ L 2 may less than τ δ when the first iteration finished. In this case, we may incorporate more noise in our result u 1 . If we take λ too small, the iteration will converge too slow to obtain our final results, e.g. exceed 500 steps. For δ = 0.0005 and δ = 0.005, we take λ = 10 11 and λ = 10 9 respectively in our numerical experiments.
In order to avoid the error in solving the forward fractional differential equation, we solve (2) to obtain the solution at time T using the Laplace transform where the Mittag-Leffler function E α,1 (·) is numerically calculated up to desired accuracy by standard algorithm provided by Podlubny [33].
Denote x = (x 1 , x 2 ) ∈ R 2 and ξ = (ξ 1 , ξ 2 ) ∈ R 2 . We generate the final measurement data with noise by where randn is the pseudo-random number generating from the standard normal distribution. Here we add noise as in [10] where the δ stands for the noise level is 100 × δ, e.g. when δ = 0.05 the noise level is 5%.
We use relative error (RelErr) to quantitatively compare our solution with those based on TV regularization and Tikhonov regularization. For given finite dimensional vectors g and its noisy form g δ representing the image, the above RelErr has the representation Example 1. We consider u(x) = e −|x| 2 , T = 1. In this case, the exact solution has the following formv We take α = 0.6, β = 1 and T = 1 to see the difference between the three different models. In Table 1, relative error defined in (109) for three different methods are presented. Because the noise added by random algorithms, we run the three different algorithms 100 times and the data are the averages.  Figure 3 which show that the recovered data have no visual difference with the original data.
Example 2. Consider a phantom model generated by standard function phantom.m in Matlab with defalut parameters. We use the gray level (piecewise constant) of this image as the values of u(x), see Figure 4. In this example, we take α = 0.6 and β = 0.9 and N = 256. In the following, we provide Table 2 to present the performance of the three different models. As in Example 1, we also run the three different algorithms 100 times and the data in Table 2 are the averages.   TV model in Figure 5 which show that the recovered functions are much similar to the original data.
for some unknown constant C. We take λ = 10 11 , 1 4 × 10 11 , 1 16 × 10 11 respectively, then run our program by taking τ = 1.01 and using our stop criterion (91) to obtain the iterative step M . For clarity, the values of M , RelErr with different choices of λ are shown in Table 3. From Table 3, we find that M = 9, 34, 150 when Table 3. The values of RelErr with different parameters λ of Variable TV model for Example 2 λ = 10 11 λ = 1 4 × 10 11 λ = 1 16 × 10 11 σ = 0.0005 M = 9 M = 34 M = 150 RelErr = 13.0792% RelErr = 13.0342% RelErr = 13.0275% λ = 10 11 , 1 4 × 10 11 , 1 16 × 10 11 respectively, which indicate that the numerical results are almost the same as predicted by the theoretical results. Although we choose three different values of λ and obtain three different maximum iteration numbers, the values of RelErr are almost the same, which demonstrate that the estimated results are not sensitive to different values of parameter λ. The results shown in Table 3 are obtained by running our algorithm once, and each time the results will be a little different for the noise is added randomly.
After theoretical justifications, we want to clarify an interesting phenomena which reveals some essential different properties of the inverse problems for integerorder differential equations and fractional-order differential equations.
Discontinuity for normal and anomalous diffusion. In this part, we also use the phantom model generated by standard function phantom.m in Matlab with defalut parameters (same as in Example 2) as our initial data then take T = 1, σ = 0.0005, β = 1 and the time derivative α = 0.5, . . . , 1. In order to provide a clear explanation, we take 100 points between [0.5, 1] for α. Then we use our variable TV regularizing model to recover the true initial data and plot the RelErr value for each α in Figure 6. Because we used the same model, the degree of ill-posedness intuitively can be represented by the RelErr value. Small RelErr value indicate that our model can provide a good result, hence, the degree of ill-posedness is weak. In contradict, large RelErr value indicate that the degree of ill-posedness is strong. From Figure 6, we clearly find that even for α = 0.99 the degree of ill-posedness is much weaker than the integer-order equation. This implies that for α < 1 the degree of ill-posedness varies continuously, however, for α = 1 the degree of ill-posedness is much higher than any value of α < 1. The degree of ill-posedness may not change continuously at the point 1. This observation may be explained by the properties of Mittag-Leffler function E α,1 (z). For α = 1, it is an exponential function, however, for any value α < 1 the Mittag-Leffler function behaves like polynomial functions for large z (Theorem 1.3 in [32]). This property also be observed in [37] which propose a fractional extension of instantaneous frequency attribute to detect thin layers of sandstone formations. They use fractional order of 0.99 and illustrate only 0.01 smaller than integer-order 1 will bring very different results. From the above two typical examples, it is obviously that our algorithm behaves like L 2 based Tikhonov regularization model when the initial function is smooth and behaves like TV regularization model when the initial function is piecewise constant. Hence, our model has more flexibility compared with Tikhonov regularization model and TV regularization model.

5.
Conclusion. In this work, backward diffusion problem of a time-space fractional diffusion equation are discussed under the regularization framework. Variable total variation (TV) regularizing term are employed to overcome the deficiencies of Tikhonov and TV regularization methods. The existence, uniqueness and stability of the solution for the minimization problem with variable TV regularizing term are obtained by using the fractional operator semigroup theory. Meanwhile, a new modified Bregman iterative algorithm are proposed to approximate the minimizer. Concerning this new algorithm, both the convergence and the strategy of choosing parameters are illustrated theoretically. In the end, numerical tests are provided, which demonstrate the good performance and flexibility of the proposed method.
Finally, we point out some future directions along the line of this work. The first is concerned with the inversion of initial data and fractional-order simultaneously. In some problems, we can estimate the fractional-order firstly, then solve the backward problem. However, sometimes, we need to estimate the initial data and the fractional-order simultaneously, which require to construct more sophisticated algorithms, e.g., alternate iterative type algorithms for fractional-order and initial data. The second is concerned with the generalization of the isotropic fractional diffusion equations employed here to anisotropic equations. For anisotropic equations, Fourier transform may not be used directly and new techniques need to be developed.