ANALYSIS OF A NONLOCAL-IN-TIME PARABOLIC EQUATION

. In this paper, we consider an initial boundary value problem for nonlocal-in-time parabolic equations involving a nonlocal in time derivative. We ﬁrst show the uniqueness and existence of the weak solution of the nonlocal-in-time parabolic equation, and also the spatial smoothing properties. More- over, we develop a new framework to study the local limit of the nonlocal model as the horizon parameter δ approaches 0. Exploiting the spatial smoothing properties, we develop a semi-discrete scheme using standard Galerkin ﬁnite element method for the spatial discretization, and derive error estimates de- pendent on data smoothness. Finally, extensive numerical results are presented to illustrate our theoretical ﬁndings.

1. Introduction. where Ω is a bounded convex polygonal domain in R d (d = 1, 2, 3) with a boundary ∂Ω, g and f are given functions and T > 0 is a positive terminal time. Here, associated with a given nonnegative, symmetric kernel function ρ δ (s) = ρ δ (|s|), the nonlocal operator G δ is defined by (1.2) whenever the limit exists in L 2 (0, T ) for any function v ∈ L 2 (−δ, T ). Throughout the paper, we assume that ρ δ has a compact support contained in [−δ, δ], and a In case sρ δ (s) / ∈ L 1 (0, δ), we assume further that the possible singularity of sρ δ (s) appears only at the origin, to ensure that the definition (1.2) makes sense. The positive parameter δ appearing in (1.2) represents the range of nonlocal interactions or memory spans, and is called the nonlocal horizon parameter. For suitably defined kernels, as δ → 0, nonlocal or memory effect diminishes, so that the zero-horizon limit of the nonlocal operator G δ u corresponds to the standard first order derivative d dt u, and hence the model (1.1) becomes the standard diffusion model in the local limit (see Section 4 for details). Meanwhile, in the extreme case that δ = ∞, the initial data v(t) := v(0) for all t ∈ (−∞, 0), and the kernel function is of fractional type without normalization, i.e., ρ ∞ (s) = α Γ(1−α) s −α−2 with α ∈ (0, 1), the nonlocal operator reproduces the Caputo fractional derivative for t > 0 (e.g. [18, p. 91] and [2]) provided that v ∈ C(−∞, T ] and v is piece-wisely differentiable with v ∈ L 1 (0, T ) ∩ C(0, T ] Consequently, the nonlocal parabolic model (1.1), with initial data g(x, t) := g(x) for all t < 0, becomes the time-fractional diffusion model u(x, t) = 0, on ∂Ω T ≥ t > 0, (1.4) u(x, 0) = g(x), in Ω.
We refer to [2] for more details. The model (1.4) has been used to describe subdiffusion dynamics [24]. A comprehensive survey on fractional order differential equations arising in many applications such as dynamical systems in control theory, electrical circuits with fractance, generalized voltage divider, viscoelasticity, electrochemistry, and model of neurons in biology, can be found in [4,6,12,24]. In comparison with the classical diffusion (the δ → 0 local limit) and fractional diffusion equations (the δ = ∞ limit), the nonlocal-in-time parabolic equation (1.1) provides an interesting intermediate model to study the 'finite history dependence' with a given δ ∈ (0, ∞).

1.2.
Relevant studies and our contributions. Our work is motivated by recent interests in nonlocal models. For instance, the nonlocal peridynamics (PD) theory, formulated in integro-differential equations, has attracted much attention as a possible alternative to classical partial differential equation formulation of solid mechanics [26,3,8,19,27,28]. The nonlocality in PD has been largely described by spatial nonlocal interactions but it is natural, from a modeling perspective, to consider nonlocal or memory effect as well. The theoretical study on (1.1) involving the nonlocal-in-time operator G δ remains limited, although there have been some works concerning the related nonlocal gradient operator in R d [7,9,22,11,23] in the form y − x |y − x| 2 ρ δ (|y − x|) dy, (1.5) where the nonnegative radial function ρ δ = ρ δ (|x|) ∈ L 1 (R d ) is compactly supported in the ball B δ (0) with the horizon parameter δ. In Section 2, similar to [7,22], we present some basic properties of the nonlocal calculus related to G δ and its adjoint operator, and discuss their localization theory in L 2 -sense.
To the best of our knowledge, this is the first study of the nonlocal-in-time parabolic model in the form of (1.1). The focus of this work is to establish the well-posedness and spatial smoothing properties associated with (1.1). As a direct application of the regularity results, we provide some preliminary discussion on the semi-discrete Galerkin finite element approximation. We are especially interested in the case of nonsmooth data, which is a typical case for related inverse problems, see, e.g., [17,31] for its local parabolic counterpart. Specifically, the main results are summarized below.
A similar argument yields an estimate when f ∈ L 2 (0, T ). The estimate for the short-time asymptotic behavior of the solution is given by Theorem 3.2 and Remark 3.5. Furthermore, exploiting the localization theory established in Section 2, a new framework is developed via a duality argument to show that the solution of the nonlocal problem indeed converges to that of its local counterpart (3.8) in the sense of L 2 (0, T ); c.f. Theorem 3.3 and Corollary 3.2. All these results can be easily extended to the case of infinite interval by letting T → ∞. Although our discussion focuses on positive λ, analogous estimates can be derived for λ = 0 in a finite interval, i.e., T < ∞, using the same idea (cf. Corollary 3.1 and Remark 3.6). However, without the decay property, the estimate for λ = 0 is more delicate in case of an infinite interval. A reasonable estimate in L ∞ -norm, provided that g ∈ L ∞ (−δ, 0) and f ∈ L p (0, ∞) (cf. Remark 3.4), requires more studies on Duhamel's principle, which is beyond the scope of the paper. The extension to the initial boundary value problem (1.1) is made next in Section 4 using the spectrum decomposition based on the results for (1.6) established in Section 3. We show in Theorem 4.1 that the problem (1.1) with a variables-separable initial data g(x, t) ∈ L ∞ (−δ, 0;Ḣ r (Ω)) with r ≥ 0 and trivial source data (f ≡ 0) has a unique weak solution which satisfies the a priori estimate u Ḣl (Ω) ≤ ct −(l−r)/2 g L ∞ (−δ,0;Ḣ r (Ω)) , with l ∈ [r, r + 2]. (1.7) Moreover, the above estimate can be slightly improved in case of non-integrable fractional kernels, i.e. ρ δ = (α + 1)s α−2 δ −α−1 with −1 < α < 0. In the latter case, it holds for r ≥ 0 that (cf. Theorem 4.3) , while the appearance of δ −α−1 is due to the normalization of the kernel. Furthermore, Theorem 4.4 shows that the second-order regularity pick-up still holds even if the variables-separable initial data is replaced by more general data, for integrable kernels, i.e., sρ δ (s) ∈ L 1 (0, δ), and the estimate is uniform in t, but with a δ-dependent constant. Meanwhile, we obtain the wellposedness and smoothing property for the inhomogeneous problem, by allowing less regular source term, cf. Theorem 4.2.
While the model (1.1) offers restricted spatial smoothing at a short time, we observe that, by repeating the argument, u = u(t) ∈Ḣ s+2k (Ω) for all t > (k − 1)δ, cf. Remark 4.1. This interesting and δ-dependent property is in sharp contrast with well-known facts on standard local parabolic equations [29] and some fractional diffusion models [25,14], since the solution of standard diffusion model has an infinite smoothing property while the subdiffusion has a limited smoothing property with order 2 for any t > 0. This reflects the distinct feature of the nonlocal model (1.1) that it has the same smoothing property as the one of anomalous subdiffusion at a short time, but exhibits the behavior of standard diffusion when t approaches infinity. The limited smoothing is also present for the semi-discrete Galerkin finite element approximation; see Section 6. In addition, we also study the localization of the nonlocal model (1.1) in the sense of L 2 (0, T ; L 2 (Ω)), with nonsmooth initial data g ∈ L ∞ (−δ, 0; L 2 (Ω)) and weak source data f ∈ L 2 (0, T ;Ḣ −1 (Ω)), which is fully confirmed by the numerical experiments.
As a direct application of the smoothing property, we then analyze in Section 5 a spatially semi-discrete scheme for the nonlocal model (1.1) and provide some preliminary results. In particular, we develop the semi-discrete finite element approximation using continuous piecewise linear functions, and derive the error estimates in terms of the data regularity. Numerical tests on various one-dimensional examples, including both regular and weak data, are given in Section 6, which illustrate and complement our theoretical study.
Throughout the paper, we use the notation c and C, with or without a subscript, to denote a generic positive constant, which may change at different occurrences, but is always independent of the horizon parameter δ, the mesh size h, the solution u, and the data g and f .

2.
Preliminaries. In this section, we collect useful facts on the nonlocal calculus and the localization theory of the nonlocal operator G δ given by (1.2). Some similar results for the operator G δ given in (1.5) have been studied in [22,23]. For completeness, we recall some useful results and outline their proofs for the simple one dimensional scalar version here.
2.1. Nonlocal calculus. Nonlocal vector calculus has been first introduced in [7] as a framework to analyze nonlocal models. It involves some basic nonlocal operators and associated integral identities similar to those in classical, local calculus. The first result needed for our study is a formula of nonlocal integration by parts. The proof is identical to that of [23,Theorem 1.4] and hence omitted here. Lemma 2.1. Suppose that u ∈ L 2 (−δ, T ), and ϕ ∈ C ∞ c (0, T ) with zero extension out of the interval (0, T ). Further, we suppose that (2.1) The next lemma gives an upper bound for the nonlocal operator for smooth functions in Sobolev spaces, see similar discussions in [5,23,22].
Proof. It follows the definition of G δ , Hölder's inequality and [5, Theorem 1] that This completes the proof of first inequality, and the second one follows analogously.
A similar argument for the nonlocal operator G * δ leads to the following. Corollary 2.1. Let the operator G * δ be defined by (2.2) with s 2 ρ δ (s) ∈ L 1 (0, δ). Then for ϕ ∈ H 1 (0, T + δ), it holds that As a result of previous discussion, a distributional nonlocal operator G δ may be defined on L 2 (−δ, T ) by a.e. in (0, T ). To this end, let us now define a related nonlocal function space equipped with the norm T ) The next lemma shows the density of smooth functions in V δ , a result missing from earlier works [22]. Lemma 2.3. Suppose that u ∈ V δ . Then there exists a sequence of smooth functions Proof. Let J be the standard mollifier [1, pp. 36]. For any a, b s.

Consequently, by defining
This completes the proof.
Proof. By Lemma 2.3, there exists a sequence of smooth functions {u n } ⊂ C ∞ [−δ, T ] such that u n → u and G δ u n → G δ (u) in L 2 (0, T ). Now, with reference to the definition (1.2), we introduce the operator G δ for small > 0 as First, using the density of C ∞ c (0, T ) in L 2 (0, T ) and the fact that lim we have that G δ (u) L 2 (0,T ) is bounded uniformly in , by the uniform boundedness principle, and hence G δ u is a weak limit of G δ u in L 2 (0, T ). Then it suffices to show that lim →0 G δ u L 2 (0,T ) = G δ u L 2 (0,T ) . To this end, we choose a sequence of smooth functions using the construction in the proof of Lemma 2.3. In particular, we choose a sequence of {(ρ n , σ n , )} s.t.
Then using the continuity of the translation operator and the property that J ρ * ϕ L 2 (0,T ) ≤ ϕ L 2 (0,T ) , we derive that where the last inequality is due to η 1 , η 2 ∈ C ∞ [−δ, T ]. Therefore, G δ u n L 2 (Ω) is bounded uniformly in and n, and hence lim By the definition (1.2), G δ (u) exists and G δ (u) = G δ (u).
Similarly, we may define the space V * δ by Then we have the following result of integration by parts.
, the desired result follows.

2.2.
Localization. Next, we state results on localization theory, which plays a crucial role in studying local limits of nonlocal models. They are again similar to the more general cases presented in [22]. Lemma 2.5. Let ϕ ∈ H 1 (−δ 0 , T ) for some δ 0 > 0 and the nonlocal operator G δ be given in (1.2). Then it holds that A localization result for the adjoint nonlocal operator G * δ may be derived via a similar argument.
Remark 2.1. Further generalizations of the localization theory of the nonlocal gradient G d in the strong topology of Sobolev spaces W 1,p (p = 2), or the space of functions of bounded variation, have been discussed in [23] and [22].
3. Nonlocal initial value problem. To prepare for the study on equation (1.1), we consider the space-independent nonlocal initial value problem (1.6) in this section. Our discussions are divided into the case of integrable kernels and that of more general kernels.
3.1. Well-posedness for integrable kernels. We start with the case of sρ(s) ∈ L 1 (0, δ), and, hereinafter, we denote b δ = δ 0 sρ δ (s) ds. The problem (1.6) can be seen as an standard integral equation with a nonlocal constrained initial value condition, whose well-posedness follows from the fixed point theorem, as stated in the following lemma.
Proof. We note the fact that u ∈ L ∞ (0, T ) is a solution of (1.6) if and only if it is a fixed point of the operator K such that for t ∈ (0, T ) We observe that Further, we have for ϕ, ψ ∈ W δ and t ∈ (0, T ), Then applying the fixed point theorem, there exists a unique solution u ∈ L ∞ (0, T ) for (1.6) which satisfies the a priori estimate.
The preceding estimate is suitable for large λ and may be exploited to pick up the regularity of (1.1). However, in the case that λ is close to 0, it is beneficial to develop a sharper a priori estimate. To this end, it suffices to consider the trivial initial data and a nontrivial right hand side. Now we define The existence of such a constant is assured by the fact that

QIANG DU, JIANG YANG AND ZHI ZHOU
Then it holds for ϕ ∈ W δ and t ∈ (0, c 0 δ) that The contraction property follows as before, and hence there exists a unique solution u ∈ W δ . Then repeating this argument, we may conclude the following result.
An estimate in L 2 -norm follows from Lemma 3.1 directly, i.e., for g ∈ L ∞ (−δ, 0) and f ∈ L ∞ (0, T ), the unique solution satisfies A sharper estimate (for large T ) in the L 2 -norm is given below.
Lemma 3.2. Suppose that g ∈ L ∞ (−δ, 0), f ∈ L 2 (0, T ) and sρ(s) ∈ L 1 (0, δ). Then there exists an unique solution satisfying Proof. Let the operator K be defined by (3.1) and By repeating the argument in the proof of Lemma 3.1 we have for any ϕ ∈ W δ , This shows the existence and uniqueness of the solution, and hence the a priori estimate.
Remark 3.1. As in Proposition 3.1, we may provide a sharper estimate in case of small λ. In particular, for g ∈ L ∞ (−δ, 0), f ∈ L ∞ (0, T ), and sρ(s) ∈ L 1 (0, δ), the initial value problem with λ ≥ 0 has a unique solution u ∈ L 2 (0, T ) such that Remark 3.2. In case of the integrable kernel associated with trivial right hand side, we can slightly strengthen the result, for g ∈ L 2 (−δ, 0). To this end, we let Then we obtain the well-posedness of the initial value problem as well as the following a priori estimate Well-posedness in case of non-integrable kernels. Next, we turn to the case of sρ δ (s) / ∈ L 1 (Ω), which is more delicate due to the weak singularity of the kernel. We thus consider the limit of a sequence of initial value problems for {u } >0 given by with the truncated nonlocal operator defined by (2.5). With the singularity removed, argument given in the previous subsection becomes applicable which then leads to the well-posedness of (3.2), i.e., there is a unique solution u of (3.2) satisfying the uniform bound and hence for λ > 0, G δ u L 2 (0,T ) + u L 2 (0,T ) ≤ C, with C > 0 independent of . Therefore u weakly converges (up to a subsequence) to u in L 2 (0, T ), and u satisfies the estimate The next lemma shows that u is indeed a solution of (1.6). Proof. From the definition of u and properties of weak convergence, the initial condition is satisfied automatically. It suffices to show that u satisfies the governing equation in (1.6). Taking ϕ ∈ C ∞ c (0, T ) with zero extension out of the interval (0, T ) and multiplying ϕ on both sides of (3.2), we have the following after integrating from 0 to T and interchanging the order of integration,

QIANG DU, JIANG YANG AND ZHI ZHOU
Using the proof of Lemma 2.2, we note the fact that Taking → 0 and applying (2.3), we derive that From the observation G δ u = f − λu ∈ L 2 (0, T ) and Lemma 2.4, we get that G δ u = G δ u ∈ L 2 (0, T ) and hence u satisfies (1.6) almost everywhere. Finally, the uniqueness follows from the comparison principle.
To sum up, we state the main result in this section, which gives the well-posedness of the initial value problem (1.6) and the a priori estimates.
Proof. The first estimate is a consequence of (3.4) and Lemma 3.3, while the second assertion follows from the similar argument (by replacing weak convergence with weak* convergence), and the uniform upper bound given in Lemma 3.1.
Remark 3.3. The preceding argument also works for unbounded intervals, i.e., when the governing equation of (1.6) is posed in (0, ∞). Suppose that g ∈ L ∞ (−δ, 0) and f ∈ L 2 (0, ∞), then there is a unique solution u ∈ V δ of the problem (1.1) and it holds that In view of Proposition 3.1 and 3.1, we may derive the a priori estimate for (1.6) in case of λ = 0, for arbitrary kernels. This is given in the following theorem.
Remark 3.4. Corollary 3.1 provides a uniform upper bound in case that λ = 0 in a finite interval, i.e., T < ∞. However, the estimate for the infinite interval is more delicate due to the lack of decay property. Analogously to the local counterpart, in which case u L ∞ (0,∞) ≤ |u(0)| + f L 1 (0,∞) , we expect a reasonable a priori bound in L ∞ (0, ∞) provided that g ∈ L ∞ (−δ, 0) and f ∈ L p (0, ∞). This requires further studies on Duhambel's principle and is beyond the scope of this paper.
3.3. Short-time behavior of the solution. The next theorem gives the shorttime asymptotic behavior of the solution of problem (3.8) that is useful to establish regularity pick-up in the Section 4. We are particularly interested in the case of non-integrable fractional kernels, i.e., ρ(s) = (α + 1)s α−2 δ −α−1 with α ∈ (−1, 0), which is given in the following theorem.
Proof. The unique existence has been given in 3.1. To prove the estimate, we first consider the case g ≥ 0. This implies u(t) ≥ 0 for any t ∈ [0, T ] by comparison principle. We then define which is the solution of the following initial value problem Next, we show that f v ∈ L ∞ (0, T ) and hence the problem is well-posed. In fact, due to the nonincreasing property of v, we have G δ v(t) ≥ 0, and hence for t ∈ [δ, T ] that where B(α+1, −α) denotes the Beta function such that B(α+1, −α) = Γ(α+1)Γ(−α) Hence We now show that f v ≥ 0. Combining (3.6) and (3.7) we achieve that for all t > 0, 1, −α)) .

QIANG DU, JIANG YANG AND ZHI ZHOU
Then by choosing we obtain that f v (t) ≥ 0 and hence the desired result. Finally, for a general initial data g ∈ L ∞ (0, T ), we consider the splitting g = g + − g − where g + ≥ 0 and g − ≤ 0 are in L ∞ (0, T ). Let u + and u − be solutions of problem (1.6) with initial data g + and g − respectively. Then it holds that which completes the proof of the lemma.
The basic idea is similar to the one of proving Theorem 3.2. Define u m (t) as We may verify that G δ u m + λu m ≥ 0 for t ∈ (0, δ), in view of the fact where the fact that δ 0 s 2 ρ δ (s) ds = 1 is used. 3.4. Local limits of nonlocal models. In this part, we develop a new framework to study the local limit of the nonlocal problem (1.6) as δ → 0, i.e., u−u δ L 2 (0,T ) → 0, where u δ is the solution of (1.6) and u is the solution of the following local initial value problem (3.8) Here we assume that g ∈ H 1 (−δ 0 , 0) with some δ 0 > 0 so that its trace at t = 0 is well defined. Now we give the main theorem of this part. Proof. By Theorem 3.1, we know that u δ L 2 (0,T ) is bounded uniformly in δ. Then there exists a weakly convergent subsequence such that u δ w ∈ L 2 (0, T ). Next, we show that w = u. In fact, for all ϕ ∈ C ∞ c (0, T ), it holds that (G δ (u δ ), ϕ) + (λu δ , ϕ) = (f, ϕ) for all δ.
Hence w + λw = f ∈ L 2 (0, T ). Next we shall show that w satisfies the initial condition w(0) = 0. Let C ∞ R [0, T ] denote the space consists of functions in C ∞ [0, T ] which can be extended to C ∞ [0, ∞) by zero extension. For any ϕ ∈ C ∞ R (0, T ), we may find a sequence {ϕ n } in C ∞ c (0, T ) such that lim n→0 ϕ n − ϕ L 2 (0,T ) = 0. Then using the uniform boundedness of G δ u δ L 2 (0,T ) , we have The using (3.9), the left hand side can be written by On the other hand, Lemma 2.1 and Corollary 2.3 yield that Consequently, we have (w , ϕ) = (w, ϕ ) for ϕ ∈ C ∞ R (0, T ), and hence w(0) = 0. Then w = u follows immediately from well-posedness of the local initial value problem (3.8).
Finally, we shall show that the convergence is indeed strong in L 2 (0, T ). To this end, we define a sequence of auxiliary problems: for a fixed δ, to find Hence v satisfies −v + λv = u ∈ L 2 (0, T ) and the preceding discussion shows that v(T ) = 0. Further, multiplying u δ on the equation (3.10) and integrating from 0 to T , and using Corollary 2.2 and the equation (1.6), we achieve that . On the other hand, we have As a result, the strong convergence is implied and the theorem is proved. Proof. Let u g be an H 1 -extension of g to H 1 (−δ, ∞). Then using corollary 2.2, we have G δ (u g ) L 2 (0,T ) + u g L 2 (0,T ) ≤ C where the constant C > 0 independent of δ. Then we consider the problem with the homogeneous initial data. Since the L 2 -norm of f is bounded uniformly in δ and G δ u g − u g L 2 (0,T ) → 0 by Lemma 2.5, we have w δ → w in L 2 (0, T ), where w is the solution of local initial value problem with right hand side f + u g + λu g . Then u δ = w δ + u g , the solution of the nonlocal problem (1.6), converges to u = w + u g , the solution of the local problem (3.8), in sense of L 2 (0, T ).
Remark 3.6. Analogously, the localization theory in a finite interval can be established in the case that λ = 0, in view of a priori estimates in Corollary 3.1.
Then by the a priori estimate in Lemma 3.1, we may prove that e L 2 (0,T ) ≤ cδ. However, the convergence rate for irregular or incompatible data is still an open question.
4. Nonlocal-in-time parabolic equations. In this section, we turn to the nonlocal-in-time parabolic equation (1.1) and study its well-posedness and smoothing properties. We first introduce some standard notation for convenience. For r ≥ −1, we denote byḢ r (Ω) ⊂ H −1 (Ω) the Hilbert space induced by the norm: for any r ≥ 1, and the norm · L r (0,T ;B) is defined by Next we consider the well-posedness of mode (1.1). It is of particular interest to consider the case of a weak data, which is widely used in inverse and control problems. Similarly to the standard parabolic counterpart, we may define the weak solution of the nonlocal-in-time parabolic equation (1.1), using the preceding notations.  Further, u ∈ L 2 (0, T ;Ḣ r+1 (Ω)) and u satisfies u L 2 (0,T ;Ḣ r+1 (Ω)) + G δ u L 2 (0,T ;Ḣ r−1 (Ω)) ≤ c g L ∞ (−δ,0;Ḣ r (Ω)) , where the constant c is independent of horizon parameter δ.
Proof. First, we give the solution representation of (1.1). Let g n (x) = g(x, t), ϕ n = p(t)c n where c n = q, ϕ n is a time-independent constant. Define u n as the solution of the initial value problem  λ r n | q, ϕ n | 2 = c g L ∞ (−δ,0;Ḣ r (Ω)) .
Next we show that u is a weak solution of (1.1). The initial condition is selfevident. To show that u satisfies the variational equation, we let u N = N n=1 u n (t) ϕ n (x), then u N satisfies with the initial condition u N (t) = N n=1 g(t), ϕ n ϕ n . We have immediately that u N L 2 (0,T ;Ḣ 1 (Ω)) + G δ u N L 2 (0,T ;Ḣ −1 (Ω)) ≤ C for some constant C uniform in N . Hence, u N u in L 2 (0, T ;Ḣ 1 (Ω)) and G δ u N G δ u in L 2 (0, T ;Ḣ −1 (Ω)) up to a subsequence by the duality argument. Then for By passing to weak limit we obtain that Then this equation is valid for all v ∈ L 2 (0, T ;Ḣ 1 (Ω)) by density argument. Thus, for all v ∈Ḣ 1 (Ω) As a result, u is a weak solution of (1.1). Consequently, G δ u = ∆u, which together with the a priori estimate on u yields the bound of G δ u.
Finally, we prove the uniqueness of the weak solution to (1.1) within the class given in Definition 4.1. To this end, we set g ≡ 0 and f ≡ 0 and prove that the problem (1.1) has only a trivial solution. Let u be the weak solution. Then by the definition of {(λ n , ϕ n )} ∞ n=1 , one observe that u n = (u, ϕ n ) must satisfy G δ u n (t) + λ n u n (t) = 0, and the initial condition u n (t) = 0 for all t ∈ (−δ, 0). Due to the existence and uniqueness of the initial value problem given in Theorem 3.1, we derive that u n = 0, n = 1, 2, .... Then by the completeness of {ϕ n }, we obtain that u ≡ 0 in (0, T ) × Ω.
Remark 4.1. In view of Theorem 4.1, for initial value g = p(t)q(x) ∈ L ∞ (−δ, 0; H r (Ω)), the solution u(t) of model (1.1) belongs toḢ r+2 (Ω) for all t > 0. Then for t > δ and t 0 ∈ (δ, t), the similar argument works for the problem with homogeneous Dirichlet boundary condition, and the initial condition y(t) = u(t + t 0 ) for all t ∈ (−t 0 , 0). Then we have y(t) ∈Ḣ r+4 (Ω) for any t > δ in view of the fact that Consequently u(t) ∈Ḣ r+4 (Ω) for all t > δ. Then repeating the discussion, we may deduce that u(t) ∈Ḣ r+2k (Ω) for all t > (k − 1)δ. This is in sharp contrast to the its counterpart estimates for the fractional diffusion [25] and the standard diffusion [29,Chapter 3]. Recall that the solution of heat equation is infinitely differentiable, irrespective of the smoothness of the initial data, while the fractional diffusion only exhibits the spatial smoothing property of order 2 even for t 0. Hence the nonlocal-in-time parabolic model (1.1) has the similar smoothing property as that of fractional diffusion for short time, i.e., t < δ, while for t 0 it exhibits the similar solution behavior as that of standard diffusion.
The next theorem gives the well-posedness for inhomogeneous source data.
This completes the proof.

4.2.
Smoothing properties for particular types of kernels. We now consider two special types of kernels, the non-integrable fractional kernels and the integrable kernels, that are of particular interest. All the proofs use similar techniques as ones used in the preceding section, and are thus either omitted or given in Appendix.
As we mentioned in Section 1, the case of non-integrable fractional kernels, i.e., sρ δ (s) = (α + 1)s α−1 δ −α−1 with α ∈ (−1, 0), relates closely to the fractional diffusion. We can improve the estimate in Theorem 4.1 in this case, and characterize the short time behavior similar to that of fractional diffusion. (−1, 0). Then the solution u of problem (1.1) satisfies Proof. The existence of the weak solution is given by Theorem 4.1. Now we deduce the a priori bound. By Theorem 3.2 we have for l ∈ [r, r + 2] for the solution of model (1.1). Recall that this is very close to the short time estimate for the fractional diffusion (cf. [25, Theorem 2.1]), while the appearance of δ −α−1 is due to the normalization of the kernel.
Next, we consider the case of integrable kernels, i.e., sρ δ (s) ∈ L 1 (0, δ), for which we may improve the smoothing property estimates given in the Theorem 4.1 even if the initial data is not variables-separable.
Proof. Without loss of generality, we consider the case that r = 0. The proof is based on the Banach fixed point theory. It is easy to observe that a function u ∈ L 2 (0, T ;Ḣ 2 (Ω)) is a solution of (1.1) if and only if it satisfies where b δ = δ 0 sρ δ (s) ds. We first show that the problem (4.2) has a unique solution in the set To this end, we set A = −∆ for convenience and define the operator K as with b δ = δ 0 sρ δ (s) ds. Then we have for ϕ ∈ W δ and t ∈ (0, T ] Hence the operator K maps W δ to W δ . Further, for ϕ, ψ ∈ W δ and t ∈ (0, T ], we derive that Then the contraction property follows directly from the fact b δ (b δ + A) −1 < 1.
Hence there exists a unique solution in W δ . Finally, we turn to the result of the lemma. For t ∈ (0, T ), we note that where the constant b δ is proportional to δ −1 . Then the estimate for G δ u L ∞ (0,T ;L 2 (Ω)) is a direct consequence of the triangle inequality.

4.3.
Local limits of nonlocal models. In this section, we discuss the zero-horizon limit of the nonlocal model (1.1). Specifically, let u δ be the solution of (1.1), and u be the solution of the standard heat equation The next theorem shows that u δ strongly converges to u in L 2 (0, T ; L 2 (Ω)).
Using spectral decomposition and Corollary 3.2 we have g n ∈ H 1 (−δ, 0) and hence This completes the proof of the theorem.
Remark 4.3. Theorem 4.5 states the convergence of nonlocal-in-time parabolic model to the standard diffusion in L 2 (0, T ; L 2 (Ω)) as δ approaches zero. However, the convergence rate still awaits further theoretical studies. Numerically, it highly depends on the smoothness of the initial data, i.e., it exhibits first order convergence for smooth data, but the order deteriorates for nonsmooth initial data; cf. Fig. 1 in Section 6.

5.
Semi-discrete finite element approximation. The smoothing properties obtained in preceding section can be used to analyze a semi-discrete finite element scheme. Using the notations from [29], we let {T h } (0 < h < 1) be a family of shape regular and quasi-uniform partitions of the domain Ω into d-simplices, called finite elements, with h denoting the maximum diameter. An approximate solution u h is sought in the finite element space X h of continuous piecewise linear elements over the triangulation T h X h = {χ ∈ H 1 0 (Ω) : χ is a linear function over τ ∀τ ∈ T h }. The semi-discrete Galerkin FEM for problem (1.1) reads: with t ∈ (−δ, 0), and a(u, w) = (∇u, ∇w) for u, w ∈ H 1 0 (Ω). Upon introducing the discrete Laplace operator ∆ h : the spatially semi-discrete problem can be written as where the discrete source term f h = P h f . Now we define the discrete norm ||| · |||Ḣ p (Ω) on X h for any p ∈ R Analogously, we introduce the associated spaces ||| · ||| L r (0,T ;Ḣ p (Ω)) , r ∈ [1, ∞], on the space X h . The following estimate is the discrete analogue of Theorem 4.2. It is essential for the analysis of the lumped mass method in Section 4.
Lemma 5.1. Let u h be the solution of (5.1) with g h ≡ 0. Then for arbitrary p > −1, In our analysis we the L 2 -orthogonal projection P h : L 2 (Ω) → X h and the Ritz projection R h : H 1 0 (Ω) → X h are used. They are, respectively, defined by We need some properties of the Ritz projection R h and the L 2 -projection P h [13].
Let u and u h be the solutions of (1.1) and (5.1), respectively. Then where the constant c is independent of the horizon δ.
Combing the preceding two estimates yields the desired assertion.
The argument in the proof of Theorem 5.1 together with Theorem 4.2 deduces the following estimate in case of nonsmooth data.
where the constant C is independent of the horizon δ.
In case of integrable kernel, i.e., sρ δ (s) ∈ L 1 (−δ, 0), the assumption in Theorem 5.1 can be weakened, in view of Theorem 4.1, and the similar error estimate can be achieved with a δ-dependent constant.
Then the localization property follows directly from Theorem 5.1 and 4.5.
6. Numerical results. Here we present some one-dimensional numerical tests to substantiate and complement the theoretical findings in Section 4 and 5.
In our computation, we divide the unit interval (0, 1) into N equally spaced subintervals, with a mesh size h = 1/N . The finite element space X h consists of continuous piecewise linear functions. We examine separately the spatial and temporal convergence rates at the terminal time t = 1. For the case of nonsmooth initial data, we are especially interested in the errors for t close to zero, and thus we also present the errors at t = 0.1, 0.01 and 0.001. To compute a numerical solution we have used also a time-stepping by discretizing the time interval, t n = nτ , n = 0, 1, ... , with τ being the time step size. In particular, we assume that δ = mτ and discretize the nonlocal operator by It can be easily verified that the truncation error is second order provided that the function is smooth enough. Moreover, one may establish the unconditional stability. Indeed, even though it is not presented here, our experiments present a pointwisely first-order convergence no matter how singular the initial data is. Whenever needed, we use very fine meshes in both space and time to compute a reference solution. Unless otherwise specified, we set τ = 10 −4 , so that the error incurred by temporal discretization is negligible. 6.2. Example (a): Smooth initial data. In our experiments, we consider fractional kernels of the type ρ δ (s) = (α + 1)s α−2 δ −α−1 with α > −1. We measure the accuracy of the approximation u h (t) by the error u − u h L 2 (0,T ;Ḣ p (Ω)) for p = 0, 1, cf. Table 1. In the table, rate refers to the ratio of the errors when the mesh size h halves, and the numbers in the bracket denote theoretically predicted convergence rates. The numerical results show O(h 2 ) and O(h) convergence rates for the L 2 (Ω)-andḢ 1 (Ω)-norms of the error, respectively, which fully confirm the theoretical estimates given in Theorem 5.1. When δ approaches 0, the solution of the nonlocal model converges to a solution of the local heat equation in L 2 (0, T ; L 2 (Ω)) (cf., Figure 1(a)), as stated in Theorem 5.2. One may observe a first order convergence, i.e. O(δ), which theoretically remains to be explored.  Tables 2-4, in which case g(x) ∈Ḣ −1/2− (Ω) for t ∈ (−δ, 0). The analysis for this very weak data is beyond the scope of the paper, but the numerical solutions presented here illustrate the limited smoothing of the nonlocal model (1.1). Since the solution is no longer be in L 2 (0, T ;Ḣ 1 (Ω)) in case of sρ(s) / ∈ L 1 (0, δ), here we  plot the pointwise error, i.e., (u − u h )(t) Ḣp (Ω) with p = 0, 1. In Tables 2 and 3, the mesh size is chosen to be h = 1/(10k + 1), and thus the Dirac-δ function is not aligned with the grid. We observe that the L 2 (D)-andḢ 1 (Ω)-norms of error is of order O(h 3/2 ) and O(h 1/2 ), respectively, in case that t < δ. However, for large t, i.e., t > δ, the solution u = u(t) could be smoother, cf. Remark 4.1. Hence, one may observe a higher-order convergence, see Table 4.1. This confirms the theoretical findings discussed in Section 4. In Table 4, we report the results for the case that the δ-function is supported at a grid point. Even if t ≤ δ, one may observe a convergence with orders O(h 2 ) and O(h) respectively for errors measured in L 2 (Ω)-andḢ 1 (Ω)-norms, i.e., the method exhibits a superconvergence of one-half order higher, which theoretically remains to be established. Figure 1(b) shows the convergence of the nonlocal models to the local one as δ → 0. Numerical results show that the rate of δ-convergence deteriorates for nonsmooth data, which reflects the strong dependence on data regularity and remains to be studied further. In this part, we present numerical results for example (c), i.e. g ≡ 0 and f ∈ L ∞ (0, T ;Ḣ s (Ω)) with s ∈ (−1, −1/2), in Tables 5 and 6. The results are similar to ones for example (b), i.e., if the deltafunction is not supported at a grid node, we observe that the method converges with the order O(h 1/2 ) inḢ 1 (Ω)-norm, while the convergence rate in the L 2 (Ω)norm is O(h 3/2 ). The numerical results fully confirm our theoretical predictions. However, in case that the delta-function is aligned with the grid, we observe a superconvergence with one-half higher order. Figure 1(c) plots the error u δ h − u 0 h L 2 (0,T ;L 2 (Ω)) as δ → 0, where u δ h and u h denote the solutions of nonlocal and local parabolic equations, respectively. One may observe a first order convergence, which confirms the theoretical finding in Theorem 4.5. Table 5. u − u h L 2 (0,T ;Ḣ p (Ω)) for example (c) with δ = 1, h = 1/m, τ = 10 −4 , T = 1. 3.41e-1 2.43e-1 1.71e-1 1.17e-1 7.72e-2 ≈ 0.52 (0.50) Table 6. u − u h L 2 (0,T ;Ḣ p (Ω)) for example (c) with δ = 1, h = 1/m, τ = 10 −4 , T = 1.
γ norm\m 10 20 40 80 160 320 rate −0.5 L 2 2.51e-4 6.29e-5 1.57e-5 3.93e-6 9.79e-7 2.42e-7 ≈ 2.00 H 1 9.33e-3 4.68e-3 2.34e-3 1.17e-3 5.51e-4 2.49e-4 ≈ 1.07 0.5 L 2 2.06e-4 5.18e-5 1.29e-5 3.23e-6 8.06e-7 1.99e-7 ≈ 2.02 H 1 7.12e-3 3.57e-3 1.79e-3 8.93e-4 4.20e-4 1.90e-4 ≈ 1.08 7. Conclusion. A nonlocal-in-time parabolic model (1.1) is studied in this paper. Its well-posedness and smoothing properties are established, based on similar results derived first for the space-independent nonlocal-in-time models. These mathematical theory allow us to further discuss the zero-horizon limit of the nonlocal model as the horizon parameter δ approaches zero, by establishing a novel duality argument. A semi-discrete scheme using piecewise linear finite element spatial approximation is also studied. Extensive numerical experiments are reported to confirm the theoretical analysis. One distinct property of the nonlocal-in-time model (1.1) is its limited smoothing property as we discussed in Section 4. In particular, one may observe a restricted spatial smoothing property at a small time, e.g., t < δ; cf. Theorem 4.1 and 4.3. However, we may repeat the argument to deduce a higher spatial regularity, say u(t) ∈Ḣ r+2k (Ω) for all t > (k − 1)δ, cf. Remark 4.1. Recall the fact that the solution of standard diffusion model is infinitely differentiable for all t > 0 [29] while the subdiffusion has a limited smoothing property with order 2 even if t approaches infinity [25]. Therefore the nonlocal model (1.1) has the same smoothing property as the one of anomalous diffusion at a short time, but exhibits the behavior of standard diffusion when t approaches infinity. This illustrates our point that (1.1) is in fact an intermediate case between standard and anomalous diffusion. The limited smoothing is numerically confirmed in Section 6, by evaluating the convergence order of the semi-discrete Galerkin finite element approximation.
Our analysis can be extended to problems with more general boundary conditions and/or spatially varying coefficients. As an example, one may replace the negative Laplacian with a generalized self-joint and positive definite second order elliptic operator, say A = −∇·a(x)∇+q(x), where a(x) is a symmetric and positive definite d×d matrix-valued measurable function on the domain with smooth entries, and q is an L ∞ -function. Then all the analytical studies in Section 4 and 5 can be repeated directly. Moreover, the argument can be extended to other types of boundary conditions, e.g., pure Neumann boundary condition whose spectrum contains a zero eigenvalue. In this case, we may treat nonzero eigenvalues analogously but applying Corollary 3.1 to the zero one so that all discussions in Section 4 and 5 remain valid.
The current work represents a first step towards the nonlocal-in-time parabolic model (1.1), and highly motivates further studies. First, the temporal regularity awaits rigorous theoretical analysis (e.g. see [10] and [2,25] respectively for its standard and fractional counterparts). This will help to derive a sharp estimate for the time-stepping (6.1). The truncation error of (6.1) is second-order provided a smooth solution but it only exhibits a point-wisely first-order convergence for a general initial data empirically due to its lack of temporal smoothing property. This often happens to time-dependent problems with a initial layer, e.g., [15,16,20,21]. Moreover, it will be beneficial to study the Duhamel's principle of the model (1.1) which not only will provide reasonable a priori estimates for (1.6) in case that λ = 0 in the infinite interval (cf. Remark 3.4), but also is a crucial tool to analyze the point-wise error estimate, i.e., (u − u h )(t) Ḣp (Ω) , for the semi-discrete scheme. Finally, it is interesting to explore the rate of δ-convergence, which is of first-order in case of smooth data but deteriorates for very weak data, as presented in Section 6.