Memory equations as reduced Markov processes

A large class of linear memory differential equations in one dimension, where the evolution depends on the whole history, can be equivalently described as a projection of a Markov process living in a higher dimensional space. Starting with such a memory equation, we propose an explicit construction of the corresponding Markov process. From a physical point of view the Markov process can be understood as a change of the type of some quasiparticles along one-way loops. Typically, the arising Markov process does not have the detailed balance property. The method leads to a more realistic modeling of memory equations. Moreover, it carries over the large number of investigation tools for Markov processes to memory equations, like the calculation of the equilibrium state, the asymptotic behavior and so on. The method can be used for an approximative solution of some degenerate memory equations like delay differential equations.


Introduction
Memory equations describe the time evolution of some quantity, considering the whole prehistory of the evolution: The past influences the future. Markov processes, or more generally time evolutions with the Markov property, describe the problem under the assumption that the evolution can be predicted, knowing only the current state: The present influences the future. At first glance, by means of memory equations, it is possible to investigate a wider class of problems, since evolution equations with the Markov property can be regarded as degenerate memory problems, where the dependence of the past is concentrated in one moment. But from a philosophical point of view, it seems to be natural that a complete description of a problem has to be a Markov one for the following reason: The Markov property means that the solution operator is a semigroup, i.e. it is time-invariant. Due to Noether's theorem, this invariant corresponds to the conservation of some energy, the dual variable of time. Thus, the Markov property is the typical property of a model, where some energy is conserved. Conversely, if the evolution is governed by a non-Markovian equation, it is not complete, some energy is lost. This requires finding more degrees of freedom unless the model is Markovian. In other words, it is to be expected that a non-Markovian description can be regarded as some part or restriction of a more-dimensional Markov process. This theoretical thought can be confirmed in various practical situations: • An arbitrary (nonlinear) dynamical system on a compact space Z can be equivalently formulated as a linear deterministic Markov process on the space of Radon measures on Z (see, e.g. [14]) via its Liouville equation.
• A general linear evolution equation that is nonlocal in space and time, including jumps and memory on some domain in R n , can be understood as a limit of a diffusion process (a special Markov process) on a complicated Riemannian manifold (see [9]).
• The projection of a general Brownian motion (a special Markov process in phase space) on the coordinate space is a diffusion process if the initial velocity is Maxwellian (see [13]).
Hence, the idea that a memory equation can be regarded as part of a higher dimensional Markov process, does not seem to be very surprising. Indeed, the main result in this paper is that we provide the construction of an easily analyzable Markov process for a more or less arbitrary given memory kernel. Let us briefly revise the basic facts in modeling and analyzing Memory equations and Markov processes.

Memory Equations
Memory equations (ME) are differential equations where the evolution depends not only on the current state but also on the past. MEs are a special case of functional differential equations -an equation of unknown functions and their derivatives with different argument values. The mathematical theory of functional differential equations (or integro-differential equations) is treated in [7,10].
From the viewpoint of modeling and analysis, MEs have attracted a lot of attention during the last decades. For example, they arise in modeling flows trough fissured media, [8,11] or in modeling heat conduction with finite wave speeds [6]. We consider MEs of convolution type. Such equations arise also as effective limits of homogenization problems, starting with the pioneering work of L.
Tartar [16]. The object of interest is a linear memory equation of the forṁ where u : [0, ∞[→ R is a scalar state variable, u 0 ∈ R ≥0 and K : R ≥0 → R ≥0 is a positive real kernel. Please note, we focus on a scalar variable, but our considerations can be generalized to systems as well as to non-autonomous linear PDEs (like diffusion equations with time-dependent diffusion coefficients). Let us briefly explain the ME (1). In contrast tou = −au, where the decay is quite fast, in this equation the decay is damped due to the influence of former states. The ME can be interpreted as a reduction of the mass into unknown depots. Phenomenologically, this can be modeled by a = a(t), which yields a non-autonomous equation. Another way to think about (1) is the following.
Introducing the function A defined by A = −K and A(0) = a, we geṫ Integrating the above equation, we get that can be regarded as a continuous analogue of the time-discrete scheme Equivalently, using partial integration we geṫ This form is often considered (e.g. in [11]). Subsequently, we use the form (1). For solving a ME, the memory described by K(t) or A(t) has to be known for any time t ≥ 0. This is often postulated, i.e. K(t) is given by heuristic arguments. A typical and simple example is K α (t) = αe −αt for α > 0. Then K α (t) ≥ 0 and ∞ 0 K α (t)dt = 1. In this case, for α −→ +∞, the integral on the right-hand side of (1) tends to u(t) -the ME becomes an ordinary differential equation. In the same sense, a sequence of some other integrals of convolution type can tend to a delay differential equation (DDE), that means K(t) = j α j δ(t − t j ) for large enough t ≥ 0. So, the kernel K can be interpreted as a measure on the time line that can be approximated by the "simplest" measures: convex combinations of δ-measures. Note that DDEs with the above kernel of the forṁ are solved with respect to an initial condition φ ∈ C([− max{t j }, 0]). That means the solution space is infinite dimensional. On the other hand regarding the modeling viewpoint, it is difficult to derive an initial value φ ∈ C([0, T ]) for a DDE. Often the initial value φ is assumed to be constant or a simple given function. See e.g. [12] for more details, where the analysis and applications especially for modeling aftereffect phenomena are presented. The ME needs the initial value only for one fixed value, say t = 0. But, if t ≥ max{t j }, the DDE become a ME. This means, that the beginning of the evolution is also modeled in the ME. In this sense, MEs include many types of differential equations like ODEs and DDEs. We remark that also from the modeling viewpoint it is more natural to treat kernels that are not located at precise time values but are smeared. Another important property is the asymptotic behavior. The ME is a non-autonomous differential equation. The equilibrium cannot be calculated settingu = 0. Assuming there is no non-trivial solution that makes the right-hand side zero, so that it is no equilibrium of the ME.

Markov Processes
There is a huge amount of literature on Markov Processes (MP) -see, e.g. [2,3,4]. Here we introduce our notation. Let Z be a given state space, a compact topological space, C := C(Z) the Banach space of continuous functions on Z and P := P(Z) the set of probability measures, i.e. the subset of Radon measures p on Z with p ≥ 0 and p(Z) = 1. A family T(t), t ≥ 0 of linear bounded operators in C is called a Markov semigroup if it is a semigroup, i.e. if it satisfies it is positive T(t) ≥ 0 in the cone sense of C and 1, the constant function is a fix-point of T(t) for all t ≥ 0, T(t)1 = 1. We refer to [1,5]. The semigroup property is often called Markov property and it is equivalent to the assumption that the trajectory depends only on the present time point and not on the past. A linear operator A on C is called Markov generator if it is the generator of a Markov semigroup, i.e. if g(t) = T(t)g 0 , where T(t) is a Markov semigroup. Then g(t) = T(t)g 0 is the solution of the equatioṅ for an initial value g 0 from the domain of A. This equation is called backward Chapman-Kolmogorov equation. A MP is the result of the action of the adjoint semigroup T * (t) at a probability measure p 0 , i.e. p(t) = T * (t)p 0 . Any MP has at least one stationary probability measure µ ∈ P. It satisfies T * (t)µ = µ for all t ≥ 0. This is a consequence of the Markov-Kakutani Theorem. The stationary probability measure µ is an element of the null-space of A * . In this paper we consider continuous-time MPs on discrete state spaces. Z = {z 0 , ..., z N } is a finite set of N + 1 states. In this case, we have C = R N +1 and P is the simplex of probability vectors P := Prob({z 0 , . . . , z N }) := {p ∈ R N +1 : p i ≥ 0, N +1 i=0 p i = 1} and a subset of R N +1 , too. A Markov semigroup is a real matrix family T(t) on R N +1 with positive entries and row sum 1. Its adjoint is the transposed matrix family T * (t). A MP is p(t) = T * (t)p 0 , where p 0 is some given probability vector. It satisfies the set of equationṡ where A * is the adjoint of the corresponding Markov generator. This equation is called forward Chapman-Kolmogorov equation. In contrast to equation (3) describing the evolution of moment functions, equation (4) describes the evolution of probability vectors. This means that one component of the vector p(t) can be understood as the probability of the corresponding state, regardless of the probability of the other states.
It is well known that equation (4) has a unique solution p(t) ∈ P if and only if the off-diagonal elements are nonnegative and the columns of A * sum up to zero. Thus, for A MP in R N +1 allows for different physical interpretations. Apart from the canonical interpretations as a probability vector, it can be understood as some concentration or amount of N + 1 different materials. We will follow this interpretation and will assume that this amount of materials is represented by particles of different types. These particles can transform into each other, changing their type, which can be understood as a linear reaction. The entries of the Markov matrix A ij describe the rates of transforming particles of type z j into particles of type z i . Therefore, if we are only interested in the amount of material of one type, it is enough to consider the corresponding component of the vector p(t) only. The initial amount of material is p 0 . Since A is a Markov generator, positivity of the concentration and the whole mass is conserved. If a Markov generator A = (A ij ) and its stationary state µ = (µ i ) satisfy A ij µ j = A ji µ i for any i, j ∈ {1, . . . , n}, it is said that the corresponding MP has the detailed balance property. It is equivalent to the case that the matrix (A ij ) is symmetric in the L 2 -Hilbert space over µ. Such a matrix has to have real eigenvalues. We remark that the opposite is not true in general: A Markov process without the detailed balance can have real eigenvalues, too. Moreover, there can be no Hilbert space at all, where it is symmetric. From a physical point of view, the condition A ij µ j = A ji µ i means that any transition z i ⇔ z j is in a local equilibrium. Thus, the detailed balance case is easier to analyze but it rarely appears in general. The systems that we consider do not have the detailed balance property in principle.

What our paper deals with
In this paper, we connect the concepts of Markovian dynamics and non-Markovian dynamics, which seem to be different at first glance. Starting with a MP of a special form, we conclude a ME for the first coordinate. The ME is a scalar differential equation, but our considerations can also be applied to PDEs. The resulting MP can be physically understood; the ME is governed by a kernel which is the sum of exponential functions. Then another path is taken: Starting with a ME with an exponential kernel, we find a MP where its first component again yields the ME. The other components can be understood as hidden degrees of freedom that have to be included in a complete description of the problem. This procedure is not unique and thus, it cannot be said that the hidden degrees of freedom are real physical variables. On the other hand, the construction of the MP out of the kernel is intuitive since the kernel is approximated by its moments. This method can be used to approximate a general positive kernel taking the enlargement of the MP into account. The simple case of two and three states is presented in chapter 2. In this case, all solutions and kernels can be calculated by hand. In chapter 3 we consider the general case. The main theorems are stated here. The method has many physical and mathematical advantages -both for the theory of MPs and MEs. We want to highlight only two of them. Firstly, the modeling of a kernel for ME is usually done by heuristic arguments. The method presented here can be used to model kernels in a more convenient manner, since the MP has an underlying physical meaning. Moreover, the modeling of the beginning of the process is also done. Secondly, the asymptotic behavior of a non-autonomous differential equation can immediately be calculated from the Markovian dynamics. The paper concludes with chapter 4. Here we note the connection to delay differential equations, where the kernel is highly degenerate. This is also reflected in the setting of MP: The underlying Markov generator has a very special form. We observe that the solution of the ME converges to the equilibrium of the MP. The spectral functions of ME and MP also converge. Summarizing, we have the following connection of modeling levels: Here MP' is a Markov process with a larger number of degrees of freedom. It is well known that a linear delay equation with delay T in a state space X can be regarded as an autonomous equation in a much larger space C([−T, 0], X), see e.g. [5]. There, the evolution of the delay equation is described by a semigroup of linear operators. This approach is not our aim in this paper. In our setting, the space of the MP' is not so large typically. Notion: In this paper, the Laplace transform is frequently used. Some properties are summarized in the appendix. MEs of convolution type have the important property that the Laplace transform maps them into multiplication operators. The Laplace transform L(u) of a real valued function If there is no confusion, we omit the 'hat' onû and just write u or u(λ). Some analytical tools concerning Lagrange polynomials and simplex integrals are presented in the appendix, too.

Some simple Markov processes and memory equations
Before starting the general theory, we firstly present the basic ideas focusing on simple low dimensional examples -MPs with two and three states. Apart from the sake of simplicity nearly all phenomena of the general theory are eminent.

Two states
We consider a MP on a state space of two abstract states {z 0 , z 1 }, generated by the Markov generator The matrix A * describes the switching between the two states with given rates a ≥ 0, b ≥ 0. We can think of an amount of matter, represented by particles, which can occur in two types. For some reason we are interested only in particles of the first type.
The equation describing the evolution of the vector p = (u, v) readsṗ = A * p with p(0) = p 0 . We assume that in the beginning the total mass is concentrated in the first variable, i.e. p 0 = (u 0 , 0). In other words, all particles have type z 0 .
It is unique unless the non interesting case a = b = 0. Any MP with two states has the detailed balance property. For (u, v) the system reads as Using the Laplace transform and writing u(λ) = L(u(t))(λ) and v(λ) = L(v(t))(λ), we obtain a system of equations for (u, v) in the form This yields an equation for u in the form Using the inverse Laplace transform, we obtain a Memory Equation for u The kernel K(t) = be −bt describes a dependence of the current state from previous time moments. For b −→ ∞, K(t) tends to δ(t) and the equation becomesu = 0. Thus, the right hand side of equation (7) consists of two terms, the first one, −au describe an exponential decay, whereas the second one, the memory term describe an opposite effect: Particles that disappear, occur after a while. Kernel be −bt for b = 1, . . . , 10 The time that passes between disappearing and reappearing, decreases with 1/b. In the end, not all matter disappears like in a pure equationu = −au but an equilibrium between disappearance and reappearance arises. The same effect is caused by the MP, changing the type of the particles. The particle changes the type from z 0 to z 1 with rate a ≥ 0, it seems to disappear, if we look only at type z 0 . After a while it re-changes to type z 1 (it occurs) with rate b ≥ 0. This give the exponential time behavior e −bt (corresponding to the memory kernel K(t) = be −bt ), characteristic for MPs. The equation (7) -or equivalently the system (6) -can be solved explicitly. We obtain for the Laplace transform and for the solution itself The solution tends to an equilibrium state u ∞ = b a+b u 0 , the first component of the stationary solution µ. It is not possible to calculate it from the memory equation (7), directly. Settingu = 0, the equatioṅ does not have any solution at all. Passing to the limit t −→ ∞ (and rewriting at first Investigating only the solution of the memory equation, it is not clear why the trajectory u(t) stops in u ∞ . Whereas looking from above, the trajectory (u(t), v(t)) has to stop at the stationary state µ, the intersection of the subspace

Three states
A general memory kernel has not be concentrated in t = 0. It can describe a transfer of mass from a very earlier time. It seems that this situation can be modeled by transitions between many quasiparticles before it appears at its starting type again.
To understand the action of such a transition loop, we investigate in detail a special case of three states, namely the transformation of a fixed particle (type z 0 ) in two different quasiparticles. One of them (type z 1 ) can be transformed back into type z 0 immediately, whereas the other (type z 2 ) can be transformed back into type z 0 only by two steps, changing at first to type z 1 . This process is illustrated in the picture.

From Markov to Memory
The simple MP on a state space of three abstract states {z 0 , z 1 , z 2 } is described by the Markov generator Note, this is a Markov generator depending on four rates. A general Markov generator on R 3 depends on six rates. The stationary state µ is the solution to A * µ = 0 and can be calculated easily as The eigenvalues (they have always non-positive real part) of the matrix are λ 0 = 0 and Depending on a 1 , a 2 , b 1 , b 2 the eigenvalues can be real (e.g. λ 1 = −5, λ 2 = −11 for a 1 = 2, a 2 = 5, b 1 = 8, b 2 = 1) or complex (e.g. for λ 1,2 = −9 ± 2i for a 1 = 2, a 2 = 5, b 1 = 8, b 2 = 3). (By the way, these are suitable values for an explicite solution with rational terms, only.) This MP has the detailed balance property, if b 1 b 2 a 2 = 0, which is not interesting, since the coupling chain is broken. Roughly speaking, the detailed balance property means that for any loop in one direction there is a loop backwards with the same product of the rates. But this is not the case in our model. Thus, the MP under consideration violate the detailed balance property, generically.
The stationary state is unique if and only if the real parts of λ 1,2 are strongly negative. Or, Since the a i , b i are non negative, this is a non interesting case that we exclude. Then, the stationary state is the equilibrium state for any initial value. Note, that nevertheless some of the a i , b i might be zero. As in the case of two states, we are interested only in the state z 0 of the system and ask for an evolution equation of this state. To do this, we introduce the notion p = (u, v 1 , v 2 ) and look for the evolution of u with an initial state p 0 = (u 0 , 0, 0). This is naturally, since the states z 1 and z 2 are unknown, and there is no reason to assume that particles with z 1 , z 2 exist in the beginning. Equation (9) is now equivalent to the system .
Passing to the Laplace transform, we obtain with u = Lu, v i = Lv i the system or equivalently, introducing a = a 1 + a 2 , we get Here, v 1 and v 2 can be eliminated as We conclude the following equation for u This is an equation for the first state, only. It can be solved explicitly with respect to u. But, at this moment, this is not our aim. We are looking for an equation for u. We write and, after transforming inverse, we get an equation for the function u(t), namelẏ where So, we obtain a memory equation with the kernel K. This equation describe the evolution of the first state of our physical system, depending on the whole past from 0 to time t. Obviously, this dependence is a result of the projection, since nothing else had be done. Thus, u(t) is the solution of two equivalent equations, a memory equation and a component of a Markov system.
The kernel K(t) = a 1 K 1 (t) + a 2 K 2 (t) is the sum of two parts each of them is obviously positive . If we denote m i = ∞ 0 tK i (t)dt the mean time of a kernel, we have The first kernel K 1 describes a memory effect with small mean time and correspond to a small loop z 0 −→ z 0 in the MP. The other kernel K 1 describes a memory effect with longer mean time and correspond to a longer loop z 0 The relative coefficients a i /a form a convex combination. The transitions z 0 a i −→ z i split the whole number of particles in parts according to the loops. Let us summarize some properties of the kernel K(t).
• K(t) is the sum of exponential decaying functions, where the exponents are diagonal elements of A.
• The arising memory equation is (11) with a = N i a i or, equivalently, k(λ = 0) = a Equation (10) can be solved explicitely: To get an explicite term for u(t) we have to factorize the denominator what leads -of course -to the same time behavior as determined by the eigenvalues for the MP. We compute the asymptotic behavior of the solution u(t), using the asymptotic properties of the Laplace transform. We obtain for the equilibrium state For the other components we get in the same manner These are the parts of the initial mass that remain in the states z 1 and z 2 .

From Memory to Markov
Now, we go the opposite direction and start with a kernel that is the sum of two exponential decaying terms, i.e.
with some real coefficients c 1 , c 2 . We assume c i = 0, otherwise we are in the case of 2 states. For definiteness, we assume α 1 > α 2 > 0. The α i has to be strongly positive, otherwise we have no decreasing of the time dependence of the past. This kernel has to be written in the form (12) with positive coefficients. We have Thus, we have to demand c 1 + c 2 ≥ 0 and c 2 ≥ 0. Both are consequences of the positivity of K(t), setting t = 0 and t −→ ∞. Now, the MP is easily constructed. We set The entries of the matrix b 1 , b 2 , a 2 are strongly positive, a 1 is non negative. This guarantees the uniqueness of the stationary solution. Moreover, it violates the detailed balance property. The existence of a positive equilibrium is fulfilled, we have the equatioṅ with the property of consistency k(0) = a = a 1 +a 2 = α 1 c 2 +α 2 c 1 α 1 α 2 . Summarizing, we get the following result: Proposition 2.1. The first component of the MP generated by A * given by (8) is the solution to the ME (11). For a MEu = −au+(K * u) with a kernel (14) with parameters c 1 , c 2 , α 1 , α 2 satisfying α 1 > α 2 > 0, c 1 + c 2 ≥ 0 and c 2 ≥ 0, it can be constructed a three dimensional MP, where the first component coincides with the solution to the ME.

General Memory Equations as Markov processes
In this chapter, we generalize the ideas from the last chapter to an arbitrary finite dimensional MP. Firstly, we show that the first coordinate of a special MP, consisting of different transformation loops, satisfies a suitable memory equation with a more or less general kernel. Then, we go the opposite direction: We show that a ME with a kernel of a special form yields the MP we started with. The construction of the MP is explicitly.

From Markov to Memory
We consider a MP of N + 1 abstract states {z 0 , z 1 , . . . , z N } of the following form where a j ≥ 0 and b j > 0 for j = 1, . . . , N are non negative rates and we set a := N j=1 a j . The condition b j > 0 is reasonable, since otherwise the loop is broken somewhere. The process p(t) is generated by the equationṗ = A * p. We set p = (u, v 1 , . . . , v N ) and understand this quantity as the concentration of some particles. We assume that for t = 0 the total mass is concentrated in the first coordinate, i.e p 0 = (u 0 , 0, . . . , 0). The equation conserves positivity of p and the whole mass u + v 1 + ... + v N = u 0 . Thus, p is a vector on the positive simplex in R N +1 , intersected by the hyperplane u + v 1 + ... + v N = u 0 . Of our interest is the first component, i.e. the amount of matter of particles of type z 0 . A * is the generator of a special type of MPs. It describe the change of types in the following way: Particles of type z 0 can changes their type to type z i with rates a i . The change of a particle of type z i back to type z 0 does not go in a direct way, but in i steps. Thus, we have an interaction between the N + 1 types in N loops (see the picture).
Easy calculations show that the stationary solution µ satisfying A * µ = 0 has the form where Z is the suitable normalization such that N j=0 µ j = u 0 . Obviously, For the zeroth coordinate we have Since any b j > 0, this stationary solution is unique and it is the equilibrium state for any initial condition. Let us check, whether detailed balance with respect to µ is satisfied. We have to check, that A ij µ j = A ji µ i . Since A 1j µ j = A j1 µ 1 = 0 for j ≥ 2, we obtain that a 2 = a 3 = . . . a N = 0. Hence, the evolution of the states z 2 , . . . , z N is not coupled to the evolution of z 0 and z 1 . In this case, we get N = 1, the two dimensional case, where every MP has the detailed-balance property. That means, apart from trivial situations, the MP under consideration does not have the detailed balance property.
The equationṗ = A * p is equivalent to the following system for p = (u, v 1 , . . . , v N ) Using the Laplace transform, we get the following equation for (u, v 1 , . . . , v N ) This yields for u We define the kernel and hence the equation for the Laplace transformed variable u reads Now, we formulate the memory equation in terms of t ≥ 0 and some properties of the kernel. For this purpose, we introduce some quantities, connected with Lagrange polynomials (see the appendix for details) with different support points b 1 , ..., b N . Let From the theory of Lagrange polynomials it is well known that Using this, we can transform k j (λ) back and obtain The assumption b i = b j for i = j is not principal. If some or all b i coincide, all formulae of the following can be obtained by some suitable limits. This is obviously done for the Laplace transform k(λ). For K(t) we get more complicated terms, involving not only exponential but also polynomials with degree, depending on the frequency of the b i . We do not bore the reader with this technical complexity, since this is well known in the theory of Lagrange polynomials. Moreover, from a practical point of view, in a generic Markov matrix all entries can be chosen differently. Surely, a different situation is, if the modeling requires equal b i . This is the case for instance for DDEs. The case is considered in detail in chapter 4. Now, we are ready for the following where t and a = j a j = k(0). Moreover, K(t) ≥ 0 and u ∞ = 1/Z with Z given by (16).
Proof. From the definition of k(λ) it is clear that u(λ) defined by the MP is the solution to (18). If the inverse transformed function t → u(t) is regular enough, it is solution to (21). Rewriting (17) as Since the k j (λ) are analytical functions and bounded on the right plane, so is λu(λ). Hence from the properties of the Laplace transform it follows that u(t) is continuously differentiable. Thus, it solves (21).
To calculate u ∞ we use the representation (22) and investigate the behavior of k j (λ) for λ → ∞.
We have By definition a = N j=1 a j , and hence, it follows from (22) what is exactly the zeroth coordinate of µ, i.e. u ∞ = 1/Z. The positivity of the K j (t), t ≥ 0 follows from their representation with simplex integrals (see the appendix). We have with f (x) = e −xt and α, s = α 1 s 1 +α 2 s 2 +. . .+α j s j . Since (−1) j−1 f (j−1) α, s t = t j−1 e − α,s t ≥ 0 and any a j ≥ 0, we conclude the positivity of K j (t) and therefore also K(t) ≥ 0. This completes the proof of the theorem.

From Memory to Markov
We consider memory equations of the forṁ where a > 0 is a real parameter and K is a positive kernel. The aim is to embed the evolution of u into a MP introducing new variables. Our main assumptions are K(t) ≥ 0 and ∞ 0 K(t)dt = a. Clearly, starting with some given K(t) we want to end up with a kernel of the shape (19-20). Then going forward to a kernel like in (17), the entries of the Markov generator matrix can be taken immediately. The kernels (20) are positive although this are linear combinations of exponential with -maybenegative coefficients.
It may seem that any nonnegative kernel K(t) can be presented in such a form. But this is not the case. We show this in a Counterexample: Let the resulting system for the coefficients leads to We think, there is no hope to find a corresponding MP for an arbitrary nonnegative kernel. Therefore we go another way and try to derive a class of sensible kernels starting from physical considerations. Furthermore, the following reasoning shows how the time interval of the memory effect is connected with rates of the loops of the MP. First of all we have to ask: How one can model a meaningful kernel for a ME. We can assume that the dependence on the past is concentrated at some time point before the present, say t − t 1 , ... t − t N where t j are ordered time values, i.e. 0 < t 1 < t 2 < · · · < t N , with some coefficients γ 1 , ..., γ N with γ i ≥ 0 and γ i = 1 that gives the relative proportion of each time point. The corresponding memory kernel of such an ansatz is (here δ means the "δ-function", the "density" of the Dirac measure). The kernelK occurs when starting from a discrete time model, like equation (2). Clearly, it is a first guess. A real memory kernel seems to be more smeared. Therefore, we can try to find kernelsK j (t) with mean time at We will show that such kernelsK j (t) can be found and it is possible to find a suitable MP for them. Note, that this does not determine the kernelsK j uniquely, of course. We show that our kernels of shape (19) are suitable for this. Proposition 3.2. Let a sequence 0 < t 1 < t 2 < · · · < t N < ∞ be given where (t j − t j−1 ) are pairwise distinct. There are kernels K(t) = N j=1 a j K j (t) such that K ≥ 0 and ∞ 0 K(t)dt = a and ∞ 0 tK j (t)dt = t j .
Proof. We define b j ∈ R via t i = i j=1 1 b j . Since the t i are ordered, we get b j > 0. Since (t j − t j−1 ), the b j are pairwise distinct. We define We prove that K satisfies the desired properties. Using the Laplace transform, we get

This yields
and α = j α j . Let u be the solution to the equationu(t) = −αu + K * u with u(0) = u 0 . Then, there is a MPṗ = A * p in R N +1 generated by a Markov matrix A and an initial condition p(0) such that u(t) = p 0 (t).
Proof. Define the Markov generator matrix via a = α, a i = α i , b i = β i . The initial condition for the MP is p 0 = (u 0 , 0, . . . , 0). The claim follows.
For the asymptotic behavior of the ME, we immediately get the following statement.
Let u be the solution to the equationu(t) = −au + K * u with u(0) = u 0 . Then u(t) → u ∞ as t → ∞, where u ∞ = 1 Z u 0 and Z is given by (16).

Remarks
with suitable chosen m i ∈ N may approximates a δ-kernel better. Especially it allows to take into account more moments then only the first one, or equivalently to allow the b i to be equal. This is possible without any principal problems (see the note above Theorem 3.1). A special case is treated in the next chapter, where one delay is approximated arbitrary precise. To prove positivity of the corresponding functions Lemma 5.1 from the appendix can be used.
Kernels like in (17) are rational functions of degree N , having poles on the left plane. They approximate meromorphic functions. This makes one able to consider more general kernels then linear combinations of exponents -at least approximately.
2. There are other (similar) MP that lead to a ME and vice versa. For example the MP with the generator can also be used for embedding the presented exponential kernels. Such MP can be understood in the same manner like at the picture on page 14 but with reversed arrows. Although this approach is more difficulty from a technical point of view.
3. The presented results can be applied in various manner. We focus on ordinary differential equations to present the general idea. Linear MEs in infinite dimensional space like diffusion equations with time depending diffusion coefficients are also possible.
Moreover, the well known tools for investigating MP, like inequalities for Lyapunov functions (see [14]) can now be carried over to explore ME.

Special Markov process leads to a Delay Differential equation
In this section we consider a special form of the MP. We define a j = 0 for j = 1, 2, . . . , N − 1 and put a N = a and b j = b ∈ R. Using the observation from the last section we consider a general cyclic MP with one single but long loop. The MP in R N +1 is generated by the matrix We assume the initial mass is concentrated in the first reservoir. Then, the equation readsṗ(t) = A * p(t) with p(0) = p 0 , where p = (u, v 1 , v 2 , . . . , v n ) T and p 0 = (u 0 , 0, . . . , 0) T .
The stationary solution is where Z = 1 a + N b = b+aN ab . Note, the system does not have the detailed balance property. We get Hence, we geṫ where we introduced the kernel  Moreover, for t > T we have Hence, Putting b = N T , we approximate the Laplace transform of the kernel δ T , i.e.
Hence, we conclude and the limiting (DDE) reads aṡ Let us note that the initial condition u| [0,T ] (t) = e −at u 0 results from the modeling ansatz. No other initial condition is possible. Let us compute the limiting stationary solution for N → ∞ of the first coordinate of the MP. This means the MP has long loops, but mass is transferred with a high rate. We have Z = N a+b ab . Putting b = N T , we conclude for the zeroth coordinate of the stationary solution The solution of the DDE and the stationary solution µ 0 of the MP can be seen in the picture. The solution of the DDE converges nicely to µ 0 . Finally, we remark some properties of the spectrum. The spectrum of the DDE is given by inserting e λt for λ ∈ C into the equation (see e.g. [12]). This yields for given a, T ≥ 0 the equation This transcendental equation (in λ ∈ C) has in general an infinite discrete amount of solutions. The eigenvalues of A * for fixed N ∈ N are given by the characteristic equation that can be computed easily. Hence, setting b = N T we get φ(λ) = 0 if and only if For N → ∞, right hand side converges to e λT . So, in the limit λ ∈ C satisfies the equation i.e. the same equation as (24). Hence, one can say that not only the solution converges but also the spectrum of the MP and of the ME converges to each other. Note, that the convergence of the spectrum is very slow, as the convergence of the exponential function is.

Laplace transform
Here, we summarize some facts of the Laplace transform. More details can be found, e.g. in [15]. For a given function u : [0, ∞) ∈ t → u(t) ∈ R that does not grow faster than an exponential function in time, the Laplace transform is defined bŷ u(λ) = (Lu)(λ) = ∞ 0 e −λt u(t)dt.
The Laplace transform has an interesting asymptotic behavior. The limit for large times u(t) t→∞ −→ u ∞ can be calculated with the Laplace transform. It holds λû(λ) λ→0 −→ u ∞ . Thus, there is no need to know the whole solution u(t) if one is interested only in the equilibrium case. This is important, since, in general for non-autonomous equations, the equilibrium case cannot be calculated by settingu = 0. Let us note that the uniform convergence on compact sets of t ∈ R + carries over to uniform convergence on compact sets of λ in the domain of analyticity. To carry over positivity properties between the original and the transformation the following lemma is useful: Proof. Let K(t) ≥ 0. Since K(0) ≥ 0, we get N j=1 γ j ≥ 0, i.e. the claim holds for m = 0. For m ≥ 0, we get 0 ≤ γ j e −α j t , as n → ∞, which proves the claim of the lemma.

Simplex integrals
In Theorem 3.1, we proved the positivity of the kernel K(t) using an integral over a simplex. This is based on the following observation. Let S n−1 ⊂ R n be the simplex, defined as S n−1 = {s ∈ R n | s i ≥ 0, s 1 + ... + s n = 1}.
We consider functions g : R n −→ R and their integrals over S n−1 . We have where σ(ds) is the Lebesgue measure on S n−1 and √ n is the volume of S n−1 . Let f : R −→ R be a smooth enough function, f (k) its k-derivative and x 1 , ..., x n be given different real values. Set g(s) = f ( x, s ), where x, s = x 1 s 1 + x 2 s 2 + . . . + x n s n is the scalar product in R n . Now, using induction one can prove that This formula gives a powerful tool to switch between expressions connected with Lagrange polynomials and expressions connected with simplex integrals. In Theorem 3.1, we used this formula with f (x) = e −xt .

Lagrange polynomials
Here we summarize basic facts from the theory of Lagrange polynomials. Let is a polynomial of degree j − 1 and we have L j i (x k ) = δ ik with δ ik the Kronecker symbol. Hence, the polynomial of degree j − 1 satisfy P (x i ) = p i . Now, let us fix z ∈ R. Seeking for a polynomial P (x) = q 0 + q 1 x + ... + q j−1 x j−1 with the condition P (x i ) = p i = x i z+x i , we get coefficients q i with q 0 = j i=1 x i z+x i among them. Hence, we have on the one hand and on the other hand Note, in our explanation we use ψ j i = (−1) j−1 L j i (0) and put z = λ.