CONVERGENCE RATES FOR SEMISTOCHASTIC PROCESSES

. We study processes that consist of deterministic evolution punctuated at random times by disturbances with random severity; we call such pro- cesses semistochastic. Under appropriate assumptions such a process admits a unique stationary distribution. We develop a technique for establishing bounds on the rate at which the distribution of the random process approaches the stationary distribution. An important example of such a process is the dynamics of the carbon content of a forest whose deterministic growth is interrupted by natural disasters (ﬁres, droughts, insect outbreaks, etc.).


1.
Introduction. This line of research began due to a question from an ecologist: How should one model the carbon content of an ecosystem that experiences randomly occurring catastrophes of random severity? The role of disturbances such as droughts, forest fires, and insect outbreaks on the dynamics of carbon has been discussed in [46], [38], [9], and [44]. In the absence of disturbances, the amount of carbon in an ecosystem increases naturally due to photosynthesis and eventually approaches the carrying capacity of the ecosystem. On occasion, however, an extreme event results in significant destruction of an ecosystem and consequently a drastic reduction in the amount of carbon stored in the ecosystem.
In order to model the carbon content of an ecosystem, continuous time continuous state space semistochastic processes were studied by Leite, Petrov, and Weng in [31] and formulae were derived for the densities of the corresponding stationary distributions. Semistochastic processes like the one studied in [31] are a particular case of the so-called piecewise deterministic Markov processes (PDMPs). The dynamics of PDMPs lacks a diffusive component, and has been used in applications to growthfragmentation processes, storage models, exposure to contaminants, communication networks, among others (for recent results and references see, e.g., [32,7,30,14]). A general framework for studying PDMPs has been developed by Davis [17] (see also his book [18]).
Given that a stationary distribution exists (and can be calculated explicitly as in [31]), a natural question is: At what rate does the process approach its stationary distribution? If the time-dependent distributions are absolutely continuous, then one approach to resolving this would be to study the evolution of the corresponding time-dependent densities which is governed by an integro-differential PDE. An alternative and more general approach is to analyze the time-evolution of the corresponding distributions through their action on observables, which is the picture dual to using the integro-differential PDE. In this paper we adopt the latter approach which works also for distributions that are not absolutely continuous.
In this paper, we utilize purely probabilistic methods to establish explicitly computable bounds on convergence rates; consequently, our methods for determining convergence rates, are quite different from the methods used in [31] to develop exact formula for the stationary distributions. The methods we use originated in the study of discrete-time Markov chains and are based on establishing a combination of minorization and drift conditions. These approaches go back to Doeblin, and appear in various forms in [33,36,43,40,41]. For an overview of more recent work see, e.g., [45,32]. Roughly speaking, minorization conditions are bounds on the probability of transitioning in one step from any initial value to some specified region in the state space. Drift conditions, on the other hand, need to be applied when the state space is unbounded and the stochastic process may drift arbitrarily far away. The drift conditions allow us to control the process in some bounded set while also keeping track of the probability of the the process drifting out of the set. For detailed description of the minorization and drift conditions see Section 3.2, in particular, equations (19) and (20).
While the problem of modeling the carbon content of an ecosystem was the original inspiration for this project, our work can be applied to any problem admitting a semistochastic model, i.e., population dynamics, optimal harvesting, virus reproduction, and some of the problems mentioned previously in this Introduction.
What follows is a brief introduction to the concept of a semistochastic process. By semistochastic process we mean a continuous-time, continuous-state process {X t }, with state space X , which consists of intervals of deterministic evolution punctuated by random events. The random events we typically consider occur on time-scales much larger than the typical inter-event time, and are modeled as instantanteous events. These processes are assumed to be doubly-stochastic in the sense that there is a random severity associated to each event as well as the random time at which it occurs. Consequently, these types of processes are quite different from other types of stochastic processes and can be used to model dynamical systems that lack conservation laws, see [17,42]. Semistochastic processes do share some common features with what are typically referred to as stochastic clearing processes, see [47]. A clearing process, however, consists of epochs of random growth punctuated by instantaneous returns to the initial value once a critical threshhold is reached. A semistochastic process replaces the random growth in a clearing process with determnistic growth and replaces the deterministic "clearing" with randomly occurring disturbances.
The operator-theoretic framework which we set up to study the dynamics of semistochastic processes applies equally well to both scalar-and vector-valued stochastic processes, but we restrict our attention to scalar processes when deriving bounds on convergence rates. In the scalar case we are thus interested in sample paths that are piecewise continuous, right-continuous, and have left-hand limits almost surely (càdlàg). We furthermore focus our attention on disturbances that correspond to a diminishing in value. The model that one should have in mind is the carbon content in a forest that grows deterministically and is interrupted and random times by natural disasters which reduce the amount of carbon. We should note that the techniques we use can be adapted to handle more general disturbances as well.
In the time between two consecutive disturbances, {X t } evolves deterministically, governed by the autonomous ordinary differential equation d dt To describe when the disturbances occur, we specify a rate function Λ(x) which is a measure of the instantaneous rate of occurrence of the disturbances. We refer to Λ(x) as the jump rate for the process. Our problem gains another element of randomness from the varying severity of the disturbances. In order to describe this severity, we introduce random variables Y − n and Y n corresponding to the n th pre-and post-disturbance values, respectively. If the n th disturbance occurs at time T , then Y − n and Y n are defined via In the simplest case, we can then model the severity by stipulating a multiplicative relation between Y − n and Y n . An additional random variable, A n is then defined by setting x(t) Figure 1. Schematic for pre-and post-disturbance levels.
Having specifed the types of processes we propose to study, we now mention some works that study similar processes, but usually under different assumptions or with different goals. The most common difference is due to the fact that most of the research on semistochastic processes is concerned with population dynamics, and demographers generally study processes with discrete state-spaces.
An interesting application of semistochastic processes is proposed by Bartoszyński in [5] to model the development of the rabies virus in an infected host. In this model, the population of the virus naturally decreases exponentially due to the immunological response of the host, but also has random upward jumps due to the viral life cycle. The state space of the model Bartoszyński constructs is discrete and the occurrence of jumps is allowed to depend on the current population.
Continuous-time and continuous state space processes subject to random catastrophes are studied by Gripenberg in [21]. Gripenberg derives an expression for stationary distributions using a limit theorem from [1] based on the concept of Harris recurrence. There is a connection between the type of recurrence condition that is established in [21] and the minorization conditions that we establish, however the issue of convergence rates is not addressed by Gripenberg. Biedrzycka and Tyran-Kamínska [8] use operator-theoretic techniques to address the question of existence of invariant densities for similar processes.
Hanson and Ryan in [23] and [24] examine optimal harvesting problems of populations governed by similar processes with discrete state spaces, though they do not allow for the randomization of the severity of disturbances. They do, however, consider the possibility of populations experiencing both sudden decreases (jumps down) and sudden increases (jumps up). With slight modifications, the results we present can also be applied in these situations. Hanson and Tuckwell also study similar processes with discrete state spaces in [25,26,27], though their focus is generally on the computation of extinction times. The problem of determining extinction times in semistochastic models is addressed more recently by Cairns in [13].
Transient distributions in discrete time discrete state space processes, e.g., a birth/immigration-death process with binomial catastrophes, have been studied in [20,28].
Recent work [10,3,4] develops inverse techniques for growth-fragmentation phenomena (like cell division and polymerization) that are quite similar to our model. In particular, these authors propose a method for calibrating the jump rate from empirical measurements.
The plan of our paper is the following: in Section 2 we state our main results on convergence rates, in Section 3 we provide proofs of our results by establishing a combination of minorization and drift conditions, we conclude with Section 4 in which we apply our results to concrete examples.
2. Statements of the main results. We start by revisiting the properties of the semistochastic process {X t }. In the time between two consecutive disturbances, X t evolves deterministically, governed by the autonomous ordinary differential equation We assume throughout that the vector field v(x) admits a unique global solution to (2) for any initial value x(0); global Lischitz continuity of v(x) is a sufficient condition. The corresponding flow of (2) with initial condition x(0) = x 0 is denoted by φ t (x 0 ), and the time duration needed to deterministically evolve from x 0 to x 1 > x 0 is denoted by ψ(x 0 , x 1 ). Thus, in the absence of disturbances, we have We assume that the occurrences of disturbances have a distribution related to a jump rate parameter Λ(x) given by as ∆t → 0. Furthermore, to determine the severity of individual disturbances, we define the multiplier density, ρ(x, α), with the property that for any a ∈ (0, 1) where Y n is the n th post-disturbance random variable and Y − n is the n th predisturbance random variable. It will be convenient to introduce the quantity ζ(x) to denote the expected fractional loss resulting from a single disturbance, Thus larger values of ζ(x) correspond to an expectation of more severe disturbances and the limiting value ζ = 0 would result in purely deterministic growth. All of this can be consolidated by specifying the infinitesimal generator L of {X t }. The action of L on observables f from the appropriate Banach space is then given by Corresponding to the generator L is a Markov semigroup U t which can be specified by its action on observables: If the distribution of X 0 is µ 0 , then the distribution of X t is In order to quantify the convergence rates, we use the total variation distance d TV defined for any two distributions, µ 1 and µ 2 , by We are now ready to state our first result.
Then {X t } converges exponentially fast to its unique stationary distribution π. Namely, for any time increment ∆t > 0, and any initial distribution µ 0 , where ∆t := and φ and ψ defined in (3).
The bound on the rate of convergence given by (7) can be optimized by choosing a value of ∆t that makes this bound tighter. It is intuitively reasonable to expect that a value of ∆t that minimizes (1 − ∆t ) t/∆t should exist. If ∆t is too small, then a disturbance is unlikely to occur in the short time interval of length ∆t. On the other hand, if ∆t is chosen too large, then we do not control the process over a long time interval during which many disturbances of varying severity may occur which would make it impossible for us to use any features of the deterministic growth. Put differently, the optimal value of ∆t should correspond to appropriate balance between the stochastic and the deterministic components of the dynamics -for ∆t too small, we observe only the deterministic component, while for ∆t too large, we observe mainly the stochastic one. The mathematical intuition behind the existence of an "optimal" value of ∆t can be seen from the text in Section 3.3 and, in particular, from Figure 2.
The general strategy of the proof of Theorem 2.1 is the following (the complete proof is given in Section 3). We begin by discretizing the process by fixing a ∆t > 0 and studying the resulting discrete-time Markov chain with transition kernel U ∆t . We then construct a uniform minorization for this discretization, which yields wellknown exponential bounds on the convergence rates. It remains only to apply the well-known fact that d TV (µ t , π) is monotonically decreasing in t to obtain bounds for the original continuous time-process.
While the restriction v(0) > 0 may seem unusual for biological models, it is a reasonable assumption for the carbon content problem since even in the event of a complete catastrophe, there is regrowth. The specific case of v(x) = 1 − x with state space X = [0, 1] was considered in [31] as a model for carbon content in an ecosystem and meets all conditions of our Theorem 2.1. In establishing a uniform minorization, it is essential that the state space be bounded. While this is the case for most applications, such as the carbon content problem, it is mathematically restrictive. Though the proof requires additional work, we are also able to state a result for unbounded state semistochastic processes.
Theorem 2.2. Let {X t } be a semistochastic process with generator (6) on the state space X = [0, ∞), satisfying (i) 0 < λ * ≤ Λ(x) ≤ λ * < ∞ for all x ∈ X , for some constants λ * and λ * , (ii) ρ(x, α) ≥ ρ * for all x ∈ X and α ∈ [0, 1], for some constant ρ * > 0, and vanishes for at most finitely many x. Then {X t } has a unique stationary distribution π to which it converges at an exponential rate. Namely, for any initial distribution µ 0 and any ∆t > 0, the estimate holds with Φ given by (9), ∆t,κ := where κ can be chosen to be any number satisfying Remark 2. The details of the proof of Theorem 2.2 are provided in Section 3. The strategy of the proof is similar to that of Theorem 2.1. The primary difference is that in this case we cannot establish a uniform minorization. Instead, we establish a combination of drift and minorization conditions which enables us to apply a result of Rosenthal [43] (see also [40,41]) to produce the desired bounds on convergence rates.
3. Proofs of the main results. The proofs of Theorems 2.1 and 2.2 follow similar ideas, so we develop them in parallel. In Section 3.1 we derive some inequalities about the Markov semigroup U t and relate the rates of convergence of the continuous-time semigroup U t and of its discretization (see (17) below) to the stationary distribution. We define the drift condition and state some results on minorization in Section 3.2. The bounds of the rates of convergence for bounded and unbounded state space are derived in Sections 3.3 and 3.3, respectively.
3.1. Some useful inequalities. We separate the infinitesimal generator into two components, L = L 0 + L 1 , where corresponds to deterministic evolution plus a loss term, and reflects the "gain". We introduce the semistochastic survival function, which represents the conditional probability of starting at x and evolving deterministically for time t with no occurrence of a disturbance. Then the sub-Markov semigroup U 0 generated by L 0 is The Markov semigroup U t can be computed iteratively, as given in the following Proposition 1. Let U t be a strongly continuous Markov semigroup with infinitesimal generator L and assume that L = L 0 + L 1 , with L 0 generating the sub-Markov semigroup U t 0 . Then the action of U t on an observable f can be decomposed into Proof. Let 0 ≤ s ≤ t, and recall that U 0 and U 0 0 are both identity operators. Then . Solving for U t above yields the result.
Combining this with the expression (6) for L, we have Noticing that in (16), U t 0 is positive, we obtain Lemma 3.1. If U t is a Markov semigroup with infinitesimal generator L (6), then Next, we establish an inequality linking convergence rates for continuous-time Markov processes to their discretizations. We discretize the continuous-time process {X t } by sampling it at times that are separated by time increments of fixed specified size ∆t. The choice of a constant separation time ∆t allows for straightforward comparison between the continuous-time process {X t } and the discretized process {X n ∆t } n≥0 . The optimal value of ∆t (recall Remark 1) can be selected in each particular example, as illustrated in Section 4.
then for any initial distribution µ 0 of X 0 , where n = t/∆t is the greatest integer less than or equal to t/∆t . Proof. Write t = n∆t + τ for 0 ≤ τ < ∆t, then for any observable f with 0 ≤ f ≤ 1, where we used the invariance of π and the fact that 0 ≤ U t f ≤ 1.

Minorization and drift condition.
A Markov chain X n with transition kernel Q on a state space X is said to satisfy a minorization condition on a subset A ⊆ X if there is a probability measure η on X , a positive integer n 0 , and a number > 0 such that for all x ∈ A and for any measurable set B of X . By appropriately redefining Q, we can write this condition as for any nonnegative observable f and for all x ∈ A. If in these conditions the subset A is the whole state space X , we say that X n admits a uniform minorization.
The following theorem can be found in [19] or [34].
Theorem 3.3. If there exists an n 0 ∈ N such that the transition kernel Q of a Markov chain on a state space X satisfies (18) for all x ∈ X and any measurable set B ⊆ X , then for any initial distribution µ 0 , the total variation distance to its unique stationary distribution π satisifes d TV (µ 0 Q n , π) ≤ (1 − ) n/n0 .
In the proof of Theorem 2.2 we need to impose an additional condition. A Markov chain X n with state space X satisfies a drift condition if there exists a nonnegative function V : X → R ≥0 , a number β < 1, and some finite b ∈ R such that for all x ∈ X . The function V has sometimes been referred to as Lyapunov function in the literature. When a uniform minorization is unavailable, one can first establish a drift condition, and subsequently minorize on a subset A of X , to obtain the following result proved in [43,Theorem 12].
Theorem 3.4. Suppose a Markov chain {X n } with transition kernel Q on a state space X satisfies a drift condition (20), and a minorization condition (18) on the set A = V −1 ([0, κ]) ⊆ X , for some κ satisfying (14). Then the Markov chain {X n } has a unique stationary distribution π, and for any 0 < r < 1 and any n ∈ N, we have for any initial distribution µ 0 with θ and Θ given by (12).

3.3.
Bounds on the convergence rates for bounded state space. In this section we present a proof of Theorem 2.1 for a semistochastic process with a bounded state space.
To discretize the continuous-time process {X t }, we fix a value ∆t > 0 and define the Markov transition kernel Q of the discretization {X n ∆t } via Q := U ∆t .
To establish a uniform minorization, we first note that, for any nonnegative observable f , we can apply Lemma 3.1 to conclude that Using the assumption that 0 < λ * Λ(x) ≤ λ * , we have for all x ∈ [0, k) .
Combining these inequalities with the bounds on ρ(x, α) and Λ(x) assumed in Theorem 2.1, we arrive at Changing the variable α to z = α φ s (x) and interchanging the order of integration, we have where have made use of the monotonicity of φ t (x), the boundedness of the state space X = [0, k] (for finite k), and the fact that v(0) > 0 to arrive at a uniform in x ∈ X positive lower bound for [Qf ](x). Multiplying and dividing by Φ (9), we obtain the uniform minorization (19) with = ∆t (8) and minorizing measure η (19) whose density is  (6): where ζ is defined by (5). From the conditions on v (10) and Λ, we thus have Recall that, for any observable f , the quantity is a martingale. Applying this for f = I, we obtain that, for any t ≥ 0, Setting u(t) = E[I(X t )|X 0 = x] = E[X t |X 0 = x] and writing [LI](X s ) explicitly from (22), we can rewrite (24) as an integral equation The sample paths are right-continuous, thus the right hand side of (25) can be differentiated with respect to t. Differentiating (25) and referencing (23), we have Rearranging this inequality and multiplying by the integrating factor e λ * ζ * t gives d dt e λ * ζ * t u(t) ≤ v * e λ * ζ * t , Therefore the expression in the parentheses is decreasing with t, so it must obtain its maximum on [0, ∞) at t = 0; recalling that u(0) = x, we have Solving for u(t) and setting t = ∆t produces the desired drift condition for the discretized process {X n ∆t }, as in (20) with V = I, β = e −λ * ζ * ∆t and b = v * λ * ζ * 1 − e −λ * ζ * ∆t . Having established the drift condition, we can minorize Q on [0, κ] for any κ < ∞ by using the same argument as in the proof of Theorem 2.1. In order to be able to apply Theorem 3.4, we additionally require that κ satisfy (14). To complete the proof of Theorem 2.2, we choose the value of r in such a way that the two terms in the right-hand side of (21) balance each other, which for large n gives us (1 − ) r = θ 1−r Θ r , which gives the expression (13) for r. In particular, with this choice of r, In this case (cf. (3)), From Theorem 2.1, for fixed ∆t and arbitrary initial distribution µ 0 , the following bound holds d TV µ 0 U t , π ≤ (1 − ∆t ) t/∆t (π is the unique stationary distribution). We have For convergence rates, the quantity of interest is (1 − ∆t ) 1/∆t (cf. (7)). For concreteness, take λ = 1. In Figure 3, we plot (1 − ∆t ) 1/∆t as a function of ∆t and observe that it exhibits a minimum at ∆t ≈ 0.82, for which ∆t ≈ 0.115. The intuitive reason for existence of such an optimal value of ∆t was discussed in Remark 1. Setting ∆t = 0.82, we obtain that, for any initial distribution µ 0 , the total variation ( 1 − ∆t ) 1/∆t Figure 3. Plot of (1 − ∆t ) 1/∆t vs. ∆t. distance between the time-evolved distribution, µ t , and the stationary distribution, π, satisfies the inequality It is worth noting that in this example, the bounds do not depend on the initial distribution, µ 0 . To illustrate the influence of the choice of ∆t on the convergence bounds, we plot (1− ∆t ) t/∆t as a function of t for several values of ∆t in Figure 4  which gives that the drift parameters are β = e −λ ∆t/2 , b = 2v λ 1 − e −λ ∆t/2 . To compute explicit bounds on the convergence rates, we need to select a size of the time interval ∆t as well as the value κ > 2b 1−β = 4v λ for which we will minorize the process on [0, κ]. In order to optimize our bounds, we select ∆t and κ so as to minimize the right-hand side of (11). One easily computes Φ = v(∆t) 2 and ∆t,κ = v(∆t) 2 λ e −λ∆t κ . For θ and Θ (12) we obtain θ = 1 + 4v λ + κ − 4v λ e −λ∆t/2 1 + κ , Θ = 1 + 4v λ + 2κ − 4v λ e −λ∆t/2 ; in the expression for θ, note that the restriction on κ ensures the positivity of the exponential term in the numerator. For concreteness, we continue the example with the specific values v = 1 and λ = 2, and obtain β ≈ 0.405 and b ≈ 0.595. We can then make appropriate choices for ∆t and κ by minimizing the expression (1 − ∆t,κ ) r(∆t,κ) ∆t as illustrated in Figures 5 and 6. The dependence of this expression on ∆t and κ is in accordance with our reasoning in Remarks 1 and 3. Consequently, we choose ∆t = 0.904, κ = 3.83, and r as in (13) to obtain an explicit bound on the total variation distance between the time-evolved distribution, µ t , and the stationary distribution, π, d TV (µ t , π) ≤ C(1 − 0.070) r t/0.904 ≤ 1.02 C e −0.014 t , with C = 3 + E µ0 [X 0 ]. Unlike in the bounded state space example, the bounds do depend on the initial distribution, µ 0 , through the multiplicative factor, C.  . Plots of (1 − ∆t,κ ) r/∆t vs. κ for selected ∆t.