Limit value for optimal control with general means

We consider optimal control problem with an integral cost which is a mean of a given function. As a particular case, the cost concerned is the Ces\`aro average. The limit of the value with Ces\`aro mean when the horizon tends to infinity is widely studied in the literature. We address the more general question of the existence of a limit when the averaging parameter converges, for values defined with means of general types. We consider a given function and a family of costs defined as the mean of the function with respect to a family of probability measures -- the evaluations -- on R_+. We provide conditions on the evaluations in order to obtain the uniform convergence of the associated value function (when the parameter of the family converges). Our main result gives a necessary and sufficient condition in term of the total variation of the family of probability measures on R_+. As a byproduct, we obtain the existence of a limit value (for general means) for control systems having a compact invariant set and satisfying suitable nonexpansive property.


Introduction
We consider a control system defined on R d whose dynamic is given by y ′ (t) = f y(t), u(t) (1.1) where f : R d × U → R d and u(·) is a measurable function -called the control -from R + to U a fixed metric space. We will make later on assumptions on (1.1) ensuring that for any initial condition y(0) = y 0 , and any measurable control u(·), the equation (1.1) has a unique solution t → y(t, u, y 0 ) defined on R + .
To any pair y 0 , u(·) , we associate a cost +∞ 0 g y(t, u, y 0 ), u(t) dθ(t), where g : R d × U → R is Borel measurable bounded and θ ∈ ∆(R + ) is a Borel probability measure on R + . We called θ an evaluation throughout the article.
When θ ∈ ∆(R + ) is given, the contribution of the interval [T, +∞) in the mean (1.2) is less and less significant as T becomes large. Thus the control problem is essentially interesting only on [0, T 0 ] for certain T 0 , which we roughly name the "duration" for the problem. In this article, we are interested in the long-run property of J , i.e., the asymptotic behavior of the function θ → V θ when the "duration" of θ tends to infinity. In the particular examples of Cesàro mean and Abel mean, the uniform convergence of V θt as t tends to infinity and of V θ λ as λ tends to 0 are studied. It is a priori unclear how to define the "duration" of a general evaluation θ over R + . If one just assumes the expectation of θ to be large, we can obtain very different value functions, as is shown by the following Example 1.1 Consider the uncontrolled dynamic y(t) = t, the running cost t → g(t) = 1 ∪ ∞ m=1 [2m−1,2m] (t), and two sequences of evaluations (µ k ) k≥1 and (ν k ) k≥1 with densities: −1,2m] and f ν k = 1 k 1 ∪ k m=1 [2m−2,2m −1] . Clearly, V µ k = 1 and V ν k = 0, ∀k ≥ 1.
For this reason, we introduce an asymptotic regularity condition for evaluations, called the long-term condition (LTC for short), to express the "large duration" and the "asymptotic uniformity of distributions over R + ", and we will study the convergence of the value functions along a sequence of evaluations satisfying the LTC.
More precisely, for any s ≥ 0, we define the s-total variation of an evaluation θ to be the total variation between the measure θ and its s-shift along R + : We say that a sequence of evaluations (θ k ) k≥1 satisfies the LTC if: The optimal control problem J = U, g, f has a general limit value given by some function V * defined on R d if for any sequence (θ k ) k satisfying the LTC, (V θ k (y 0 )) k converges uniformly to V * as k tends to infinity.
Our main result (Theorem 4.1) states that for any (θ k ) k satisfying the LTC, (V θ k ) k converges uniformly if and only if the family {V θ k } is totally bounded with respect to the uniform norm. Moreover, in this case, the limit is characterized by the following: (1. 3) The above function V * naturally appears to be the unique possible long-term value function of the control problem.
As a byproduct of our main result, we obtain the existence of the general limit value for any control problem J = U, g, f with a running cost g that does not depend on u and with a control dynamic (1.1) which is non-expansive and has a compact invariant set. This can be viewed as a generalization of already obtained results in [8] for optimal control with Cesàro mean.
Existing results in the erdogic control literature are concerned mainly with the convergence of the t-horizon Cesàro mean values or the convergence of the λ-discounted Abel mean values. To the best of the authors' knowledge, this paper is the first to consider general long-term evaluations for optimal control problems.
Also it is worth pointing out that while many works (including [1], [2], [3], [4], [6], [7]) suppose controllability or ergodicity conditions, the present approach does not reply on such conditions. This could be understood by the fact that the limit value V * may depend on the initial state y 0 (which does not occur under ergodic or controlability assumptions).
We also make here a link to the discrete time framework, in which an evaluation θ = (θ m ) m≥1 is a probability measure over positive integers N * = N\{0}, and θ t is the weight for the staget payoff. The analogue notion of total variation is defined for any θ ∈ ∆(N * ): T V (θ) = ∞ m=1 |θ m+1 − θ m | (cf. [12] and [9]). Recently, the existence of the general limit value of dynamic optimization problems in several discrete time frameworks has been obtained in [9], [11] and [13]. Our work is partially inspired by [9] and similar tool within the proof appeared in [10].
The article is organized as follows. Section 2 contains some preliminary notations and basic examples. The long-term condition is introduced and studied in Section 3. Section 4 contains our main result and its consequences. We discuss in the end of this section two (counter)examples. Section 5 is devoted to the proof of the main result. A weaker notation of LTC is discussed in Section 6.

Preliminaries
Consider the optimal control problem J = U, g, f described by (1.1)-(1.2). We make the following assumptions on g and f : Under these hypotheses, given any control u(·) in U and any initial starting state y 0 ∈ R d , (1.1) has a unique absolutely continuous solution t → y(t, u, y 0 ) defined on [0, +∞). As the running cost function g : R d × U → R is bounded, we can always assume that g : R d × U → [0, 1] after some affine transformation.
Reachable map R t For any y 0 ∈ R d , the reachable map in R + , t → R t (y 0 ), is defined as: R t (y 0 ) represents the set of states that the dynamic can reach via certain control at time t, starting from the initial state y 0 at time 0. We write R t (y 0 ) = ∪ t s=0 R s (y 0 ) and R(y 0 ) = ∪ ∞ s=0 R s (y 0 ). R(y 0 ) is the set of states that can be reached at any finite time starting from y 0 .
Image measure T t ♯θ and the auxiliary value function V Tt♯θ Given t ∈ R and θ in ∆(R + ), we use T t ♯θ to denote the image (push-forward) measure of θ by the function T t : s → s + t, i.e., where B(R + ) denotes the set of all Borel subsets in R + . This leads us to write the t-shift θ-evaluated cost induced by a control u as follow: Taking on both sides of (2.3) the infimum over u(·) ∈ U and using the notation of reachable map R t , we obtain the t-shift θ-value function s-total variation Given an evaluation θ, define its s-total variation for each s ≥ 0: Long-term condition (LTC) A sequence of evaluations (θ k ) k≥1 satisfies the LTC if: In this article, we are concerned with the following notation of limit value for optimal control problems with general means.
Definition 2.1 Let V be a function defined on R d . The optimal control problem J admits V as the general limit value if: for any sequence of evaluations (θ k ) k≥1 satisfying the LTC, for all y 0 in R d , (V θ k (y 0 )) converges to V (y 0 ) as k tends to infinity, and moreover the convergence is uniform in y 0 .
Below are some basic examples of optimal control problems in which the general limit value exists.
Example 2.2 y lies in R 2 seen as the complex plane, there is no control, and the dynamic is given by f (y) = i y, where i 2 = −1. We clearly have for any sequence of evaluations (θ k ) k satisfying the LTC. Example 2.3 y lies in the complex plane again, with f (y, u) = i y u, where u ∈ U is a given bounded subset of R, and g is any continuous function in y (which thus does not depend on u).
Example 2.4 f (y, u) = −y + u, where u ∈ U a given bounded subset of R d , and g is any continuous function in y (which thus does not depend on u).
We will show later (using Corollary 4.7) that the general limit value exists in Examples 2.3 and 2.4.

On the long-term condition (LTC)
In this section, we discuss the LTC. First, we give the following remarks.
This implies that (θ k ) k≥1 satisfies the LTC if and only if Remark 3.2 Let θ be an evaluation absolutely continuous w.r.t. the Lebesgue measure on R + , and f θ its density. Scheffé Theorem (cf. [5], Theorem 1 in p.2) implies that: Here we discuss several cases where the LTC condition is satisfied.
Example 3.4 (Abel average) Assume that for each k, θ k has density s → f θ k (s) = λ k e −λ k s 1 R + (s), with λ k > 0. Since ∀k ≥ 1, s → f θ k (s) is non increasing, Remark 3.2 (a) implies that (θ k ) k satisfies the LTC if and only if: Example 3.5 (Folded normal distributions) Assume that for each k, θ k is the distribution of a random variable |X k |, where X k follows a normal law N (m k , σ 2 k ). The density of θ k is given by: Our argument relies on the following lemma, whose proof is put in the Appendix. Without loss of generality, we may assume that m k is non-negative for each k.
When the sequence of evaluations in continuous time admits step functions as densities, this link to discrete time framework is much clearer as seen by the following Proposition 3.6 Let (θ k ) k be a sequence of absolutely continuous evaluations in ∆(R + ), and their densities are given as: Proof : Fix s ∈ [0, 1]. We shall write for each k, For each m = 1, 2, ..., we have As a consequence, We end this section by a preliminary lemma, which will be useful in later results.
Lemma 3.7 Fix any θ ∈ ∆(R + ) and any t ∈ R + , we have Proof : We fix any θ ∈ ∆(R + ) and t ∈ R + . By definition of T s ♯θ, we have that for any h(·) ∈ M(R + , [0, 1]): Since T −t ♯θ and T t ♯θ are both Borel measures on R + , "θ − T −t ♯θ" and "θ − T t ♯θ" are both signed measures. Hahn's decomposition theorem 1 implies that: and The first author acknowledges Eilon Solan for the discussion on using Hahn's decomposition theorem.
Combining with (3.1)-(3.2), we obtain: The proof of the lemma is complete.

Main Result
As will be shown in our main result, the function V * (y 0 ) defined in (1.3) characterizes the general limit value of the optimal control problem in case of convergence. We first rewrite it as We give the following interpretation: consider the auxiliary optimal control problem (game) where an adversary of the controller chooses an evaluation µ, and then knowing µ as given, the controller chooses an initial state in the reachable set R(y 0 ). The running cost from time t is evaluated by µ and V * (y 0 ) is the value of this problem starting from y 0 .
Recall that a metric space X is totally bounded if for each ε > 0, X can be covered by finitely many balls of radius ε.  A more precise convergence result is obtained if we suppose that there exists a compact set Y ⊆ R d which is invariant for the dynamic (1.1), i.e., such that y(t, u, y 0 ) ∈ Y for all u(·) ∈ U , t ≥ 0 and y 0 in Y .  Proof. The general uniform convergence of the value functions {V θ } to V implies the existence of general limit value given as V . Next we show that the existence of the general limit value given as V is sufficient to deduce the general uniform convergence of {V θ } to V . Suppose by contradiction that there is no general uniform convergence of {V θ } to V , i.e., Let ε 0 > 0 be fixed as above. We take a vanishing positive sequence (η k ) k and some S 0 > 0, then there is a sequence of evaluations According to Remark 3.1 (a), such (θ k ) k satisfies the LTC, while (V θ k ) does not converges uniformly to V * . This is a contradiction.
Corollary 4.5 Assume (2.1) for the optimal control problem J = U, g, f . Suppose that there is a compact set Y ⊆ R d which is invariant for the dynamic (1.1), and that the family {V θ : θ ∈ ∆(R + )} is uniformly equicontinuous on Y . Then there is the general uniform convergence of the value functions {V θ } to V * .
Proof : By assumption, the family of value functions {V θ : θ ∈ ∆(R + )} is both uniformly bounded and uniformly equicontinuous on the compact invariant set Y , so we can use Ascoli's theorem to deduce the totally boundedness of the space ({V θ }, || · || ∞ ). Theorem 4.1 implies that: for any (θ k ) k satisfying the LTC, the corresponding sequence of value functions (V θ k ) converges uniformly to V * as k tends to infinity. Thus J has a general limit value given as V * , and according to Lemma 4.4, there is the uniform convergence of value functions {V θ } to V * .
We shall give the existence result of the general limit value under sufficient conditions expressed directly in terms of properties of the control dynamic (1.1) and of the running cost g.
Let us introduce the following non expansive condition (cf. [8]). The control dynamic (1.1) is non expansive if Corollary 4.7 Assume (2.1) for the optimal control problem J = U, g, f . Suppose that that J is compact non expansive, then the general limit value exists in J and is given as V * .
Interchanging y 1 and y 2 and taking into account of ε > 0 being arbitrary, we deduce that (V θ ) θ∈∆(R + ) is uniformly equicontinuous on the invariant set Y . This finishes the proof.   The first example is an uncontrolled dynamic. We show that if (θ k ) k contains no subsequence satisfying the LTC, then the result in Part (i) of Theorem 4.1 does not hold, i.e., sup k≥1 inf t≥0 V Tt♯θ k (y 0 ) < sup θ∈∆(R + ) inf t≥0 V Tt♯θ (y 0 ) for some y 0 (cf. Remark 4.2).
In the second example, we study the convergence of the value functions of a control problem along two different sequences of evaluations satisfying the LTC. Along the first sequence, the value functions converge uniformly to V * ; while along the second, the value functions pointwisely converge, but not uniformly (thus the family of value functions is not totally bounded for the uniform norm), to a limit function which is different from V * .
Counter-example 4.11 Consider the control problem on the state space R = (−∞, +∞), where the control set is U = {+1, −1}; the dynamic is 2 : where R * − = R − /{0}; and the running cost function is: Suppose that K > 1 large enough, so the cost on R − is positive and high. Whenever the state reaches y = 0, it is optimal to choose control u = +1 and this drives the state back to R + ; on R * − , the dynamic is f = −1, independent of control and state. V θ (y 0 ) = K for all y 0 in R * − and θ in ∆(R + ), so the reduced state space is R + , and we consider value functions defined on it.

Proof of main result: Theorem 4.1
Consider in this section a sequence of evaluations (θ k ) k that satisfies the LTC. As the proof is rather long, we divide it into two main parts: • in Subsection 5.1, we present the first preliminary result, Proposition 5.1. It is used in two ways: first, we obtain an immediate consequence of it for later use, which bounds lim inf k V θ k from below in terms of the auxiliary value functions {V Tt♯θ k : k ∈ N * , t ∈ R + }; second, we deduce from it in Corollary 5.2 the proof for Part (i) of Theorem 4.1.
• In Subsection 5.2, we prove Parts (ii)-(iii) of Theorem 4.1. Lemma 5.4 gives an upper bound of lim sup k V θ k in terms of the auxiliary value functions {V Tt♯θ k : k ∈ N * , t ∈ R + }, which is, together with the result from Proposition 5.1, used to end the proof.
We end the proof for Part (i) of Theorem 4.1 by the following corollary of Proposition 5.1.

Corollary 5.2 [Proof for Part (i) of Theorem 4.1]
sup Proof: Fix y 0 ∈ R d , and denote ̺ = sup k≥1 inf t≥0 V Tt♯θ k (y 0 ). It is clear that ̺ ≤ sup µ∈∆(R + ) inf t≥0 V Tt♯µ (y 0 ). Now for the converse inequality, consider for each k ≥ 1, there exists m(k) in R + such that V T m(k) ♯θ k (y 0 ) ≤ ̺ + 1/k. Since T m(k) ♯θ k -the image measure of θ k by the function s → s + m(k) -is also an evaluation on R + , we have: We deduce that (T m(k) ♯θ k ) k satisfies the LTC whenever (θ k ) k does so. According to Proposition The proof is complete.

Proof for Parts (ii)-(iii)
In this subsection, we give the proof for Parts (ii)-(iii) of Theorem 4.1. We begin with the following result, which compares the values under evaluation µ and its t-"shifted" evaluation T t ♯µ for any t > 0.
The following result gives an upper bound on "lim sup k V θ k " in terms of the auxiliary value functions {V Tt♯θ k : k ∈ N * , t ∈ R + }. In particular, for all T 0 ≥ 0 and any y 0 in R d , Proof: Fix T 0 ≥ 0 and y 0 ∈ R d . The inequality ′′ lim sup k inf t≤T 0 V Tt♯θ k ≤ lim sup k V ′′ θ k is clear by taking t = 0 for each k. Now for the converse inequality ′′ lim sup k inf t≤T 0 V Tt♯θ k ≥ lim sup k V ′′ θ k : according to Proposition 5.3, we have that for all k and t ≤ T 0 , k , which gives us: Since (θ k ) k satisfies the LTC, T V T 0 (θ k ) vanishes as k tends to infinity. By taking "lim sup k " on both sides of above inequality, the proof of the lemma is complete.
Now we end the proof for Theorem 4.1. To do this, we first summarize results in Proposition 5.1 and Lemma 5.4 in the following chain form, which is then used for the study of the convergence of (V θ k ) k .
Remark 5.6 Corollary 5.5 states that the uniform convergence of "sup k≥1 inf t≤T 0 V Tt♯θ k " to "sup k≥1 inf t≥0 V Tt♯θ k " as T 0 tends to infinity implies the uniform convergence of (V θ k ) k as k tends to infinity. Moreover, according to Corollary 5.2, in case of uniform convergence, the limit function is V * .
For any states y and y in R d , let us defined(y, y) = sup k≥1 |V θ k (y) − V θ k (y)|. The space (R d ,d) is now a pseudometric space (may not be Hausdorff).
The following is similar to the proof of Theorem 2.5 in [9], and is also similar to the proof of Theorem 3.10 in [10]. We rewrite it here for sake of completeness. Roughly speaking, we shall use the total boundedness of the space {V θ k }, ||·|| ∞ so as to deduce that the state space (R d ,d) is totally bounded for the pseudometric metricd. This allows us to prove the convergence ford of the reachable set R T to R in bounded time. We are then able to prove the uniform convergence of "sup k≥1 inf t≤T 0 V Tt♯θ " to "sup k≥1 inf t≥0 V Tt♯θ " as T 0 tends to infinity..
Let us prove the converse. Suppose that ({V θ k }, ||·|| ∞ ) is totally bounded, so fixing any ε > 0, there exists a finite set of indices I such that for all k ≥ 1, there exists i ∈ I satisfying { V θ i (y) , y ∈ R d } is a subset of the compact metric space [0, 1] I , · ∞ , thus it is itself totally bounded, so there exists a finite subset X of R d such that We have obtained that for each ε > 0, there exists a finite subset X of R d such that for every y ∈ R d , there is x ∈ X satisfying: for any k ≥ 1 there is some i ∈ I with thusd(y, x) ≤ ε. This implies that the pseudometric space (R d ,d) is itself totally bounded.
Fix now y 0 in R d . It is by definition that for all T, S ∈ R + with S ≥ T, we have R T (y 0 ) ⊂ R S (y 0 ) ⊂ R(y 0 ), and ∀ȳ ∈ R(y 0 ), ∃T > 0 withȳ ∈ RT (y 0 ).
Next, Part (ii) can be deduced from the proof of Part (iii). Let (θ ϕ(k) ) be any subsequence of (θ k ) that converges uniformly to some function V . This implies that ({V θ ϕ(k) }, || · || ∞ ) is totally bounded. As we have shown in the proof of Part (iii) that if ({V θ ϕ(k) } , || · || ∞ ) is totally bounded, (V θ ϕ(k) ) converges uniformly to V = V * , which implies Part (ii) that V * is the unique accumulation point (for the uniform convergence) of the sequence (V θ k ) k .

Discussion on a weaker long-term condition
Below is a weaker form of the long-term condition (LTC): Long-term condition' (LTC') A sequence of evaluations (θ k ) k≥1 satisfies the LTC' if: t + m σ 2 > 0 (resp. < 0).
From the above analysis, we deduce that the proof of the lemma is reduced to the proof for Claim There is some t * ∈ [0, m) such that H(t) < 0 for t ∈ (0, t * ) and H(t) > 0 for t ∈ (t * , m). Moreover, such t * satisfies (t * ) 2 ≥ m 2 − σ 2 .
In order to prove the claim, we compute • the values at the end point: H(0) = 0 and lim t→m − H(t) = −∞; • the first-order derivative at any t ∈ [0, m): which is substituted back into (7.1), to yield H ′ (t e ) > 0 resp. H ′ (t e ) < 0 ⇐⇒ (t e ) 2 < m 2 − σ 2 resp. (t e ) 2 > m 2 − σ 2 . (7.2) Next, it is easy for us to prove the following result: Let t e 1 ∈ [0, m) be a rest point for H(·), and suppose that t e 2 ∈ (t e 1 , m) is the smallest rest point after t e 1 . Then H ′ (t e 1 )H ′ (t e 2 ) ≤ 0 and if H ′ (t e 1 ) ≤ 0, such t e 2 does not exist.
To conclude, we see that in both cases such t * exists and satisfies (t * ) 2 ≥ m 2 − σ 2 , thus the claim is proved. This finishes our proof for the lemma.