Necessary Optimality Conditions For Average Cost Minimization Problems

Control systems involving unknown parameters appear a natural framework for applications in which the model design has to take into account various uncertainties. In these circumstances the performance criterion can be given in terms of an average cost, providing a paradigm which differs from the more traditional minimax or robust optimization criteria. In this paper, we provide necessary optimality conditions for a nonrestrictive class of optimal control problems in which unknown parameters intervene in the dynamics, the cost function and the right end-point constraint. An important feature of our results is that we allow the unknown parameters belonging to a mere complete separable metric space (not necessarily compact).


Introduction
In this paper we consider a class of optimal control problems in which uncertainties appear in the data in terms of unknown parameters belonging to a given metric space. Though the state evolution is governed by a deterministic control system and the initial datum is fixed (and well-known), the description of the dynamics depends on uncertain parameters which intervene also in the cost function and the right end-point constraint. Taking into consideration an average cost criterion, a crucial issue is clearly to be able to characterize optimal controls independently of the unknown parameter action: this allows to find a sort of 'best trade-off' among all the possible realizations of the control system as the parameter varies. In this context we provide, under non-restrictive assumptions, necessary optimality conditions. More precisely, we consider the following average cost minimization problem: and, for each ω ∈ Ω, x(t, ω) = f (t, x(t, ω), u(t), ω) a.e. t ∈ [0, T ], x(0, ω) = x 0 and Ω d C(ω) (x(T, ω)) dµ(ω) = 0.
If the integral cost term in (P) does not exist for a feasible process (u, {x(., ω) : ω ∈ Ω}), then we set J Ω (u(.), {x(., ω)}) = +∞. To underline the dependence on a given control u(.) ∈ U, sometimes we shall employ the notation x(., u, ω) for the feasible arc belonging to the family of trajectories {x(., ω) : ω ∈ Ω}, associated with the control u(.) and the element ω ∈ Ω. A feasible process (ū, {x(., ω) : ω ∈ Ω}) is said to be a W 1,1 −local minimizer for (P) if there exists ǫ > 0 such that Ω g(x(T, ω); ω) dµ(ω) ≤ Ω g(x(T, ω); ω) dµ(ω) for all feasible processes (u, {x(., ω) : ω ∈ Ω}) such that x(., ω) − x(., ω) W 1,1 ≤ ǫ for all ω ∈ supp(µ) . (1.1) Control systems involving unknown parameters have been well-studied in literature finding widespread applications particularly from the point of view of the robust (worst-case) control, see for instance the monographs [1], [20] and [6] (and the references therein), and the paper [18] on minimax optimal control. In the introductory section of [20, Chapter IX], control problems with uncertainties are considered comparing the conservative approach (minimax) with an alternative approach in which one might minimize, for instance, an "expected value" (which corresponds to the average cost problem studied in our paper). Then, in [20, Chapters IX and X] Warga investigates the so-called "conflicting/adverse control problems" providing necessary conditions for this broad class of problems which covers minimax problems (under some regularity assumptions), but which does not cover optimal control problems having the average cost criterion studied in our paper. (See [21] for further developments on adverse control problems in the nonsmooth context; cf. the recent papers [13] on adverse control problems and [11] on state-constrained minimax problems.) A growing interest has recently emerged in considering an 'averaged' (or 'expected' with respect to a given measure) approach, exploring various issues, directions and applications: see for instance a recent series of papers on aerospace systems [15], [16], [7], and the articles [2] and [22] on averaged controllability (from different viewpoints); see also [17] for results on heterogeneous systems. Therefore, motivated not only by theoretical reasons but also by a recent growing interest in applications (such as aerospace engineering, see in particular [15] and [16]), in our paper we consider the 'average cost' paradigm rather than the more 'classical' criteria employed in the minimax/robust or adverse optimization framework. For the general (nonsmooth) case we derive necessary optimality conditions ensuring the existence of a costate function p(., .) : [0, T ] × Ω → R n which satisfies an averaged (on Ω) maximality condition. Moreover, the costate arcs p(., ω)'s satisfy also the somewhat expected adjoint system and transversality condition, when ω belongs at least to a countable dense subset Ω of supp(µ). We show that these last two necessary conditions extend to the whole supp(µ) for free right end-point problems, if we impose (suitable) regularity assumptions on the dynamics and the cost function. We also prove that a further (non-trivial) case, in which the conditions of maximum principle extend to the whole supp(µ), is when the measure µ is purely atomic (not necessarily with finite support). This paper is organized as follows. We first study the simpler case in which the measure µ has a finite support (Section 2), which constitutes a discretization model for the general case of an arbitrary measure on a complete separable metric space (which is investigated successively). The main results are displayed in Section 3, and their proofs are given in Section 5. Section 4 is devoted to recall some fundamental theorems in measure theory and provide a limit-taking lemma which play a crucial role in our analysis. The approach that we suggest in our paper consists in approximating the measure µ by measures with finite support (convex combination of Dirac measures). Owing to Ekeland's variational principle, we construct a suitable family of auxiliary optimal control problems, the solutions of which approximate the reference problem (P). Invoking the maximum principle (applicable in a more traditional version) for the approximating minimizers, we obtain properties which, taking the limit (in a suitable sense), allow us to derive the desired necessary conditions. The most difficult part in our proof is to show the maximality condition: this requires non-trivial consideration of multifunction representation and selection theorems. This part becomes simpler for the 'purely atomic' case and the 'smooth' case. An important source of inspiration for the techniques here employed is represented by Vinter's paper [18] (which is devoted to minimax optimal control but, in fact, contains flexible and effective analytical tools that can be extended or adapted to our case). As one may expect, the necessary conditions that we obtain differ from those ones in the minimax context (in particular for the general nonsmooth case and the purely atomic case), for the nature of the minimization criterion is different. For instance, for the general (nonsmooth) case the most evident difference with respect to the costate arcs characterization given in [18] is that (avoiding a formulation which might involve somewhat complicate sets) we show that the 'expected' adjoint system and transversality conditions are satisfied by a family of costate arcs p(., ω)'s, at least when the parameter ω belongs to a countable dense set Ω ⊂ supp(µ). We highlight that an important feature of our paper is the unrestrictive nature of our assumptions: indeed, we allow not only nonsmooth data (on the dynamics, the cost function and the averaged right end-point constraint), but we also provide results for unknown parameters belonging to a mere complete separable metric space Ω. This aspect is particularly relevant for applications (cf. [15]) where Ω (and the support of the reference measure µ) need not to be compact. Our techniques could be used to generalize the conditions in [18] and might provide some insights into dealing with adverse/conflicting control problems with non-compact parameter sets (in [20] and [21] parameter sets are assumed to be compact.) Notation Let (Ω, ρ Ω ) be a metric space. Denote by B Ω the σ-algebra of Borel sets in Ω. A probability measure µ on the measurable space (Ω, B Ω ) takes non-negative values, verifies the σ-additivity property and is such that µ(Ω) = 1. The family of all probability measures on (Ω, B Ω ) is denoted by M(Ω). Recall that a sequence {µ i } of measures in M(Ω) is said to converge weakly * to a measure µ ∈ M(Ω) (in symbol µ i * ⇀ µ), if Ω hdµ i → Ω hdµ for every bounded continuous function h on Ω. The support of a measure µ defined on Ω is written supp(µ). L denotes the Lebesgue subsets of [0, T ], while B m are the Borel subsets of R m . L × B m (respectively L × B m × B Ω ) is the product σ−algebra of L and B m (respectively L, B m and B Ω ). The Euclidean norm is written |.|. We shall employ the following norm on W 1,1 ([0, T ]; R n ): x(.) W 1,1 := |x(0)| + ẋ(.) L 1 (0,T ) . We write ∂ϕ(x) the limiting subdifferential of the (possibly extended valued) function ϕ : R n → R ∪ {+∞} at x ∈ domϕ. If ϕ = ϕ(x, y), then ∂ x ϕ(x, y) is the partial limiting subdifferential with respect to the variable x. B is the closed unit ball in Euclidean space. N C (x) is the limiting normal cone of a closed set C at a point x ∈ C, and (We refer the reader to [4], [9], [10], and [19] and the references therein for these nonsmooth analytical tools.)

Average on measures with finite support
We start considering the particular and simple case of optimal control problems of the form (P), where the probability measure µ of the integral functional has a finite support: it is a convex combination of unit Dirac measures. This constitutes also a preliminary step to derive necessary conditions for the general case. The following assumptions will be needed throughout this section. For a given W 1,1 −local minimizer (ū, {x(., ω) : ω ∈ Ω}) and for some δ > 0, we shall suppose: (ii) The multifunction t U (t) has nonempty values, and Gr U (.) is a L × B m measurable set.
We deduce, therefore, conditions (a)-(d) of the proposition statement. This concludes the proof.

Comments
Condition (iii) of Theorem 3.1 is interpreted in the following sense: for each ω ∈ Ω, one considers functions q(., ω) ∈ W 1,1 ([0, T ], R n ) (such that q(., .) L ∞ is uniformly bounded by a constant) satisfying the adjoint system and the transversality condition Then, from this set of functions, one takes into account only the q(., .)'s such that to generate the family of arcs sets of {P(ω)} ω∈ Ω .
In optimal control theory, necessary optimality conditions results are usually provided avoiding the 'trivial' case, which is given by the couple (λ, p(., .)) = (0, 0), where λ is the multiplier associated with the cost. However, in literature dealing with optimal control problems with unknown parameters in the non-smooth context, results are often written including possible trivial cases which are not considered so relevant for the general properties expressed in the results statement; cf. [18] on nonsmooth minimax problems and [21] on nonsmooth adverse problems, in which the operator 'co' (convexifying over sets of costate arcs) is considered possibly bringing trivial cases. (The fact that in [18] and [21] the multiplier associated with the cost λ does not appear in the necessary conditions should not be so surprising: this multiplier is somewhat hidden in the analysis and, in these contexts, the situation 'p ≡ 0' alone might be considered as 'trivial'). In our case, we might have a trivial couple (λ = 0, p(., .) = 0) which satisfies the conditions of Theorem 3.1, indeed, employing the convexification operator 'co' on the set of costate arcs, it may happen that, taking λ = 0, even if p(.,ω) = 0, withω ∈ Ω, also −p(.,ω) is an admissible costate arc; convexifying, p ≡ 0 ∈ co P(ω). We decided to be consistent with part of previous (nonsmooth) literature on problems with unknown parameters and provide a general nonsmooth result (Theorem 3.1), which allows (in some particular circumstances) a trivial case, but at the same time covers a number of non-restrictive non-trivial cases. For instance, (iii) of Theorem 3.1 immediately implies a non-triviality condition for the pair (λ, p(., .)) when (a) the right end-point constraints are absent (C(ω) ≡ R n ); (b) the given measure µ has a nonatomic component, the averaged right end-point constraints are imposed but the normal cone to the end-point constraint coN C(ω) (x(T, ω)) is pointed for all ω ∈ Ω (or even for ω belonging to a countable dense subset of the support of the nonatomic component of µ). We recall that a convex cone K ⊂ R n is said to be 'pointed' if for any nonzero Concerning (b), the abnormal situation (i.e. λ = 0) is admissible, but the fact that coN C(ω) (x(T, ω)) is pointed ensures that p ≡ 0 / ∈ co P(ω) for allω ∈ Ω.
The 'degeneracy issue' (i.e. the necessary conditions are satisfied by any control) is a longstanding issue which has been widely investigated in optimal control. It is well-known that this issue may arise, for instance, in presence of state constraints for 'standard' (in the sense that parameters are absent) optimal control problems (cf. [19, Chapter X] and the references therein). Rather less is known for the case when unknown parameters intervene in the dynamics and the cost: minimax, adverse, and average optimal control problems. (See [11] for a non-degeneracy result on state constrained minimax problems avoiding the degeneracy caused by the state constraint; see also [18] for a link between minimax and state-constrained problems). In our context degeneracy might occur for the general nonsmooth case (Theorem 3.1) when the given measure µ has a nonatomic component. Indeed, our construction of the costate arcs p(t, ω) for ω ∈ Ω is based on a limit-taking procedure starting from the information provided by (non-trivial) costate arcs p(t,ω) forω ∈ Ω (cf. (5.21) below). If µ has a nonatomic component, we have no reason to expect (under the general assumptions considered in Theorem 3.1) that the non-degenerate property of the costate arcs p(t,ω) (ω ∈ Ω) always propagates on Ω as desired: there might be some degenerate situations in which for a full-measure subset of Ω the limit we take in the proof of Theorem 3.1 does not exist, and p(., .) extends with the value zero on Ω \ Ω, obtaining a degeneracy issue. However, under some circumstances, the information provided on the set Ω does propagate: if there is no right end-point constraint and, in addition, we impose regularity assumptions on the dynamics and the terminal cost function, properties (i) and (iii) of Theorem 3.1 extend to the whole parameter set Ω, as stated in Theorem 3.3. Theorems 3.2 and 3.3 do provide non-degenerate results.
Nonsmooth results on optimal control problems with unknown parameters, such as adverse and minimax problems (see [21] and [18]), are concerned with a 'degenerate issue' which is not far from the one of our nonsmooth result Theorem 3.1, maybe, in a more 'dramatic' way, for the measure -appearing there as a multiplier in the necessary conditions-is not uniquely determined, and may have a support with degenerate effects on the necessary conditions. Consider for instance the simple example [18,Example 4.1] in the context of minimax problem: A minimax minimizer is: (x ≡ 0,ū ≡ 0). In [18] there is a detailed discussion comparing [ At first glance our results might look similar to some statements on necessary conditions appearing in [20] and [21]. Not only these results do not cover the class of average cost minimization problems (in the sense of our paper), but we also highlight a crucial aspect concerning the completely different role of the measures entering in the picture of the necessary conditions: in Warga's framework the existence of a positive Radon measure (on the set of adverse relaxed controls) is a necessary condition, and this measure plays the role of a 'multiplier'. In our context (of average control problems) the probability measure µ is a given datum, and we underline the fact that our objective is to give necessary conditions w.r.t. the given measure µ.
We finally observe that the construction of the countable set Ω proposed in this paper could be useful for applications: it provides a constructive way to approximate the reference measure µ by means of a sequence of convex combinations of Dirac measures concentrated at points of Ω. Therefore the set Ω can be considered as a reference set of parameters ω's for which one starts computing the costate arcs p(., ω) and, eventually, derives conditions for optimal controls.

Preliminary results in measure theory
This section is devoted to display results which will be relevant for the proofs of Theorems 3.1, 3.2 and 3.3. We shall make repeatedly use of the following theorem (also referred to as Portmanteau Theorem, cf.  (c) lim µ i (B) = µ(B) for every Borel set B whose boundary has µ−measure zero. (Such sets are also referred to as µ−continuity sets) ; We consider now subsets D and D i , for i = 1, 2, . . ., of Ω × R K . We denote respectively by D(.), D i (.) : Ω R K the multifunctions defined as Let {µ i } be a weak * convergent sequence of measures in M(Ω). Our aim is to justify the limit-taking of sequences like The required convergence result is provided by Lemma 4.3 below, which represents an extension of [ (ii) the multifunctions ω D(ω) and ω D i (ω), for all i, are uniformly bounded; Define, for each i, the vector of signed measures η i := γ i µ i . Then, along a subsequence, we have where η is a vector-valued Borel measure on Ω such that for some Borel measurable function γ : Ω → R K satisfying γ(ω) ∈ D(ω) µ − a.e. ω ∈ Ω .
Since Ω is a complete separable metric space, the sequence {µ i } turns out to be uniformly tight as result of Theorem 4.2. We also know that γ i (ω) ∈ D i (ω) µ i − a.e. and D i (ω) is uniformly bounded for all i. It follows that there exists a constant M > 0 such that But since B η,µ generates all the Borel sets of Ω (cf. [12,Chapter 7, Appendix]), it follows that η is absolutely continuous with respect to µ. Therefore, by the Radon-Nikodym Theorem, there exists a R K -valued, Borel measurable and µ-integrable function γ on Ω such that for any Borel subset B of Ω we have It remains to show that γ(ω) ∈ D(ω) µ−a.e. ω ∈ Ω. For all j ∈ N fixed, following the approach suggested in [19, Proposition 9.2.1], we define D j (ω) := D(ω) + 1 j B ⊂ R K . We fix q ∈ R K . Since D(ω) is uniformly bounded and D is closed, the multifunction D j (.) is upper semicontinuous. Then, forR > 0 large enough, the marginal function defined by Recalling that the sets D(ω) and D i (ω) for i = 1, 2, . . . , are uniformly bounded, and owing to (4.1), we have that, for all j ∈ N, there exists i j such that for all i ≥ i j , D i (ω) ⊂ D j (ω) . Then for q ∈ R K and for any Borel subset B of Ω, for all i ≥ i j , we have The last inequality is a consequence of (4.3). Before passing to the limit, we observe that supp(η) ⊂ dom D j (.) .

Proofs of Theorem 3.1, Theorem 3.2 and Theorem 3.3
We first employ a standard hypotheses reduction argument establishing that we can, without loss of generality, replace assumptions (A3)-(A5) by the stronger conditions in which δ = +∞ (i.e. the conditions are satisfied globally).
(A3) ′ There exist a constant c > 0 and an integrable function (A4) ′ (i) There exist positive constants k g ≥ 1 and M such that for all ω ∈ Ω |g(x, ω)| ≤ M and d C(ω) (x) ≤ M for all x ∈ R n , (ii) There exists a modulus of continuity θ(.) such that we have and for all ω 1 , ω 2 ∈ Ω and x ∈ R n .
(A5) ′ There exists a modulus of continuity θ f (.) such that for all ω 1 , ω 2 ∈ Ω, This is possible if we consider the "truncation" function tr y,δ : R n → R n , defined to be and we replace f, g and d above by their local expression f , g and d defined as follows Indeed, the problems involving the functions (f, g, d) and ( f , g, d) do coincide in a neighbourhood of the W 1,1 −local minimizer (ū, {x(., ω) | ω ∈ Ω}) for (P). Therefore, (ū, {x(., ω) | ω ∈ Ω}) does remain a W 1,1 −local minimizer for the problem (P) when we substitute the pair (f, g, d) with ( f , g, d). Furthermore, the assertions of the theorem are unaffected by changing the data in this way. We provide two technical lemmas which will be employed in the approximation techniques used in the theorems proof. These preliminary results establish the uniform continuity of trajectories with respect to ω and the existence of a sequence of suitable finite support measures approximating the reference measure µ. Throughout this section, d E (., .) denotes the Ekeland metric defined on the control set U as We recall that, given a control u(.), to make clearer which control is used we shall employ the alternative notation x(., u, ω) for the feasible arc belonging to the family of trajectories {x(., ω) : ω ∈ Ω} associated with the control u(.).

This confirms property (iii).
Finally, if the measure µ has a purely atomic component such that each atom is a singleton, then at each step of the iterative argument employed in (i), the compact set K ℓ ⊂ Ω, for all ℓ ≥ 1, is such that it contains a finite number of atoms of µ which will be included in Ω ℓ .
Proof of Theorem 3.1. The proof is build up in four parts. The first part consists in approximating the reference problem with a given probability measure by an auxiliary problem which involves measures with finite support. This is possible invoking the result on the weak * convergence established in Lemma 5.2 and the Ekeland's variational Principle. In the second part, we apply necessary optimality conditions (cf. Proposition 2.1 previously obtained) for the auxiliary problem. In the third part, we pass to the limit a first time to obtain optimality conditions on a countable dense subset of supp(µ). The last part of the proof is devoted to deriving, via a second limit-taking process, all the desired necessary conditions of the theorem statement. Since it is not restrictive to assume that supp(µ) = Ω, we shall consider this assumption throughout the proof.
Following the idea of Proposition 2.1, and dividing each term of the family of the costate arcs across by the corresponding coefficient α i j (> 0) (without relabelling), we obtain that for each i large enough and µ i −a.e. ω ∈ Ω, for all t ∈ A ρ ′ i and for any u ∈ U (t) .

3.
We derive now consequences of the limit-taking for conditions (a1) ′ -(a3) ′ of the previous step. Recall that from Lemma 5.2, we have a countable dense subset Ω of Ω, such that Ω = i≥1 Ω i , where Ω i = {ω i j : j = 0, . . . , N i } provides an increasing sequence of finite subsets of Ω: Since Ω is a countable set, we can write it as the collection of the elements of a sequence {ω k } k≥1 such that Ω = {ω k } k≥1 .
Fix i ∈ N. When we take ω k ∈ Ω, two possible cases may occur: either ω k ∈ Ω i for the fixed i ∈ N; or ω k ∈ Ω \ Ω i . In the first case, it means that there exists j ∈ {0, . . . , N i } such that ω k = ω i j and the corresponding adjoint arc p i (., ω i j ) satisfies conditions (a1) ′ -(a4) ′ . So, we can define the arc p i (., ω k ) as follows: Therefore, by iterating on i, associated with each ω k ∈ Ω, we can construct a sequence of families of arcs {p i (., ω k ) : ω k ∈ Ω} i≥1 . Observe that there exists always i k ∈ N such that, for all i ≥ i k , p i (., ω k ) is an adjoint arc for which (a1) ′ -(a4) ′ hold true. From (a3) ′ and (A4) ′ it immediately follows that the sequence {p i (T, ω k )} is uniformly bounded by k g + 1. On the other hand (a2) ′ and (A3) ′ imply that {ṗ i (., ω k )} are uniformly integrably bounded. Then, the hypotheses are satisfied under which the Compactness Theorem [19, Theorem 2.5.3] is applicable to , ω k )] for all t ∈ A ρ ′ i . We conclude that, along some subsequence (we do not relabel), for some p(., ω k ) ∈ W 1,1 which satisfies (for the fixed k) We can also take the subsequence in such a manner that {λ i } converges to some λ ∈ [0, 1]. Moreover, from the closure of the graph of the limiting subdifferential and the normal cone (seen as multifunctions), we have that − p(T, ω k ) ∈ λ∂ x g(x(T, ω k ); ω k ) + (1 − λ)∂d C(ω k ) (x(T, ω k )).
Then, we obtain the desired extension setting p j = p j+ − p j− and p(., .) = (p 1 (., .), . . . , p n (., .)). Clearly we have the following transversality condition: Finally, we derive a non-triviality condition for {p(., ω) : ω ∈ Ω}. This is immediate if the λ = lim λ i > 0, so we continue examining the case in which λ = 0. Choose i 0 ∈ N such that for all i ≥ i 0 , (k g + 1)λ i < 1 2 . In particular, for all i ≥ i 0 , from the Max Rule we have 1 − λ i > 0, and using the fact that J i (u i ) > 0, it follows that Then there exists j i ∈ {0, 1, . . . , N i } and ν ∈ R n such that |ν| = 1 and Recalling that k g > 0 is the Lipschitz constant of g(., ω), we have And from the choice of i 0 ∈ N, we obtain that and so In any case, we obtain the non-triviality condition Owing to assumption (A2) and the Lipschitz continuity of f (t, ., u, ω), we obtain that (t, ω) F (t, ω) is a L × B Ω measurable with closed values. Using the Castaing's Representation Theorem, we know that there exists a countable family of L × B Ω measurable functions {f j (t, ω)} j≥0 , such that in which E ⊂ [0, T ] × Ω is a set of full-measure. We can also assume that f 0 (t, ω) = f (t,x(t, ω),ū(t), ω). For all j ≥ 1, define the multifunction which is the union of two L × B Ω × B m measurable sets. Now invoking Aumann's Measurable Selection Theorem, we deduce that U j (., .) has a measurable selection v j (t, ω) ∈ U j (t, ω).
Proof of Theorem 3.2. A purely atomic measure has necessarily at most a countable support. We can therefore choose Ω in such a manner that Ω = supp(µ). The properties (i) and (iii) follow immediately considering Steps 1, 2 and 3 of Theorem 3.1 proof and the obtained costate arc p(., .). On the other hand, the maximality condition (ii) can be deduced by contradiction, avoiding any use of the technical procedure of Step 4 of Theorem 3.1 proof which requires the construction of appropriate multifunctions and the use of selection theorems. We provide here the details of this 'new step 4' which allows to obtain (ii). Consider the function (t, u) → Ψ(t, u) := k≥0 µ(ω k )p(t, ω k ) · f (t,x(t, ω k ), u, ω k ) − f (t,x(t, ω k ),ū(t), ω k ) .
Using a standard argument, one can easily show that (t, u) → Ψ(t, u) is L × B m − measurable.