Optimal sampled-data control, and generalizations on time scales

In this paper, we derive a version of the Pontryagin maximum principle for general ﬁnite-dimensional nonlinear optimal sampled-data control problems. Our framework is actually much more general, and we treat optimal control problems for which the state variable evolves on a given time scale (arbitrary non-empty closed subset of R ), and the control variable evolves on a smaller time scale. Sampled-data systems are then a particular case. Our proof is based on the construction of appropriate needle-like variations and on the Ekeland variational principle.


Introduction
Optimal control theory is concerned with the analysis of controlled dynamical systems, where one aims at steering such a system from a given configuration to some desired target by minimizing some criterion.The Pontryagin maximum principle (in short, PMP), established at the end of the 50's for general finite-dimensional nonlinear continuous-time dynamics (see [46], and see [30] for the history of this discovery), is certainly the milestone of the classical optimal control theory.It provides a first-order necessary condition for optimality, by asserting that any optimal trajectory must be the projection of an extremal.The PMP then reduces the search of optimal trajectories to a boundary value problem posed on extremals.Optimal control theory, and in particular the PMP, has an immense field of applications in various domains, and it is not our aim here to list them.
We speak of a purely continuous-time optimal control problem, when both the state q and the control u evolve continuously in time, and the control system under consideration has the form q(t) = f (t, q(t), u(t)), for a.e.t ∈ R + , where q(t) ∈ R n and u(t) ∈ Ω ⊂ R m .Such models assume that the control is permanent, that is, the value of u(t) can be chosen at each time t ∈ R + .We refer the reader to textbooks on continuous optimal control theory such as [4,13,14,18,20,21,33,42,43,46,47,49,50] for many examples of theoretical or practical applications.
We speak of a purely discrete-time optimal control problem, when both the state q and the control u evolve in a discrete way in time, and the control system under consideration has the form where q k ∈ R n and u k ∈ Ω ⊂ R m .As in the continuous case, such models assume that the control is permanent, that is, the value of u k can be chosen at each time k ∈ N. A version of the PMP for such discrete-time control systems has been established in [32,39,41] under appropriate convexity assumptions.The considerable development of the discrete-time control theory was in particular motivated by the need of considering digital systems or discrete approximations in numerical simulations of differential control systems (see the textbooks [12,24,45,49]).It can be noted that some early works devoted to the discrete-time PMP (like [27]) are mathematically incorrect.Some counterexamples were provided in [12] (see also [45]), showing that, as is now well known, the exact analogous of the continuous-time PMP does not hold at the discrete level.More precisely, the maximization condition of the continuous-time PMP cannot be expected to hold in general in the discrete-time case.Nevertheless, a weaker condition can be derived, in terms of nonpositive gradient condition (see [12,Theorem 42.1]).
We speak of an optimal sampled-data control problem, when the state q evolves continuously in time, whereas the control u evolves in a discrete way in time.This hybrid situation is often considered in practice for problems in which the evolution of the state is very quick (and thus can be considered continuous) with respect to that of the control.We often speak, in that case, of digital control.This refers to a situation where, due for instance to hardware limitations or to technical difficulties, the value u(t) of the control can be chosen only at times t = kT , where T > 0 is fixed and k ∈ N.This means that, once the value u(kT ) is fixed, u(t) remains constant over the time interval [kT, (k + 1)T ).Hence the trajectory q evolves according to q(t) = f (t, q(t), u(kT )), for a.e.t ∈ [kT, (k + 1)T ), k ∈ N.
In other words, this sample-and-hold procedure consists of "freezing" the value of u at each controlling time t = kT on the corresponding sampling time interval [kT, (k + 1)T ), where T is called the sampling period.In this situation, the control of the system is clearly nonpermanent.
To the best of our knowledge, the classical optimal control theory does not treat general nonlinear optimal sampled-data control problems, but concerns either purely continuous-time, or purely discrete-time optimal control problems.It is one of our objectives to derive, in this paper, a PMP which can be applied to general nonlinear optimal sampled-data control problems.
Actually, we will be able to establish a PMP in the much more general framework of time scales, which unifies and extends continuous-time and discrete-time issues.But, before coming to that point, we feel that it is of interest to enunciate a PMP in the particular case of sampled-data systems.
PMP for optimal sampled-data control problems.Let n, m and j be nonzero integers.Let T > 0 be an arbitrary sampling period.In what follows, for any real number t, we denote by E(t) the integer part of t, defined as the unique integer such that E(t) ≤ t < E(t) + 1.Note that k = E(t/T ) whenever kT ≤ t < (k + 1)T .We consider the general nonlinear optimal sampled-data control problem (OSDCP) τ, q(τ ), u(kT )) dτ, with k = E(τ /T ), q(t) = f (t, q(t), u(kT ))), with k = E(t/T ), u(kT ) ∈ Ω, g(q(0), q(t f )) ∈ S.
Here, f : R × R n × R m → R n and f 0 : R × R n × R m → R are continuous, and of class C 1 in (q, u), g : R n × R n → R j is of class C 1 , and Ω (resp., S) is a non-empty closed convex subset of R m (resp., of R j ).The final time t f ≥ 0 can be fixed or not.
Note that, under appropriate (usual) compactness and convexity assumptions, the optimal control problem (OSDCP) NT has at least one solution (see Theorem 2 in Section 2.2).Recall that g is said to be submersive at a point (q 1 , q 2 ) ∈ R n × R n if the differential of g at this point is surjective.We define the Hamiltonian H : R × R n × R n × R × R m → R, as usual, by H(t, q, p, p 0 , u) = p, f (t, q, u) R n + p 0 f 0 (t, q, u).
In the case where kT ∈ [0, t f ) with (k + 1)T > t f , the above maximization condition is still valid provided (k + 1)T is replaced with t f .
• Transversality condition on the final time: If the final time is let free in the optimal control problem (OSDCP) NT , if t f > 0 and if f and f 0 are of class C 1 with respect to t in a neighborhood of t f , then the nontrivial couple (p, p 0 ) can be moreover selected to satisfy H(t f , q(t f ), p(t f ), p 0 , u(kT )) = 0, where k = E(t f /T ) whenever t f / ∈ NT , and k = E(t f /T ) − 1 whenever t f ∈ NT .
Note that the only difference with the usual statement of the PMP for purely continuoustime optimal control problems is in the maximization condition.Here, for sampled-data control systems, the usual pointwise maximization condition of the Hamiltonian is replaced with the more complicated inequality (1).This is not a surprise, because already in the purely discrete case, as mentioned earlier, the pointwise maximization condition fails to be true in general, and must be replaced with a weaker condition.
The terms h y ((k + 1)T ) and h 0 y in the maximization condition (1) depend on the value of u only at the controlling time t = kT (see (2) and ( 3)).As a consequence, the condition (1), which is satisfied for every y ∈ Ω, gives a necessary condition allowing to compute u(kT ) in general, and this, for all controlling times kT ∈ [0, t f ).It follows that, despite appearences, the conditions (1)-( 2)-(3) are not difficult to exploit.We will provide in Section 3.1 a simple optimal consumption problem with sampled-data control, and show how these computations can be done in a simple way.
Note that the optimal sampled-data control problem (OSDCP) NT can of course be seen as a finite-dimensional optimization problem where the unknowns are u(kT ), with k ∈ N such that kT ∈ [0, t f ).The same remark holds, by the way, for purely discrete-time optimal control problems.One could then apply classical Lagrange multiplier (or KKT) rules to such optimization problems with constraints (numerically, this leads to direct methods).The Pontryagin maximum principle is a far-reaching version of the Lagrange multiplier rule, yielding more precise information and reducing the initial optimal control problem to a shooting problem (see, e.g., [51] for such a discussion).
Extension to the time scale framework.In this paper, we actually establish a version of Theorem 1 in a much more general framework, allowing for example to study sampled-data control systems where the control can be permanent on a first time interval, then sampled on a finite set, then permanent again, etc.More precisely, Theorem 1 can be extended to a general framework in which the set of controlling times is not NT but some arbitrary non-empty closed subset of R (i.e., a time scale), and also the state may evolve on another time scale.We will state a PMP for such general systems in Section 2.3 (see Theorem 3).Since such systems, where we have two time scales (one for the state and one for the control), can be viewed as a generalization of sampled-data control systems, we will refer to them as sampled-data control systems on time scales.
Let us first recall and motivate the notion of time scale.The time scale theory was introduced in [34] in order to unify discrete and continuous analysis.By definition, a time scale T is an arbitrary non-empty closed subset of R, and a dynamical system is said to be posed on the time scale T whenever the time variable evolves along this set T. The time scale theory aims at closing the gap between continuous and discrete cases, and allows one to treat general processes involving both continuous-time and discrete-time variables.The purely continuous-time case corresponds to T = R + and the purely discrete-time case corresponds to T = N.But a time scale can be much more general (see, e.g., [29,44] for a study of a seasonally breeding population whose generations do not overlap, and see [6] for applications to economics), and can even be a Cantor set.Many notions of standard calculus have been extended to the time scale framework, and we refer the reader to [1,2,10,11] for details on that theory.
The theory of the calculus of variations on time scales, initiated in [8], has been well studied in the existing literature (see, e.g., [7,9,17,28,35,38]).In [36,37], the authors establish a weak version of the PMP (with a nonpositive gradient condition) for control systems defined on general time scales.In [16], we derived a strong version of the PMP, in a very general time scale setting, encompassing both the purely continuous-time PMP (with a maximization condition) and the purely discrete-time PMP (with a nonpositive gradient condition).
All these works are concerned with control systems defined on general time scales with permanent control.The main objective of the present paper is to handle control systems defined on general time scales with nonpermanent control, that we refer to as sampled-data control systems on time scales, and for which we assume that the state and the control are allowed to evolve on different time scales (the time scale of the control being a subset of the time scale of the state).This framework is the natural extension of the classical sampled-data setting, and allows to treat simultaneously many sampling-data control situations.
Our main result is a PMP for general finite-dimensional nonlinear optimal sampled-data control problems on time scales.Our proof is based on the construction of appropriate needle-like variations and on the Ekeland variational principle.In the case of a permanent control, our statement encompasses the time scale version of the PMP obtained in [16], and a fortiori it also encompasses the classical continuous and discrete versions of the PMP.
Organization of the paper.In Section 2, after having recalled several basic facts in time scale calculus, we define a general nonlinear optimal sampled-data control problem defined on time scales, and we state a Pontryagin maximum principle (Theorem 3) for such problems.Section 3 is devoted to some applications of Theorem 3 and further comments.Section 4 is devoted to the proof of Theorem 3.

Main result
Let T be a time scale, that is, an arbitrary non-empty closed subset of R. Without loss of generality, we assume that T is bounded below, denoting by a = min T, and unbounded above. 1 Throughout the paper, T will be the time scale on which the state of the control system evolves.
We start the section by recalling some useful notations and basic results of time scale calculus, in particular the notion of Lebesgue ∆-measure and of absolutely continuous function within the time scale setting.The reader already acquainted with time scale calculus may jump directly to Section 2.2.

Preliminaries on time scale calculus
The forward jump operator σ : T → T is defined by σ(t) = inf{s ∈ T | s > t} for every t ∈ T. A point t ∈ T is said to be right-scattered whenever σ(t) > t.A point t ∈ T is said to be right-dense whenever σ(t) = t.We denote by RS the set of all right-scattered points of T, and by RD the set of all right-dense points of T. Note that RS is at most countable (see [23,Lemma 3.1]) and that RD is the complement of RS in T. The graininess function µ : T → R + is defined by µ(t) = σ(t) − t for every t ∈ T.
For every subset A of R, we denote by A T = A ∩ T.An interval of T is defined by I T where I is an interval of R. For every b ∈ T\{a} and every s ∈ [a, b) T ∩ RD, we set Note that 0 is not isolated in V s,b .∆-differentiability. Let n ∈ N * .The notations • R n and •, • R n respectively stand for the usual Euclidean norm and scalar product of R n .A function q : T → R n is said to be ∆-differentiable at t ∈ T if the limit , where q σ = q • σ.Recall that, if t ∈ RD, then q is ∆-differentiable at t if and only if the limit of q(t)−q(s) t−s as s → t, s ∈ T, exists; in that case it is equal to q ∆ (t).If t ∈ RS and if q is continuous at t, then q is ∆-differentiable at t, and q ∆ (t) = q σ (t)−q(t) (see [10]).
If q, q : T → R n are both ∆-differentiable at t ∈ T, then the scalar product q, q R n is ∆-differentiable at t and These equalities are usually called Leibniz formulas (see [10,Theorem 1.20]).
Lebesgue ∆-measure and Lebesgue ∆-integrability.Let µ ∆ be the Lebesgue ∆-measure on T defined in terms of Carathéodory extension in [11,Chapter 5].We also refer the reader to [3,5,23,31] for more details on the µ ∆ -measure theory.For all (c, d) Let A ⊂ T. A property is said to hold ∆-almost everywhere (in short, ∆-a.e.) on A if it holds for every t ∈ A\A , where A ⊂ A is some µ ∆ -measurable subset of T satisfying µ ∆ (A ) = 0.In particular, since µ ∆ ({r}) = µ(r) > 0 for every r ∈ RS, we conclude that, if a property holds ∆-a.e. on A, then it holds for every Let n ∈ N * and let A ⊂ T be a µ ∆ -measurable subset of T. Consider a function q defined ∆-a.e. on A with values in R n .Let Ã = A ∪ (r, σ(r)) r∈A∩RS , and let q be the extension of q defined µ L -a.e. on Ã by q(t) = q(t) whenever t ∈ A, and by q(t) = q(r) whenever t ∈ (r, σ(r)), for every r ∈ A ∩ RS.Recall that q is µ ∆ -measurable on A if and only if q is µ L -measurable on Ã (see [23,Proposition 4.1]).
Let n ∈ N * and let A ⊂ T be a µ ∆ -measurable subset of T. The functional space L ∞ T (A, R n ) is the set of all functions q defined ∆-a.e. on A, with values in R n , that are µ ∆ -measurable on A and bounded ∆-almost everywhere.Endowed with the norm q L ∞ T (A,R n ) = sup ess τ ∈A q(τ ) R n , it is a Banach space (see [3,Theorem 2.5]).The functional space L 1 T (A, R n ) is the set of all functions q defined ∆-a.e. on A, with values in R n , that are µ ∆ -measurable on A and such that A q(τ ) R n ∆τ < +∞.Endowed with the norm q L 1 T (A,R n ) = A q(τ ) R n ∆τ , it is a Banach space (see [3,Theorem 2.5]).We recall here that if q ∈ L 1 Absolutely continuous functions.
Assume that q ∈ L 1 T ([c, d) T , R n ), and let Q be the function defined on [c, d] T by Q(t) = [t0,t) T q(τ ) ∆τ whenever t ≥ t 0 , and by Q 0 ∆-a.e. on [c, d) T , then q is constant on [c, d] T , and that, if q, q ∈ AC([c, d] T , R n ), then q, q R n ∈ AC([c, d] T , R) and the Leibniz formula ( 5) is available ∆-a.e. on [c, d) T .
For every for every s ∈ L [c,d) T (q) ∩ RD, where V s,d is defined by (4).

Optimal sampled-data control problems on time scales
Let T 1 be another time scale.Throughout the paper, T 1 will be the time scale on which the control evolves.We assume that T 1 ⊂ T. 2  Similarly to T, we assume that min T 1 = a and that T 1 is unbounded above.As in the previous paragraph, we introduce the notations σ 1 , RS 1 , RD 1 , V s,b 1 , ∆ 1 , etc., associated with the time scale T 1 .Since T 1 ⊂ T, note that RS ⊂ RS 1 and RD 1 ⊂ RD.We define the map For every t ∈ T 1 , we have Φ(t) = t.For every t ∈ T\T 1 , we have Φ(t) ∈ RS 1 and Φ(t In what follows, given a function u : T 1 → R, we denote by u Φ the composition u • Φ : T → R. Of course, when dealing with functions having multiple components, this composition is applied to each component.Let us mention, at this step, that if u ∈ L ∞ T1 (T 1 , R) then u Φ ∈ L ∞ T (T, R) (see Proposition 1 and more properties in Section 4.1.1).
Let n, m and j be nonzero integers.We consider the general nonlinear optimal sampled-data control problem on time scales (OSDCP) Here, the trajectory of the system is q : T → R n , the mappings f : and Ω (resp., S) is a non-empty closed convex subset of R m (resp., of R j ).The final time b ∈ T can be fixed or not.
Remark 1.We recall that, given u ∈ L ∞ T1 (T 1 , R m ), we say that q is a solution of (6) on I T if: 1.I T is an interval of T satisfying a ∈ I T and I T \{a} = ∅; 2 Indeed, it is not natural to consider controlling times t ∈ T 1 at which the dynamics does not evolve, that is, at which t / ∈ T. The value of the control at such times t ∈ T 1 \T would not influence the dynamics, or, maybe, only on [t * , +∞[ T where t * = inf{s ∈ T | s ≥ t}.In this last case, note that t * ∈ T and we can replace T 1 by (T 1 ∪ {t * })\{t} without loss of generality.
Existence and uniqueness of solutions (Cauchy-Lipschitz theorem on time scales) have been established in [15], and useful results are recalled in Section 4.1.2.
Remark 2. The time scale T 1 stands for the set of controlling times of the control system (6).If T = T 1 , then the control is permanent.The case T = T 1 = R + corresponds to the classical continuous case, whereas T = T 1 = N coincides with the classical discrete case.If T 1 T, the control is nonpermanent and sampled.In that case, the sampling times are given by t ∈ RS 1 such that σ(t) < σ 1 (t) and the corresponding sampling time intervals are given by [t, σ 1 (t)) T .The classical optimal sampled-data control problem (OSDCP) NT investigated in Theorem 1 corresponds to T = R + and T 1 = NT , with T > 0.
Remark 3. Let us consider two optimal control problems (OSDCP) T T1 and (OSDCP) T T2 , posed on the same general time scale T for the trajectories, but with two different sets of controlling times T 1 and T 2 , and let us assume that T 2 ⊂ T 1 .We denote by Φ 1 and Φ 2 the corresponding mappings from T to T 1 and T 2 respectively.If u 1 ∈ L ∞ T1 (T 1 , Ω) is an optimal control for (OSDCP) for ∆-a.e.t ∈ T, it is clear that u 2 is an optimal control for (OSDCP) T T2 .We refer to Section 3.1 for examples.Remark 4. The framework of (OSDCP) T T1 encompasses optimal parameter problems.Indeed, let us consider the parametrized dynamical system with λ ∈ Ω.Then, considering (6) coincides with (7) where u(a) plays the role of λ.In this situation, Theorem 3 (stated in Section 2.3) provides necessary conditions for optimal parameters λ.We refer to Section 3.1 for examples.
Remark 5. A possible extension is to study dynamical systems on time scales with several sampled-data controls but with different sets of controlling times: q ∆ (t) = f (t, q(t), u Φ1 1 (t), u Φ2 2 (t)), ∆-a.e.t ∈ T, where T 1 and T 2 are general time scales contained in T, and Φ 1 and Φ 2 are the corresponding mappings from T to T 1 and T 2 .Our main result (Theorem 3) can be easily extended to this framework.Actually, this multiscale version will be useful in order to derive the transversality condition on the final time (see Remark 30).Remark 6.Another possible extension is to study dynamical systems on time scales with sampleddata control where the state q and the constraint function f are also sampled: where T 1 , T 2 and T 3 are general time scales contained in T, and Φ 1 , Φ 2 and Φ 3 are the corresponding mappings from T to T 1 , T 2 and T 3 respectively.In particular, the setting of [16] corresponds to the above framework with T = R + and T 1 = T 2 = T 3 a general time scale.
Although this is not the main objective of our paper, we provide hereafter a result stating the existence of optimal solutions for (OSDCP) T T1 , under some appropriate compactness and convexity assumptions.Actually, if the existence of solutions is stated, the necessary conditions provided in Theorem 3, allowing to compute explicitly optimal sampled-data controls, may prove the uniqueness of the optimal solution.We refer to Section 3.1 for examples.
Let M stand for the set of trajectories q, associated with b ∈ T and with a sampled-data control u ∈ L ∞ T1 (T 1 , Ω), satisfying (6) ∆-a.e. on [a, b) T and g(q(a), q(b)) ∈ S. We define the set of extended velocities W(t, q) = {(f (t, q, u), f 0 (t, q, u)) | u ∈ Ω} for every (t, q) ∈ T × R n .Theorem 2. If Ω is compact, M is non-empty, q ∞ ≤ M for every q ∈ M for some M ≥ 0, and if W(t, q) is convex for every (t, q) ∈ T × R n , then (OSDCP) T T1 has at least one optimal solution.The proof of Theorem 2 is done in Section 4.4.Note that, in this theorem, it suffices to assume that g is continuous.Besides, the assumption on the boundedness of trajectories can be weakened, by assuming, for instance, that the extended dynamics have a sublinear growth at infinity (see, e.g., [25]; many other easy and standard extensions are possible).

Preliminaries on convexity
The orthogonal of the closed convex set S at a point x ∈ S is defined by It is a closed convex cone containing 0.
We denote by d S the distance function to S defined by d S (x) = inf x ∈S x − x R j , for every x ∈ R j .Recall that, for every x ∈ R j , there exists a unique element P S (x) ∈ S (projection of x onto S) such that d S (x) = x−P S (x) R j .It is characterized by the property x−P S (x), x −P S (x) R j ≤ 0 for every x ∈ S. In particular, x − P S (x) ∈ O S [P S (x)].The function P S is 1-Lipschitz continuous.We recall the following obvious lemmas.

Main result
Recall that g is said to be submersive at a point (q 1 , q 2 ) ∈ R n × R n if the differential of g at this point is surjective.We define the Hamiltonian H : by H(t, q, p, p 0 , u) = p, f (t, q, u, t) R n + p 0 f 0 (t, q, u).

T T1
).If a trajectory q, defined on [a, b] T and associated with a sampled-data control u ∈ L ∞ T1 (T 1 , Ω), is an optimal solution of (OSDCP) T T1 , then there exists a nontrivial couple (p, p 0 ), where p ∈ AC([a, b] T , R n ) (called adjoint vector) and p 0 ≤ 0, such that the following conditions hold: • Extremal equations: for ∆-a.e.t ∈ [a, b) T .
• Maximization condition: for every y ∈ Ω, where h y is the unique solution on [r, σ 1 (r)] T of the linear ∆-Cauchy problem In the case where r ∈ [a, b) T1 ∩ RS 1 with σ 1 (r) > b, the above maximization condition is still valid provided σ 1 (r) is replaced with b.
• Transversality conditions on the adjoint vector: If g is submersive at (q(a), q(b)), then the nontrivial couple (p, p 0 ) can be selected to satisfy where • Transversality condition on the final time: If the final time is let free in the optimal control problem (OSDCP) T T1 , if b belongs to the interior of T (for the topology of R), and if f and f 0 are of class C 1 with respect to t in a neighborhood of b, then the nontrivial couple (p, p 0 ) can be moreover selected such that the Hamiltonian function t → H(t, q(t), p(t), p 0 , u Φ (t)) coincides almost everywhere, in some neighborhood of b, with a continuous function vanishing at t = b.
In particular, if u Φ (t) has a left-limit at b (denoted by u Φ (b − )), then the transversality condition can be written as Theorem 3 is proved in Section 4. Several remarks are in order.
Remark 7. As is well known, the nontrivial couple (p, p 0 ) of Theorem 3, which is a Lagrange multiplier, is defined up to a multiplicative scalar.Defining as usual an extremal as a quadruple (q, p, p 0 , u) solution of the extremal equations ( 8), an extremal is said to be normal whenever p 0 = 0 and abnormal whenever p 0 = 0.In the normal case p 0 = 0, it is usual to normalize the Lagrange multiplier so that p 0 = −1.
Remark 8. Theorem 3 encompasses the time scale version of the PMP derived in [16] when the control is permanent, that is, when , and therefore the condition (9) can be written as the nonpositive gradient condition ∂H ∂u (r, q(r), p(σ(r)), p 0 , u(r)) (y − u(r)) ≤ 0, for every y ∈ Ω.Moreover, in the case of a free final time, under the assumptions made in the fourth item of Theorem 3, b also belongs to the interior of T 1 = T, and then in that case we recover the classical condition max z∈Ω H(b, q(b), p(b), p 0 , z) = 0.
A fortiori, Theorem 3 encompasses both the classical continuous-time and discrete-time versions of the PMP, that is respectively, when T = T 1 = R + and T = T 1 = N.
and thus, in particular, u Φ has a left-limit at b), and therefore H(b, q(b), p(b), p 0 , u(ρ 1 (b))) = 0.This is similar to the situation of Theorem 1.
Remark 10.Let us describe some typical situations of terminal conditions g(q(a), q(b)) ∈ S in (OSDCP) T T1 , and of the corresponding transversality conditions on the adjoint vector.
• If the initial and final points are fixed in (OSDCP) T T1 , that is, if we impose q(a) = q a and q(b) = q b , then j = 2n, g(q 1 , q 2 ) = (q 1 , q 2 ) and S = {q a } × {q b }.In that case, the transversality conditions on the adjoint vector give no additional information.
• If the initial point is fixed, that is, if we impose q(a) = q a , and if the final point is left free in (OSDCP) T T1 , then j = n, g(q 1 , q 2 ) = q 1 and S = {q a }.In that case, the transversality conditions on the adjoint vector imply that p(b) = 0.Moreover, we have p 0 = 03 and we can normalize the Lagrange multiplier so that p 0 = −1 (see Remark 7).
• If the initial point is fixed, that is, if we impose q(a) = q a , and if the final point is subject to the constraint G(q(b)) and S = {q a } × (R + ) k .The transversality conditions on the adjoint vector imply that • If the periodic condition q(a) = q(b) is imposed in (OSDCP) T T1 , then j = n, g(q 1 , q 2 ) = q 1 −q 2 and S = {0}.In that case, the transversality conditions on the adjoint vector yield that p(a) = p(b).
We stress that, in all examples above, the function g is indeed a submersion.
Remark 11.In the case where g is not submersive at (q(a), q(b)), to obtain transversality conditions on the adjoint vector, Theorem 3 can be reformulated as follows: If a trajectory q, defined on [a, b] T and associated with a sampled-data control u ∈ L ∞ T1 (T 1 , Ω), is an optimal solution of (OSDCP) T T1 , then there exists a nontrivial couple (ψ, p 0 ) ∈ R j × R, with −ψ ∈ O S [g(q(a), q(b))] and p 0 ≤ 0, and there exists p ∈ AC([a, b] T , R n ) such that the extremal equations, the maximization conditions and the transversality conditions are satisfied.
However, with this formulation, the couple (p, p 0 ) may be trivial 4 and, as a consequence, the result may not provide any information.We refer to Sections 4.3.3 and 4.3.4 for more details.
Remark 12.In this paper, Ω is assumed to be convex only for technical reasons and in order to simplify notations.The convexity assumption can actually be removed by using the concept of stable Ω-dense directions (see [16,Section 2.2] for details).
On the other hand, the closedness of Ω is used in a crucial way in our proof of the PMP.Indeed, the closure of Ω allows us to define the Ekeland functional on a complete metric space (see Section 4.3.2).However, if the initial point is fixed, that is, if we impose q(a) = q a , and if the final point is left free in (OSDCP) T T1 , then Theorem 3 can be proved with a simple calculus of variations, without using the Ekeland variational principle.In this particular case, the closedness assumption can be removed.
Remark 13.If the cost functional to be minimized in (OSDCP) , then the transversality conditions on the adjoint vector become p(a Moreover, in the fourth item of Theorem 3, if is of class C 1 in a neighborhood of b, the transversality condition on the final time must be replaced by: The nontrivial couple (p, p 0 ) can be selected such that the Hamiltonian function t → H(t, q(t), p(t), p 0 , u Φ (t)) coincides almost everywhere, in some neighborhood of b, with a continuous function that is equal to To prove this claim, it suffices to modify accordingly the Ekeland functional in the proof of Theorem 3 (see Section 4.3.2).

Applications and further comments
In this section, we first give, in Section 3.1, a very simple example of an optimal control problem on time scales with sampled-data control, that we treat in details and on which all computations are explicit.The interest is that this example provides as well a simple situation where it is evident that some of the properties that are valid in the classical continous-time PMP do not hold anymore in the time-scale context.We gather these remarks in Section 3.2.

A model for optimal consumption with sampled-data control
Throughout this subsection, T and T 1 are two time scales, unbounded above, satisfying T 1 ⊂ T, min T = min T 1 = 0 and 12 ∈ T. In the sequel, we study the following one-dimensional dynamical system with sampled-data control on time scales: with the initial condition q(0) = 1, and subject to the constraint u(t) ∈ [0, 1] for ∆ 1 -a.e.t ∈ [0, 12) T1 .Since the final time b = 12 is fixed, we can assume that 12 ∈ T 1 without loss of generality.
The above model is a classical model for the evolution of a controlled output of a factory during the time interval [0, 12] T (corresponding to the twelve months of a year).Precisely, q(t) ∈ R stands for the output at time t ∈ T and u(t) ∈ [0, 1] stands for the fraction of the output reinvested at each controlling time t ∈ T 1 .We assume that this fraction is sampled at each sampling time t ∈ T 1 such that t ∈ RS 1 and σ(t) < σ 1 (t), on the corresponding sampling interval [t, σ 1 (t)) T .In the sequel, our goal is to maximize the total consumption In other words, our aim is to maximize the quantity of the output that we do not reinvest.
Remark 14.In the continuous case and with a permanent control (that is, with T = T 1 = R + ), the above optimal control problem is a very well-known application of the classical Pontryagin maximum principle.We refer for example to [48, Exercice 2.3.3.p.82] or [40, p.92].In this section, our aim is to solve this optimal control problem in cases where the control is nonpermanent and sampled.We first treat some examples in the continuous-time setting T = R + and then in the discrete-time setting T = N.
Since f and f 0 are affine in u and since Ω is compact, Theorem 2 asserts that (OSDCP) admits an optimal solution q, defined on [0, 12] T and associated with a sampled-data control u ∈ L ∞ T1 (T 1 , [0, 1]).We now apply Theorem 3 in order to compute explicitly the values of u at each controlling time t ∈ [0, 12) T1 .The nontrivial couple (p, p 0 ) satisfies p 0 = −1 and p(12) = 0 (see Remark 10).The adjoint vector p ∈ AC([0, 12] T , R) is solution of Moreover, one has the following maximization conditions: 2. for every r ∈ [0, 12 for every y ∈ [0, 1], where h y is the unique solution of the following linear ∆-Cauchy problem on [r, σ 1 (r)] T : and where Since q is solution of (11) and satisfies q(0) = 1, one can easily see that q is monotonically increasing on [0, 12] T and then q has positive values.From (13) and since p(12) = 0, one can easily obtain that p is monotonically decreasing on [0, 12] T and then p has nonnegative values.
Note that Γ r depends only on µ 1 (r) and p(σ 1 (r)).As a consequence, the knowledge of the value p(12) = 0 and the above properties allow to compute u(r 0 ) where r 0 is the element of [0, 12) T1 such that σ 1 (r 0 ) = 12 (and µ 1 (r 0 ) = 12 − r 0 ).Then, the knowledge of u(r 0 ) allows to compute p(r 0 ) from (13).Then, the knowledge of p(r 0 ) and the above properties allow to compute u(r 1 ) where r 1 is the element of [0, 12) T1 such that σ 1 (r 1 ) = r 0 (and µ 1 (r 1 ) = r 0 − r 1 ), etc. Actually, this recursive procedure allows to compute u(r) for every r ∈ [0, 12) T1 .Numerically, we obtain the following results: 142 C(u) 28299.767 885 C(u) 28299.767 24 Remark 15.In this example, Theorem 2 states the existence of an optimal solution.In all cases above studied, the Pontryagin maximum principle proves that the optimal solution is moreover unique.
Remark 16.The case T 1 = N can be easily deduced from the permanent case T 1 = R + (see Remark 3).Similarly, the case T 1 = 9N can be deduced from the case T 1 = 3N.
Remark 17.The case T 1 = 12N corresponds to an optimal parameter problem (see Remark 4).
Remark 21.The case T 1 = 2N can be seen as a consequence of the permanent case T 1 = N (see Remarks 3 and 20).
Remark 22.The case T 1 = 12N corresponds to an optimal parameter problem (see Remark 4).

Non-extension of several classical properties
In this section, we recall some basic properties that occur in classical optimal control theory in the continuous-time setting and with a permament control, that is, with T = T 1 = R + .Our aim is to discuss their extension (or their failure) to the general time scale setting and to the nonpermament control case.We will provide several counterexamples in the discrete-time setting with a permanent control (T = T 1 = N) and in the continuous-time setting with a nonpermanent control (T 1 T = R + ).In the following paragraphs, except the last one, the final time b ∈ T can be fixed or not.
Pointwise maximization condition of the Hamiltonian.In the case T = T 1 = R + , an optimal (permanent) control satisfies the maximization condition u(t) ∈ arg max z∈Ω H(t, q(t), p(t), p 0 , z) for a.e.t ∈ [0, b).We refer to [16,Example 7] for a counterexample showing the failure of this maximization condition in the case T = T 1 = N.We refer to Remark 18 for a counterexample in the case Continuity of the Hamiltonian.In the case T = T 1 = R + , it is well known that the Hamiltonian function t → H(t, q(t), p(t), p 0 , u(t)) coincides almost everywhere on [0, b] with the continuous function t → max z∈Ω H(t, q(t), p(t), p 0 , z).Remark 18 provides a counterexample showing the failure of this regularity property in the case Remark 23.Nevertheless, in the case of a free final time, under the assumptions of the fourth item of Theorem 3, the Hamiltonian function t → H(t, q(t), p(t), p 0 , u Φ (t)) coincides almost everywhere, in some neighborhood of b, with a continuous function.
The autonomous case.In the case T = T 1 = R + , if the Hamiltonian H is autonomous (that is, does not depend on t), it is well known that the function t → H(q(t), p(t), p 0 , u(t)) is almost everywhere constant on [0, b], this constant being equal to the maximized Hamiltonian.We refer to [16,Example 8] for a counterexample showing the failure of this constantness property in the case T = T 1 = N, and we refer to Remark 18 for a counterexample in the case T 1 T = R + (and there, even the maximized autonomous Hamiltonian is not constant).
Saturated constraint set Ω for Hamiltonian affine in u.In the case T = T 1 = R + , if the Hamiltonian is affine in u, that is, if it can be written as H(t, q(t), p(t), p 0 , u(t)) = H 1 (t, q(t), p(t), p 0 ), u(t) R m + H 2 (t, q(t), p(t), p 0 ), one can easily prove that H 1 (t, q(t), p(t), p 0 ) ∈ O Ω [u(t)] for almost every t ∈ [0, b).It follows that an optimal (permament) control u must take its values at the boundary of Ω for almost every t ∈ [0, b) such that H 1 (t, q(t), p(t), p 0 ) = 0 (saturation of the constraints).
Remark 24.This classical property can be extended to the case T = T 1 = N.Indeed, in that case, the nonpositive gradient condition is given by Remark 25.Remark 20 provides an interesting example in the case T = T 1 = N.Indeed, in that case, the control defined by u(t) = 0 for every t ∈ {0, . . ., 9}, u(10) = 1/2 and u(11) = 1 is an optimal (permanent) control.However, it does not saturate the constraint set Ω at t = 10.It is not a surprise since, in that case, H 1 (t, q(t), p(t + 1), p 0 ) = 0 at t = 10.
Note that, in the case T 1 = 4N, Section 3.1.2provides a counterexample showing the failure of this classical property in the case T 1 T = N.Similarly, Section 3.1.1 in the case T 1 = 3N provides a counterexample in the case T 1 T = R + .Remark 26. Figure 1 represents the values of the optimal sampled-data control u of Section 3.1.1,in the case T 1 = 12N∪{λ}, with λ ∈ (0, 12).In that case, u(0) (resp., u(λ)) saturates the constraint set Ω = [0, 1] λ ∈ (0, 11.9245) (resp., for λ ∈ (9.9866, 12)).Vanishing of the maximized Hamiltonian at the final time.We assume that the final time is let free.In the case T = T 1 = R + , under the assumptions of the fourth item of Theorem 3, it is well known that the maximized Hamiltonian vanishes at t = b (see Remark 8).In Theorem 3, we have established that this property is still valid in the time scale setting under some appropriate conditions, the main one being that b must belong to the interior of T. In the discrete case T = N, the interior of T is empty and then the latter assumption is never satisfied.The Hamiltonian at the final time may then not vanish, and we refer to [16,Example 8] for a counterexample with T = T 1 = N.

Proofs
The section is structured as follows.Subsections 4.1, 4.2 and 4.3 are devoted to the proof of Theorem 3. In Subsection 4.1, we recall some known Cauchy-Lipschitz results on time scales and we establish some preliminary results on the relations between u and u Φ .In Subsection 4.2, we introduce appropriate needle-like variations of the control.Finally, in Subsection 4.3, we apply the Ekeland variational principle to an adequate functional in an appropriate complete metric space, and then we prove the PMP.In Subsection 4.4 (that the reader can read independently of the rest of Section 4), we detail the proof of Theorem 2.

Relations between u and u Φ
We start with a lemma whose arguments of proof will be used several times.Proof.Without loss of generality, we can assume that v is constant equal to 0 R m .Let us define Since Φ(t) = t for every t ∈ T 1 , the inclusion A ⊂ B holds.Firstly, let us assume that µ ∆1 (A) = 0. Hence, A ⊂ RD 1 and µ ∆1 (A) = µ L (A) = 0. Since A ⊂ RD 1 ⊂ RD, we deduce that µ ∆ (A) = µ L (A) = 0. On the other hand, for every t ∈ B, Φ(t) ∈ A ⊂ RD 1 , then t ∈ T 1 and t = Φ(t) ∈ A. We conclude that A = B and µ ∆ (B) = 0. Secondly, let us assume that µ ∆ (B) = 0 and µ ∆1 (A) > 0. Since A ⊂ B, we deduce that µ ∆ (A) = 0, A ⊂ RD and µ ∆ (A) = µ L (A) = 0. Since µ L (A) = 0 and µ ∆1 (A) > 0, we conclude that there exists Proposition 1.Let m ∈ N * and let c < d be two elements of T with c ∈ T 1 .

For every
We first treat the µ ∆ -measurability of u Φ .From Lemma 3, we can consider that u is defined everywhere on [c, d) T1 and is Hence, it is sufficient to see that u Φ = ũ| [c,d) .This is true since, for any t ∈ [c, d), we have: • either t ∈ T 1 , and then t ∈ T and thus u Φ (t) = u Φ (t) = u(Φ(t)) = u(t) = ũ(t).
Considering u R m instead of u, we can consider that m = 1 and u ∈ L 1 T1 ([c, d) T1 , R + ).From [23], we have Noting that (d − d)u(Φ(d)) ≥ 0 concludes the proof of the first point.
Let us prove the second point.
Let M ≥ 0 be a constant.With the same arguments as in the proof of Lemma 3, we can prove that u Remark 27.From the proof above, we see that the inequality ( 18) is an equality if and only if d ∈ T 1 or u(Φ(d)) = 0.Then, considering T = N, T 1 = 2N, m = 1, c = 0, d = 1 and u the constant function equal to 1 provides a counterexample.Indeed, in that case, we have

Recalls on ∆-Cauchy-Lipschitz results
According to [15, Theorem 1], for every control u ∈ L ∞ T1 (T 1 , R m ) and every initial condition q a ∈ R n , there exists a unique maximal solution of ( 6) such that q(a) = q a , denoted by q(•, u, q a ), and defined on a maximal interval, denoted by I T (u, q a ).The word maximal means that q(•, u, q a ) is an extension of any other solution.Moreover, we recall that (see [15, Lemma 1]) ∀t ∈ I T (u, q a ), q(t, u, q a ) = q a + [a,t) T f (τ, q(τ, u, q a ), u Φ (τ )) ∆τ.
Finally, either I T (u, q a ) = T, that is, q(•, u, q a ) is a global solution of ( 6), or I T (u, q a ) = [a, c) T where c is a left-dense point of T, and in that case, q(•, u, q a ) is unbounded on I T (u, q a ) (see [15,Theorem 2]).
For a given b ∈ T, we denote by UQ b the set of all admissible couples (u, q a ) on [a, b] T .It is endowed with the norm (u,

Needle-like variations of the control, and variation of the initial condition
Throughout this section, we consider b ∈ T and (u, q a ) ∈ U Q b .We are going, in particular, to define appropriate needle-like variations.As in [16], we have to distinguish between right-dense and right-scattered times, along the time scale T 1 .We will also define appropriate variations of the initial condition.

General variation of (u, q a )
In the first lemma below, we prove that UQ b is open.Actually we prove a stronger result, by showing that UQ b contains a neighborhood of any of its point in L 1 T1 × R n topology, which will be useful in order to define needle-like variations.

Lemma 5 ([10]
).Let c < d be two elements of T, let L 1 and L 2 be two nonnegative real numbers, Proof of Lemma 4. By continuity of q(•, u, q a ) on [a, b] T , the set Therefore ∂f /∂q and ∂f /∂u are bounded on K R by some L R ≥ 0 and then, from convexity and from the mean value inequality, it holds: for all (t, x 1 , v 1 ), (t, Let (u , q a ) ∈ E R (u, q a ).Our aim is to prove that b ∈ I T (u , q a ).By contradiction, assume that the set is not empty and let If t 0 is a minimum then q(t 0 , u , q a ) − q(t 0 , u, q a ) R n > 1.If t 0 is not a minimum then t 0 ∈ RD and by continuity we have q(t 0 , u , q a ) − q(t 0 , u, q a ) R n ≥ 1. Moreover one has t 0 > a since q(a, u , q a ) − q(a, u, q a ) R n = q a − q a R n ≤ η R < 1. Hence q(τ, u , q a ) − q(τ, u, q a ) R n ≤ 1 for every τ ∈ [a, t 0 ) T .Therefore (q(τ, u , q a ), u Φ (τ ), τ ) and (q(τ, u, q a ), u Φ (τ ), τ ) are elements of K R for ∆-a.e.τ ∈ [a, t 0 ) T .Since one has q(t, u , q a ) − q(t, u, q a ) = q a − q a + [a,t) T f (τ, q(τ, u , q a ), u Φ (τ )) − f (τ, q(τ, u, q a ), u Φ (τ )) ∆τ, for every t ∈ I T (u , q a ) ∩ [a, b] T , it follows from (19), from Lemma 5 and from Proposition 1 that, for every t ∈ [a, t 0 ] T , This raises a contradiction at t = t 0 .Therefore A is empty and thus q(•, u , q a ) is bounded on and (u , q a ) ∈ E R (u, q a ).With the notations of the above proof, since [a, b] T ⊂ I T (u , q a ) and A is empty, we infer that q(t, u , q a ) − q(t, u, q a ) ≤ 1, for every t ∈ [a, b] T .Therefore (τ, q(τ, u , q a ), u Φ (τ )) ∈ K R for every (u , q a ) ∈ E R (u, q a ) and for ∆-a.e.
is Lipschitzian.In particular, for every (u , q a ) ∈ E R (u, q a ), q(•, u , q a ) converges uniformly to q(•, u, q a ) on [a, b] T when u tends to u in L 1 T1 ([a, b) T1 , R m ) and q a tends to q a in R n .
Proof.Let (u , q a ) and (u , q a ) be two elements of E R (u, q a ) ⊂ UQ b .It follows from Remark 28 that (τ, q(τ, u , q a ), u Φ (τ )) and (τ, q(τ, u , q a ), u Φ (τ )) are elements of K R for ∆-a.e.τ ∈ [a, b) T .Following the same arguments as in the previous proof, it follows from (19), from Lemma 5 and from Proposition 1 that, for every t ∈ [a, b] T , The lemma follows.

Needle-like variation of u at a point r ∈ RS 1
Let r ∈ [a, b) T1 ∩ RS 1 and let y ∈ R m .We define the needle-like variation Π = (r, y) of u at r by Lemma 7.There exists . We use the notations K R , L R , ν R and η R , defined in Lemma 4 and in its proof.One has u Hence, there exists 0 and hence (u Π (•, α), q a ) ∈ E R (u, q a ).The claim follows then from Lemma 4.
Proof.We use the notations of the proof of Lemma 7. It follows from Lemma 6 that there exists C ≥ 0 (the Lipschitz constant of F R (u, q a )) such that for all α 1 and α 2 in [0, α 0 ].The lemma follows.
We define the so-called first variation vector h Π (•, u, q a ) associated with the needle-like variation Π = (r, y) as the unique solution on [r, σ * 1 (r)] T of the linear ∆-Cauchy problem The existence and uniqueness of h Π (•, u, q a ) are ensured by [15,Theorem 3].
Then, we define the so-called second variation vector w Π (•, u, q a ) associated with the needle-like variation Π = (r, y) as the unique solution on [σ * 1 (r), b] T of the linear ∆-Cauchy problem The existence and uniqueness of w Π (•, u, q a ) are ensured by [15,Theorem 3].
and let (u k , q a,k ) k∈N be a sequence of elements of E R (u, q a ).If u k converges to u ∆ 1 -a.e. on [a, b) T1 and q a,k converges to q a in R n as k tends to +∞, then h Π (•, u k , q a,k ) converges uniformly to h Π (•, u, q a ) on [r, σ * 1 (r)] T as k tends to +∞.Proof.We use the notations K R , L R , ν R and η R , defined in Lemma 4 and in its proof.
Let us consider the absolutely continuous function defined by for every t ∈ [r, σ * 1 (r)] T and every k ∈ N. Since (u k , q a,k ) ∈ E R (u, q a ) for every k ∈ N, it follows from Remark 28 that (τ, q(τ, u k , q a,k ), Since µ ∆1 ({r}) = µ 1 (r) > 0, u k (r) converges to u(r) as k tends to +∞.Moreover, from the Lebesgue dominated convergence theorem, (u k , q a,k ) converges to (u, q a ) in (E R (u, q a ), and, from Lemma 6, q(•, u k , q a,k ) converges uniformly to q(•, u, q a ) on [a, b] T as k tends to +∞.Since ∂f /∂q and ∂f /∂u are uniformly continuous on K R , we conclude that Υ k converges to 0 as k tends to +∞.The lemma follows.
and let (u k , q a,k ) k∈N be a sequence of elements of E R (u, q a ).If u k converges to u ∆ 1 -a.e. on [a, b) T1 and q a,k converges to q a in R n as k tends to +∞, then w Π (•, u k , q a,k ) converges uniformly to w Π (•, u, q a ) on [σ * 1 (r), b] T as k tends to +∞.
Remark 29.With the same arguments of the proof of Lemma 3, the hypotheses of Lemma 10 give the convergences of u Φ k to u Φ ∆-a.e. on [a, b) T and, from the Lebesgue dominated convergence theorem, of (u k , q a,k ) to (u, q a ) in (E R (u, q a ), Proof of Lemma 10.We use the notations K R , L R , ν R and η R , defined in Lemma 4 and in its proof.From Lemma 9, the case σ * 1 (r) = b is already proved.As a consequence, we only focus here on the case σ * 1 (r) = σ 1 (r) < b.Let us consider the absolutely continuous function defined by Let us prove that ε k converges uniformly to 0 on [σ 1 (r), b] T as k tends to +∞.One has and from Lemma 6, q(•, u k , q a,k ) converges uniformly to q(•, u, q a ) on [a, b] T as k tends to +∞.Since ∂f /∂q is continuous and bounded on K R and since u Φ k converges to u Φ ∆-a.e. on [a, b) T , the Lebesgue dominated convergence theorem concludes that Υ k converges to 0 as k tends to +∞.Finally, from Lemma 9, one has ε k (σ 1 (r)) = w Π (σ 1 (r), u k , q a,k ) − w Π (σ 1 (r), u, q a ) = h Π (σ 1 (r), u k , q a,k ) − h Π (σ 1 (r), u, q a ) converges to 0 as k tends to +∞.The lemma follows.

Needle-like variation of u at a point
Note that s ∈ T 1 and then Φ(s) = s.We define the needle-like variation = (s, z) of u at s by Lemma 11.There exists Hence, there exists β 0 > 0 such that for every ≤ ν R and thus (u (•, β), q a ) ∈ E R (u, q a ).The conclusion then follows from Lemma 4.
Proof.We use the notations of the proof of Lemma 11.From Lemma 6, there exists C ≥ 0 (Lipschitz constant of F R (u, q a )) such that According to [15, Theorem 3], we define the variation vector w (•, u, q a ) associated with the needle-like variation = (s, z) as the unique solution on [s, b] T of the linear ∆-Cauchy problem Proposition 4. For every δ ∈ V s,b \{0}, the mapping is differentiable at 0, and one has DF (u, q a )(0) = w (•, u, q a ).
and let (u k , q a,k ) k∈N be a sequence of elements of E R (u, q a ).If u k converges to u ∆ 1 -a.e. on [a, b) T1 , u k (s) converges to u(s) and q a,k converges to q a as k tends to +∞, then w (•, u k , q a,k ) converges uniformly to w (•, u, q a ) on [s, b] T as k tends to +∞.
Proof.The proof is similar to the one of Lemma 10, replacing σ 1 (r) with s.

Variation of the initial condition q a
Let q a ∈ R n .Lemma 14.There exists γ 0 > 0 such that (u, q a + γq a ) ∈ UQ b for every γ ∈ [0, γ 0 ].
. We use the notations K R , L R , ν R and η R , defined in the proof of Lemma 4.There exists γ 0 > 0 such that q a + γq a − q a R n = γ q a R n ≤ η R for every γ ∈ [0, γ 0 ], and hence (u, q a + γq a ) ∈ E R (u, q a ).Then the claim follows from Lemma 4.
Proof.We use the notations of the proof of Lemma 14.From Lemma 6, there exists C ≥ 0 (Lipschitz constant of F R (u, q a )) such that for all γ 1 and γ 2 in [0, γ 0 ].
According to [15, Theorem 3], we define the variation vector w q a (•, u, q a ) associated with the perturbation q a as the unique solution on [a, b] T of the linear ∆-Cauchy problem Proposition 5.The mapping is differentiable at 0, and one has DF q a (u, q a )(0) = w q a (•, u, q a ).
To conclude, it remains to prove that Υ q a (γ) converges to 0 as γ tends to 0. Since q(•, u, q a +γq a ) converges uniformly to q(•, u, q a ) on [a, b] T as γ tends to 0 (see Lemma 15) and since ∂f /∂q is uniformly continuous on K R , the conclusion follows.
and let (u k , q a,k ) k∈N be a sequence of elements of E R (u, q a ).If u k converges to u ∆ 1 -a.e. on [a, b) T1 and q a,k converges to q a in R n as k tends to +∞, then w q a (•, u k , q a,k ) converges uniformly to w q a (•, u, q a ) on [a, b] T as k tends to +∞.
Proof.The proof is similar to the one of Lemma 10, replacing σ 1 (r) with a.

Proof of Theorem 3
We are now in a position to prove the PMP.In the sequel, in order to avoid any confusing notations, we denote by q * the optimal trajectory, associated with the optimal sampled-data control u * and with b * ∈ T, with b * = b if the final time is fixed.The upper star designates the optimal solution.We set q * a = q * (a).

The augmented system
As in [43,46], we consider the augmented system in R n+1 with q = (q, q 0 ) the augmented state with values in R n × R, and f : , the augmented dynamics, defined by f (t, q, u) = (f (t, q, u), f 0 (t, q, u)) .Note that f does not depend on q 0 .We will always impose as an initial condition q 0 (a) = 0, so that q 0 (b) = [a,b) T f 0 (τ, q(τ ), u Φ (τ )) ∆τ .Hence, the additional coordinate q 0 stands for the cost.

Application of the Ekeland variational principle
For the completeness, we recall a simplified (but sufficient) version of the Ekeland variational principle.
Note that (u * , q * a ) ∈ E Ω,0 R (u * , q * a ).Since Ω is closed, it follows from the (partial) converse of the Lebesgue dominated convergence theorem that (E Ω,0 R (u * , q * a ), and then is a complete metric space.For every ε > 0, we define the functional J R ε : (E Ω,0 R (u * , q * a ), + d 2 S (g (q a , q(b * , u, qa ))) Since g and d 2 S are continuous and so is F R (u * , q * a ) (see Lemma 6), it follows that J R ε is continuous on (E Ω,0 R (u * , q * a ), • U Qb * ).Moreover, one has J R ε (u * , q * a ) = ε and, from optimality of q 0 * (b * ), J R ε (u, qa ) > 0 for every (u, qa ) ∈ E Ω,0 R (u * , q * a ).It follows from the Ekeland variational principle that, for every ε > 0, there exists for every (u, qa ) ∈ E Ω,0 R (u * , q * a ).In particular, u R ε converges to u * in L 1 T1 ([a, b * ) T1 , R m ) and qR a,ε converges to q * a as ε tends to 0. Besides, setting and . Using a compactness argument, the continuity of F R (u * , q * a ) (see Lemma 6), the C 1 -regularity of g and the (partial) converse of the Lebesgue dominated convergence theorem, we infer that there exists a sequence (ε k ) k∈N of positive real numbers converging to 0 such that u R ε k converges to u * ∆ 1 -a.e. on [a, b * ) T1 , qR a,ε k converges to q * a , g(q R a,ε k , q(b * , u R ε k , qR a,ε k )) converges to g(q * a , q * (b * )) ∈ S, dg(q R a,ε k , q(b * , u R ε k , qR a,ε k )) converges to dg(q * a , q * (b * )), ψ 0R ε k converges to some ψ 0R ≤ 0, and ψ R ε k converges to some ψ R ∈ R j as k tends to +∞, with a , q * (b * ))] (see Lemma 1).In the next lemmas, we use the inequality (25) respectively with needle-like variations of u R ε k at right-scattered points of T 1 and at right-dense points of T 1 , and then variations of qR a,ε k .Hence, we infer some important inequalities by taking the limit in k.Note that these variations were defined in Section 4.2 for any dynamics f , and that we apply them here to the augmented system (24), associated with the augmented dynamics f .Lemma 17.For every r ∈ [a, b * ) T1 ∩ RS 1 and every y ∈ Ω, considering the needle-like variation Π = (r, y) as defined in Section 4.2.2, one has where the variation vector wΠ = (w Π , w 0 Π ) is defined by (21 for every α ∈ [0, 1] and every k ∈ N. Since u R ε k converges to u * ∆ 1 -a.e. on [a, b * ) T1 , it follows that u R ε k (r) converges to u * (r) as k tends to +∞, where u * (r) R m < R.
Moreover, one has for every α sufficiently small and every k sufficiently large.It then follows from (25) that and thus .
Using Proposition 3, we infer that as α tends to 0, using (26) and ( 27) it follows that .
By letting k tend to +∞ and using Lemma 10, the lemma follows.
We define the sets Lemma 18.We have µ ∆1 A ∪ k∈N A k = 0.
We define the set of Lebesgue times Lemma 19.For every s ∈ L R,1 [a,b * ) T ∩ RD 1 and for every z ∈ Ω ∩ B R m (0, R), considering the needle-like variation = (s, z) as defined in Section 4.2.3, one has where the variation vector w = (w , w 0 ) is defined by (22) (replacing f with f ).
Proof.For every k ∈ N and any for every β sufficiently small and every k sufficiently large.It then follows from (25) that and thus .
Using Proposition 4, we infer that as β tends to 0, using (26) and ( 27) it follows that .
By letting k tend to +∞, and using Lemma 13, the lemma follows.
Lemma 20.For every qa ∈ R n × {0}, considering the variation of initial point as defined in Section 4.2.4,one has where the variation vector wqa = (w qa , w 0 qa ) is defined by (23) (replacing f with f ).
Proof.For every k ∈ N and every γ ≥ 0, one has for every γ sufficiently small and every k sufficiently large.It then follows from (25) that and thus .
Using Proposition 5, we infer that as γ tends to 0, using (26) and ( 27) it follows that .
By letting k tend to +∞, and using Lemma 16, the lemma follows.
At this step, we have obtained in the previous lemmas the three inequalities ( 28), ( 29) and ( 30), valuable for any Then, considering a sequence of real numbers R converging to +∞ as tends to +∞, we infer that there exist ψ 0 ≤ 0 and ψ ∈ R j such that ψ 0R converges to ψ 0 and ψ R converges to ψ as tends to +∞, and moreover . Taking the limit in in ( 28), ( 29) and (30), we get the following lemma.
where the variation vector wΠ = (w Π , w 0 Π ) associated with the needle-like variation Π = (r, y) of u is defined by (21) (replacing f with f ).
For every s ∈ L 1 [a,b * ) T ∩ RD 1 and every z ∈ Ω, one has where the variation vector w = (w , w 0 ) associated with the needle-like variation = (s, z) of u is defined by (22) (replacing f with f ); For every qa ∈ R n × {0}, one has where the variation vector wqa = (w qa , w 0 qa ) associated with the variation qa of the initial point q * a is defined by (23) (replacing f with f ).
This result concludes the application of the Ekeland variational principle.The last step of the proof consists of deriving the PMP from these inequalities.

Proof of Remark 11
In this subsection, we prove the formulation of the PMP mentioned in Remark 11.Note that we do not prove, at this step, the transversality condition on the final time.
We define p = (p, p 0 ) as the unique solution on [a, b * ] T of the backward shifted linear ∆-Cauchy problem p∆ The existence and uniqueness of p are ensured by [15,Theorem 6].Since f does not depend on q 0 , it is clear that p 0 is constant with p 0 = ψ 0 ≤ 0.
Since this inequality holds for every qa ∈ R n × {0}, the left-hand equality of (10) follows.

End of the proof
In this subsection, we conclude the proof of Theorem 3. Note that we prove the fourth item of Theorem 3 in Remark 30 below.To conclude the proof of Theorem 3, we will use the result claimed in Remark 11.Let us separate two cases.Firstly, let us assume that g is submersive at (q * a , q * (b * )).In that case, we have just to prove that the couple (p, p 0 ) is not trivial.Let us assume that p is trivial.Then p(a) = p(b) = 0 and, from the transversality conditions on the adjoint vector, ψ belongs to the kernels of (∂ q1 g(q * a , q * (b * ))) and (∂ q2 g(q * a , q * (b * ))) .It follows that ψ belongs to the orthogonal of the image of the differential of g at (q * a , q * (b * )).Since g is submersive at (q * a , q * (b * )), it implies that ψ = 0. Since the couple (ψ, p 0 ) is not trivial, we conclude that p 0 = 0 and then (p, p 0 ) is not trivial.This concludes the proof of Theorem 3 in this first case.
Secondly, let us assume that g is not necessarily submersive at (q * a , q * (b * )).In that case, one has to note that if q is an optimal solution of (OSDCP) T T1 associated with the function g and to the closed convex S, then q is also an optimal solution of (OSDCP) T T1 associated with the function g defined by g(q 1 , q 2 ) = (q 1 , q 2 ) (which is submersive at any point) and to the closed convex S = {q * a } × {q * (b * )}.Then, we get back to the above first case, but with a different function g and a different closed convex S that would impact only the third item of Theorem 3. f 0 (τ, q(τ ), u Φ (τ )) dτ, q(t) = f (t, q(t), u Φ (t)), for a.e.t ∈ [b * − δ, b), u ∈ L ∞ T1 (T 1 , Ω), (q(b * − δ), g(q * (a), q(b))) ∈ {q * (b * − δ)} × S.
We set ρ * (s) = δ(s − 1) + b * , x * (s) = q * • ρ * (s) and w * (s) = δ, for every s ∈ [0, 1].With the change of variable x = q • ρ and with ρ = w, it is clear that the augmented trajectory (ρ * , x * ), associated with the augmented control (w * , u * ), is an optimal solution of the optimal sampled-data control problem, with fixed final time s = 1, defined by In this new optimal sampled-data control problem, the sampled-data controls w and u are defined on different time scales.Its Hamiltonian H is H(ρ, x, p ρ , p x , p 0 , w, u) = p ρ w + p x , wf (ρ, x, u) + p 0 wf 0 (ρ, x, u).Applying the multiscale version (without the fourth item) of Theorem 3, since w * takes its values in the interior of the constraint set [δ/2, +∞), it follows in particular that

Proof of Theorem 2
The following proof can be read independently of the rest of Section 4. It is inspired from [50,Chap. 6].We only treat the case where the final time b ∈ T is fixed.The proof can be easily adapted to the case of a free final time.
Let us consider a sequence (q k ) k∈N of M, associated with sampled-data controls u k ∈ L ∞ T1 (T 1 , Ω), minimizing the cost considered in (OSDCP) T T1 .It follows from the assumptions that the sequence (f (•, q k , u Φ k ), f 0 (•, q k , u Φ k )) k∈N is bounded in L ∞ T ([a, b) T , R n+1 ).Hence a subsequence (that we do not relabel) converges in the weak-star topology of L ∞ T ([a, b) T , R n+1 ) to a function (F, F 0 ).Moreover, a subsequence of (q k (a)) k∈N (that we do not relabel) converges in R n and we denote the limit by x(a).We define the absolutely continuous functions x(t) = x(a) + [a,t) T F (τ )∆τ and x 0 (t) = [a,t) T F 0 (τ )∆τ .In particular, note the pointwise convergence of (q k ) k∈N to x on [a, b] T .Since g is continuous and since S is closed, we have g(x(a), x(b)) ∈ S. Note that x 0 (b) is equal to the infimum of admissible costs.To conclude the proof, we have to prove the existence of u ∈ L ∞ T1 (T 1 , Ω) such that x(t) = x(a) + [a,t) T f (τ, x(τ ), u Φ (τ ))∆τ and x 0 (t) = [a,t) T f 0 (τ, x(τ ), u Φ (τ ))∆τ .
Since RS 1 ∩ [a, b) T is at most countable, using a diagonal argument, there exists a subsequence of (u k ) k∈N (that we do not relabel) such that u k (r) converges to some u r ∈ Ω for every r ∈ RS 1 ∩ [a, b) T .It follows from the Lebesgue dominated convergence theorem that x(t) = x(a) + [a,t) T f (τ, x(τ ), u Φ (τ ))∆τ and x 0 (t) = [a,t) T f 0 (τ, x(τ ), u Φ (τ ))∆τ , where u is defined by u(t) = u r if t = r ∈ RS

Remark 9 .
If the final time is free, under the assumptions made in the fourth item of Theorem 3, and if moreover b

Lemma 3 .
Let m ∈ N * and let c < d be two elements of T with c ∈ T 1 .Let u, v : [c, d) T1 → R m be two functions.Then, u = v ∆ 1 -a.e. on [c, d) T1 if and only if u Φ = v Φ ∆-a.e. on [c, d) T .

Remark 30 .
Throughout this remark, we assume that all asumptions of the fourth item of Theorem 3 are satisfied.As mentioned in Remark 5, Theorem 3 can be easily extended (without the fourth item, for now) to the framework of dynamical systems on time scales with several sampleddata controls with different sets of controlling times.To prove the transversality condition on the final time, we use the multiscale version (without the fourth item) of Theorem 3.Letδ > 0 be such that [b * − δ, b * + δ] ⊂ T and such that (f, f 0 ) is of class C 1 on [b * − δ, b * + δ] × R n × R m .Obviously, the trajectory q * , associated with b * and u * , is an optimal solution of the optimal sampled-data control problem, with free final time b ∈ (b * − δ, b * + δ), defined by (O) [b * −δ,b * +δ]