Value function for regional control problems via dynamic programming and Pontryagin maximum principle

In this paper we focus on regional deterministic optimal control problems, i


Introduction
In this article, we consider regional optimal control problems in finite dimension, the word "regional" meaning that the dynamics and the cost functional may depend on the region of the state space and therefore present discontinuities at the interface between these different regions.Our objective is to provide a description of these trajectories exploiting the Pontryagin maximum principle and the Dynamic Programming approach (the value function is the viscosity solution of the corresponding Hamilton-Jacobi equation).We establish a relationship between these two approaches, which is new for regional control problems.
There is a wide existing literature on regional optimal control problems, which have been studied with different approaches and within various related contexts: stratified optimal control problems in [9,11,23], optimal multiprocesses in [17,18], they also enter into the wider class of hybrid optimal control (see [10,28,33]).Necessary optimality conditions have been developed in [20,21,35] in the form of a Pontryagin maximum principle.For regional optimal control problems, the main feature is the jump of the adjoint vector at the interface between two regions (see [21]).An alternative approach is the Bellman one, developed in [7,8,30] in terms of an appropriate Hamilton-Jacobi equation studied whose solutions are studied in the viscosity sense (see also [24,26,29] for transmission conditions at the interface).
In this paper we exploit both the Dynamic Programming approach and Pontryagin maximum principle in order to describe the optimal trajectories of regional control problems.Although the techniques are not new we believe that the approach is interesting and helpful.We are going to use in an instrumental way the lifting duplication technique, nicely used in [19] in order to prove that the hybrid version of the Pontryagin maximum principle can be derived from the classical version (i.e., for classical, non-hybrid problems) under the assumption that optimal trajectories are regular enough.More precisely, we assume that optimal trajectories have a locally finite number of switchings, or, in other words, we assume that wild oscillation phenomena (known as Fuller, Robbins or Zeno phenomena in the existing literature, see [13] for a survey) do not occur, or at least, if they happen then we deliberately ignore the corresponding wildy oscillating optimal trajectories and we restrict our search of optimal trajectories to those that have a regular enough structure, i.e., a locally finite number of switchings.Under this assumption, the duplication technique developed in [19] can be carried out and shows that the regional optimal control problem can be lifted to a higher-dimensional optimal control problem that is "classical", i.e., non-regional.As we are going to see, this construction has a number of nice applications.
In order to point out the main ideas, we consider the following simplified framework with only two different regions.Let N ∈ N * .We assume that and we consider a nonlinear optimal control problem in R N , stratified according to the above partition.We write this regional optimal control problem as Ẋ(t) = f (X(t), a(t)), where the dynamics f and the running cost are defined as follows.If x ∈ Ω i for i = 1 or 2 then f (x, a) = f i (x, a), (x, a) = l i (x, a), where f H : R N × R m → R N and H : R N × R m → R are C 1 -mappings.The set H is called the interface between the two open regions Ω 1 and Ω 2 (see Figures 1 and 2).The class of controls that we consider also depends on the region.As long as X(t) ∈ Ω i , we assume that a ∈ L ∞ ((t 0 , t f ), A i ), where A i is a measurable subset of R m .Accordingly, as long as The terminal times t 0 and t f and the terminal points x 0 and x f may be fixed or free according to the problem under consideration.For instance, if we fix x 0 , t 0 , x f , t f , we define the value function S(x 0 , t 0 , x f , t f ) of the regional optimal control problem (1.1) as being the infimum of the cost functional over all possible admissible trajectories steering the control system from (x 0 , t 0 ) to (x f , t f ).
Our objective is to show that the value function S of the regional optimal control problem (1.1) can be recovered from the study of a classical (i.e., non-hybrid) optimal control problem settled in high dimension, under the assumption of finiteness of switchings.To this aim, we list all possible structures of optimal trajectories of (1.1).We recall that, for regional optimal control problems, existence of an optimal control and Cauchy uniqueness results are derived using Filippov-like arguments, allowing one to tackle the discontinuities of the dynamics and of the cost functional (see, e.g., [9,11,23]).
In what follows, we assume that the regional optimal control problem under consideration admits at least one optimal solution.We consider such an optimal trajectory X(•) associated with a control a(•) on [t 0 , t f ].Assuming that x 0 ∈ Ω 1 and x f ∈ Ω 2 , we consider various structures.
The simplest case is when the trajectory X(•) consists of two arcs, denoted by ([t 0 , t 1 ], X 1 (•)) and ([t 1 , t f ], X 2 (•)), lying respectively in Ω 1 for the first part, and then in Ω 2 for the second part of the trajectory, with X 1 (t 1 ) = X 2 (t 1 ) ∈ H.Such optimal trajectories are studied in [21] under the assumption of a transversal crossing and an explicit jump condition is given for the adjoint vector obtained by applying the Pontryagin maximum principle.This is the simplest possible trajectory structure, and we denote it by 1-2 (see Figure 1).It has only one switching.
The second structure is when the trajectory X(•) consists of three arcs, denoted by ), lying respectively in Ω 1 for the first arc, in H for the second arc and in Ω 2 for the third arc.The middle arc X H lies along the interface.Such a structure is denoted by 1-H-2 (see Figure 2).The trajectory has two switchings.Accordingly, we consider all possible structures 1-2-H-1, 1-H-1-2, 1-2-H-2, etc, made of a finite number of successive arcs.Restricting ourselves to any such fixed structure, we can define a specific optimal control problem consisting of finding an optimal trajectory steering the system from the initial point to the desired target point and minimizing the cost functional over all admissible trajectories having exactly such a structure.Denoting by S 12 , S 1H2 , etc, the corresponding value functions, we have S = inf{S 12 , S 1H2 , . ..}, provided all optimal trajectories of the regional optimal control problem have a locally finite number of switchings (and thus, the infimum above runs over a finite number of possibilites).
Using the duplication argument of [19], we show that each of the above value functions (restricted to some fixed structure) can be written as the projection / restriction of the value function of a classical optimal control problem in higher dimension (say p, which is equal to the double of the number of switchings of the corresponding structure), the projection being considered along some coordinates, and the restriction being done to some submanifolds of the higher dimensional space R p .The word "duplication" reflects the fact that each arc of the trajectory gives two components of the dynamics of the problem in higher dimension.
Thanks to this technique, we characterize the value function as a viscosity solution of an Hamilton-Jacobi equation and we apply the classical Pontryagin maximum principle.We thus provide an explicit relationship between the gradient of the value function of the regional control problem evaluated along the optimal trajectory and the adjoint vector.This sensitivity relation extends to the framework of regional optimal control problems the relation in the classical framework.This allows us to derive conditions at the interface: continuity of the Hamiltonian and jump condition for the adjoint vector.
In Section 2 we provide the details of the procedure for the structures 1-2 and 1-H-2.The procedure goes similarly for other structures and consists of designing a duplicated problem of dimension two times the number of arcs of the structure.
The value function S is then the infimum of value functions associated with all possible structures, provided optimal trajectories have a locally finite number of switchings.The latter assumption is required to apply the duplication technique.However in general it may happen that the structure of switchings have a complex structure, even fractal, and thus the set of switching points may be countably or even uncountably infinite.In the context of hybrid optimal control problems, the Zeno phenomenon is a well known chattering phenomenon, meaning that the control switches an infinite number of times over a compact interval of times.It is analyzed for instance in [25,39], and necessary and/or sufficient conditions for the occurrence of the Zeno phenomenon are provided in [3,22].However, we are not aware of any existing result providing sufficient conditions for hybrid optimal control problems under which the number of switchings of optimal trajectories is locally finite or even only countable.Anyway, although the Zeno phenomenon may occur in general, restricting the search of optimal strategies to trajectories having only a locally finite number of switchings is a reasonable assumption in practice in particular in view of numerical implementation (see [13,38]).
Under this local finiteness assumption, it follows from our analysis that the regularity of the value function S of the regional optimal control problem is the same (i.e., is not more degenerate) than the one of the higher-dimensional classical optimal control problem lifting the problem.More precisely, we prove that each value function S 12 , S 1H2 , . .., for each fixed structure, is the restriction to a submanifold of the value function of a classical optimal control problem in higher dimension.Our main result, Theorem 2.6, gives a precise representation of the value function and of the corresponding sensitivity relations, in relation with the adjoint vector coming from the Pontryagin maximum principle.In particular, if for instance all classical value functions above are Lipschitz then the value function of the regional optimal control problem is Lipschitz as well.This regularity result is new in the framework of hybrid or regional optimal control problems.
The paper is organized as follows.
In Section 2 we define the regional optimal control problem and we state the complete set of assumptions that we consider throughout.We analyze in detail the structures 1-2 and 1-H-2 (the other cases being similar), by providing an explicit construction of the duplicated problem.As a result, we obtain the above-mentioned representation of the value function of the regional optimal control problem and the consequences for its regularity.
In Section 3 we provide a simple regional optimal control problem, having a structure 1-H-2, modelling for instance the motion of a pedestrian walking in Ω 1 and Ω 2 and having the possibility of taking a tramway along H at any point of this interface H.
Section 4 gathers the proofs of all results stated in Section 2.
2 Value function for regional optimal control problems

Problem and main assumptions
We assume that: More precisely, there exists a function We consider the problem of minimizing the cost of trajectories going from x 0 to x f in time t f − t 0 .These trajectories follows the respective dynamics f i , f H when they are respectively in Ω i , H, and pay different costs x ∈ H, we denote by ∇ H φ(x) the gradient of φ at x, which belongs to T x H.The scalar product in T z H is denoted by u, v H .This definition makes sense if both vectors u, v belong to T z H and without ambiguity we will use the same notation when one of the vectors u, v is in R N .The notation u, v refers to the usual Euclidean scalar product in R N .
We make the following assumptions: (Hg) Let M be a submanifold of R N and A a measurable subsets of R m , the function g : M × A → R N is a continuous bounded function, C 1 and with Lipschitz continuous derivative with respect to the first variable.More precisely, there exists M > 0 such that for any x ∈ M and α ∈ A, |g(x, α)| M.
Moreover, there exist L, L 1 > 0 such that for any z, z ∈ M and α ∈ A, satisfy Assumption (Hg) for a suitable choice of positive constants M, L and L 1 .
In this paper we consider optimal trajectories that are decomposed on arcs staying only in Ω 1 , Ω 2 or H and touch the boundary of Ω 1 , or Ω 2 only at initial or final time.
The problem in the region Ω i (for i = 1 or 2).The trajectories The value function S i : We define the Hamiltonian and The problem along the interface H.The trajectories X H : R + → H are solutions of The value function S H :

We define the Hamiltonian H
and

Analysis of the structure 1-2
We describe here the simplest possible structure: trajectories consisting of two arcs living successively in Ω 1 , Ω 2 and crossing the interface H at a given time (see Figure 1).This case has already been studied in the literature.As explained in [17], the jump condition (2.9) herefter is a rather straightforward generalization of the problem solved by Snell's Law.Besides, the Pontryagin maximum principle is also well established in this case; we recall it hereafter in detail because it is interesting to compare this result with the one obtained for more general structures (see Theorem 2.6 and Remark 2.7).We make the following transversal crossing assumption: (H 1-2) There exist a time t c ∈ (t 0 , t f ) and an optimal trajectory that starts from Ω 1 , stays in Ω 1 in the interval [t 0 , t c ), does not arrive tangentially at time t c on H and stays in Ω 2 on the interval (t c , t f ].
Such trajectories are described as follows: for each initial and final data (x * , the trajectory is given by the vector completed with the mixed conditions the non tangential conditions and the state constraints The cost of such a trajectory is Hence the value function S 1,2 : Under the assumptions (HH), (Hfl i ), (Hfl H ), the results of [20,21] apply and for any (x 0 , t 0 ; where we recall that S i is the value function of the problem restricted to the region Ω i .Moreover, if X(•) is an optimal trajectory for the value function S 1,2 (x 0 , t 0 ; x f , t f ) and P (•) is the corresponding adjoint vector given by the Pontryagin maximum principle, then we have the continuity condition In this section we analyze the structure with three arcs described in Figure 2. Precisely, given we make the following assumption: (H 1H2) There exist t 0 < t 1 < t 2 < t f and an optimal trajectory that starts from Ω 1 , stays in Ω 1 in the interval [t 0 , t 1 ), stays on H on a time interval [t 1 , t 2 ] and stays in Ω 2 in the interval (t 2 , t f ].
Such trajectories are described as follows: for each initial and final data (x * , the trajectory will be given by the vector with mixed conditions and the state constraints 12) The cost of such a trajectory is Our aim is to characterize the value function S 1,H,2 : Remark 2.1.This definition does not include the cases where x 0 ∈ H and/or x f ∈ H.However, it can be modified in order to involve only vectors X 1 , X 2 or X H .Moreover, note that if both Herafter, we use the following notations.
Notations.Let u = u(x 0 , t 0 ; x f , t f ) : Ω 1 ×R + ×Ω 2 ×R + * → R be a generic function.We denote by ∇ x 0 u, ∇ x f u the gradients with respect to the first and the second state variable respectively, so ∇ x 0 u and ∇ x f u take values in R N .We denote by u t 0 and u t f the partial derivatives with respect to the first and the second time variable respectively, so u t 0 and u t f take values in R.
Definition of the duplicated problem.The main ingredient of our analysis is the construction of the duplicated problem (following [19]), the advantage being that the latter will be a classical (nonregional) problem in higher dimension.The idea is to change the time variable to let the possible optimal trajectories evolve "at the same time" on the three arcs: the one on Ω 1 , the one on H and the one on Ω 2 .In this duplicated optimal control problem we will not need to impose the mixed conditions (2.11) and the state constraints (2.12).Therefore we will be able to characterize the value function by an Hamilton-Jacobi equation, apply the usual Pontryagin maximum principle and exploit the classical link (sensitivity relations) between them.
We set The admissible trajectories are Lipschitz continuous vector functions solutions of the so-called duplicated system with initial and final conditions Note that to take into account the mixed conditions on the original problem, we will allow initial and final state

15) .
For each admissible trajectory Z we consider the cost functional and hence the value function Σ : ),(Z1,T1) . (2.16) Link between the regional optimal control problem and the duplicated problem.To establish the link between the original and the duplicated problem, given (x 0 , t 0 ; The following result says that the original value function is the minimum of the value functions Σ(Z 0 , T 0 ; Z 1 , T 1 ) restricted to the submanifold M(x 0 , t 0 ; x f , t f ).
Application of the usual Pontryagin maximum principle to the duplicated problem.
Let us introduce several further notations.
Sensitivity relations.In order to establish the link between the adjoint vector and the gradient of the value function Σ, we assume the uniqueness of the extremal lift: (Hu) We assume that the optimal trajectory Z(•) in Lemma 2.3 admits a unique extremal lift (Z(•), P Z (•), p 0 , V(•)) which is moreover normal, i.e., p 0 = −1.
The assumption of uniqueness of the solution of the optimal control problem and of uniqueness of its extremal lift (which is then moreover normal) is closely related to the differentiability properties of the value function.We refer to [4,16] for precise results on differentiability properties of the value function and to [12,31,32,34] for results on the size of the set where the value function is differentiable.For instance for control-affine systems the singular set of the value function has Hausdorff (N − 1)-measure zero, whenever there is no optimal singular trajectory (see [32]), and is a stratified submanifold of R N of positive codimension in an analytic context (see [37]).These results essentially say that, if the dynamics and cost function are C 1 , then the value function is of class C 1 at "generic" points.Moreover, note that the property of having a unique extremal lift, that is moreover normal, is generic in the sense of the Whitney topology for control-affine systems (see [14,15] for precise statements).
We have the following result.

.24)
(ii) For any time τ in the closed interval [T 0 , T 1 ] we have in the sense that either D − Z1 Σ(Z 0 , T 0 ; Z(τ ), τ ) is empty or the function τ → Σ(Z 0 , T 0 ; Z(τ ), τ ) is differentiable and then D − Z1 Σ = D + Z1 Σ at this point.Moreover, when assumption (Hu) holds the function τ → Σ(Z 0 , T 0 ; Z(τ ), τ ) is differentiable for every time in [T 0 , T 1 ], thus that is, more precisely, Note that at times T 0 and T 1 the gradients are naturally defined as the limits of the gradients in the open interval (T 0 , T 1 ).
Application to the regional optimal control problem: main result.We now establish a result that is analogous to the one obtained for the structure 1-2.We first remark that for this structure one cannot directly define a global adjoint vector, therefore its role will be played by the limit of the gradient of the value function (vectors The main result is the following.
Theorem 2.6.Under the assumptions (HH), (Hfl i ), (Hfl H ) and (Hu), for any (x 0 , t 0 ; Let X(•) be an optimal trajectory for the value function S 1,H,2 (x 0 , t 0 ; x f , t f ) defined by (2.13) and let We have the continuity conditions Moreover, there exist ν 1 , ν 2 ∈ R such that (2.36) and where we used the short notations Theorem 2.6 is proved in Section 4.4.
Remark 2.7.Note the similarity between the jump conditions (2.36)-(2.38)and the ones in the transversal case (2.9): the difference is due to the fact that H is of codimension 1.

More general structures
Proceeding as in Section 2.3, the analogue of Proposition 2.2 is obtained for any other structure 1-2-H-1, 1-H-1-2, 1-2-H-2, etc, in a similar way.For each given such structure, the duplication technique permits to lift the corresponding regional control problem to a classical (i.e., non-regional) optimal control problem in higher dimension, and then the value function of the regional optimal control problem is written as the minimum of the value function of the high-dimensional classical optimal control problem over a submanifold, this submanifold representing the junction conditions of the regional problem (continuity conditions on the state and jump conditions on the adjoint vector).
For example, consider optimal trajectories with the structure 2-H-2-1, i.e., trajectories starting in Ω 2 , staying in Ω 2 along the time interval [t 0 , t 1 ), then lying in H on [t 1 , t 2 ], then going back to Ω 2 on (t 2 , t 3 ] and finally staying in Ω 1 in the time interval (t 3 , t f ].Then, the duplicated problem has four arcs and is settled in dimension 8.The whole approach developed previously can be applied as well and we obtain the corresponding analogues of Proposition 2.2 and then of Theorem 2.6.
In such a way, all possible structures can be described as composed of a finite succession of arcs, and are analyzed thanks to the duplication technique.If the structure has N arcs then the duplicated problem is settled in dimension 2N .
As already said, from a practical point of view it is reasonable to restrict the search of optimal trajectories over all possible trajectories having only a finite number of switchings.This is always what is done in practice because, numerically and in real-life implementation, the Zeno phenomenon is not desirable.Under such an assumption, our approach developed above shows that the value function of the regional optimal control problem can be written as where each of the value functions S is itself the minimum of the value function of a classical optimal control problem (in dimension that is the double of the number of switchings of the corresponding structure) over terminal points running in some submanifold.An interesting consequence is that: The regularity of the value function U of the regional optimal control problem is the same (i.e., not more degenerate) than the one of the higher-dimensional classical optimal control problem that lifts the problem.
The lifting duplication technique may thus be seen as a kind of desingularization, showing that the value function of the regional optimal control problem is the minimum over all possible structures of value functions associated with classical optimal control problems settled over fixed structures, each of them being the restriction to some submanifold of the value function of a classical optimal control problem in higher dimension.
In particular, if for instance all value functions above are Lipschitz then the value function of the regional optimal control problem is Lipschitz as well.Note that Lipschitz regularity is ensured if there is no abnormal minimizer (see [38]), and this sufficient condition is generic in some sense (see [14,15]).
Such a regularity result is new in the context of regional optimal control problems.
Remark 2.8.In this paper, for the sake of simplicity we have analyzed regional problems in R N .Since all arguments are local, the same procedure can be applied to regional problems settled on a smooth manifold, which is stratified as M = M 0 ∪ M 1 ∪ . . .M N (disjoint union) where M j is a j-dimensional embedded submanifold of M .
Remark 2.9.Our results can also be straightforwardly extended to time-dependent dynamics and running costs, and to regions Ω i (t) depending on time, always assuming at least a C 1 -dependence.

What happens in case of Zeno phenomenon?
In case the Zeno phenomenon occurs, optimal trajectories oscillate for instance between two regions Ω 1 and Ω 2 an infinite number of times over a compact time interval.
If the number of switchings is countably infinite, then the above procedure can, at least formally, be carried out, but then the duplicated (lifted) problem is settled in infinite (countable) dimension.In order to settle it rigorously, much more functional analysis work would be required.Anyway, formally the value function is then written as an infimum of countably many value functions of classical optimal control problems, but even if the latter are regular enough (for instance, Lipschitz), taking the infimum may break this regularity and create some degeneracy.
If the number of switchings is uncountably infinite, the situation may even go worst.The duplication technique cannot be performed, at least in the form we have done it, and we do not know if there would exist a somewhat related approach to capture any information.The situation is widely open there.We are not aware of any example of a regional (or, more generally, hybrid) optimal control problem for which the set of switching points of the optimal trajectory would have a fractal structure.Notice the related result stated in [2], according to which, for smooth bracket generating single-input control-affine systems with bounded scalar controls, the set of switching points of the optimal bang-bang controls cannot be a Cantor set.

Example
As an example we consider here a simple regional optimal control problem where it is easy to see that a trajectory of the form 1-H-2 is the best possible choice.The idea is to model situations where it is optimal to move along the interface H as long as possible.One can think, for example, of a pedestrian walking in Ω 1 and Ω 2 with the possibility of taking a tramway along H at any point of this interface H.More generally, this example models any problem where moving along a direction is much faster and/or cheaper than along others.
In R 2 we set Ω 1 = {(x, y) : y < 0}, Ω 2 = {(x, y) : y > 0} and H = {(x, y) : y = 0}.We choose the dynamics where the controls α i take values on [−π, π].We consider the minimal time problem, therefore our aim is to compute the value function where the dynamics f coincide with We analyze the case where we start from a point (x 0 , y 0 ) in Ω 1 and we aim to reach a point (x 1 , y 1 ) in Ω 2 with x 1 > x 0 .In Ω 1 , the dynamics f 1 allow to move with constant velocity equal to one in any direction, therefore it is clear that the best choice is to go "towards H but also in the direction of x 1 ".Indeed, if we compare on Figure 3 below the dotted trajectory and the black one, they spend the same time in Ω 1 , but on H the dotted one is not the minimal time.Therefore the black one is a better choice.
For this reason, and since the problem is symmetric, it is not restrictive to assume that y 1 = −y 0 and that trajectories with the structure 1-H-2 are like the ones described on Figure 4 with 0 a x 1 − x 0 2 .
For each trajectory steering (x 0 , y 0 ) to (x 1 , −y 0 ) a simple computation gives the cost (as a Therefore, the value function is and we obtain that: then the optimal trajectory has the structure 1-H-2 with a = |y0| 3 √ 11 and the optimal final time is then the optimal trajectory has the structure 1-2 with a = x 1 − x 0 2 and the optimal final time is (see Figure 5).
We finally remark that, although this example is very simple, it is paradigmatic and illustrates many possible situations where one has two regions of the space (with specific dynamics) separated by an interface along which the dynamics are quicker than in the two regions.In this sense, the above example can be adapted and complexified to represent some more realistic situations.
We first remark that putting together (2.