ON OPTIMAL CONTROL OF A SWEEPING PROCESS COUPLED WITH AN ORDINARY DIFFERENTIAL EQUATION

We study a special case of an optimal control problem governed by a differential equation and a differential rate–independent variational inequality, both with given initial conditions. Under certain conditions, the variational inequality can be reformulated as a differential inclusion with discontinuous right-hand side. This inclusion is known as sweeping process. We perform a discretization scheme and prove the convergence of optimal solutions of the discretized problems to the optimal solution of the original problem. For the discretized problems we study the properties of the solution map and compute its coderivative. Employing an appropriate chain rule, this enables us to compute the subdifferential of the objective function and to apply a suitable optimization technique to solve the discretized problems. The investigated problem is used to model a situation arising in the area of queuing theory.


Jiří Outrata
UTIA, Czech Academy of Sciences Pod Vodárenskou věží 4, Prague 8, Czech republic (Communicated by the associate editor name) Abstract. We study a special case of an optimal control problem governed by a differential equation and a differential rate-independent variational inequality, both with given initial conditions. Under certain conditions, the variational inequality can be reformulated as a differential inclusion with discontinuous right-hand side. This inclusion is known as sweeping process.
We perform a discretization scheme and prove the convergence of optimal solutions of the discretized problems to the optimal solution of the original problem. For the discretized problems we study the properties of the solution map and compute its coderivative. Employing an appropriate chain rule, this enables us to compute the subdifferential of the objective function and to apply a suitable optimization technique to solve the discretized problems. The investigated problem is used to model a situation arising in the area of queuing theory.
1. Introduction. In this paper, we study an optimal control problem with a variational inequality in the constraint system. If we write down this variational inequality in the equivalent form of a generalized equation, it becomes −ż(t) + Rẏ(t) ∈ N Z(t) (z(t)), t ∈ [0, T ] a.e. (1) where Z(t) is a closed convex set, N Z(t) (z(t)) denotes the normal cone to Z(t) at z(t),ẏ(t) andż(t) stand for the time derivatives and R is a matrix with appropriate dimensions. Differential inclusion (1) can be understood as a special form of the maximal dissipation principle in evolution systems with convex constraints. Models of this type play a central role in modeling nonequilibrium processes with rate-independent memory in mechanics of elastoplastic and thermoelastoplastic materials as well as in ferromagnetism, piezoelectricity or phase transitions, see [1], [6], [20] or [39]. Furthermore, differential inclusion (1) is a special case of a sweeping process which was introduced (in its basic form) in [29] and then gradually generalized in a number of papers, e. g. [15] or [25]. In most of these works, however, the authors do not consider any control and concentrate solely on the existence and regularity of its solution. More references to and applications of (1) can be found in [19] or [37].
Another view at this model is provided by [31] where, under certain condition, inclusion (1) is reformulated as a differential variational inequality, for which some theoretical results and numerous applications exist. For applications to game theory and Nash equilibria see [7], [26] or [36]. If the aforementioned conditions are not fulfilled, inclusion (1) can be reformulated as a mixture of a differential variational inequality and a differential algebraic equation [3]. For these reformulations see again [31].
By adding a differential equatioṅ y(t) = f (t, y(y), z(t)) + Bu(t), t ∈ [0, T ] a.e., (2) and an objective function in variables u, y and z, we obtain a system introduced in [5] where necessary optimality conditions for this problem have been derived. The authors considered only a fixed set Z and made use of the rate-independence of the solution map y → z defined by (1). We utilize some of their results and demonstrate the usefulness of this model by means of a simple example from the area of queuing theory.
The incurred optimization problem is a special difficult optimal control problem which cannot be tackled by standard optimal control theory via maximum principle or dynamic programming. Indeed, when dealing with a differential inclusion, the right-hand side is usually required to be Lipschitz continuous, such as in [8] or [28] or this requirement is somehow relaxed, such as the one-sided Lipschitzian property from [13]. Unfortunately, neither of these assumptions is satisfied in our setting.
From another point of view, system (1)-(2) models an evolutionary equilibrium and hence its optimization is a special type of Mathematical program with evolutionary equilibrium constraints (MPEEC) which has been defined and studied in [18].
In [10] the authors also investigated an optimization problem with a sweeping process among the constraints. They use a similar technique, based on time discretization and the coderivative calculus to derive necessary optimality conditions for the original problem. However, their control enters the system via the sweeping set and so the results of [10] cannot be compared with those presented in this paper in a straightforward way.
The aim of this paper is twofold. First we perform a simple discretization of the problem and show that from any sequence of solutions to the discretized problems one can select a subsequence converging in a certain sense to a solution of the original infinite-dimensional problem. This part depends on some results from [5]. Then, in the second part of the paper, we apply the so-called implicit programming approach, developed for the numerical solution of non-evolutionary MPECs, to the numerical solution of individual discretized problems. Thereby, we make use of some advanced tools of modern variational analysis, which enables us to compute the so-called "subgradient information" required by most numerical methods of nonsmooth optimization. This part follows essentially the scheme developed in [30]. In fact, this approach has been already successfully used in a different MPEEC describing the magnetization of a piece of ferromagnet [18] where, however, the CONTROL OF A SWEEPING PROCESS 3 underlying evolutionary equilibrium has been governed by a completely different infinite-dimensional model.
Hence, our technique differs substantially from the smoothing approach which is usually used in differential inclusions with hysteresis operators [4].
The organization of the paper is as follows: in Section 2 we present the problem and state the assumptions used in the paper. Discretization of the time interval is performed and discretized optimization problems are stated. Furthermore, the topic of convergence of the optimal solutions to the discretized problems is discussed.
In Section 3 we present some basic concepts of variational analysis which are extensively used in Section 4. There one ensures first existence of optimal solutions, computes an upper estimate of the coderivative of the solution map and on its basis an upper estimate of the subdifferential of the composite objective function. Moreover, several cases in which this estimate is exact are discussed.
Section 5 concludes this paper by the already-mentioned example from the area of queuing theory. Finally, some auxiliary lemmas are stated in the Appendix.
Our notation is basically standard. As we work only with time-dependent functions, by L p we understand L p ([0, T ], R n ) and W 1,p denotes the Sobolev spaces of functions with derivatives in L p . This derivative will be denoted by a dot and the norms in the above spaces by · p and · 1,p , respectively. For the weak convergence we use the notation x n x, for the uniform one x n ⇒ x and x n A → x signifies the usual convergence x n → x with the condition that x n ∈ A. The inverse function is considered as a multifunction so that 2. Problem statement and its approximation. The main goal of this paper is to propose a way of the numerical solution of the following optimization problem: E, R and B are fixed matrices with corresponding dimensions, Z(t) ⊂ R m is a moving convex set and N Z(t) (z(t)) denotes the normal cone to this set at point z(t). The exact assumptions will be given below. The constraint system for problem (3) was taken from article [5], in which necessary optimality conditions were derived. In this paper we consider more general objective function.
Although the main result of article [5] was the development of necessary optimality conditions, several of the proven auxiliary results can be also used to derive their discretized-version counterparts. Among the obtained results, a modification of the following facts will be used in this paper. In particular, we learn that under some assumptions, for any u ∈ L 1 there exists exactly one pair (y, z) ∈ W 1,p satisfying the constraints of (3). For this reason u will be referred to as the control variable while y and z will be called the state variables.

LUKÁŠ ADAM AND JIŘÍ OUTRATA
Moreover, the following a priori estimate was derived for every p ∈ [1, ∞]: with the constant C independent of the choice of u. Furthermore, u k u in L 2 implies y(u k ) ⇒ y(u) and z(u k ) ⇒ z(u), which under additional assumptions implies the existence of an optimal solution to (3). In this section, we show similar statements for the solutions of discretized problems and use these results to prove the convergence of the solutions of the discretized problems to a solution of (3).
For any k, we consider a rather general discretization scheme 0 = t k 0 < t k 1 < · · · < t k k = T ; hence we do not require the division to be equidistant. Denoting the distance between two consecutive time instants h k j = t k j+1 − t k j , problem (3) may be discretized as follows: To simplify the notation, we will use the abbreviated notation f k j := f (t k j , y k j , z k j ) and similarly Z k j := Z(t k j ). Usually, the upper discretization index k will be fixed and hence, it will often be omitted. However, in some cases and especially in the convergence analysis, it will be helpful to emphasize the discretization level by keeping it. If all the values u j , y j and z j are known, functions y k and z k denote the piecewise linear functions obtained by connecting the known points. Function u k will be constructed in a similar way, with the difference that it will be piecewise constant. For the sake of simplicity, the discretization of the differential equation was performed in a forward way.
As the normal cone is a cone by definition, it is not necessary to include term h j in the inclusion in (4). The reason for considering N Zj+1 (z j+1 ) instead of N Zj (z j ) in (4) is that inclusion may be for convex sets Z j+1 equivalently rewritten as equation This immediately implies that if we know all the values u j and the initial points a and b, we are able to unambiguously compute the values y j and z j for all j = 0, . . . , k. Similarly as in the continuous case, u k can be seen as the control variable and it is sufficient to optimize only with respect to this variable. Because of this, y k and z k will be referred to as state variables.
Having a brief knowledge about the model, we now state the assumptions, which will be used later in the text. Unfortunately, we do not work with Carathéodory functions; on the other hand, we admit some possible discontinuities in the time variable. First define the set Γ := {t ∈ [0, T ]| ∃ y, z : f (·, y, z) or L 1 (·, y, z) is not continuous at t}.

CONTROL OF A SWEEPING PROCESS 5
In the rest of this paper, we require that Γ is a set of zero measure and that for all discretization levels k we have This also implies that we have to consider nonequidistant divisions. So we suppose that (H1) : Z(t) is closed and convex for all t (H2) : Z(t) moves in a Lipschitz continuous way with constant K 1 with respect to the Hausdorff distance, hence d H (Z(t), Z(s)) ≤ K 1 |t − s| where d H denotes the Hausdorff distance between two sets (H3) : Ω is closed and convex (A1) : f is continuous on Γ × R n × R m (A2) : |f (t, y, z)| ≤ α 0 + α 1 (|y| + |z|) for all t ∈ [0, T ], y ∈ R n and z ∈ R m (A3) : L 1 is continuous on Γ × R n × R m and bounded below (A4) : there exists convexL 2 : being positive definite (A6) : L 3 is continuous and bounded below The imposed assumptions are followed by two remarks. In the first one, we discuss the difference of the model and assumptions from the core paper and in the second one, possible assumption relaxation is considered.
Remark 1. We have made several extensions of the model considered in [5]. To the objective function we have added a term depending on the terminal state via L 3 and we consider a more general L 2 which was restricted only to a quadratic function in [5]. As seen from (A5), L 2 still possesses some features of a quadratic function; however, if Ω is bounded, then L 2 can be any bounded function.
For the sake of simplicity, we still insist on the separability of the objective function in the state and control variables. It is certainly possible to overcome this restriction by strengthening several assumptions to hold uniformly in the control or state variables. Furthermore, we allow Z(·) to depend on time. On the other hand, we require several estimates to hold uniformly in t. Specifically, α 0 should be time-independent and L 1 should be bounded on compact sets in all three variables, not only in the state ones. The time uniformity is used only two times in this text, namely in (9) and (21). In both cases the Lebesgue integral has to be approximated by a limit of Riemann sums. Although this is not possible for arbitrary choice of time instants, according to [12,Lemma 4.12], it is possible to choose time instants in such a way that Riemann sums converge to the Lebesgue integral.
It is certainly possible to consider α 0 ∈ L 1 instead of the assumed α 0 ∈ L ∞ but in this case Γ would have to be restricted to admit only time divisions mentioned above. The same consideration applies also to L 1 .

Remark 2.
In the assumptions we require that the discontinuities in the data may occur only on a subset of time instants with zero measure; moreover, these instants should not depend on the state variables y and z. The reason for this requirement is simple. For the convergence analysis, it could possibly suffice to choose any point t in the interval [t j , t j+1 ), at which f (·, y j , z j ) is continuous or a Lebesgue point. On the other hand, for the optimization and the computation of coderivative of the solution map we need to have fixed time division and not change it every time when the state variables y j and z j change.
We will discuss only the case of f but of course this remark is valid for L 1 as well. It is certainly possible to weaken assumption (A1) into f being only a Carathéodory function. On the other hand, then the discretized equation By considering (7) it is possible to prove the convergence. However, this would cause trouble in the numerical computation since instead of evaluation of f at one point, an integral would have to be computed.
It is also possible to weaken the convexity assumption in (H1) by assuming that Z(t) is a prox-regular set with a specific structure. This follows from the discretization scheme used in [38]. The main argument is that the points z j will not be far away from the set Z j and thus the projection will retain its single-valued Lipschitzian character, even though the set may be nonconvex.
Having posed and discussed the assumptions, we start now developing a chain of lemmas analyzing the properties of solutions to the discretized problems. First of all, we show the uniform boundedness of the state variables.
Lemma 2.1. Assume that (H1), (H2) and (A2) are valid. Then for any feasible solution of the discretized problems u k , y k , z k the following estimate holds true for all j = 0, . . . , k − 1 Proof. At first, we derive an estimate for the derivative of z k . Due to the construction of z k we know that z j ∈ Z j , and thus P Zj (z j ) = z j . Furthermore Dividing the previous inequality by h j , we obtain Altogether, we obtain that In the previous lemma we derived the estimate needed in the discrete version of the Gronwall's Lemma, which allows us to estimate the Sobolev norm of the state variables by the Lebesgue norm of the control variable. Lemma 2.2. Let assumptions (H1), (H2) and (A2) be fulfilled. Then for any p ∈ [1, ∞] there exists a constant C such that for any k and for any feasible solution of the discretized problem u k ∈ L p and the corresponding y k and z k the following estimate is valid and hence the Corollary to the discrete Gronwall's Lemma A.2 can be applied to a j := |y j | + |z j |. This infers that where the last equality holds true since u k is a piecewise constant function. This leads to the first part of the estimate For the estimate of derivatives combine Lemma 2.1 with (9), which yields Temporarily ignoring one of the terms on the left-hand side of the previous estimate, integrating the p-th power of the rest, using the well-known equivalence of norms on any finite-dimensional space and the estimate of To finish the proof it is sufficient to take the p-th root and again use the equivalence of norms on R 2 . The estimate for ż k p follows from the symmetry of (10).
Insofar, we have been working only with the constraint system and not with the objective function. In the next theorem, we prove the existence of a convergent subsequence to an arbitrary sequence of feasible solutions to the discretized problems. For notational simplicity, any subsequence will be denoted by the same indices as the original sequence. Denote further the composite objective function of the original problem (3) by J(u) and the composite objective function of the discretized problems (4) by J k (u k ). Theorem 2.3. Let assumptions (H1)-(H3), (A1)-(A4) and (A6) hold true and let u k be a sequence of any feasible solutions to the discretized problems which is bounded in L 2 . Then there exists a subsequence, denoted without relabeling, and some u ∈ L 2 such that u k u in L 2 . Moreover, for the corresponding y k and z k there are Lipschitz continuous functions y and z such that y k ⇒ y and z k ⇒ z. Finally, the triple (u, y, z) is a feasible solution to the original problem (3).
Furthermore, we obtain

LUKÁŠ ADAM AND JIŘÍ OUTRATA
If u k converges to u strongly in L 2 and assumption (A5) is valid, then Proof. Since u k is bounded in L 2 , a weakly convergent subsequence may be chosen. The convergent subsequences of y k and z k may be chosen due to their boundedness in W 1,2 from Lemma 2.2 and the compact embedding of W 1,2 into C(0, T ). As a byproduct we obtain the weak convergence of the derivativesẏ k ẏ andż k ż in L 2 .
The feasibility of (u, y, z) will be proven in several steps. Since the proof that the triple (u, y, z) satisfies the differential inclusion and equation is not straightforward, these proofs are postponed to two following Lemmas 2.4 and 2.5. Here we will prove only that u(t) ∈ Ω. As u k u, due to Mazur's lemma we may choose integers n(k) and a convex hull of {u i | i = k, . . . , n(k)} to define new functions, saỹ This implies the pointwise convergence for a subsequence.
As Ω is convex and does not depend on t, the relation u k (t) ∈ Ω impliesũ k (t) ∈ Ω and the closedness of Ω implies u(t) ∈ Ω.
To show inequality (11), we consider first the last term of the objective function. Due to the uniform convergence of the state variables and the continuity of L 3 from (A6), we obtain . Concerning the first term, define the piecewise constant functioñ ). By virtue of the uniform convergence y k ⇒ y, z k ⇒ z and the continuity of L 1 , we get for all t / ∈ ΓL k 1 (t) = L 1 (t k j , y k j , z k j ) → L 1 (t, y(t), z(t)). Since y k and z k are uniformly bounded in L ∞ due to Lemma 2.2, assumption (A3) implies thatL k 1 are uniformly bounded in the same space as well and hence the dominated convergence theorem may be used. This leads to thus the first part of the objective function converges.
As to the second term, define According to [11,Proposition 2.32] assumption (A5) implies that there exists a constant β such that forũ 1 ,ũ 2 ∈ Ω one has Integrating the previous inequality and applying the Hölder inequality, we obtain that J 2 : L 2 → R is continuous, which together with the previous parts implies (12).
To prove (11) it is sufficient to show that J 2 is weakly lower semicontinuous. But this follows from the fact that for convex functions the lower semicontinuity and the weak lower semicontinuity coincide.
Functions y and z are Lipschitz continuous because they are uniform limits of sequences of functions with uniform Lipschitz moduli.
To finish the proof of Theorem 2.3 it remains to show that (u, y, z) satisfies the differential inclusion and the differential equation. This will be conducted in the following two lemmas.
Lemma 2.4. Under assumptions of Theorem 2.3, the pair (y, z) satisfies almost everywhere the differential inclusion Proof. This proof is a modification of a similar proof presented in [38], the first part goes in the same way, yet for the second part we depart from finite-dimensional setting into the infinite-dimensional one and then return back.
Define first Since Z(t) is convex for every t, it is easy to see that S is convex as well. Further for fixed k define θ k : R → R as a function satisfying θ k (t) Consequently, from (14) and from the equivalence between statements 1 and 2 in Lemma A.3 in the Appendix we obtain that for every ξ(t) we have Since ξ(t) may be chosen in an arbitrary way, we can also restrict our attention to all ξ ∈ L 2 . From Lemma 2.2 and imposed assumptions it follows that (15) over [0, T ] and using Hölder inequality implies that Passing to the limit, the left-hand side converges in L 2 due to the weak convergence of the time derivatives to For the right-hand side we intend to use the Lebesgue dominated convergence theorem. To find the dominating function, we employ the relation Since ξ ∈ L 2 , the Lebesgue dominated convergence theorem may be used to obtain (17) Since Z(t) moves in a Lipschitzian way and z k (θ k (t)) → z(t) due to z k ⇒ z, for any fixed t we obtain Putting all the previous results together, we conclude that for all ξ ∈ L 2 we have Next we claim that this relation implies the following relation in L 2 (compare with the implication 3 to 1 in Lemma A.3) Assume that this is not the case. Since S is convex, by the definition of the normal cone there exists some x ∈ S such that Putting .
As x(t) ∈ Z(t) by x ∈ S, the right-hand side is equal to zero, which is a contradiction with (20), and hence the validity of (19) is proven. But (19) is almost everywhere Lemma 2.5. Under assumptions of Theorem 2.3, the triple (u, y, z) satisfies almost everywhere the differential equatioṅ Proof. Similarly as in the previous lemma define for fixed k the function ν k : In the last term it is not necessary to insert ν because u k is piecewise constant. Integrating the previous equation, we obtain

CONTROL OF A SWEEPING PROCESS 11
To obtain the desired result we intend to pass to the limit and compute the derivative afterwards. The left-hand side converges to y(t) and the convergence is evident from the definition of weak convergence.
For the missing term, we use again the Lebesgue dominated convergence theorem. Observe that a large enough constant C can be chosen such that Thus where in the last equality we employed the continuity of f on Γ × R n × R m and the fact that Γ has zero measure. All in all, we receive which almost everywhere amounts tȯ and y(0) = a.
In Theorem 2.3 we have shown the convergence of a sequence of any feasible solutions. In the next theorem, we make use of it and prove similar result for optimal solutions. This theorem is further utilized by the end of the paper where a numerical solution scheme is presented.
Proof. Since E 2 from (A5) induces an equivalent norm on R d , we obtain for somẽ C the estimate: We claim now that ū k 2 is bounded. If this is not the case, then since L 1 and L 3 are bounded below, we obtain J k (ū k ) → ∞. But taking any u k constant and feasible we obtain a contradiction with optimality ofū k .
This implies that the assumptions of Theorem 2.3 are fulfilled and there exists a subsequence converging to some (ū,ȳ,z), which is a feasible solution to the original problem (3). To finish the proof it is necessary to show that (ū,ȳ,z) is indeed the optimal solution to (3). Consider arbitrary feasible solution (u, y, z) to (3). From Lemma A.1 we obtain that u can be approximated by simple functions u k such that the intervals, on whichū k and u k are constant, coincide. To u k we find the corresponding y k and z k .
Due to Theorem 2.3, we obtain that where in the second inequality we used the optimality ofū k and the property that u k is constant on the same intervals asū k . This implies thatū is indeed an optimal solution to (3). As (23) holds true with lim sup instead of lim inf as well, choosing u =ū implies J k (ū k ) → J(ū).

Basic concepts of variational analysis.
In this section we present the basic notions of variational analysis, which are essential for our paper. More information can be found either in [33] for finite-dimensional setting or in [28] and [9] for the infinite-dimensional one. In the first part we define the basic concepts such as normal cone, subdifferential or coderivative and in the second part we concentrate on several selected properties of set-valued mappings which will be later used as constraint qualifications. All objects in this section are finite-dimensional. For sequence of sets we define the Painlevé-Kuratowski upper limit as Having the limit construction at hand, we can define the concept of normal cones. There is a variety of possible approaches resulting in different cones. Forx ∈ A we define Fréchet (known also as regular), limiting (Mordukhovich) and Clarke (convexified) normal cones aŝ To every type of the normal cone, we can define a corresponding subdifferential of an extended single-valued function f : For f lower semicontinuous we define the singular subdifferential by Single elements of a subdifferential are called subgradients.
If A happens to be convex, all the above cones coincide and are equal to the normal cone of convex analysis If f is multi-or vector-valued, one usually works with the graph of the mapping instead of its epigraph. Hence, for any (x,ȳ) ∈ gph M we define Fréchet (regular) and limiting (Mordukhovich) coderivatives bŷ If M is single-valued and smooth, then its coderivative amounts to the adjoint Jacobian. Further, we say that M is outer semicontinuous if its graph is closed.
We will briefly note what can be expected from Fréchet and limiting subdifferentials. Having f = g • F where g : R m → R is proper and lower semicontinuous and F : R n → R m locally Lipschitz, we have the following schema, in which all but the bottommost inclusion hold automatically, for the last one a mild constraint qualification is needed∂ with the convention that The precise statement of this theorem can be found in [ Instead of inclusions we would very often like to have equalities in (25). Aŝ D * F (x)(∂g(F (x))) is the smallest object and D * F (x)(∂g(F (x))) is the largest one, then, if they are equal to each other, all the inclusions can be changed into equalities. This leads to a concept of regularity. We say that a set S is regular atx ∈ S if N S (x) = N S (x). Similarly, a function f is regular atx if its epigraph is regular at (x, f (x)). Hence, under regularity and the additional assumption in a form of a constraint qualification, equalities in the above calculus rule can be obtained. This section is concluded by some selected properties of multifunctions, which will be later used in the constraint qualifications.
A multifunction M has the Aubin (Lipschitz-like or pseudo-Lipschitz) property around (x,ȳ) ∈ gph M if there exist a nonnegative modulus L and neighborhoods U ofx and V ofȳ such that for all x, x ∈ U the following inclusion holds true where B(0, 1) is the unit ball in the corresponding space. This resembles the local Lipschitzian property with the difference that the localization is performed not only in domain but also in range.
A multifunction M is calm at (x,ȳ) ∈ gph M if there exist a nonnegative calmness modulus L and neighborhoods U ofx and V ofȳ such that for all x ∈ U the following inclusion holds true As in this inclusion the pointx is fixed, calmness is significantly weaker than the Aubin property. For example, any polyhedral multifunction satisfies the calmness property at any point of its graph while the Aubin property may not be satisfied.

LUKÁŠ ADAM AND JIŘÍ OUTRATA
For more information and properties of the abovementioned objects, we refer the reader again to [28] and [33]. 4. Numerical solution of discretized problems. In Section 2 we performed the convergence analysis. This section is devoted to the numerical solution of discretized problems. Hence, we fix the index k and discretization time instants t k j and suggest a solution method for the optimization problem which is clearly equivalent to (4). Often the abbreviated notation will be used such as u = (u 0 , . . . , u k−1 ) = (u k 0 , . . . , u k k−1 ). Even though problem (26) is a finite-dimensional optimization problem and the only nonsmoothness may appear in the objective function, the projection operator and possibly in f , the number of equalities may be large and the computation may not be simple. It is also not entirely clear how to handle the fact that the optimization is performed only over u and not over y and z.

(27)
The main idea is to rewrite problem (26) using the solution map. If we introduce the functionL (y 0 , . . . , y k , z 0 , . . . , z k ) := then the discretized problem (4) attains the form This is the basis for the so-called implicit programming approach, see [24] or [30].
For the numerical solution of (28), we intend to use a numerical method which employs the so-called subgradient information. To do so, we need to be able to evaluate the values ofL • S and to be able to compute at least one element of its Clarke subdifferential ∂(L • S). The evaluation of the values will be simple but possibly time consuming for large k. It is sufficient to evaluate S(u) and then plug it intoL. However, the computation of the subdifferential is not so straightforward.
It is not advised to compute a lower estimate for the Fréchet subdifferential because the subdifferential may be an empty set even for Lipschitz continuous functions. As the limiting subdifferential is smaller and enjoys a more developed calculus than the Clarke one, we will work with it. Of course, if we compute a limiting subgradient, it is also a Clarke subgradient.
Since we are able to compute only an upper estimate of the limiting subdifferential, one needs to keep in mind that one could theoretically arrive at an incorrect subgradient. On the other hand, we will give verifiable conditions under which this undesirable behavior is suppressed.
We give additional assumptions strengthening the previous ones: (A7) : f (t j , ·, ·) is C 1 for every j (A8) : L 1 (t j , ·, ·) is locally Lipschitz at for every j and L 3 is locally Lipschitz (A8 ) : L 1 (t j , ·, ·) is C 1 for every j and L 3 is C 1 .
First, we prove that S is single-valued and locally Lipschitz continuous and then we state a result about existence of optimal solutions to the discretized problems. Proof. Single-valuedness of S is evident from (27). To prove the local Lipschitzian property, we will make use of the well-known fact that a function is locally Lipschitz on an open set if and only if it is globally Lipschitz on its any compact subset. Choosing two control variables u andũ with u j ,ũ j ∈ Ω ∩ B(0, r), we obtain from Lemma 2.2 that all y j ,ỹ j , z j andz j are contained in a compact set. Due to (A7), f (t j , ·, ·) is globally Lipschitz on this set, with, say, constant K 2 .
Let us make the following estimates: From the previous relations we obtain the existence of a constant C dependent only on K 2 , R, B and the fixed time division which satisfies Summing these inequalities and chaining them j times, we obtain for some constant C that The Lipschitzian property of S follows now directly from the previous estimate. Proof. A sufficient condition for the equivalence of (5) and (6) is convexity and closedness of Z j . Due to assumptions and Lemma 4.1, the objective function of (28) is continuous, coercive and bounded below. Since Ω is closed, the statement follows.
After proving that S is locally Lipschitz, we will compute its coderivative in the following lemma and then discuss the used constraint qualification. We will learn later from Lemma 4.4 that the constraint qualification is always satisfied. However, we keep it in the lemma statement and omit it only in the final Theorem 4.6. Before stating the lemma we will need yet another representation of the solution map.
Define function g : Then the solution map can be equivalently written as On the basis of this reformulation we are able to compute D * S. be calm at (0,ū,ȳ,z). Then for µ ∈ R (k+1)n , ν ∈ R (k+1)m and for each element one has the representation where p 1 , . . . , p k are solutions of the adjoint equations with j = 1, . . . , k − 1 with the terminal conditions 33) and the multipliers r j and q j satisfy for j = 1, . . . , k − 1 the relations Proof. Since the continuous differentiability of f implies the continuous differentiability of g, [17,Proposition 3.8] together with the imposed constraint qualification implies N gph S (ū,ȳ,z) ⊂ −(∇g(ū,ȳ,z)) T N Q (−g(ū,ȳ,z)). (35) Observe that forp ∈ R (k+1)n , q ∈ R (k+1)m and r ∈ R km the relation  p q r   ∈ N Q (−g(ū,ȳ,z)) amounts to and the variablesp j and q 0 being unconstrained. From (35) we obtain As the matrix ∇g(ū,ȳ,z) is rather large, we do not write it down. However, it can be retrospectively computed from the following relation, which follows from 18 LUKÁŠ ADAM AND JIŘÍ OUTRATA Now, observe thatp 0 and q 0 appear both in one equation only. Due to this fact, it is not necessary to consider these two equations because a suitable choice ofp 0 and q 0 can make these equations fulfilled under any circumstances. The lower two thirds can be rewritten in a compact form as

and the terminal conditions attain the form
To simplify the formulas, it remains to set p = −p and formula (31) follows.
In the next lemma we will show that the mapping Ξ has even the Aubin property around any feasible (u, y, z) and thus is also calm at that point. Proof. The mapping S, defined in (27), is single-valued and locally Lipschitz from Lemma 4.1 and admits the representation (29). Assume for a while that the associated partially linearized mapping ∆ : (38) has the same properties. Since D * ∆(0,ȳ,z)(0, 0) = {η * ∈ N Q (−g(ū,ȳ,z))| ∇ y g(ū,ȳ,z) T ∇ z g(ū,ȳ,z) T η * = 0}, the Mordukhovich criterion [33,Theorem 9.40] would imply that The last condition ensures, however, directly the Aubin property of Ξ around (0,ū,ȳ,z) and so, its calmness at this point. Since (ū,ȳ,z) was an arbitrary point of gph S, the statement holds provided we verify the required properties of ∆.
Setting η = (ξ, ζ, χ), from (38) we obtain for j = 0, . . . , k − 1 Using again the equivalent representation of normal cone via the projection operator, this system can be rewritten as From this representation the single-valuedness of ∆ follows immediately.
To prove the Lipschitz continuity of ∆, choose any η,η and corresponding variables y, z,ỹ,z and compute Since ∇ y f j and ∇ z f j are fixed matrices, the Lipschitz continuity can be obtained in the same way as in the proof of Lemma 4.1.
Under the surjectivity of ∇g, we obtain equality in the coderivative estimate from Lemma 4.3. As we will see later, this surjectivity is ensured if matrices B and R have full row ranks. Hence, a necessary condition for fulfillment of this requirement is that B and R do not have less columns than rows. This implies in particular that the dimension of the control variable u must be greater than or equal to the dimensions of both the state variables y and z. Proof. If ∇g is surjective, we can use [33,Theorem 6.7] to obtain N gph S (ū,ȳ,z) = −(∇g(ū,ȳ,z)) T N Q (g(ū,ȳ,z)) instead of (35). So, it remains to verify that the matrix ∇g computed in the proof of Lemma 4.3 has full row rank. By removing rows with only one nonzero cell occupied by an identity matrix and the corresponding columns, it can be easily shown that the full row rank requirement on ∇g is equivalent to the following matrix having full row rank:  However, the full rank of this matrix is equivalent to the full rank of matrices B and R.
Remark 3. In Lemma 4.3 we have worked with the limiting normal cone. It is not difficult to show similar results for the Fréchet normal cone. As in the proof of Lemma 4.5, the only change would concern inclusion (35). This change would be based on [33, Theorem 6.14] which gives the opposite estimatê N gph S (ū,ȳ,z) ⊃ −(∇g(ū,ȳ,z)) TN Q (−g(ū,ȳ,z)). In the lemma statement, two changes would occur. Firstly, the upper estimate would be replaced by the lower one and secondly, relation (36) would become However, it may happen that adjoint system (32) has no solution if coupled with (39) instead of (34).
After the preparatory work, we can now provide a workable description of ∂(L•S) in the following form.
Similarly, under assumption (A8 ) one obtains Proof. From Lemma 4.1 it follows that S is single-valued and Lipschitz. As the Lipschitz continuity of L 1 (t j , ·, ·) implies that ∂ ∞ L 1 (t j ,ȳ j ,z j ) = {0}, the assumptions of [33,Theorem 10.49] are satisfied, which implies is equivalent with (41)  According to Remark 3 and the comparison of both estimates, we conclude that the equality is ensured by the regularity of set gph N Zj and functions L 1 (t j , ·, ·) and L 3 (·, ·) at the points in question.
For the final part of the theorem, note that ∂(L • S)(ū) = ∂S(ū) T ∇L(S(ū)) = co D * S(ū)(∇L(S(ū))) ⊃ D * S(ū)(∇L(S(ū))), where the first equality comes from [8, Theorem 2.6.6] and the second one from [27,Proposition 2.11]. As the full row ranks of B and R imply by virtue of Lemma 4.5 that we are able to compute D * S exactly, the proof has been finished.
When solving numerically problem (4), or equivalently (28), we have to be able to compute the function value and one subgradient from ∂(L • C). The function value can easily be computed from (27) and for the computation of a subgradient, Theorem 4.6 can be used. We are able to compute such subgradient under full row ranks of matrices B and R or under the regularity assumptions from the previous theorem. We are fully aware that these regularity assumptions cannot be simply checked on the basis of given data. However, computational experience indicates that points evaluated during the solution process of (28) are almost exclusively regular and nonregularity may typically occur only in several last iterations without disturbing the convergence.
In the rest of the paper, we will address two issues. The first one is the choice of r j and q j which would satisfy both (34) and (32), the other one are more precise conditions under which the regularity conditions needed in the second part of Theorem 4.6 are satisfied.
If Z j is a polyhedral set, then from the proof of Theorem 2 from [14] we have the following characterization of Fréchet and limiting normal cones to the graph of the normal cone mapping: where U is some sufficiently small neighborhood of (z j , v j ) and is the critical cone to Z j at z j with respect to v j and T Zj (z j ) is the (Bouligand) tangent cone to Z j at z j . To be able to exploit this result, we will assume that Z j are polyhedral sets. The next lemma provides a basis, on which an algorithm for choosing the variables r j and q j will be derived. Moreover, it suggests a case in which the graph of the normal cone mapping is regular. One part of the proof is a specific case of [32,Theorem 3.5]. However, to use this part it would be necessary to define new objects and thus, we decided to provide a more technical but self-contained proof.
Lemma 4.7. Assume that Z j is a polyhedral set and choose anyv j ∈ rint N Zj (z j ). Then the critical cone K(z j ,v j ) is a linear subspace and gph N Zj is regular at (z j ,v j ). Moreover, the cones K(z j , v j ) coincide for all v j ∈ rint N Zj (z j ).
Proof. Since the critical cone is an intersection of a convex cone and a linear subspace, it is also a convex cone. To prove that K(z j ,v j ) is a linear subspace, it is sufficient to show that if s ∈ K(z j ,v j ), then −s ∈ K(z j ,v j ), or equivalently −s ∈ T Zj (z j ). Because Z j is convex, we have full duality between the normal and tangent cone, hence not only the standard relation N Zj (z j ) = T * Zj (z j ) but also the relationship N * Zj (z j ) = T Zj (z j ). For contradiction assume that −s / ∈ T Zj (z j ), which implies that there exists some r ∈ N Zj (z j ) such that −s, r > 0.
As s ∈ K(z j ,v j ), from the definition of the critical cone we know that s,v j = 0. Defines :=v j +λ(v j −r) with a positive λ. Sincev j ∈ rint N Zj (z j ) and r ∈ N Zj (z j ), for small λ > 0 we haves ∈ N Zj (z j ) and which is a contradiction withs ∈ N Zj (z j ) because s ∈ T Zj (z j ).
To check the regularity of gph N Zj , we need to prove that which is by virtue of (44) equivalent to where U is some small neighborhood of (z j ,v j ). We make use of the concept of a normal fan N to a set A, which is defined as a collection of all normal cones N A (x) over x ∈ A. By [23, Corollary 1] we know that if A is polyhedral, then for any two normal cones N 1 , N 2 ∈ N one has Take any z j ∈ Z j sufficiently close toz j . Choosing N 1 = N Zj (z j ) and N 2 = N Zj (z j ) in (47), we obtain either or N Zj (z j ) ⊂ N Zj (z j ), which, due to the polyhedrality of Z j , implies that Because v j →v j ∈ rint N Zj (z j ), the case (48) does not need to be considered in the computation of right-hand side of (46). Then (49) implies that in the union on the right-hand side of (46) it suffices to set z j =z j . To finish the proof, it remains to prove that the critical cone does not depend on the choice of v j . Choose any v j ,ṽ j ∈ rint N Zj (z j ). We need to prove that if s ∈ K(z j , v j ), then s ∈ K(z j ,ṽ j ) or equivalently s,ṽ j = 0. As K(z j , v j ) is a linear subspace by the previous statement, we also have −s ∈ K(z j , v j ) ⊂ T Zj (z j ), which implies that −s,ṽ j ≤ 0. This, together with s,ṽ j ≤ 0, implies the desired equality s,ṽ j = 0.
The previous lemma tells us that for anyv j ∈ rint N Zj (z j ) we have This allows us to solve find q j and r j which satisfy conditions (34), second adjoint equation (32) and second terminal condition (33). Indeed, the adjoint and terminal equations may be written in a compact form as for some s j . Due to Lemma 4.7 we can choose exactly one q j ∈ K(z j ,v j ) and exactly one −r j ∈ −K(z j ,v j ) * = K(z j ,v j ) * such that (34) and (50) are satisfied. We may summarize the algorithm for computing an element of the upper estimate of the subdifferential of ∂(L • S) as follows: 1. Givenū, compute the values ofȳ andz. Set s k = ν k , j = k, find any element and continue to step 2. 2. Ifz j ∈ int Z j , then set q j = s j . Otherwise find anyv j ∈ rint N Zj (z j ), compute K(z j ,v j ) and set q j to be the projection of s j onto K(z j ,v j ). In both cases proceed to step 3.

Compute
with the convention that p k+1 = q k+1 = 0. If j = 1, go to step 5, otherwise decrease j by one and proceed to step 4. 4. Find any element µ j ν j ∈ h j ∂L 1 (t j ,ȳ j ,z j ), set s j = q j+1 + h j (∇ z f j ) T p j+1 + ν j and return to step 2.
On the basis of Theorem 4.6 and the discussion onward we can now provide a final statement about the computation of subgradients from ∂(L • S). Since the proof has been basically completed in the previous text, we omit it. Moreover, if for all j = 1, . . . , k − 1 we have functions L 1 (t j , ·, ·) are regular at (ȳ j ,z j ) and L 3 (·, ·) is regular at (ȳ k ,z k ), then the computed point is actually an element of the subdifferential ∂(L • S)(ū).
Similarly, if (A8 ) is fulfilled and if matrices B and R have full row rank, then the computed point is an element of ∂(L • S)(ū) without the regularity requirements from the previous part.

Numerical examples.
We have already mentioned in the Introduction that most applications of the investigated model lie in various fields of physics. However, in this section we investigate a different type of application, namely optimization of the efficiency of a service point handling a queue. A similar model was studied in [21] where, however, no optimization was considered.
To solve the respective problem (28), we used two algorithms, both employing the subgradient information computed in Theorem 4.6. The first one was BT (Bundle Trust) algorithm proposed in [34], the other one a version of BFGS (Broyden-Fletcher-Goldfarb-Shanno) algorithm. The BT algorithm was designed especially for nonsmooth optimization while for BFGS we have used its modification from [22] which, under a weakening of the Wolfe condition, works well even for larger nonsmooth optimization problems. While for small problems, BT exhibited slightly better results, it often collapsed for larger problems and then BFGS has been used. Furthermore, since BT solves a quadratic optimization problem in every iteration, its time consumption was higher. Due to these reasons, we present here only the results computed by the BFGS algorithm. The implementation of BFGS was taken from Master thesis [35]. Example 1. Let us consider a service point handling a deterministic queue. Assuming that we know the customer inflow, we find an optimal rate, at which the service point should operate. Both the arrival and service rates depend on time and are considered as processes with continuous values. This means that even though it is possible to use this model for people behavior and round off the optimal values, we prefer to think about it as a model for product processing. Since agreements for such processing are often negotiated beforehand, even the deterministic inflow and outflow make sense in this case. This model can be also applied to control a water reservoir where the water inflow and outflow form a process with continuous values as well.
We denote the known number of customers who entered the system prior to time t by v(t), the rate at which the service point operates by u(t) and finally the queue length by z(t). We assume that customers may leave the queue, with the rate of leaving equal to g(z(t)) depending purely on the length of the queue. Further we assume that the maximum waiting capacity of the service point is unbounded, which leads to the definition Z := [0, ∞).
Under the above assumptions, the service point operates according to the model Due to the definition of the normal cone mapping, the condition z(t) ∈ Z is always satisfied. Moreover, if z(t) > 0, then the change in the queue length corresponds to the difference between the arrival and departure rates, hencė However, if the queue is empty, or equivalently z(t) = 0, then instead of (51) we obtainż (t) ≥v(t) − g(z(t)) − u(t).
As the penalization of large queues we have chosen the function This indicates that if there are more than five customers in the queue, some of them start leaving.
Settingẏ(t) =v(t) − g(z(t)) − u(t), we obtain the constraint system from problem (3)  Function L 1 penalizes the number of customers which are in the queue or have left it prematurely. The definition of L 2 models the situation where only a certain service rate is reachable by simple means and to exceed this rate it is necessary to pay additional costs. This rate was chosen as 30, which is slightly lower than the average customer arrival rate. The initial conditions naturally read y(0) = z(0) = 0 and the time interval was chosen [0, 10].
We decided not to penalize short queues and to set λ 1 = 0.01. It is not equal to zero, if it was, multiple global optima would arise. To ensure that the queue is cleared at the end of the time interval, we chose λ 3 = 1000. Fixing the previous two parameters, the problem can be understood as a multiobjective problem where one minimizes a tradeoff between the running costs and the queue length. The last two parameters were set to λ 2 = 50 and λ 4 = 0.2. Figure 1 summarizes the results. In the graph on the left-hand side, the full line represents the arrival ratev(t) and the dotted line represents the optimal service rate u(t). For the sake of completeness, we write the formula for the arrival ratė  We see that during customer arrival peak, the optimal service rate does not handle all arrivals and the queue begins to grow. As soon as the arrival rate decreases below 30, which is the maximum rate at which the service rate is not further penalized, the service rate will be higher than the arrival rate and the queue starts to disband until it is disbanded completely.
In the graph on the right-hand side of Figure 1 the development of the queue length is shown. We see that its length keeps below or slightly over the penalization threshold, which is in accordance with expectations.
Remark 4. When testing the convergence, we have created numerous academic test problems. In some of them, we achieved a good convergence, some examples caused minor faults and some even big difficulties or breakdowns during the optimization. In this short remark we provide several possible reasons for this unpleasant behavior: 1. Rounding errors resulting in a wrong subgradient. Even though S is Lipschitz, the algorithm presented after Lemma 4.7 still heavily depends on the non-Lipschitzian behavior of the normal cone mapping. If z j+1 is close to the boundary of Z j+1 , then the algorithm may fail to compute q j in a correct way.
Moreover, if z j + Ry j+1 − Ry j is close to zero, then the used solution method may lie in a nonregular point and either the computed subgradient does not have to be in the Clarke subdifferential or, if it is, its opposite direction does not have to be a descent direction as described in [2]. 2. Error accumulation due to the iterative computing in time-dependent problems. 3. Ill-conditioned approximation of the inverted Hessian matrix. During the optimization we encountered cases, in which the condition number of the approximation was of order −13. As most of the scripting languages are able to work only with 16 digits on default, there was a very small margin for error. 4. Ill-posedness of the data which often happens when the penalization of the control variable is too low compared to the penalization of the state variables. This occurs especially when the time mesh is too fine because, in this case, big changes of the control variable cause only small changes of the state variables.
Appendix A. Auxiliary lemmas. The first lemma is a slight modification of the well-known fact that simple functions are dense in L p for any p ∈ [1, ∞).
Lemma A.1. Let u ∈ L p ([0, T ]) for any p ∈ [1, ∞) and {0 = t k 0 ≤ t k 1 ≤ · · · ≤ t k k = T } be a collection of divisions of the interval [0, T ] such that their norms converge to zero. Then there exists sequence {u k } of simple functions, constant on intervals [t k j , t k j+1 ) for every j = 0, . . . , k − 1, which converges to u in L p . Proof. Since continuous functions with compact support are dense in L p , we can construct a sequence of continuous functions y k such that y k → u in L p . Define now the function u k as follows: for t ∈ [t k j , t k j+1 ). Since any continuous function on a compact set is uniformly continuous by the Heine-Cantor Theorem, we immediately obtain that u k −y k ∞ → 0, which leads to the statement of the lemma..
The following lemma is a discrete version of the Gronwall's Lemma. We have taken the full version from the original source [16] and simplified it to the needed form stated in the corollary. Note that θ = 0 corresponds to the forward Euler discretization scheme and θ = 1 to the backward one.

28
LUKÁŠ ADAM AND JIŘÍ OUTRATA Corollary 1. Suppose that for j = 0, . . . , k − 1 one has a j+1 − a j h j ≤ g j + λa j with λ > 0, h j > 0 and k−1 j=0 h j = T . Then there exists constants C 1 and C 2 depending only on λ, a 0 and T such that for all j = 1, . . . , k the following estimate holds true Proof. The imposed assumptions correspond to Lemma A.2 with θ = 0 and λ j = λ. Then the estimate takes the from a j ≤ a 0 (1 + λh j−1 ) Plugging this into (53) we obtain a j ≤ a 0 e λT + j−1 n=0 h n g n 1 e λT , which was to be proven.