Differentiability in perturbation parameter of measure solutions to perturbed transport equation

We consider a linear perturbation in the velocity field of the transport equation. We investigate solutions in the space of bounded Radon measures and show that they are differentiable with respect to the perturbation parameter in a proper Banach space, which is predual to the H\"older space $\mathcal{C}^{1+\alpha}(\mathbb{R}^d)$. This result on differentiability is necessary for application in optimal control theory, which we also discuss.


INTRODUCTION
Analysis of perturbations in partial differential equation systems is an important issue. Structured population models ( [GLM10], [GM10], [CCGU12], [CGR18]) and vehicular traffic flow ( [EHM16]) were investigated for Lipschitz dependence on initial conditions in space of measures. However, the differentiability (not only Lipschitz dependence) is necessary for the application in optimal control theory or linearised stability. Previous considerations concerning the transport equation in the space of measures did not allow to analyse the differentiability of solutions with respect to a perturbation of the system ( [PF14]).
In this paper we consider solutions to a perturbed transport equation in the space of bounded Radon measures, denoted by M(R d ), where the perturbation is linear in the velocity field.
Consider the initial value problem for the transport equation in conservative form where the velocity field (t → b(t, ·)) ∈ C 0 ([0, +∞) ; C 1+α (R d )), the initial condition is a probability measure and w(t, x) ∈ C 1+α ([0, ∞) × R d ). By (·) * we denote the topological dual to (·), when the latter is equipped with a suitable locally convex or norm topology; C 1 c is the space of continuous functions with compact support and C 1+α is the space of functions of those which first order partial derivatives are Hölder continuous with exponent α, where 0 < α ≤ 1.
Existence and uniqueness of solutions to equation (1.1) was proved in [Man07], see Lemma 2.1. The solution µ t : [0, ∞) → P(R d ) is a narrowly continuous curve (by [Man07], Lemma 3.2). Recall that a mapping [0, ∞) ∋ t → µ t ∈ P(R d ) is narrowly continuous if t → R d ηdµ t is a continuous function for all η in the space of continuous and bounded functions defined on R d , We start by defining a weak solution to equation (1.1).
We introduce a perturbation to the velocity field b as follows : and h ∈ R, close to 0.
The perturbed problem corresponding to (1.1) has the form Notice that the initial conditions in (1.1) and (1.4) are the same (µ t=0 = µ h t=0 = µ 0 ). For the purpose of further considerations, without loss of generality, we may assume that h ∈ [− 1 2 , 1 2 ]. Before stating the main result, we need to define an appropriate Banach space. First recall that the Hölder space C 1+α (R d ) is a Banach space with the norm The space of Radon measures M(R d ) inherits the dual norm of (C 1+α (R d )) * by means of embedding the former into the latter, where a measure is identified with the functional defined by integration against the measure. Throughout we identify the former with the subspace of (C 1+α (R d )) * . Let then which is a Banach space equipped with the dual norm · (C 1+α (R d )) * .
We show in Proposition 5.3 that such defined Z is a predual space of C 1+α (R d ): Z * is linearly isomorphic to C 1+α (R d ).
The following theorem is the main result of this paper.
This result is required for various applications. One that we shall discuss here is application to optimal control theory.
As additional results we have further characterizations of the Banach space Z, presented in Section 5. First, Z is separable. The rational span of Dirac measures at rational points is a dense countable subset. Moreover we have that Z * is linearly isomorphic to C 1+α (R d ).
The outline of the paper is as follows. Section 2 is devoted to preparing the necessary background in functional analysis. The proof of Theorem 1.1 is treated in Section 3. In Section 4 by discuss possible applications of the result of this paper. Characterization of the space Z is presented in Section 5.
A solution to (2.7), X b is called a flow map. Note that the flow maps are defined for all t ∈ R and thus y → X b (t, y) is a one-parameter group of diffeomorphisms on R d (dependent on the variable b).
Remark. The requirement (t → b(t, ·)) ∈ C 0 [0, +∞) ; C 1 (R d ) is sufficient to conclude that y → X b (t, y) is a diffeomorphism. Higher regularity is needed when we estimate remainder terms of a Taylor expansion in the final proof of Theorem 1.1 (see e.g. equation (3.10)).
Now we define the push-forward operator [AGS08]. If Y 1 , Y 2 are separable metric spaces, µ ∈ P(Y 1 ), and r : Y 1 → Y 2 is a µ-measurable map, we denote by µ → r#µ ∈ P(Y 2 ) the pushforward of µ through r, defined by The following lemma guarantees that a weak solution µ t is probability measure.
Lemma 2.1 (A representation formula for the non-homogenous continuity equation [Man07] Remark. Since in our case (t → b(t, ·)) ∈ C 0 [0, +∞) ; C 1+α (R d ) then b is globally Lipschitz and thus the solution X t is global. Also w(t, x) satisfies the assumption in Lemma 2.1. Thus we conclude that (1.1) has a unique weak solution t → µ t , that is defined for all t.
In fact the representation formula could be generalized for the case when µ is a non-negative measure M + (R d ). Knowing that, we can also consider non-positive measures as an initial condition.
3. PROOF OF MAIN RESULT -THEOREM 1.1 is a subspace of (C 1+α (R d )) * . The space Z inherits the norm of (C 1+α (R d )) * . Knowing that Z is complete it is enough to show that proper sequence of differential quotient is a Cauchy sequence.
The analogue of (2.7) for the system associated to perturbed equation (1.4) with velocity field defined by ( As before, y → X h (t, y) is a diffeomorphism. To underline the dependence of X h (t, x) on the parameter h from now on we will use the notation X(t, y; h) := X h (t, y).
See Appendix for the proof. We are in the position to prove the main result.
First, we are going to show differentiability at h = 0. Notice that for every λ ∈ R, For the first part it suffices to show that can be made arbitrary small, when h 1 and h 2 are sufficiently close to 0. Then for any sequence h n → 0, is a Cauchy sequence in (C 1+α (R d )) * . Hence, converges to a limit that is the same for each sequence (h n ) such that h n → 0.
First we use representation formula (Lemma 2.1) and the fact that y → X(t, y; h) is a diffeomorphism. Introduce for convenience w(s, y; h) := w(s, X b (s, y; h)).
In I (2) h 2 expand e t 0 w(s,y;h 1 )ds and e t 0 w(s,y;h 2 )ds into Taylor series in h = 0.

|I
(1) To summarize estimations of |I (1) is just finite number (Lemma 3.1), • e Thus I h 1 ,h 2 can be made arbitrarily small when h 1 and h 2 are sufficiently close to 0. Therefore we have shown that is a Cauchy sequence for every λ n → 0 in (C 1+α (R d )) * for h = 0, with the same limit. Hence µ h t is differentiable with respect to parameter h at h = 0.
The same argumentation works for h = 0. Let us consider a sequence

APPLICATION TO OPTIMAL CONTROL
The previous results discussed above can be applied in optimal control theory. Differentiability of solutions with respect to parameters is necessary in the method of steepest descent or in some other gradient methods of optimization (like Newtonian).
In control theory, the control is based on observation of the state of the system at each or some finite points: Thus, a reasonable class of differentiable observation function φ is provided by the composition of a continuous linear functional on Z and f ∈ C 1 (R, R). In Proposition 5.3 we show that every continuous linear functional on Z is represented essentially by integration with respect to a C 1+α (R d )-functiondenote it here by K.
Thus, aiming at optimal control of the solution to (1.4), where h is a control parameter, attaining values in R, we start by considering functionals of the form where γ is a C 1 -function and K ∈ C 1+α (R d ).
The meaning is essentially the following: the integral operator R d K(x)dµ(x) is well-defined for µ being a measure and necessary not every element from the space Z is measure. Following lemma provides extension of the domain to whole space Z.
Lemma 4.1 (Extension Theorem). [AE08, Theorem 2.1] Suppose X and Y are metric spaces, and Y is complete. Also suppose X 1 is a dense subset of X, and f : X 1 → Y is uniformly continuous. Then f has a uniquely determined extension f : X → Y given by and f is also uniformly continuous.
In our case the operator R d K(x)dµ(x) is of course well-defined for any µ ∈ M(R d ) and it can be uniquely extended to Z = M(R d ) (span{δ x : x ∈ R d } is dense subset of Z, Proposition 5.1). Denote this uniquely determined extension by where ·, · is dual pair. Thus the functional corresponding to (4.11) has the form Now consider the problem of finding the minimum of γ(µ h ). That is, we wish to find an h * ∈ R such that γ(µ h * ) ≤ γ(µ h ) for all h ∈ R.
A necessary condition for µ h * realizing a minimum is that the gradient of the function γ is zero at µ h * (4.13) For this condition to be satisfied it is necessary that h → γ(µ h ) ∈ C 1 (Z, R). This is guaranteed by the following lemma when combined with are main result, Theorem 1.1.
Proof. What we want to show is that if K ∈ C 1+α (R d ) then the functional µ → K, µ C 1+α (R d ),Z is linear and bounded on Z. Then γ( K, µ C 1+α (R d ),Z ) ∈ C 1 (Z, R), as a composition of C 1function and a bounded linear functional.
Of course to investigate condition (4.13) we need also differentiability of µ h with respect to parameters h. This is provided by Theorem 1.1.
Remark. If γ is convex then condition (4.13) is not only necessary but also sufficient for µ h * to realize a minimum.
Of course there are many optimization methods for non-convex function (like steepest descent) which does not depend on finding derivative analytically and then setting it to zero. Important fact is that the derivative exists and it makes us sure that a method converges in reasonable time -the step size of iteration depends on the gradient (or the approximate value of gradient).

Further application.
In [CGR18] authors consider optimization in the structured population model defined by (4.14) where t ∈ [0, ∞) and x ∈ R + is a biological parameter, typically age or size. The unknown µ t is a time dependent, non-negative and finite Radon measure. The growth function b and the mortality rate w are strictly positive, while the birth function β is non-negative -b, w, β are Nemytskii operators. By D λ µ t we denote the Radon-Nikodym derivative of µ t with respect to the Lebesgue measure λ computed at 0. The initial datum µ 0 is a non-negative Radon measure.
Remark. The reason for analyzing solutions to structured population models in the space of measures is as follows: typical experimental data are not continuous, they provide information on percentiles, i.e., the number of individuals in some intervals of the structural variable (like age). In the case of demography and epidemiology a number of births are typically used per years.
Aiming at the optimal control of the solution to (4.14), a control parameter h is introduced (possibly time and/or state dependent), attaining values in a given set H. Therefore, we obtain: The goal is to find minimum of a given functional within a suitable function space i.e. to find an h * ∈ H such that J (µ h * ) ≤ J (µ h ) for all h ∈ H.
Aiming at the optimal control problem in [CGR18] the Escalator Boxcar Train (EBT) algorithm is adapted (defined in [GJMU14]), i.e. an appropriate ODE system is used approximating the original PDE model. Authors mention that solutions to conservation or balance laws typically depend in a Lipschitz continuous way on the initial datum as well as from the functions defining the equation. This does not allow the use of differential tools in the search for the optimal control.
Thus, knowing that solution to the transport equation is differentiable with respect to parameter, mathematical tools applied to (4.16) can be extended by e.g. gradient methods.

CHARACTERIZATION OF THE SPACE Z
In this section we establish some further properties of the space Z defined by (1.6). The identification of the dual space Z * in Proposition 5.3 is particularly interesting eg. in view of the application to control theory, discussed in Section 4. By δ x we denote the Dirac measure concentrated in x.
Proposition 5.1. Let Z be given by (1.6). Then the set span{δ x : x ∈ Q d } is dense in Z with respect to the (C 1+α (R d )) * -topology, i.e.
Consequently, Z is a separable space.
We consider bounded Radon measures, thus for any µ ∈ M(R d ) and for any ε > 0 there exists R ε such that |µ|(R d \ B(0, R ε )) ≤ ε 2 . The closure of a ball B(0, R ε ) in R d as a compact set has finite cover , where g i ∈ Q d . Denote by B i := B(g i , ε 4 µ TV ). Then define Notice that g i (the center of B i ) is not necessarily contained in U i,ε . In case g i is not contained in U i,ε we take any other point of the ball B i contained in U i,ε , we will denote this point the same way, realising the slight abuse of notation.
For any µ ∈ M(R d ) and any ε > 0 we consider µ ε = n(ε) i=1 µ(U i,ε ) · δ g i (linear combination of Dirac deltas concentrated at points g i ∈ Q d ). Denote by µ := µ| B(0,Rε) the measure restricted to B(0, R ε ). Then the following holds: We need to estimate the following And now we get that for any µ ∈ M(R d ) there exists an element µ ε ∈ span{δ x : Before giving and proving this characterization, we need the following lemma.
Proof. For f ∈ C 1+α (R d ), λ ∈ R d and x ∈ R d define Dδ(x) ∈ L(R d , Z) by means of By • we denote an inner product on R d . Thus, λ • ∇f (x) relates to the gradient of f in the direction given by λ.
Proof. We need to show that T is bijection from (Z * , · Z * ) to (C 1+α (R d ), · C 1+α ) such that In addition T is bounded. By Banach Isomorphism Theorem, T −1 is bounded.
Step 1. Obviously the mapping defined by T φ(x) = φ(δ x ) maps Z * into R R d , where by R R d we denote a function space from R d to R. The mapping T is injective, because if z * 1 = z * 2 then using density of span{δ x : x ∈ R d } in Z (Proposition 5.1) there exists x ∈ R d such that Since span{δ x : x ∈ R d } is dense in Z, there exists {z n } n∈N ⊂ span{δ x : x ∈ R d } such that z n → z. Functionals z * 1 , z * 2 are continuous and thus there exists n such that z * 1 (z n ) = z * 2 (z n ).
Step 3. To prove the opposite inclusion C 1+α (R d ) ⊆ im(T (Z * )), let us consider an arbitrary y ∈ C 1+α (R d ). We want to show there exists z * y such that y = T z * y . Define a functional z * y (δ x ) := y(x). Our goal is to show that z * y ∈ Z * . It is enough to consider only z ∈ span{δ x : x ∈ R d } and then |z * y (z)| = |z * y ( functional z * y is linear thus above is equal to | n i=1 α i · z * y (δ x i )|. Using the definition of z * y the following holds Thus z * y Z * = sup{z * y (z) : z Z ≤ 1} ≤ y C 1+α (R d ) .