Optimal transport over nonlinear systems via infinitesimal generators on graphs

We present a set-oriented graph-based framework for continuous-time optimal transport over nonlinear dynamical systems. Our approach allows us to recover provably optimal control laws for steering a given initial distribution in phase space to a final distribution in prescribed finite time for the case of nonlinear control-affine systems. The action of the controlled vector fields is approximated by a continuous-time flow on a graph obtained by discretizing the phase space. The edge weights of this graph are obtained by computing infinitesimal generators of the Perron-Frobenius operator of the control vector fields. The problem is reduced to a modified Monge-Kantorovich optimal transport on this graph, motivated by the Brenier-Benamou fluid dynamics formulation. The well-posedness of the optimal transport problem on graph is related to controllability of the underlying dynamical system. The resulting convex problem is solved via state-of-the-art solvers to global optimality. Using our computational framework, we study optimal transport in dynamical systems arising in chaotic fluid dynamics and non-holonomic vehicle dynamics. The solutions to the optimal transport problem elucidate the role played by invariant manifolds, lobe-dynamics and almost-invariant sets in efficient transport of distributions. Our work connects set-oriented operator-theoretic methods in dynamical systems with optimal mass transportation theory, and also opens up new directions in design of efficient feedback control strategies for nonlinear multi-agent and swarm systems operating in nonlinear ambient flow fields.


Introduction
Understanding, computing and controlling phase space transport is of utmost importance in the study of nonlinear dynamical systems. For computation of phase space transport in dynamical systems, the available techniques can be divided into roughly three classes: geometric, topological and statistical (or operator theoretic) methods.
The geometric methods, originating in Poincaré's [1] work in celestial mechanics, aim at extracting structures in phase space that organize transport. In recent years, the focus in this field has been on extracting the Lagrangian coherent structures in autonomous and non-autonomous systems, which are often the stable and unstable manifolds [2] of fixed points or periodic orbits, or their time-dependent analogues [3,4]. Related techniques based on lobe-dynamics [2] allow for quantifying transport between different weakly mixing regions in the phase space. Application to low-dimensional systems arising in fluid kinematics [5,6], celestial mechanics [7,8], and plasma physics [9] have been developed over the years. Once these Lagrangian structures have been identified, intelligent control strategies can be formulated to obtain efficient phase space transport between the desired regions in the phase space; see Refs. [10,7,11,12,13,14] for some recent work in this area. Furthermore, methods for computing impulsive perturbations to differential equations for purpose of optimal enhancement of mixing in phase space have also been developed recently [15,16].
The topological methods, based on the idea of 'topological forcing' [17], provide rigorous bounds and sharp estimates of certain transport related quantities. For example, such methods have been applied in the the study of passive scalar mixing in laminar fluid flows [18,19]. Some topological optimal control problems have also been studied [20].
Operator-theoretic statistical techniques [21] are based on lifting the evolution from the state space to the space of measures, in case of the Perron-Frobenius (or transfer) operator, and to the space of observables, in case of the Koopman operator. In both cases, the resulting (uncontrolled) dynamical system is linear, albeit in infinite dimensions. The linearity allows for immediate application of techniques from linear algebra, and the toolbox of linear dynamical systems. Numerical methods based on operator theory have been developed, and applied to several problems of contemporary interest [22,23,24,25,26]. Recent work has also shown that combining the statistical methods with geometric [27,28,29], or topological methods [30,31] can give further qualitative and quantitative information about phase space transport.
The set-oriented numerical methods for computing transfer operators [23,25] are especially promising. Using efficient phase space discretization techniques, these methods enable discovery of 'coherent sets' in autonomous [32] and non-autonomous [33] dynamical systems. Furthermore, the setting of set-oriented methods is especially well-suited for development of rigorous methods for controlling phase space transport in various settings. For example, optimal control algorithms using set-oriented methods have been developed [34,35,36].
In this paper, we are interested in the problem of continuous-time 'optimal transport' [37] of phase space measure under controlled nonlinear dynamics. This problem involves optimally steering an initial measure on a phase space X, to a final measure in given finite time. Specifically, we consider nonlinear control-affine systems of the form,ẋ The aim of the problem is to compute controls u such that the cost of transporting a measure µ t0 to µ t f over the time-horizon [t 0 , t f ] is minimized. This cost is given by the integral over phase-space and time, |u(x, t)| 2 dt dµ(x). ( This problem arises naturally in the realm of nonlinear control systems, and is especially relevant in systems which exhibit sensitivity to initial conditions. In most applications, there is uncertainty associated with initial and final states. Hence, the initial and final states can only be specified as probability distributions. In this case, the measures involved are probability measures, and hence, the optimal transport cost C is the expectation of control cost over all possible initial and final states. Another motivation for studying this problem comes from the field of multi-agent systems or swarm control. The problem of path planning and control of a swarm of homogenous agents in an ambient nonlinear flow field can be formulated as optimal transport problem under nonlinear dynamics. For instance, the control of magnetic particles in blood stream [38,39,40], robotic bees in air [41,42], and swarms of autonomous underwater vehicles (AUVs) in the ocean [43] can all be studied in this setting. Here, one represents the distribution of swarms in phase space by measures.
Related work was pursued in Refs. [44,45,46], where an optimal control framework for asymptotic feedback stabilization of arbitrary initial measure to an attractor is presented. This framework is based on computing a (control) 'Lyapunov measure', which is a measure-theoretic analogue of control Lyapunov function. Also relevant is the work in the area of occupation measures, see Ref. [47]. In Ref. [48], the problem of computing optimal local perturbations for enhancing mixing is addressed using statistical methods. A set-oriented transfer operator approach is used to propagate the dynamics, and perturbations are modeled via a stochastic kernel. The resulting convex optimization problem is then solved for perturbations that lead to minimum difference in L 2 norm from the desired density at each time step. In Ref. [49], infinitesimal generators are employed to solve the same problem in the continuous-time setting.
The field of optimal mass transportation [37] is concerned with optimal mapping of measures in different settings, and has deep connections with phase space transport in dynamical systems [50,51,52]. Hence, it forms a natural setting in which to study the problem of interest. The problem of optimal transport in linear dynamical systems has been studied recently [53,54,55], resulting in several theoretical and computational advances. In previous related work of the authors [56], a computational method to obtain optimal discretetime perturbations which result in desired measure transport under nonlinear dynamics in finite time was presented. The discrete-time perturbations were obtained by solving a Monge-Kantorovich optimal transport problem on graphs in pseudo-time. The perturbations were modeled as instantaneous, and full controllability was assumed.
In the current work, we develop a set-oriented graph-based framework for continuous-time optimal transport over nonlinear dynamical systems of the form given in Eq. (1). Since set-oriented methods have been very successful in the study of complex dynamical systems, it is desirable that the optimal transport be computed in such a setting. The action of the controlled vector fields is approximated by a continuous-time flow on a graph obtained by discretizing the phase space. The edge weights of this graph are obtained by computing the infinitesimal generator of the Perron-Frobenius operator of the control vector fields. The problem is reduced to a modified Monge-Kantorovich optimal transport on this graph. The well-posedness of the optimal transport problem on graph is related to controllability of the underlying dynamical system. Application of this method to systems related to chaotic fluid dynamics and non-holonomic vehicle dynamics elucidates further the role that invariant manifolds, lobe-dynamics and almost-invariants sets play in efficient transport.
The rest of the paper is organized as follows. In Section 2, we give briefly review the theory of optimal mass transport, and a continuous time approach to solving the canonical optimal transport problem. Previous work on optimal transport problem with cost coming from control with linear and nonlinear dynamics is discussed next. An overview of set-oriented approach in dynamical systems via transfer operators and infinitesimal generators is given in Section 3. We also discuss the concept of advection on graphs, and graphbased formulation of the canonical optimal transport problem. In Section 4, we develop our framework of set-oriented graph-based optimal transport which generalizes this canonical formulation to nonlinear systems with control-affine structure. In Section 5, we present the application of our method to optimal transport in three problems: Grushin plane, time-periodic double-gyre system, and the unicycle model. Finally, in Section 6, we provide conclusions and discuss some avenues of future research.

Monge-Kantorovich problem
The Monge-Kantorovich optimal transport (OT) problem [37] is concerned with mapping of an initial measure µ 0 on a space X to a final measure µ 1 on a space Y . In the original formulation, it involves solving for a measurable transport map T : X → Y , which pushes forward µ 0 to µ 1 in an optimal manner. The cost of transport per unit mass is prescribed by a function c(x, T (x)). Hence, the optimization problem is The pushforward constraint in this problem makes the optimization problem highly nonlinear and nonconvex. The existence and uniqueness of optimal solutions for various settings has been major topic of research. In a 'relaxed' version of this problem, due to Kantorovich, the optimization problem is to obtain an optimal joint distribution π(X × Y ) on the product space X × Y , where the marginal of π on X is µ 0 and on Y is µ 1 . We denote by (µ 0 , µ 1 ) the set of all measures on product space with the marginals µ 0 and µ 1 on X and Y respectively. Hence, the relaxed problem is min π(X×Y )∈ (µ0,µ1) c(x, y)dπ(x, y) For the case of quadratic costs, i.e., c(x, y) = x − y 2 , the support of the optimal distribution π(X × Y ) is the graph of the optimal map T obtained from the solution of problem 3. The optimal cost obtained as solution of this problem is called the 2−Wasserstein distance, and we denote it by W 2 (µ 0 , µ 1 ).

Benamou-Brenier fluid dynamics approach
A major conceptual and computational breakthrough in the optimal transport theory was the development of fluid dynamical interpretation of OT problem by Brenier-Benamou [57]. This work provides the first connection of OT with dynamical systems, and evolution equations in general. The OT problem described in the previous section is concerned with only the initial and final measures. The optimal map T is known to be of the form T (x) = ∇ψ(x) for some convex function ψ(x). Then, the pushforward constraint can written as, Numerically solving the optimization problem with nonlinear constraint given in Eq. (5) is difficult. The fluid dynamical approach, first developed in Ref. [57], provides a more tractable way to the solution of this problem. In this approach, the optimization problem is formulated in terms of an advection field u(x, t), and initial and final densities (ρ 0 (x), ρ 1 (x)) of a passive scalar. The core idea is to obtain the optimal map T as a result of advection over a 'time' period (t 0 , t f ) by an optimal advection field u(x, t). It can be shown that the optimization problem given by Eq. (3) (with X = Y = R n ) with quadratic cost is equivalent to the following problem: This aim of this problem can be understood as minimizing the time integral of the total kinetic energy of the 'fluid' being advected with velocity field u(x, t), subjected to initial and final densities of the passive scalar. The motion of a passive scalar is governed by the ordinary differential equatioṅ Furthermore, the optimal advection field is a potential flow, i.e., u(x, t) = ∇φ(x, t) for some potential field φ(x, t). The Euler-Lagrange equations for this optimization problem can be written as which are pressure-less version of Euler equations. By a change of variables from (ρ, u) to (ρ, m ∆ = ρu), the optimization problem in Eq. (17) can be put into a form where its convexity can be proved easily. The transformed optimization problem is In the above equations, the constraints are now linear in the problem variables (ρ, m). The term inside the integral in the cost function, |m(x,t)| 2 ρ(x,t) , is of the 'quadratic-over-linear' form, and can be shown to be convex. Hence, by a transforming the transport problem into continuous time, and by using a change of variables, the non-convex problem in Eq. (3) has been converted into a convex problem in Eq. (10).

Optimal transport over dynamical systems
Next, we discuss the generalization of the optimal transport problem on continuous spaces to general nonlinear controlled dynamical systems, following Ref. [58,54]. Consider the following system, The problem of optimal transport for this system can posed as finding optimal control which steers an initial scalar density to a final density, where the scalar transport occurs due to combination of an ambient flow field f (x(t), 0), and control u(t). This problem can be posed by replacing the cost function c with a Lagrangian representing the optimal control cost i.e., under the constraint given by evolution Eq. (11), and where U is some set of allowable controls. While the existence and regularity of the corresponding optimal transport problem has been studied extensively in recent literature [58,50,59], computation of transport maps for general nonlinear systems is a difficult problem. In case when the underlying optimal control problem can be solved analytically, the corresponding transport problem can also be tackled.
For the special case of linear dynamical systems with quadratic cost, mirroring the optimal control case, further analytical development and computational simplification has been made [53,54]. As described in [54], consider the following setup: The generalization of Benamou-Brenier approach to the corresponding optimal transport problem can be seen to be the following: The linearity of the dynamics allows for simplification of this formulation in the following two ways: • First, the point-to-point optimal control cost for the system in Eq. (14)(15)(16), and the corresponding trajectory, can be explicitly written down in terms of its controllability Gramian. Given the state transition matrix Φ, the least energy needed to move the system state from x to y in unit time is where • Second, since the dynamics are linear, one can find a linear transformation C in which the problem can be reduced to the original Monge-Kantorovich form. This transformation is the linear map : The initial point x is first mapped to its final location under the natural (i.e. u = 0) dynamics given byẋ(t) = A(t)x(t), using the state transition matrix Φ 10 . This is followed by multiplication of the resulting statex Φ 10 x, and y by M − 1 2 10 . As a result of above properties, the optimal transport problem for linear quadratic case can been transformed into two coupled but well understood problems. The first problem involves computing point-to-point optimal control cost in the linear-quadratic setting, and the second problem involves solving the classical Monge-Kantorovich problem in the transformed coordinates, where the transformation is linear in original state variables, and captures the dynamics of the system. These simplifications even enable analytical solutions to the optimal transport problem, for case of Gaussian initial and final measures [54]. We note that the optimal transport problem given by Eq. (17) can also be interpreted as the problem of optimally steering a dynamical system from a probabilistic initial state to a probabilistic final state. Note that the dynamics of the system are still taken to be deterministic; however see Ref. [55] for connections with stochastic dynamical systems.

Set-oriented Methods and Optimal Transport on Graphs
In contrast to the linear-quadratic case considered in the previous section, an explicit linear change of variables in phase space to convert the optimal transport problem to the Monge-Kantorovich form is not available for general nonlinear dynamical systems Eq. (11), or even for systems of the form given in Eq.
(1). Hence, we lift our dynamical system into the space of measures and work with transfer operators and their generators. These objects are approximated via set-oriented methods. The corresponding advection equations can be formulated on a graph obtained by the set-oriented discretization method. We discuss these objects next.

Transfer Operator and Infinitesimal Generator
Consider the flow-map φ t0+T t0 : X → X on a d-dimensional phase space X. This map may be obtained as a time-T map of the flow of a possibly time-dependent dynamical system, The corresponding Perron-Frobenius transfer operator [21] P t0+T t0 is a linear operator which pushes forward measures in phase space according to the dynamics of the trajectories under φ t0+T t0 . Let B(X) denote σ−algebra of Borel sets in X. Then, for any measure µ, The transfer operator lifts the evolution of the dynamical systems from phase space X to the space of measures M(X). Numerical approximation of P , denoted byP , may be viewed as a transition matrix of an N -state Markov chain [25]. For computation, we partition the phase space volume of interest into N d−dimensional connected, positive volume subsets, B 1 , B 2 , . . . , B N with piecewise smooth boundaries ∂B i . Usually, these subsets are hyperrectangles. The matrixP = {p ij } is numerically computed via the Ulam-Galerkin method [60,25], as followŝ wherem is the Lebesgue measure. The action of the transfer operator over a finite time T can also be defined naturally on densities in the case of Lebesgue absolutely continuous measures. However, we are more interested in capturing the continuous-time behavior of the dynamical system in Eq. (20) in the space of densities. The continuity equation for system in Eq. (20), is given by For the numerical approach used in this paper we briefly consider the Eq. (23), in a operator theoretic framework, as an abstract ordinary differential equation in the space of measures, formally. Eq. (23) can be expressed asμ where ) ⊂ M(X) and the solution, µ(t), of Eq. (24) can be expressed using a two-parameter semigroup of operators (U(t, s) s,t∈R,t≥s as µ(t) = U(t, s)µ s . The divergence operation is to be understood in the sense of duality of M (X) with C(X) (assuming X is compact). Here C(X) refers to the space of continuous functions on X. The Perron-Frobenius operator is related to this two-parameter semigroup of operators as U(T, t0) = P t0+T t0 for given parameters t 0 and T . In general, guaranteeing the existence of a strongly continuous two-parameter semigroup based on the time-dependent generator A(t) is quite involved. See for example Refs. [61,62]. In contrast, the theory is more well-developed for the case when A(t) ≡ A, (the vector field f (x) is time independent). In this case, the solution, µ(t), can be expressed by a one-parameter semigroup of bounded operators, (T (t)) t≥0 , as µ(t) = T (t − s)µ s . Here, the generator A and T (t) are related by the formula As in the case of the Perron-Frobenius operator, one can also consider the semigroup and it's generator on a space of densities (or equivalently on a space of measures absolutely continuous with respect to a reference measure with additional regularity restrictions).
due to vector field f .
Ulam's method for approximating Perron-Frobenius operators using Markov matrices extends to numerical approximations of semigroups corresponding to the continuity equation. Analogously, one approximates the generator of the semigroup using transition rate matrices, which generate approximating semigroups on a finite state space. We recall this method as shown in [63]. The operator A(t) is approximated by defining elements of time-varying transition rate matrix {A ij (t)}, which are computed as follows, where n ij is the unit normal vector pointing out of B i into B j ifB i ∩B j is a (d − 1) dimensional face, and zero vector otherwise. See Fig. 1. Note that in [63], the authors also considered the perturbed version of the operator, This was mainly to exploit the spectral properties of the perturbed operator and the corresponding semigroup. However, in this work the perturbed operator does not offer any visible advantages. Hence, we work with approximations of the operator, −∇ · (f (x, t)·), alone. Nevertheless, we note that the discretization will introduce some numerical diffusion.

Monge-Kantorovich Transport on graphs
Now consider a directed graph G = (V, E) on X, where the set of vertices V represent the subsets B i , and the set of directed edges E are obtained from the topology of X. For each pair of neighboring vertices, two edges are constructed, one in each direction.
A continuous-time advection on such a graph can be described [64,65] as, where U (t, e) is the flow on the edge e. Here we use the notation e = (v → w) to represent the edge e directed from a vertex v to w. The notion of optimal transport has been extended to such a continuous-time discrete-space setting recently [66,67]. Following [67], one can formulate a quadratic cost optimal transport problem on G as follows. First, define an advective inner product between two flows U 1 , U 2 as Then the corresponding optimal transport distance between a set of measures (µ 0 , µ 1 ) supported on V can be written asW such that Eq (27) holds, and Here U (t, .) µ(t,.) U, U µ . This approach is motivated by the previously discussed Benamou-Brenier approach for optimal transport on continuous spaces, and results in the following advection based convex optimization problem.
where J(t, e) µ(t, v)U (t, e) for e = (v → w), and D ∈ R |E|×|V | is the linear flow operator computing µ(w) − µ(v) for each e = (v → w) ∈ E. The change of variables from U to J is analogous to the change of variables in Brenier-Benamou formulations, as discussed in Section 2.2. The convergence ofW N to W 2 , the 2-Wasserstein distance on a continuous phase space (a d-torus), as N → ∞ is studied in Ref. [66]. The convergence analysis is performed by constructing discrete measures over cubes, and discrete vector fields by integration of flow over boundaries (called 'momentum vector fields'). This process is analogous to the above mentioned construction of discrete measures on V , and representation of flow by edge-based variable J. While the aim of that paper is to show convergence of a different distance (denoted W N ) to W 2 , the distanceW N is used an intermediate measure. It is shown that for absolutely continuous measures, ∃δ > 0 s.t. for any N > 0, > 0, where z(δ, N ) = o( δ 2 N ). Here we have abused notation by using (µ 0 , µ 1 ) for measures in both continuous and discrete spaces. The constant c is independent of discretization. The convergence ofW N follows by a simple sandwiching argument. We drop the subscript N in the rest of the paper.

Problem Setup and Computational Approach
Then given the densities ρ 0 and ρ 1 on M , the corresponding optimal transport problem of interest is the following We adopt a (semi-)discretize and optimize approach to numerically solve the problem. We approximate the optimal transport problem using a sequence of optimal transport problems on graphs. A key tool is to approximate the (time-varying) generator of the semigroup corresponding to the Eq. (36) using generator approximations on a finite state space as described in [63] and briefly discussed in Section 3.1. Hence, we approximate solutions of optimal transport problems on an Euclidean space using solutions of optimal transport problems on graphs. Towards this end, we partition M into m d-dimensional connected, positive volume subsets P m = {B 1 , B 2 ..., B m }. Additionally, we assume that the boundaries ∂B i are piecewise smooth. Then we can consider the optimal transport problem on a graph G = (V, E) where the the cardinality of V is m and the connectivity of the graph is determined by the topology of M and the partition P m . More specifically, V = {1, 2.....m} and an element e = (v → w) ∈ E for v, w ∈ G and v = w ifB v ∩B w has non-zero d − 1-dimensional measure. The graph G is strongly connected, i.e., for any two vertices, v 0 , v T ∈ V there exists a directed path of n vertices, Moreover, this graph is also symmetric, that is, e = (v → w) ∈ E impliesē defined byē = (w → v) is also in E.
In order to apply the approximation procedure highlighted in [63] we express the continuity Eq. (36), as a bilinear control system,ẏ where A 0 (t) = ∇ · (g 0 (x, t) · ) for each t ∈ [0, 1],û i (t) = u i (·, t), y(t) = ρ(·, t), A i = ∇ · (g i (x) · ). Note that the right hand side of a bilinear system is traditionally expressed in the form A(t) + u(t)Bρ(t) in control theory literature [68]. The form in Eq. (37) is equivalent for systems on finite-dimensional state spaces, but not for general infinite dimensional bilinear systems ifû(t) is not a scalar for each t ∈ [t 0 , t f ]. For example, in the continuity equation, one can see that u(x, t)∇ · (ρ(x, t)) = ∇ · (u(x, t)ρ(x, t)) in general. Hence, the form Eq. (37) is more appropriate for expressing the system in Eq. (36).
In section 3, it was discussed how generators of semigroups corresponding to the continuity equation can be used to define a approximating semigroup on a graph generated by appropriately constructed transition rate matrices. This method can be generalized to the controlled continuity equation, Eq. (36). A natural extension is to consider approximations of the control operators A i using corresponding transition rate matrices, and analogously construct a controlled Markov chain on the space V. However, we note that typically for a controlled Markov chain, the control parameters are constrained to be non-negative. Hence, a direct approximation of A i using transition rate matrices and constrainingû i (t) to be positive would negate the possibility that one can flow both backward and forward along the control vector fields, which is critical for controllability of the system. Hence, to account for this in the approximation procedure, we define a bilinear control system equivalent to the one in Eq. (37), but with positivity constraints on the control: where ...N }. As in Section 3, for each of the operators A 0 , A s i , we construct the control operators on the graph, G, which are denoted by A 0 : [0, T ]×E → R + and A s i : E → R + . A 0 is the edge-based version of the generator constructed from the vector field g 0 (x, t) using the formula in Eq. (26) . For A s i , the corresponding transition rates are defined as.
where n vw is the unit normal vector pointing out of B v into B w at x. Let P(V) be the space of probability densities on the finite state space, V. Then using the above parameter definitions, we consider the following flows on the graph, G where µ(t, ·) ∈ P(V) for each t ∈ [0, T ], and U s i (t, ·) are the edge-dependent non-negative 'control' parameters that scale the transition rates, A s i (e). These controlled flows define a time-inhomogeneous continuous-time Markov chains on the finite state space, V. The evolution of the corresponding stochastic process X(t) ∈ V over an edge, e = (w → v) ∈ E, is defined by the conditional probabilities: This leads us to the approximating optimal transport problem on a graph, motivated by the formulation in Section 3.2:W such that Eq. (41) holds, and The convex formulation of the above problem is then given bỹ where A 0 (t, e)µ(t, w) −

e=(v→w)
A 0 (t, e)µ(t, v) This results in the following convex optimal transport problem, Remark 4.2. We note that the Eq. (27) discussed in Section 3.2 can be seen as the special case of Eq. (46) with g 0 ≡ 0 and g i =î (the ith unit vector). Hence, our formulation generalizes optimal transport on graphs from a single-integrator system to general nonlinear control-affine systems. While a rigorous proof of convergence ofW as defined in Eq. (43) or Eq. (47) to W 2 is not provided here, the connection to Eq. (27) provides a heuristic argument in this direction. As discussed in Section 3.2, Ref. [66] provides such a convergence proof for an advection equation on graphs. The advection is modeled using anti-symmetric discrete 'momentum vector fields' V on the edges, and the optimal transport problem minimizes a discrete action. For the driftless case, Eq. (46) satisifies those conditions due to the way the transition matrices A i (e) (which give edge-weights) are defined, and our definition ofW agrees with the one given in Ref. [66]. We also note in general when A 0 = 0, the solution of the optimization problemW does not necessarily define a metric on P(V) due to the asymmetry that is possibly induced by the drift vector fields.

Controllability of Graphs Flows
In this section, we establish that the controlled Markov chain approximations Eq. (41) preserve the controllability properties of the system in Eq. (34). This will ensure the well-posedness of the graph optimal transport problem, Eq. (43). Without loss of generality, we consider the case when t 0 = 0 & t f = 1. First, we recall a few standard notions from geometric control theory [69].     ∈ (0, 1) be such that f (s) ∈ ∂B i and there exists c > 0 small enough such that f (s − δ) ∈ B v and f (s + δ) ∈ B w for all δ ∈ (0, c). Then, clearly n vw · g r (x) = 0 for x = f (s) and some r ∈ {1, 2....n} assuming γ and are chosen appropriately (i.e. avoiding crossings of γ and f over corners of B v and B w ). From continuity of the vector field g r , there exists a small enough neighborhood, N x of x such that n vw · g r (y) = 0 for all y ∈ N x . Hence, this implies A r (e) = 0 for e = v → w. Due to continuity of the vector field g r at x, it also follows that A r (e) = A r (ē). Hence, the connectivity of the graph G c follows. Case 2 just follows from the assumption Remark 4.9. The main obstruction in extending the above result for underactuated systems (span g i (x) : i ∈ {1, 2....n} = T x M for some x ∈ M ) with drift, i.e. g 0 ≡ 0, is that usual tests for small-time local controllability of control systems with drift [71] require the initial condition to be an equilibrium point. Hence, starting at a non-equilibrium initial condition one might need to make large excursions (in our case, possibly outside the domain M ) in order to return to the initial condition. Take for example, the simplest control-affine system with drift, the double integrator:ẍ = u. Hence, given an initial and target density, the optimal transport problem on a bounded domain might not admit a solution for a system with drift if M is not taken to be large enough.
We note the following result to ensure well-posedness of problem given in Eq. (43).
Proof. The graph G c is connected. Since G 0 ⊆ G, we can chooseŨ s i (t, ·) such that the right hand side in Eq. (41) is equal to 0 for all t ∈ [0, 1]. Then, from the previous theorem, it follows that there exists a control U s i (t, ·), of the form U s i (t, ·) =Û s i (t, ·) +Ũ s i (t, ·) such that Eq. (41) satisfies µ(0, ·) = µ 0 and µ(1, ·) = µ 1 . Here, the parameterŨ i (t, ·) negates the effect of the drift field A 0 andÛ s i (t, ·) ensures the density µ 0 is transported to µ 1 as in Thm 4.10.

Construction of Approximate Feedback Control Laws
Given the solution the optimal transport problem on the graph, we need to reconstruct the corresponding approximate feedback control laws {u i (x, t)}. The feedback control laws for the control system in Eq. (34) are given by Here, N s i (v) refers to the the neighboring nodes of v in the graph G s i for each s ∈ {+, −} and i ∈ {1, 2....n}.

Numerical Implementation
Following Refs. [72,67], we use the staggered discretization scheme for pseudo-time discretization. We define where t j = (j/k)t f , j ∈ [0, 1, 2, . . . , k] is the time discretization into k intervals. We take t 0 = 0. Here J s i,j (e) represents the s ∈ {+, −} flow due to g i (x) over edge e = (v → w), from vertex v at time t j−1 to vertex w at time t j .
Hence, the optimization problem given in Eqs. (44) can discretized as, subject to the following constraints: Here ∆t = t f k . The cost function given by Eq. (55) is again of the form 'quadratic over linear', and the advection (Eq. (56)) imposes linear constraints. Hence the discretized problem is convex, and can be solved using many off-the-shelf convex solvers. The optimization problem is solved via CVX [73] modeling platform, an open-source software for converting convex optimization problems into usable format for various solvers. We use the SCS [74] solver, a first-order solver for large size convex optimization problems. This solver uses the Alternating Direction Method of Multipliers (ADMM) [75] to enable quick solution of very large convex optimization problems, with moderate accuracy.
The variables to be solved for in the optimization problem Eqs. (55)(56)(57) are vertex based quantities µ j , and edge based quantities J s i,j . The size of the optimization problem can be quantified in terms of number of time-discretization steps k, number of vertices |V | = N , and the number of edges |E|. The graph G c is always sparse, since a typical vertex is at most connected to 2(n + 1)d neighbors, and N n, N d. Hence, the variables in the optimization problem scale as O(k(N + |E|)) = O(n · d · k · N ).

Optimal Transport in the Grushin Plane
We first apply our framework to a non-holonomic control-affine system in which certain optimal transport solutions can be found analytically. We consider transport of measure in the Grushin plane, which is a subriemannian space, with base space R 2 . In Ref. [58], the structure of optimal controls in this problem was analyzed. Using this structure, optimal transport to a delta measure at (0, 0) was computed. The system is described byẋ The optimal control cost c(x, y) between initial and final states, x = (x 1 , x 2 ) , y = (y 1 , y 2 ) , is taken to be square of the subriemannian distance d(x, y) = inf C y x 1 0 u 2 1 + u 2 2 dt. Hence, the optimal control solutions are also geodesics in the subriemannian space. The solutions of optimal control problem are integral curves of the Hamiltonian H given by Here p 1 , p 2 are the co-state variables. Note that since H is independent of x 2 , H can be reduced to a Hamiltonian in (x 1 , p 1 ), and the integral curves of H can be obtained using quadratures. The geodesics reaching (0, α) at t = 1 are of the form A geodesic between a specified initial point (x 1 ,x 2 ), and (0, α) can be obtained by inverting the Eqs. (60-61) at t = 0 to solve for (a, b). For t ≤ π b , these geodesics are also global minimizers of the optimal control problem. Figure 2(a) shows some geodesics to the origin. Now consider the optimal transport problem with c(x, y) = d 2 from an initial measure µ 0 to final measure µ 1 = δ (0,0) . Clearly, the optimal map T is x → (0, 0), and the displacement interpolation is given by the geodesics between each x ∈ supp(µ 0 ) and (0, 0). Figure  2(b) shows the displacement interpolation of T for a particular case.  solution between a measure whose support is the disk Ω = {(x, y)|x 2 + (y − .8) 2 < .15 2 }, and measure at the box containing the origin.
We apply the our computational framework algorithm developed in Section 4 to compute optimal transport for a similar case as described above. We divide the X = [−1, 1] × [−1, 1] into m = 100 2 boxes, and form the corresponding graph G. The resulting solution is shown in Figure 3(a)-(e). The convergence of optimal transport costW with k and m is shown in Fig 3(f). It can be seen that the computed solution closely follows the analytical solution shown in Fig. 2. Next, we perform particle simulations with controls extracted from the optimal transport computation. We take p = 4 particles per box, and use Eq. (52) to get state-dependent control commands for each particle. The results are shown in Figure 4. About 95% of the particles get transported according the optimal transport solution shown in Fig 3, while the rest are dispersed. Note that the control laws obtained from optimal transport solution does not automatically guarantee feedback stabilization of individual particles.

Optimal Transport in time-periodic Double-Gyre system
Now we consider a measure transport problem for the time-periodic double-gyre system [76]. This chaotic dynamical system has been analyzed using several computational tools related to transport and mixing [77,78,28,29,48].
The controlled equations we consider as follows, x = −πA sin(πf (x, t)) cos(πy) + u 1 , (62a) where f (x, t) = β sin(ωt)x 2 + (1 − 2β sin(ωt))x is the time-periodic forcing in the system. The phase space is X = [0 2] × [0 1]. We first describe the dynamics of the uncontrolled (u 1 = u 2 = 0) system. For the trivial case of β = 0 (i.e. no time-dependent forcing), the phase space is divided into two invariant sets, i.e., the left and right halves of the rectangular phase space ('gyres'), by a heteroclinic connection between fixed points x 1 = (1, 1) and x 2 = (0, 1). For non-zero β, the Poincare map F of the system, obtained by integrating the dynamics over one time period τ of f , describes an autonomous discrete-time system. The heteroclinic connection is broken in this case, and results in a heteroclinic tangle. This heteroclinic tangle leads to transport between left and right sides via lobe-dynamics.
We choose parameters A = 0.25, β = 0.25, ω = 2π, such that the time-period of the flow is τ = 1. To get insight into the phase space transport due to heteroclinic tangles, the theory of lobe dynamics [76] is useful. Lobe dynamics techniques allows one to quantify the transport between sets separated by invariant manifolds, and their transversal intersections. In figure 5, the unstable manifold of x 1 ≈ (0.919, 1), U x1 , and the stable manifold of x 2 ≈ (1.081, 0), S x2 are shown in green and white respectively. The lobe labeled 'A', its pre-image F −1 (A) and image F (A) are also shown. Consider the segment L = U x1 (x 1 → F (P 1 )) ∪ S x2 (F (P 1 ) → x 2 ). Then L divides phase space X into two regions. The points that get mapped from left to right region in one iteration of F are precisely those in set A. Hence, the amount of mass transport from left to right side of L is m(A). While our algorithm for optimal transport can be applied between any arbitrary pair of measures, it is instructive to choose a pair of measures which are somehow 'distinguished' for the given dynamical system. Selecting a pair of invariant measures is an obvious choice, however the chaotic double-gyre system does not seem to possess any invariant measures except the uniform measure. However, there do exist almost-invariant sets (and associated measures with support on those sets). A set S is called almost-invariant [28] if Invariant and almost-invariant sets in this system can be identified by the sign structure of the second eigenvector of the reversibilized transfer operator where P is the transfer operator corresponding to F andP is the transfer operator for the reverse-time dynamics. In Figure 5, two almost-invariant sets, A 1 and A 2 are also shown. We choose our initial and final measures to be uniform measures supported on A 1 and A 2 , respectively. Both measures are normalized to sum to unity. We solve the the optimal transport problem for m = 60 × 30 for different time horizons t f , with k = 40. Our simulations are repeated with finer grid-sizes m = 100 × 50 to verify that our results are nearly independent of m. In Fig. 6, we show the optimal transport sequence for t f = τ = 1. In other words, the transport is constrained to be completed in one time-period of the uncontrolled flow. Due to the short time-horizon, all of the mass is pushed across the invariant manifolds separating the the two gyres. In Fig. 7, the optimal transport sequence for the more interesting case with t f = 10 is shown. The transport process in this case is completely different than the t f = 1 case. We observe that the transport occurs in a 'quantized' manner, i.e. packets of mass are transported, one at a time, via lobe-dynamics from left gyre to the right gyre. The number of such packets exactly equal the number of time-periods in the time-horizon of the transport problem, i.e. n = 10 in this case. Hence, while the global transport is being done by the natural dynamics via lobes, the role of control is to gather the mass in pre-images of the those lobes. For instance, in Figs 7((b)-(e)), the transport of one such packet via the sequence The mechanism is essentially the same for other time-horizons that we analyzed, t f = 2, 5, 8, 12.5 & 15. The increasing use of 'natural' chaotic lobe-dynamics of the uncontrolled system during optimal transport should reflect in the optimal transport cost. This cost,W , decreases rapidly as the time-horizon t f is increased, as shown in Fig. 8. Hence, the optimal transport discovers the efficient paths in this chaotic system, where 'going with the flow' is the best option. (a) t = 0

Optimal Transport for unicycle model
Finally, we consider optimal transport in a three-dimensional non-holonomic system, called the 'unicycle' model. This system is a toy model for vehicle kinematics, and is used extensively in vehicle path planning and control [79,80]. The states are cartesian coordinates (x, y) ∈ R 2 , and orientation θ ∈ S 1 of the unicycle. The system equations are given by,ẋ = u 2 cos θ, (63a) where u 1 is the steering speed, and u 2 is the translation speed. The optimal control problem has been studied for various cost functions, and endpoint conditions [81,82,83]. The techniques from geometric mechanics, specifically Lie-Poisson reduction [69], have been successfully used to reduce the optimal control problem to a three-dimensional non-canonical Hamiltonian system. For this three-dimensional system, two conserved quantities can be found, and hence, the optimal controls (u 1 (t), u 2 (t)) can be solved explicitly in terms of Jacobi elliptic functions.
To study the optimal transport problem for the unicycle model, take the control cost to be quadratic, i.e. d(z 1 , z 2 ) = inf C z 2 z 1 1 0 u 2 1 + u 2 2 dt. We compute optimal transport solutions for two scenarios. In the first case, µ 0 is chosen to be a measure supported on box containing (0, 0.5, 0), and µ 1 is chosen to be uniform measure supported on the union of boxes containing (1, 0, 0) and (1, 1, 0). In the second case, µ 0 is chosen to be a measure supported on box containing (0, 0.5, 0), and µ 1 is chosen to be a uniform measure supported on the union of boxes containing (1, 0, π 2 ) and (1, 1, 3π 2 ). The initial and final measures for the two cases are depicted in Fig 9. The optimal transport solution for the first case is shown in Fig. 10. Since the final orientation is prescribed to be along the x−axis, this leads to a splitting of the measure half-way in the transport, and steering of the two halves horizontally to their final positions. The optimal transport solution for the second case is shown in Fig. 11. The solution in this case is qualitatively different from the first case. The two halves split and then move vertically towards final positions.

Conclusions and Future Directions
We have developed a set-oriented graph-based framework for continuous-time optimal transport over nonlinear dynamical systems. We interpret the standard continuous-time optimal transport problem on graph as the graph-analogue of the optimal transport problem of single-integrator control system, and generalize that framework for general nonlinear control-affine control systems. We use set-oriented methods, especially the infinitesimal generator approach, to approximate transition rates in the generalized framework. The well-posedness of this problem is related to the graph connectivity, and is proved to be a consequence of controllability of the control system. Hence, our work connects set-oriented operator-theoretic methods in dynamical systems with optimal mass transportation theory, and also opens up new directions in design of efficient feedback control strategies for nonlinear multi-agent and swarm systems operating in nonlinear ambient flow fields. Application of our set-oriented framework to larger domains, longer time-horizons and/or higher dimensional systems will require improvement in computational efficiency. Using efficient phase space discretization techniques, such as those employed in GAIO [23], one can hope to improve the efficiency of the resulting optimal transport algorithms, and apply the framework to higher dimensional dynamical systems. Graph pruning algorithms can be employed to remove edges which are not likely to be used [84].
Solutions to the optimal transport problem in the double-gyre system elucidate the role played by invariant manifolds, lobe-dynamics and almost-invariant sets in efficient transport of phase-space distributions. While it is known that invariant manifolds and lobes act as low-energy 'channels' in the phase-space, our results give new qualitative and quantitative information about their role in problems of transport of distributions or swarms of agents. Application of this framework to more complicated arbitrary time-varying flows should provide similar insights into the role of Lagrangian coherent structures and coherent sets. This can lead to development of efficient swarm planning and control strategies for realistic applications in ocean and air-borne systems. Moreover, using our framework, the relative importance of such objects can be studied for different types of controls.
Furthermore, recent methods in obtaining Lagrangian coherent structures and coherent sets in finite-time non-autonomous systems have used variational formulations of transport under nonlinear dynamics [4] or dynamic versions of the Laplacian [85]. It would be fruitful to develop connections of these formulations with optimal mass transportation theory, extending the connections already identified in the Hamiltonian dynamics case [51]. For instance, one could define a controlled version of almost-invariant sets or coherent sets, by defining a control dynamic Laplacian, analogous to the control infinitesimal-generators as developed in the current work, or control Lyapunov measures developed in Ref. [44].
Connections with work in the closely related area of occupation measures [47] and Lyapunov measures [44] also need to be explored, especially in context of obtaining feedback control laws from the control laws obtained from optimal transport solutions. The feedback control laws constructed as solutions to the optimal transport problem guide the measure along shortest paths corresponding to solutions of the corresponding sub-Riemannian problem. Hence, it needs to be explored in what sense these laws can be used for feedback stabilization of an individual control-affine system. Moreover, the numerical approach in the current paper could also be adapted to construct time-independent feedback control laws for such systems.

Acknowledgements
We thank Matthias Kawski, Uroš Kalabic and Udit Halder for helpful discussions on geometric control theory.