From mean field games to the best reply strategy in a stochastic framework

This paper builds on the work of Degond, Herty and Liu by considering N-player stochastic differential games. The control corresponding to a Nash equilibrium of such a game is approximated through model predictive control (MPC) techniques. In the case of a linear quadratic running-cost, considered here, the MPC method is shown to approximate the solution to the control problem by the best reply strategy (BRS) for the running cost. We then compare the MPC approach when taking the mean field limit with the popular mean field game (MFG) strategy. We find that our MPC approach reduces the two coupled PDEs to a single PDE, greatly increasing the simplicity and tractability of the original problem. We give two examples of applications of this approach to previous literature and conclude with future perspectives for this research.


Introduction
Mean field game (MFG) models were first proposed by  and simultaneously by Huang,. Their research follows from the previous work of Aumann [6] and related researchers [38,41] on systems with a continuum of agents. In the last decade, the field has grown considerably with research taking many different directionsfrom applications [30,32,34], to existence, uniqueness and regularity [7,12,15], and to numerical analysis [1][2][3]. The aim of MFG models is to describe how populations of agents evolve over time due to their strategic interactions. The trajectories of agents are determined through the minimisation of a cost functional over long time horizons. This process of optimisation implicitly assumes agents consider the future evolution of the population for long time periods and are able to continuously change their control.
For a large number of applications, such as firm behaviour, traffic and pedestrian dynamics, and other human-based optimisation processes, this type of decision-making strategy seems to be different to reality. It would be more likely that in such applications, agents fix their control over a short period of time, evolve their position and then update the control. In recent years, another model of agent interaction has been developed -that of the 'best reply strategy' (BRS). The BRS was used in [19] to describe agents whose strategies evolved on a faster time scale than their social configuration. It was applied, in [17] and [18] respectively, to the evolution of wealth in conservative and non-conservative economies. In a later paper by Degond,Herty and Liu [16], it was shown that the BRS could be related to a rescaled mean field game model in the case of deterministic dynamics. 1 This relationship was described by a discretisation of the MFG, using a method of control known as model predictive control (MPC) or receding horizon control. The MPC method is detailed in [24] and the references therein. In this paper we extend the work in [16] to include idiosyncratic noise in individual dynamics and a terminal cost in the optimisation functional. In a similar manner to [16], figure 1 can explain how we relate the BRS to MFGs via the MPC approach. The description of the dynamics begins with an N -player stochastic differential game (top left box in figure 1). We then have a choice of either using MPC to take us to the BRS for the N -player games (top right box) or we can take the number players to infinity to obtain the MFG (bottom left box). The two methods then converge to the mean field BRS (bottom right box) either by using MPC from the MFG or taking the limit as the number of players go to infinity from the BRS for the N -player game.
This paper is laid out as follows. In Section 2 we describe the setup of the N -player stochastic differential game; we use two methods to explain how the MPC approach results in the BRS. We then take a mean field limit of the controlled N -player dynamics to obtain the mean field BRS. In Section 3 we develop the MFG related to the N -player stochastic differential game by taking the number of players to infinity. We then show how the MPC approach can be used to recover the mean field BRS from the MFG. In Section 4 we take a number of examples from the MFG and BRS literature and redesign them in the paradigm of this paper as examples for both how this approach might simplify numerical calculations and how the current BRS literature can be considered under the MFG paradigm. In Section 5 we summarise our results and explain future directions for research, both in terms of applications and theory. Finally, in the Appendix we define and discuss various notions of differentiability in the space of measures. 2 Model predictive control of large interacting systems and the best reply strategy 2.1 An N-player stochastic differential game Consider N players labelled by i = 1, . . . , N . Each player has a state X i ∈ R d , which they are controlling over a time horizon of [0, T ]. Throughout the entire paper, N denotes the number of players in the game, d denotes the dimension of the state space, i, j ∈ {1, . . . N } denotes the ith or jth player, and finally k ∈ {1, . . . d} denotes the kth component of a player's state. We denote by X j,k ∈ R the kth component of player j's state, by X j = (X j,k ) k ∈ R d player j's state, by X = (X j ) j = ((X j,k ) k ) j ∈ R N d the total state of the system and by X −i = (X j ) j =i ∈ R (N −1)d the state of the system without player i. This convention will be similarly used for other variables or functions, unless otherwise stated.
We assume that each player's dynamics are influenced by the state of the entire system. This influence is interpreted as a function f We also assume that each player can control their dynamics with a control u i : [0, T ] → R d . Finally, we include a randomness to the dynamics depending on time and a player's own position, this is given by the function Here, B i (t) are independent d-dimensional Wiener processes with t ∈ [0, T ] and the initial condition, X 0 i , are given iid random variables for all i = 1, . . . , N . We assume that each player wants to minimise its own objective functional.
Definition 2.1. Given a set of admissable controls A, from which players choose their strategies, we define the objective functional, J i : A N → R for player i as the cost of player i's trajectory The objective functional's constituent parts are: the running cost of being in position x, given by h ; the terminal cost of ending up in position x at the end of the control horizon, given by g (N ) i (x); and the running cost αi(s) 2 |u i (s)| 2 which is used to penalise the size of the control function. Therefore, each player is choosing a strategy to minimise this objective functional, as in [37], this corresponds to a Nash equilibrium where no player can reduce their cost any further by changing their strategy only. We denote the optimal strategy for player i by u * i . It is then given by the following minimisation problem We assume agents can choose controls from a certain set A of admissable controls. This set usually consists of progressively measurable controls with constraints on their smoothness and integrability. In the finite-player case, we assume that we are using closed-loop controls i.e. that for each i there is a deterministic function For a discussion on different sets of admissable controls as well as requirements on the various functions f [20]. As an example of such results, we can ensure existence of solutions to the SDE/optimisation problem (2.1) and (2.3) if the following hold: is Lipschitz, locally bounded and continuously differentiable and and f (N ) i (X(t)) is L 2 bounded in time for any control u i .
• σ is Lipschitz, locally bounded and continuously differentiable in x and σ(t, X i (t)) is L 2 bounded in time for any control u i .
• g (N ) i is locally bounded, continuously differentiable with a derivative that has at most linear growth and convex.
• h (N ) i is locally bounded and continuously differentiable with a derivative that has at most linear growth These requirements guarantee the convexity of each optimisation problem and the existence of solutions to the SDE when using the optimal strategy.
We also introduce the following definition of player i's value functional. This is closely related to the objective functional (2.2) and is used in the description of the optimal strategies Definition 2.2. We define the value functional, for player i as the cost of player i's trajectory from time t to T , with agents starting at position x, using their optimal controls. i.e.
Remark. We are interested in the approximation of controls over each small time period of size ∆t, hence why it is important that in each term the approximation is correct up to order O(∆t). From a modelling perspective this would be appropriate in situations where the anticipation of agents is low relative to the length of the time horizon. Of course this will result in a sub-optimal control as the sum of the errors results in an error of order O(1), therefore we cannot necessarily expect the resulting BRS control to approximate the Nash equilibrium control.
As a result, we may restrict ourselves to the case of considering the dynamics (2.1) and control problem (2.3) on the time horizon [t, t + ∆t]. In order for the cost (2.5) to make sense over such short time horizons, we scale it by 1 ∆t . Therefore, under our paradigm, agents are optimising the following expectation over u i : Ω → R d , where Ω is the underlying probability space of the SDE (2.1).
Where X(t) = x and players are using controls u.

Method 1:
Using a Riemann sum, specifically the end point quadrature rule, to approximate the integral (2.9) up to order O(∆t), we get (2.10) Using Itô's formula for h (N ) i at time t and the notation D xj := (∂ x j,k ) k , D 2 xj := (∂ x j,k x j,l ) k,l gives (2.11) and similarly for the Itô expansion of g (N ) i (X(t + ∆t)). We then use an Euler-Maruyama discretisation, a simple extension of the Euler discretisation of an ODE to the setting of SDEs (for more information see [33]), of the dynamics (2.1) on (t, t + ∆t) with an initial valueX = X(t). We take ∆B i (t) = B i (t + ∆t) − B i (t), then take the expectation to get the following weak order O(∆t) approximation of u * i . (2.12) Notice that only the final term in (2.12) depends on u i , so this can be simplified to (2.14) Now, suppose the function u * (ω) satisfies F(ω, u * (ω)) = min u∈R F(ω, u) for all ω. Then for any other process u(ω), we necessarily have F(ω, u * (ω)) ≤ F(ω, u(ω)). (2.15) Integrating over Ω and taking the minimum with respect to u(ω) gives min u:Ω→R Thus, applying this reasoning to (2.13), it is clear that the expectation in (2.13) will be minimised if for every ω ∈ Ω the following expression is minimised Now, we fix ω ∈ Ω so that u * i = u * i (ω) ∈ R is some constant in R to be found. We can use first order conditions to find u * i . Approximating α i (t + ∆t) by a Taylor expansion of order O(∆t) we get If we were to take the limit ∆t → 0, we would get u )(X(t)), however in many situations we may take α i as constant, so this control would make no sense. To rectify this problem we have to rescale α i by ∆t, redefining V ∆t If we do this and go through the same process as above we get This is known as the BRS, as described in [17][18][19] and given by Definition 2.1. In simulations, the dynamics given by substituting (2.20) into the discretised version of (2.1) would be calculated and the new state X(t + ∆t) would be used to repeat the process. If we now let ∆t → 0, we get the following dynamics (2.21)

Method 2:
This method uses the well know fact (see [23] or [40] for example) that V i (t, x) solves the following HJB equation for every i = 1, . . . , N .
Using first order conditions, we find that for every i = 1, . . . , N Substituting the above into (2.22), we obtain the following HJB equation and optimal agent dynamics: (2.25) Note that we still have the same terminal condition for the PDE for V i and initial condition for the SDE for X i . Now, using the MPC approach, we actually want to consider V ∆t i rather than V i . We discretise the analogue of (2.22) for V ∆t i (which is found by replacing h i . This is a backward in time discretisation since we are given a terminal condition. This results in the following equation (2.26) , which returns us to (2.20) and (2.21). Thus we can conclude that the BRS can be derived from an MPC approach for stochastic differential games either through the discretisation of the value function or through the discretisation of the corresponding HJB equation.

Deriving the BRS dynamics for the limit N → ∞
We now look at the limiting behaviour of equation (2.21) as N → ∞. First, we shall make the following assumptions on the symmetry of f and α i , in order to pass to the limit as N → ∞ in a coherent manner. Similar such assumptions are made throughout MFG literature, c.f. [8,10,13,14,16,22,28]. The assumptions are: C In order to ensure we can use symmetry arguments later in this paper, we require α i (t) = α(t) for all i = 1, . . . , N .
Using assumptions A, B and C, equation (2.21) can be rewritten as We also require growth assumptions to apply results relating to the limit of this SDE as N → ∞ as well as for existence and uniqueness later. The assumptions made are Here, d 1 (m 1 , m 2 ) is the 1-Wasserstein distance, see Appendix A for further information on the Wasserstein distance and the space of probability measures. Under the assumptions D, E, F and G, trivial modifications to theorem 1.3 in [31] and theorem 1.4 in [42] show that for every i = 1, . . . , N , X where m t is the law of X t , B t is a d-dimensional Wiener process and X 0 is an m 0 -distributed independent random variable. Such behaviour, of particles in large systems behaving independently of one another, is known as the propagation of chaos. Together, [42] and [31] effectively cover existence and uniqueness of such SDEs. In fact, they have shown that for such SDEs strong existence and uniqueness holds under the assumptions on f , D x h, D x g and σ already made. It is often also of interest to understand how the entire population of players develops over time. To do this, we will describe the evolution of the distribution of the population by analysing the weak form of a pde for measures. For more information on differentiability in the space of measures, see Appendix A. (2.29) We can then take the expectation, and use the boundedness of φ together with the Lipschitz properties of the other functions to bring the expectation inside the integral via the dominated convergence theorem, so we have ( 2.30) Finally, we can use the definition of m s as the law of X s and evaluate the expectations. This gives the weak form of the following PDE that m t must satisfy

Relating the BRS to MFGs
The aim of this section is to demonstrate how the BRS is linked to MFGs and to begin demonstrating the extent of the simplification made in the BRS

Mean field limit of the HJB equation
We will now analyse the limiting behaviour of the HJB equation (2.24). To this end we require that assumptions A, B,C, D, E and F all hold. In order to describe the limiting case, we need to consider a function .

(3.2)
See Appendix A for more information on differentiation with respect to measures.
Proof. Let's assume W satisfies (3.1), then we can define Using the above identities along with assumptions A and B, it is clear that W (t, x i , X −i ) is a solution to (2.24). Therefore, we no longer have to concern ourselves with studying the N equations of (2.24) for i = 1, . . . , N . Instead we can look at the behaviour of (3.1), as in [16], in particular we can look at the limiting case of (3.1) as N → ∞. Note that since f , g and h only depend on the empirical distribution of Y , the solution, W , to (3.1) will also only depend on the empirical distribution of Y . Thus, there exists a function W : The partial derivatives of W , and hence V i , can also be seen as partial derivatives of W. We have We now have two remaining terms to consider, namely It is possible to modify Proposition 6.1 in [11] in order to rigorously express each of the terms through derivatives of the mean field limit W. Using this proposition, the following identities hold: In the above, ∂ 2 m W := D y δ δm (∂ m W), see [11] and Appendix A for further information on differentiability in the space of measures. Placing these identities into (3.4) -(3.5) and simplifying using the definition of m Y we get We are now in a position to take the limit N → ∞. First we assume that m Y * m as N → ∞.
In this case, it is a matter of computation to show, for any φ ∈ C ∞ c (R d ), there exists an > 0 such that Hence, m Y x k * m as N → ∞ as well. So, finally we find the mean field equation for W(t, x, m) to be (3.12)

Mean field limit of the player dynamics
Now that we have seen the mean field limit of the HJB equation, this needs to be married to the mean field limit of the player dynamics (2.1) with the optimal control, given by (2.23).
Here, B t is a d-dimensional Wiener process, m t is the law of X t and X 0 is an independent random variable with law L(X 0 ) = m 0 , where m 0 is the limit as the number of players goes to infinity of the law of X i,0 in the finite-agent case.
Remark. In a similar way to (2.31), it is possible to go from (3.13) to the following continuity equation for m t : (3.14) Proof. In light of the assumptions in Lemma 3.1, and following the previous section's definition of W(t, x, m), the dynamics of the N-player game can be reformulated as ( 3.15) Here, m N −i (t) = N j=1, j =i δ Xj (t) and for every i = 1, . . . , N , X i,0 is an i.i.d. random variable with disribution m 0 . Using the propagation of chaos, as in Section 2.1, and assumptions A-F, we can conclude the result.

The mean field equation
Now that we have, by (3.14) and (3.2), the mean field limits of the player dynamics and HJB equation, we want to find how the HJB equation changes along characteristics governed by the mean field player dynamics. The following theorem describes that. where m t is satisfies (3.14). Then w(t, x) satisfies the following mean field PDE (3.17) This system together with (3.14) then fully describe the mean field limit of the dynamics of the N -player stochastic differential game Remark. With this theorem, we have also described the system that corresponds to -Nash equilibria of the N -player game (2.1) (see e.g. [10] for how solutions to the mean field game correspond to Nash equilibria of the finite-player game).
Proof. Our first step is to describe ∂ t w(t, x). Note that Adding and subtracting W(t + h, x, m t ), we get Following the method of (25) in [11], we get (3.20) Using (3.14), with a test function φ(y) := δW δm (t + h, x, (1 − s)m t + sm t+h )(y), we see (3.20) leads to We now divide by h, take the limit h → 0 and, noting ∂ m W := D y δW δm , we get (3.22) We can therefore substitute this into (3.2) to obtain (3.17) So, being able to solve (3.14) and (3.17) gives a solution along characteristics for W. This corollary summarsises the result Corollary 3.4. Provided assumptions A -F and the assumptions in Lemma 3.1 hold, the mean field limit of the HJB equation governing the stochastic differential game (2.1), along with the evolution in time of the distribution of agents is given by

24)
with initial and terminal conditions:

The mean field BRS for the MFG system of equations
Having calculated the mean field equations in the previous section, we now use the MPC approach to obtain the mean field BRS from the MFG and find that it matches (2.31). Clearly, in terms of numerical calculations, it is much simpler to use the best reply strategy dynamics, (2.31) as an approximation to (3.23) and (3.24) rather than try to solve these two equations directly. This section will show that the BRS is indeed a simplification of (3.23) and (3.24) using MPC methods.
where X(s) solves (2.1) with controls and X(t) = x. Then the HJB equation associated with V ∆t i has a mean field limit the solution of which can be approximated up to an error of order O(∆t) by (3.30) The corresponding law of motion for the distribution of players in the mean field limit is therefore given (up to an error of order O(∆t)) by (3.31) Proof. First note that the HJB equation ssociated with V ∆t i has a mean field limit w ∆t , as defined by Theorem 3.3, and it satisfies (3.31). Following a method similar to Method 2 in Section 2.2, we assume we have a solution of w and m at time t and we are interested in understanding the solution at a time in the future t + ∆t, for small ∆t. Setting T = t + ∆t and discretising (3.23) backwards in time we get, up to an error of order O(∆t) (3.32) Then, using the terminal condition for w ∆t we get, up to an error of order O(∆t) (3.33) Substituting this into the mean field equation for m gives, up to an error of order O(∆t) (3.34) Remark. It is interesting to note that the mean field dynamics found here are the same as those for the controlled dynamics in Subsection 2.2. Thus we can conclude that the best reply strategy for the mean field stochastic differential game can be derived by either first applying the MPC method to the N-player game and then taking the mean field limit, or by first taking the mean field limit of the N-player game and then applying the MPC method.

Applications
In this section we will take some examples from the MFG and BRS literature and use the paradigm of this paper to compare the two approaches.

Wealth distribution driven by local Nash equilibria
This example is taken from [17]. Their model described the evolution of agents' wealth and economic configuration (which was noted in [17] as possibly being a diverse number of attributes, from social status to education level depending on the situation) in time as a response to trading between agents. The trading was assumed to depend on the difference in wealth between two agents that want to trade, and has its origins in [9], as well as later work by [21]. In this model, we assume d = 2, since we are considering agents are described firstly by their wealth and secondly by their economic configuration, so in our framework we have Here, Y i is agent i's economic configuration and Z i is their wealth. It is assumed that there is no debt in this model, hence Z i > 0 for all i. They are governed by the following system of equations Notice that the first equation is deterministic and can not be explicitly controlled, whereas the second equation has a control u i but no deterministic movement otherwise. v describes the speed at which an agent's economic configuration evolves. Now, we introduce the following notation, similar to notation at the beginning of Section 2.2, ) j =i , and similarly for Z, Z −i . The value functional is given by In [17] it is explained that φ is the trading interaction potential, i.e. it governs the amount of trading that occurs between any two agents based on their difference in wealth. It is also explained that ξ i,j (Y (s))Ψ(|Y i (s) − Y j (s)|) is the trading frequency, i.e. the rate at which trades or movement of wealth takes place between two agents, determined by how far apart the agents' economic configuration is. Several assumptions are made on each of the functions in 4.3. First, we assume that the function φ : R → R is C 2 and an even function. Second, we assume ξ ij = ξ ji and that it is dependent on the number of other agents in a neighbourhood of the economic configuration of agents i and j. As in [17], we assume ξ ij has the following form Since each agent can only control their Y i variable, the HJB equation is modified to (4.5) This is generally an extremely difficult equation to solve and so, although the optimal control is given by u , is a much more tractable and realistic suggestion for how wealth may really be moved.

The mean field limit
Under the framework of this paper, assumptions A and C are automatically fulfilled. It is also relatively straightforward to notice that , j =i δ xj . So, with a trivial modification that doesn't affect the convergence, assumption B is also satisfied. Similarly, we can ensure assumptions D, E and F are satisfied as long as we assume Ψ and v are both Lipschitz. So under these relatively weak assumptions we find that in the mean field limit, with individuals using the BRS, X i (t) → X t for every i = 1, . . . , N who's distribution evolves according to the following Fokker-Planck equation (4.7) Here . This is supplemented with various boundary conditions in [17] to close the PDE problem. Equation (4.7) is also an order O(∆t) approximation of the full mean field game below, where w ∆t is the mean field limit of V ∆t i , as described in section 3.3.
It is clear that either solving (4.7) numerically, or analytically showing that solutions do exist is a much simpler problem than solving the full mean field equations related to the fully optimal solution.

Congestion and aversion in pedestrian crowds
This second example has been taken from [34]. In the paper, the authors begin with an overview of the different methods for modelling traffic and pedestrian dynamics, followed by a description of how mean field games may be used as a bridge from microscopic traffic models to macroscopic. The paper then continues by describing an MFG model of pedestrian traffic. This model is perfectly suited to adapting to a BRS approach, firstly because the cost function implemented by [34] can be adapted to the approach taken in this paper, and secondly because it is natural to imagine that individuals in a crowd don't optimise their own behaviour based on the long-term future behaviour of other individuals around them, as described by the complex MFG framework.
Rather, an assumption that individuals look at the flow around them at an almost instantaneous moment in time and change their behaviour accordingly seems to fit more naturally to our lived experience and is best described through the BRS framework.
The paper [34] considers two populations of groups, their analysis begins with assuming the mean field limit has been taken and that in this limit, the distribution of each group is absolutely continuous with respect to the Lebesgue measure. We can modify our analysis from Section 2.1 to accomodate these ideas. We begin by considering two populations of individuals, with distribution functions m 1 (t, x) and m 2 (t, x) respectively. The respective positions in space R d of a representative particle are given by Y (t) and Z(t). They move according to the following SDE (4.11) In (4.10)-(4.11), α and β are the controls of the populations, σ ∈ diag(R d ) is a diagonal positive matrix, and B 1 and B 2 are independent d-dimensional Brownian motions. As in [34], we focus on the two populations interacting on some domain Ω ⊂ R 2 . The cost function being optimised by each representative player is given by , m 2 (s), m 1 (s)) ds + Ψ 2 (Z(T )) (4.13) (Y (t), Z(t)) ∼ (m 1 (t), m 2 (t)). (4.14) Using the formulation of Φ in [34], it can be consistently defined to match the framework of this paper by Here, [34] described λ as the 'xenophobia parameter', that is it measures how averse each group is to one another. If λ is high then the two groups will separate as much as possible, whereas if λ is low, the groups will be as or more worried about their distance between individuals in the same group than those in the opposite group. Equations (4.10) -(4.14) are formulated in the following mean field game system in [34] Under the paradigm of this paper, we consider that in fact in each interval [t, t + ∆t] agents are minimising the following cost with respect to a fixed (in time) control random variable.
Using the best reply strategy approach, we are able to simplify this system to the following two equations, which describe the evolution of the two populations.
∂ t m 1 = 1 2 Tr σ 2 D 2 m 1 + D · ((Dm 1 + λDm 2 + 1 T DΨ 1 )m 1 ) (4.23) This section has clearly shown some of the potential benefits of using the BRS to replace MFGs in certain situations, of course when exactly this is appropriate requires further investigation. However, it is intuitive from the formulation of the BRS that in situations where short time horizons are considered and agents are unable to optimise their behaviour efficiently then there is a case for using the BRS.

Conclusion and future perspectives
In conclusion, we have shown that the BRS, a sub-optimal strategy for players in a stochastic differential game, can be derived from the optimal strategy as an asymptotic limit of a revised cost functional. The BRS is an important alternative strategy to the MFG to consider when the time horizon of the optimisation problem is small because it depends only on the running and terminal costs. As a result there is no HJB equation to solve, and since the HJB equation is often intractable the BRS offers a more tractable modelling approach and at reduced computational effort. We then showed how, under certain conditions, the BRS can produce a mean field limit as the number of players tends to infinity. To close our analysis, we proved that the mean field game equations first introduced by Lasry and Lions [37], which are the mean field limit of the stochastic differential game, can also be approximated by the BRS. We concluded that regardless of whether we approximate the MFG by the mean field BRS, or approximate the N -player stochastic differential game by the BRS first and then take the mean field limit, the resulting dynamics of the distribution of players is the same.
In the final section we were able to analyse two examples from existing literature. In the first, the BRS was already used as the dynamics for the mean field behaviour, so we can now justify this use by explaining that the agents involved in the behaviour approximately minimise a related cost.
In the second example, we show how a mean field game for congestion could be approximated by using the BRS. This simplified the behaviour considerably and could allow us to computationally model the behaviour more efficiently. We have a number of future directions.
Throughout the paper we have had to renormalise the optimisation problem to obtain the BRS as an approximation to a solution to the game. We have not claimed that the resulting BRS will now approximate the MFG solution for the original optimisation problem. In fact, one can imagine situations where the BRS will be qualitatively similar to the MFG and situations where they won't. We hope to explore more direct comparisons between MFG and BRS dynamics in future work.
With this distance defined, P p (E) is a metric space. Definitions A.1 and A.2 can be equivalently reformulated in terms of random variables as follows. We now let (Ω, F, P) be an atomless probability space.
Remark. Definitions A.1 and A.2 can be restated in a complementary form using random variables. For example P p (E) can be defined as the set of µ ∈ P(E) such that for any y ∈ E and any random variable X : Similarly, W p (µ, ν) can be defined as The reason for defining P p (E) and W p in this alternative way is that it is often easier to work with random variables than it is to work with measures directly. It is important to note that since E is a Polish space (complete and separable) and Ω is atomless, it is always possible to find such an X with law L(X) = µ. Now that we have put a metric space structure onto the set of probability measures, the next step is to define differentiability of functions with respect to measure. There are several overlapping, but not equivalent, definitions of differentiability in the space of measures. In this appendix we will discuss L-differentiability, as defined in [20], and the notion of a functional derivative described in [11]. There is also another definition of differentiability, defined in [5], which is in some sense a more intrinsic definition of differentiability however it is less useful to us here and so will be ignored in this appendix. For this section of the appendix, as with the main body of the paper, we will restrict our analysis to focussing on P 2 (R d ).
We will focus on functions u : P 2 (R d ) → R and consider their liftũ : L 2 ((Ω, F, P); R d ) → R defined byũ(X) = u(L(X)). As previously discussed, it is always possible to find such a random variable X given a measure µ.
Definition A.3. Let u andũ be as defined above. u is (continuously) L-differentiable at µ ∈ P 2 (R d ) if there exists a random variable X ∈ L 2 ((Ω, F, P); R d ) such thatũ is differentiable in the usual Fréchet sense at X (or continuously differentiable in an open neighbourhood of X in the case of continuously L-differentiable). Note that in this case, we consider Dũ(X) ∈ L 2 ((Ω, F, P); R d ) by associating this space with its dual.
It is not clear straight away that the above definition of differentiability is independent of the choice of X, however the next two propositions (to be found, along with proofs, in [20]) show that this is indeed the case and, under certain circumstances the derivative can be uniquely described by the measure µ.
The importance of these two propositions, as explained in [20], are as follows. Proposition A.4 means that differentiability with respect to a measure µ depends only on µ and not on the particular random variable chosen to represent the derivative. Proposition A.5 states that if there is some further regularity in the differentiability, then not only is the L-derivative independent of the random variable, it is of the form ξ(X) for some deterministic function ξ which is uniquely defined almost everywhere. Due to this uniqueness property, we can then define the L-derivative of u as follows Definition A.6. Let u andũ be as defined previously. Suppose u is continuously L-differentiable and ξ is as in Proposition A.5. Then, the L-derivative of u at µ ∈ P 2 (R d ), denoted by ∂ m u(µ) is defined as the equivalence class of ξ in L 2 ((R d , µ); R d ). This is defined uniquely since ξ is defined uniquely almost everywhere with respect to µ.
Note that since ∂ m u(µ) is an equivalence class of functions from R d to R d , it can be identified with a function ∂ m u(µ)(·) : R d → R d without ambiguity. We shall often consider ∂ m u(µ) in such a way without explicit reference to this form. As mentioned near the beginning of this appendix, the notion of the functional derivative of a function with respect to a measure will also be a widely used notion for us in this paper. We will now define what this notion is and link it to the previous definition of the L-derivative. The following definition is attributed to [11].
Definition A.7. Let u : P 2 (R d ) → R d . We call u a C 1 function if there exists some function δu δm : P 2 (R d ) × R d → R d such that for all µ, ν ∈ P 2 (R d ) the following holds. Here, δu δm (µ) is defined up to a constant, so the normalisation condition R d δu δm (µ)(y)µ(dy) = 0 is taken. The requirement (A.1) can easily be seen as equivalent to the following requirement, which is the requirement used in Section 3.3. These two notions of derivative have a simple relationship to each other, as explained in Propositions A.8 and A.9 below (see [11] and [20] respectively for original statements and proofs).
Proposition A.8. Let u : P 2 (R d ) → R be C 1 . Assume further that the function δu δm (µ)(·) : R d → R d is continuously differentiable for any µ ∈ R d . Then u is L-differentiable and we have Proposition A.9. Let u : P 2 (R d ) → R be L-differentiable. Assume further that the Fréchet derivative ofũ is Lipschitz and that for all µ ∈ P 2 (R d ) there is a representative ∂ m u(µ)(·) such that ∂ m u : P 2 (R d ) × R d → R d is continuous. Then u is C 1 and satisfies (A.2) In Section 3 and in particular Section 3.3, we rely heavily on this functional derivative notion, and implicitly use Propositions A.8 and A.9 to interchange the definitions.