A Game-Theoretic Framework for Autonomous Vehicles Velocity Control: Bridging Microscopic Differential Games and Macroscopic Mean Field Games

A transportation system with fully autonomous vehicles can be modeled as a multi-agent system, where autonomous vehicles interact with one another through coupled optimal driving strategies. However, the existing literature on autonomous vehicle longitudinal control suffers from scalability issues. In other words, it is challenging to deploy these control algorithms to a large number of autonomous vehicles. This paper aims to tackle such a challenge by employing mean field approximation and deriving a macroscopic game-theoretic framework for autonomous vehicles traffic flow from microscopic velocity control. The developed game is"mean field game (MFG)", which is essentially the limiting differential game with an infinite number of agents. It is a micro-macro model which allows one to define individuals on a microscopic level as rational utility-optimizing agents while translating rich microscopic behaviors to macroscopic models. Despite a few studies on the application of MFG to traffic flow models, this paper offers a systematic framework to apply MFG to autonomous vehicle control from four aspects: (i) We first derive the mean field game as a continuum version of discrete differential game; (ii) We develop a solution algorithm based on multigrid preconditioned Newton's method to solve a mean field equilibrium; (iii) We construct a tuple of discrete controls from the continuous mean field equilibrium and demonstrates its accuracy as an $\epsilon$-Nash equilibrium to the original discrete differential game, so that those controls can be deployed to individual autonomous vehicles in the context of discrete games. (iv) The derived mean field game can also be treated as a macroscopic traffic flow model. We show its connections to traditional LWR model and present some examples to illustrate traffic flow characteristics of mean field game.


Motivation
To prepare autonomous vehicles (AV) to drive on public roads, safe and efficient controller design of autonomous driving is the first priority. The most extensively studied AV controllers in platooning are Connected Adaptive Cruise Control (CACC) that requires connectivity between predecessors and followers as well as between platoon leaders and followers. CACC contains two control policies: constant spacing (CS) (Swaroop and Hedrick, 1996;Darbha and Rajagopal, 1999;Swaroop et al., 2001) and constant time headway (CTH) (Ioannou and Chien, 1993;Rajamani and Shladover, 2001;Van Arem et al., 2006;Naus et al., 2010;VanderWerf et al., 2001;Zhou et al., 2017;Arefizadeh and Talebpour, 2018;Stern et al., 2018).
These two policies can be formulated as a linear time invariant system (LTI) (Swaroop et al., 1994) with disturbances to dynamic and measurement dynamics (Zhou et al., 2017). AVs longitudinal acceleration control can also be modeled using nonlinear car following models (CFMs). The most widely used CFMs for AVs are Intelligent Driver Model (IDM) (Treiber et al., 2000;Kesting et al., 2008Kesting et al., , 2010Schakel et al., 2010;Naus et al., 2010;Ploeg et al., 2011;Talebpour and Mahmassani, 2016;Cui et al., 2017;Wu et al., 2017;Talebpour et al., 2018;Wu et al., 2018;Yang and Peng, 2010;Huang et al., 2017;Zhao et al., 2017Zhao et al., , 2018 and Optimal Velocity Model (OVM) and its variants with heterogeneous communication delay or dynamic uncertainty (Qin and Orosz, 2013;Jin and Orosz, 2014;Qin and Orosz, 2017;. Unlike OVM, IDM takes safety into consideration and is thus collision-free. All the aforementioned studies aim to develop a string stable car-following controller in order to smoothen traffic flow and prevent stop-and-go waves. But none of them considers control and physical safety constraints (Gong and Du, 2018). In other words, interactions among vehicles are not explicitly modeled (Li et al., 2018b).
To address above challenges, another school of researchers treat a transportation system with a full penetration of AVs as a multi-agent system (MAS), wherein every AV interacts among one another through physical interactions in traffic. A majority of studies that capture the interactions of vehicles assume each vehicle solves a sequence of accelerations over a finite time horizon by optimizing a common or an individual objective function. Vehicles interact with one another through the common or individual cost function as well as safety constraints. Depending on the objective functional form, these models can be further divided into two classes: cooperative control and non-cooperative game.
Cooperative control has been widely studied in multi-robotic systems. In light of multi-robotic-interaction, robots interact with one another and choose optimal policies by predicting others behavior. Neighboring robots trajectories are treated as hard safety constraints or boundaries for robots motion planning. Such modeling has been critical in multi-robot collision avoidance and human-robot interaction Tomizuka, 2015, 2016). A cooperative AV system is a multi-vehicle system that can be controlled to stabilize traffic flow and smoothen traffic jam (Wang et al., 2016;Gong et al., 2016;Gong and Du, 2018), to optimize driving comfort (Wang et al., 2014b;Zhou et al., 2017), and to improve fuel efficiency (Wang et al., 2014a;Yao et al., 2018). To reduce computational burdens, a distributed algorithm is usually designed and implemented on each vehicle (Wang et al., 2016;Gong et al., 2016;Gong and Du, 2018).
Compared to cooperative AV control, the non-cooperative interactions among AVs are relatively understudied. Game theory is a natural approach to model the non-cooperative strategic interactions among AVs, which are usually regarded as intelligent agents aiming to optimize an individual objective function. In the game theoretic framework, cars are referred to as "agents" or "players". Wang et al. (2015) formulated AVs discrete lane change and continuous acceleration selections as a differential game, where agents' optimal strategies are obtained from solving optimal control problems. Talebpour et al. (2015) modeled lanechanging behavior as a two-person non-zero-sum non-cooperative game under (in)complete information. Langari (2012, 2013); Kim and Langari (2014); Yu et al. (2018) developed a Stackelberg game among multiple AVs in driving or merging and a mixed-motive game in lane-changing. Dreves and Gerdts (2018) modeled multiple AVs acceleration and steering angle velocity selection at intersections with the goal of avoiding collisions. The human-cyber-physical systems (h-CPS) community extends multi-agent systems to hybrid AVs interacting with human drivers. For example, Sadigh et al. (2016) designed "local interactions" between an AV and a human driver to drive efficiently and maximize road capacity, while Lazar et al. (2018) generalized the model to several AVs and HVs. Li et al. (2018a) assumed that human drivers choose driving policies using hierarchical reasoning while AVs optimize car-following and lane-changing strategies based on a Stackelberg game. The outcomes of all the aforementioned game-theoretic models are equilibrium driving strategies. The computation of equilibrium may become mathematically intractable when the number of coupled agents becomes large. To get around, Wang et al. (2015) applied Model Predictive Control (MPC) instead of computing an equilibrium. Dreves and Gerdts (2018) solved a generalized Nash equilibrium by summing up all vehicles objective functions, which is essentially a cooperative control. Lazar et al. (2018) assumed that AVs can directly solve an optimization based upon their predictions of human driver actions rather than human's actual strategies. None of these studies investigated quantitatively how close the approximate solutions are to the original differential games. Nevertheless, because the game-based control algorithms suffer from scalability issues, all the aforementioned studies had to constrain their applications to a limited number of AVs.
Above studies focus on AVs longitudinal or lateral controls in discrete games, which suffers from scalability issues. Thus a scalable theory and algorithm applicable for a large number of coupled AV controllers is urgently needed. One feasible framework is to translate microscopic car-following behavior to macroscopic traffic flow models. Car following models (CFMs) treat each car as a discrete entity with a constant length, whose dynamic location and velocity is computed from an underlying coupled ordinary differential equation (ODE) system (Newell, 1961;Gipps, 1981;Bando et al., 1995;Brackstone and McDonald, 1999;Zhang, 1999;Zhang and Kim, 2005;Zheng et al., 2011;Chen et al., 2012;Orosz et al., 2010;Cui et al., 2017;Yang and Peng, 2010;Huang et al., 2017;Zhao et al., 2017Zhao et al., , 2018Tomizuka, 2015, 2016;Gong et al., 2016;Gong and Du, 2018). CFMs assume local interactions among vehicles and local information from neighboring vehicles. Modeling each agent requires tracking and keeping records of surrounding but changing agents. With traffic flow's dynamical and volatile characteristics, the interacting agents and their topology change quickly. It is particularly infeasible for heavy traffic scenarios, as the proposed control may not be scalable for networked control. Neither is it easy to account for global traffic information obtained from vehicle connectivity. In contrast, macroscopic traffic flow models treat one car as a particle without occupying any space. Traffic flow is then described by a continuum velocity field and a density distribution for a specific location and time. The traffic flow is then modeled by partial differential equations (PDEs) (Newell, 1961(Newell, , 1993Lighthill and Whitham, 1955;Zhang, 1998Zhang, , 2002Jin and Zhang, 2003;Daganzo, 2005;Lebacque, 2005).
The connection between CFMs and macroscopic traffic flow models has been studied using different formulations and continuum limit. Based on different coordinates and state variables, a macroscopic model can be transformed into different formulations that are consistent with specific CFMs. Daganzo (2006a,b) showed that the Newell's CFM is a discrete form of the LWR model with the Greenshields fundamental diagram in Lagrangian coordinates. Along this line, Helbing (2009) studied some higher order models and Laval and Leclercq (2013) studied some other formulations of LWR model. Both works showed consistency between some CFMs and some macroscopic models. Then, Jin (2016) established a more general framework bridging between a family of CFMs and different formulations of macroscopic traffic flow models. In continuum limit, a macroscopic model is the limit of a CFM as the number of cars tends to infinity, which can be shown in some cases using theory from conservation laws (Colombo and Rossi, 2014;Rossi, 2014) or measure theory (Di Francesco and Rosini, 2015) In this paper, we aim to derive a macroscopic game-theoretic model from AVs microscopic longitudinal control using mean field approximation. Mean field game (MFG) is a game-theoretic model used to describe complex multi-agent dynamic systems (Lasry and Lions, 2007;Huang et al., 2006). It has become increasingly popular in designing new decision-making processes for finance (Guéant et al., 2011;Lachapelle et al., 2010), engineering (Djehiche et al., 2016;Couillet et al., 2012), social science (Degond et al., 2014) and pedestrian crowds modeling (Lachapelle and Wolfram, 2011;Burger et al., 2013). MFG is a micro-macro model which allows one to define individuals on a microscopic level as rational utility-optimizing agents while translating rich microscopic behaviors to macroscopic models. The basic idea is to exploit the "smoothing" effect of large numbers of interacting individuals. Instead of solving a long list of highly coupled equations that depict the interactions among different players, MFG assumes that each player only reacts to a "mass" which results from an aggregate effect of all the players. Such an approach is called mean field approximation and helps to simplify the complex multi-agent dynamic systems on a macroscopic level.

Contributions of this paper
This paper contributes to the state-of-the-art of AV controller design by characterizing the interplay between discrete differential games and continuous mean field games. In particular, four contributions are elaborated below (as shown in Fig. 1).
1. A mean field game is derived from the limiting differential game as the number of AVs tends to infinity. The mean field game is a coupled forward-backward partial differential equation (PDE) system which models AVs non-cooperative velocity selections at a macroscopic scale. The existing research on application of the mean field game to the transportation domain solely worked on specific objective functions (Chevalier et al., 2015;Kachroo et al., 2016). In contrast, we systematically derive the forward continuity equation and the backward Hamilton-Jacobi-Bellman (HJB) equation with a family of more general objective functions using mean field approximation.
2. An equilibrium solution, denoted by mean field equilibrium (MFE), is solved from the mean field game. AVs optimal velocity control strategies are represented by MFE at a macroscopic level. Existing algorithms for MFE are mainly designed for a short planning horizon (Chevalier et al., 2015) or a special cost function (Lachapelle and Wolfram, 2011). In this paper we develop a new algorithm for MFE that works with a longer planning horizon and more general cost functions. Our algorithm is based on finite difference and multigrid preconditioned Newton's method.
3. A tuple of AVs discrete controls are constructed from the discretization of a continuous MFE. We test different numbers of cars and different objective functions to illustrate the accuracy of MFEconstructed controls as an -Nash equilibrium of the original differential game. The results show a consistent trend that the continuous equilibrium solution provides a good approximation to AVs noncooperative individual controls when the number of AVs is large. This construction method addresses the scalability issue faced by many existing literature (Wang et al., 2015;Dreves and Gerdts, 2018;Lazar et al., 2018).
4. The proposed mean field game can also be treated as a macroscopic traffic flow model. It models AVs aggregated behaviors assuming AVs are predictive and rational agents. Along this line, we first establish connections between mean field game and traditional LWR model rigorously. Then we present some possible AV driving objective functions whose corresponding mean field games show interesting traffic patterns. The remainder of the paper is organized as follows. Section 2 introduces AVs differential game as an extension to one AV optimal longitudinal control problem. In Section 3, the macroscopic MFG is derived from AVs differential game with some assumptions. In Section 4, we illustrate the connection between MFG and traditional LWR model in a general framework and present two MFG examples modeling AVs kinetic energy, driving efficiency and safety. Then, Section 5 is devoted to a new algorithm to solve MFG numerically based on Newton's method. In Section 6, we construct a tuple of AVs discrete controls from the continuous MFG solution and characterize their accuracy as an approximate equilibrium of the original differential game. Conclusions and future research directions follow in Section 7.

From Optimal Control to Differential Game
We have seen a growing interest in applying optimal control theory to model AVs' predictive driving strategies in car-following and lane-changing scenarios (Wang et al., 2014a,b;Gong et al., 2016;Zhou et al., 2017). In this section, we briefly introduce how to formulate a single AV's longitudinal control as an optimal control problem and then extend it to a differential game between multiple AVs.

Optimal Longitudinal Control for One Car
Assume that there are N AVs indexed by i ∈ {1, 2, . . . , N } driving in one direction on a closed highway of length L without any entrance nor exit. Denote the i th car's position by x i (t) and speed by v i (t). Fix a finite period of time [0, T ] where T > 0, the motions of cars on [0, T ] are dictated by the following dynamical system:ẋ where, x i (t): the shorthand notation of dxi(t) dt ; x i,0 : the i th car's initial position at the beginning time t = 0. We use the notation For any i = 1, 2, . . . , N , suppose the i th car knows other cars' speeds: and positions: To select an optimal driving velocity profile, the i th car solves an optimal control problem over the planning horizon [0, T ] . Define the i th car's driving cost functional as: ) dt: the running cost over the entire planning horizon; f N i (·): the cost function that quantifies driving objectives such as efficiency and safety. Here we assume it is a strictly convex function with respect to v i and will explain the rationale in the next section; V T (x i (T )): the terminal cost representing the i th car's preference on the final position at time T .
We assume all cars have the same free flow speed denoted by u max . It is natural to require the i th car's velocity to be positive and not to exceed u max . Mathematically, is the admissible set of the i th car's velocity selections. The i th car tries to obtain an optimal velocity control v * i (v −i (·), t) on the planning horizon [0, T ] such that: , t) depends on other cars' velocities v −i (·) through their trajectories x −i (·). We will use the notation v * i (t) for simplicity. When one car selects its own driving velocity over the planning horizon while everybody else does so simultaneously, a non-cooperative differential game forms.

N -Car Differential Game
Differential games can be regarded as an extension of non-cooperative Nash games to dynamical systems. In a differential game, a finite number of players solve their individual optimal control problems and those optimal control problems are coupled through the dependency of one's cost functional (Basar and Olsder, 1999). Along this line, we extend the one-car optimal control problem in Section 2.1 to an N -car differential game for AVs: N AVs indexed by i ∈ {1, 2, . . . , N } are driving in one direction on a closed highway without any entrance nor exit, with initial positions x 1,0 , . . . , x N,0 . Each car aims to select its optimal velocity control by minimizing its driving cost functional defined in Eq. (2.4) over the predefined planning horizon [0, T ]. A Nash equilibrium of the game is a tuple of controls v * It is generally difficult to solve an equilibrium when N is large, because it involves solving N coupled optimal control problems (Cardaliaguet, 2010). The goal of this paper is to develop a scalable framework to solve approximate equilibriums for a family of N -car differential games by resorting to mean field approximation.
The underlying rationale of the developed methodology is articulated as follows: 1. Rather than solving the N -car differential game directly, we turn to its limit as the number of cars N → ∞ i.e. a mean field game (Section 3).

2.
A numerical algorithm is developed to solve the corresponding mean field game (Section 5).
3. The equilibrium solution of the mean field game is used to construct a tuple of discrete controls and those controls are verified to be an -Nash equilibrium of the original N -car differential game with small by numerical examples (Section 6).

From Differential Game to Mean Field Game
When the number of cars N → ∞, one goes from the N -car differential game to a mean field game (MFG). The MFG is essentially a differential game with an infinite number of agents so that the interactions between any two individuals are ignorable. Instead, any individual reacts only to the "mass" of all agents. The "mass" then evolves with the aggregated behavior of all agents' motions. Two partial differential equations are developed to describe the MFG: 1. A backward Hamilton-Jacobi-Bellman (HJB) equation: a generic car's speed selection is formulated as an optimal control problem where the generic car computes its driving cost associated to a cost function based on its prediction on the evolution of total "mass". The HJB equation is then derived from the optimal control problem. The solution of the HJB equation provides optimal costs and optimal velocity control strategies for all cars. The HJB equation is solved from t = T to t = 0 backward.

A forward continuity equation:
it is derived from the conservation of cars. The solution of the continuity equation describes the "mass" evolution arising from all cars' motions. The continuity equation is solved from t = 0 to t = T forward.
The MFG is a coupled system of the forward continuity equation and the backward HJB equation. At the mean field equilibrium, the total "mass" evolution coincides every car's prediction. Fig. 2 shows a simple example of four cars to provide an intuitive explanation of these two equations. Each car is a rational agent who aims to minimize a driving cost defined in Eq. (2.4), leading to a system of four coupled optimal control problems, one for each car. As N goes large, the HJB equation can be derived from these coupled problems and the continuity equation can be derived from all cars' motions.
In this section, we will formally derive the HJB and continuity equations from the N -car differential game using mean field approximation.

Mean Field limit
The general idea of moving from the microscopic N -car different game to a macroscopic MFG is to take a mean field limit by letting the number of cars in the system go to infinity. To allow us to take the limit, we need to first make two homogeneity assumptions: (A1) All cars are indistinguishable.
(A2) All cars have the same form of cost function.
It should be mentioned that the above assumptions may be relaxed. For example, (A2) can be relaxed if multi-class traffic is modeled. In this paper we mainly focus on single-class AVs and leave multi-class model as the future work.
Provided that the N -car differential game satisfies the above assumptions, we will derive a MFG in four steps: 1. We reformulate the driving cost functional defined in Eq. (2.4) by introducing a smooth density (Section 3.1.1); 2. We extend an individual car's optimal control problem to that of a generic car's by taking the mean field limit when N → ∞ (Section 3.1.2); 3. We derive a set of HJB equations from the generic car's optimal control problem (Section 3.1.3); 4. We obtain an evolution equation and show that it is exactly the continuity equation widely used in macroscopic traffic flow models (Section 3.1.4).

Step 1: Traffic Density Smoothing
Traffic density is a crucial quantity to manifest the macroscopic aspect of traffic flows. Assumption (A1) enables us to replace states of individual cars in the driving cost functional defined in Eq. (2.4) by an aggregate traffic density.
More precisely, for any Cardaliaguet (2010), we can replace the ar- where δ(·) is the Dirac mass. However, ρ N is not a smooth function, leading to non-smoothness of the new driving cost function . To resolve this issue, we first approximate ρ N using a smooth kernel.
Suppose ξ(x) is a smoothing kernel which is smooth, nonnegative and satisfying R ξ(x) dx = 1. We take a smoothing parameter σ > 0 and define the scaled kernel ξ σ (x) = 1 σ ξ( x σ ). The physical meaning of using the scaled kernel ξ σ (x) is that the i th car contributes to the density in a "window" [x i (t) − σ, x i (t) + σ] rather than only at the point x i (t) (i = 1, . . . , N ) so that the density changes smoothly with location x. The smooth density distribution is defined as: With a smooth density, the cost function for the i th car is rewritten as: where f i (·, ·) is a bivariate function of speed and density, i = 1, 2, . . . , N .
By assumption (A2), we have is a cost function shared by all cars and strictly convex with respect to u. In summary, the i th car's driving cost becomes: Definition 3.1.
1. N -car mean field type differential game [DG]: N AVs indexed by i ∈ {1, 2, . . . , N } are driving in one direction on a closed highway of length L without any entrance nor exit, with initial positions x 1,0 , . . . , x N,0 . Each car aims to select its optimal velocity control by minimizing its driving cost functional defined in Eq. (3.4) over the predefined planning horizon [0, T ].
2. N -car mean field type differential game equilibrium [DGE]: A Nash equilibrium of the N -car mean field type differential game is a tuple of controls v * 1 (t), v * 2 (t), . . . , v * N (t) satisfying At equilibrium, no car can improve its driving cost by unilaterally switching its velocity control.
We see from Eq. (3.4) that each car only responds to and contributes to the density distribution ρ N σ of all cars through driving costs. Such property allows us to take the mean field limit of the game as N tends to infinity.

Step 2: Optimal Control for a Generic Car
We take the mean field limit in the following way: fix the ratio L/N , let N → ∞ and σ/L → 0. Intuitively that means we fix the space headway and shrink the "window" so that in the limiting case one car only sees a local density. Under the limit, using mean field approximation we replace ρ N σ (x, t) which is computed from N cars by a continuum density distribution ρ(x, t). Note that all cars are anonymous, we can ignore the index i and consider a generic car starting from x 0 at t = 0. Denote this car's speed control by v(t) and trajectory by x(t) for t ∈ [0, T ], we rewrite Eq. (3.4) as where its dynamic is described byẋ and its velocity control v(·) is constrained by (3.8)

Step 3: HJB Equation
The generic car solves the optimal control problem Eqs. (3.6)(3.7)(3.8) to obtain its optimal velocity control v(t). However, the generic car can start from any initial position x 0 . Rather than solving an infinite number of optimal control problems for every initial position, we generalize the problem by defining V (x, t) to be the optimal cost for a generic car starting from location x at time t: and u(x, t) to be the optimal velocity field. We solve V (x, t) and u(x, t) for all x and t, then the optimal cost of the original problem Eqs. (3.6)(3.7)(3.8) is given by V (x 0 , 0). A set of HJB equations are derived from Eqs. (3.9a)(3.9b) for V (x, t) and u(x, t). The derivation can be given using dynamical programming or Pontryagin's minimization principle, both end up with the same HJB equations. We give a derivation based on dynamic programming, such an approach is widely used in optimal control theory (Bardi and Capuzzo-Dolcetta, 2008).
Suppose a generic car starts from position x at time t. Consider a small time step ∆t, we can divide the driving cost in Eq. (3.9a) into two parts on [t, t + ∆t] and [t + ∆t, T ]: (3.10) Correspondingly, the generic car's decision process is also divided into two stages. First it selects velocity v(t) = α ∈ [0, u max ] on the horizon [t, t + ∆t]. Then it moves to x + α∆t at time t + ∆t and selects its velocity profile on the rest of the planning horizon [t + ∆t, T ].
The running cost on [t, t + ∆t] is approximated by t+∆t t f (v(s), ρ(x(s), s)) ds = f (α, ρ(x, t))∆t + O(∆t 2 ). (3.11) Note that from the new position x + α∆t, the optimal cost on [t + ∆t, T ] the car can obtain is V (x + α∆t, t + ∆t). By dynamic programming principle we have: V (x, t) = min 0≤α≤umax f (α, ρ(x, t))∆t + V (x + α∆t, t + ∆t) + O(∆t 2 ) . (3.12) Take the first order Taylor's expansion of V (x + α∆t, t + ∆t) near (x, t), denote V t and V x the partial derivatives ∂V ∂t and ∂V ∂x . Eq. (3.12) yields: Eliminating V (x, t) from both sides, dividing both sides by ∆t and letting ∆t → 0, we get: (3.14) Note that f (u, ρ) is assumed to be strictly convex with respect to u, we introduce: so that −f * (−p, ρ) is the Legendre transformation of f (u, ρ) with respect to u. Then Eq. (3.14) can be written as The strict convexity of f (u, ρ) with respect to u yields the uniqueness of the minimizer in Eq. (3.15) which is given by f * p (p, ρ) for any p ∈ R, here f p denotes f 's derivative with respect to p. That is why we assume f i (i = 1, . . . , N ) and f in the game are strictly convex with respect to u. As a result, the optimal velocity field u(x, t) is given by , which gives the terminal condition. Summarizing the above derivation, given the density distribution ρ(x, t), Eqs. (3.9a)(3.9b) lead to the following HJB equations: V (x, t) and u(x, t) are solved backward from the HJB equations.

Step 4: Continuity Equation
When all cars follow the optimal velocity control, the aggregated density distribution ρ(x, t) evolves according to the optimal velocity field u(x, t) obtained from the HJB equations. An evolution equation can be derived from the conservation of cars: to describe the evolution of density ρ(x, t) from some initial density distribution ρ(x, 0) = ρ 0 (x). Eq. (3.19) is exactly the continuity equation (CE) widely used in traffic flow models (Orosz et al., 2010). Given velocity field u(x, t) known, Eq. (3.19) is solved forward.
Up to now we have derived both HJB and continuity equations for a MFG. Its initial and terminal conditions are provided by the initial density ρ 0 (x) and the terminal cost V T (x), respectively. The choice of boundary conditions depends on the traffic scenario. For example, when cars drive on a ring road, the periodic boundary conditions are specified as: ρ(0, t) = ρ(L, t), V (0, t) = V (L, t).

Mean Field Game System
Summarizing Eqs. (3.19)(3.18a)(3.18b), when HJB and continuity equations are coupled, we have the following MFG system with the cost function f (u, ρ): Denote the solution of the system by ρ * (x, t) and u * (x, t). The optimal velocity field u * (x, t) is our primary focus and will thus be referred as the mean field equilibrium (MFE) in the subsequent analysis. Remark 3.1.
1. MFGs with some special cost functions are shown to have a unique MFE (Cardaliaguet, 2010(Cardaliaguet, , 2015. However, the existence and uniqueness of a general MFG system remains unsolved.
2. The MFG system derived here is usually called a first-order or non-viscous MFG system, because we assume no stochasticity on cars' dynamics. Accordingly, [MFG] is a first-order PDE system. For theory on first order MFG see Cardaliaguet (2010Cardaliaguet ( , 2015.

Mean Field Game in Traffic Flow
MFG shares the same continuity equation with traditional traffic flow models but models cars' reactions to traffic congestions in a different way. Traditional traffic flow models prescribe a relationship between traffic density and velocity or acceleration, while MFG models cars' speed selection as an optimal control problem with a prescribed cost function. Based upon such understanding, MFG can be seen as a macroscopic traffic flow model which models AVs predictive and rational driving behaviors. In this section we first establish connections between MFG and traditional LWR model and then present examples of MFGs by choosing appropriate cost functions to quantify AVs driving objectives.

Connections between MFG and LWR
The Lighthill-Whitham-Richards (LWR) model (Lighthill and Whitham, 1955;Richards, 1956) is a representative of traditional traffic flow models, so it would be helpful to establish the connection between MFG and LWR. Kachroo et al. (2016); Chevalier et al. (2015) have revealed such connections focusing on a specific class of LWR with the Greenshields fundamental diagram. Kachroo et al. (2016) showed a cost function whose corresponding MFG takes the Greenshields LWR as a solution. Chevalier et al. (2015) claimed that the Greenshields LWR is essentially a MFG with a specific cost function when drivers minimize their driving costs by selecting their driving speeds myopically .
In the subsequent analysis we will establish connections between MFG and LWR from two perspectives, as shown in Fig. 3. (i) We coin a cost function for an arbitrary fundamental diagram and show that the LWR is a solution of the corresponding MFG; (ii) LWR can also be seen as a myopic MFG by letting the length of planning horizon tend to zero, with a general family of cost functions.

LWR as a solution to MFG
Let us choose an arbitrary equilibrium speed function U (ρ). The corresponding LWR model is: Now we directly set the driving objective to be maintaining the LWR speed. There are infinite choices of respective cost functions. Here we artificially choose the following cost function: for the reason that Eq. (4.2) relates to another cost function modeling driving efficiency and safety, which will be shown later. Eq. (4.2) corresponds to the following MFG system: Proof. Denote ρ * (x, t) and u * (x, t) the solution of [LWR]. Note that Eq. (4.1a) is the same as Eq. (4.3a), it suffices to show that ρ * and u * satisfy the HJB equations (4.3b)(4.3c) for some V * . Take V * ≡ 0, then the terminal condition V * (x, T ) = V T (x) = 0 is satisfied and Eqs. (4.3b)(4.3c) become a single equation u * = U (ρ * ), which is true from Eq. (4.1b). So ρ * , u * and V * ≡ 0 is a solution of [MFG-LWR].
Theorem 4.1 will be verified with the Greenshields equilibrium speed later in the numerical example.

LWR as a myopic MFG
To demonstrate another connection between MFG and LWR, we consider a general cost function f (u, ρ) and its corresponding MFG system [MFG].
Given the planning horizon [0, T ], a generic car selects its optimal velocity control to minimize the driving cost functional defined in Eq. (3.6). If the generic car is myopic and does not concern the future, intuitively it will select the speed u to minimize the instantaneous cost i.e. u = argmin 0≤α≤umax f (α, ρ) at any time t, which leads to an LWR model with equilibrium speed: where the function f * (p, ρ) is defined in Eq. (3.15).
To give a rigorous description of the myopic behavior, we take the limit T → 0 where T is the length of planning horizon. As the future prediction effect becomes smaller, the car behaves more myopically. It is expected that the solution of the MFG will converge to the solution of the LWR model with the equilibrium speed function U (ρ) defined in Eq. (4.5). (4.6) Proof. There exists a constant M > 0 such that |V x (x, T ) = 0 for all 0 ≤ x ≤ L, we get: for all 0 ≤ x ≤ L and 0 ≤ T ≤ T 0 . Hence V (T ) x (x, 0) → 0 when T → 0, ∀x ∈ [0, L]. Since f (u, ρ) is continuously differentiable and strictly convex with respect to u, f * p (p, ρ) is continuous with respect to p. From Eq. (3.20c) we deduce that when T → 0: (4.8) Remark 4.1. Typically an equilibrium speed U (ρ) is supposed to satisfy certain conditions. For example: (i) U (ρ) ≤ 0; (ii) U (0) = u max ; (iii) U (ρ jam ) = 0. In Theorem 4.2, U (ρ) is computed from Eq. (4.5). The conditions on U (ρ) is rewritten as: Here the subscripts represent the partial derivatives. Using the well known identity f u (f * p (p, ρ), ρ) = p between f and its Legendre transformation f * (Rockafellar, 2015), we can translate the conditions on f * to conditions on f . As a result, we require the cost function f (u, ρ) to satisfy: (i) f uρ (U (ρ), ρ) ≥ 0; (ii) f u (u max , 0) = 0; (iii) f u (0, ρ jam ) = 0. These conditions provide a way to calibrate the cost function from its myopic behavior.
We can now interpret LWR from the perspective of MFG, which provides a richer behavioral foundation and a more general and flexible framework.

MFG Examples
As a micro-macro game-theoretic model, MFG can capture richer driving behaviors than LWR if we choose various cost functions. Different cost functions relate to different driving objectives and consequently lead to different MFGs. In this subsection, we would present two concrete cost functions quantifying AVs kinetic energy, driving efficiency, and safety.

MFG-Separable
We will propose a special cost function whose corresponding MFG has nice mathematical properties. This family of cost function is called separable (Ambrose, 2018), i.e., f (u, ρ) can be written as the sum of two univariate functions with respect to u and ρ. Denote ρ jam the jam density, we propose a new cost function which is separable and models AVs kinetic engery, driving efficiency and safety: (4.9) In Eq. (4.9), the first term of f (u, ρ) represents the kinetic energy; the second term quantifies driving efficiency by speed magnitudes; the last term quantifies driving safety using a traffic congestion penalty term on density ρ meaning that AVs tend to avoid staying in high density areas. We denote the corresponding MFG by [MFG-Separable].
Since the cost function Eq. (4.9) is separable, when there are no speed constraints the corresponding MFG is a potential game (Benamou et al., 2017). Cardaliaguet (2015) proved the existence and uniqueness results for a family of such games.
When there are speed constraints 0 ≤ u ≤ u max , the minimum of f (u, ρ) + uV x is attained at (4.10) So the MFG system is:

MFG-NonSeparable
We propose another cost function which quantifies driving safety in a more explicit way. The cost function is: (4.12) It quantifies kinetic energy and driving efficiency in the same way as in Eq. (4.9) but uses a different traffic congestion penalty term on the product of density and speed to quantify driving safety. The new penalty term means that AVs tend to decelerate in high density areas and accelerate in low density areas. We denote the corresponding MFG by [MFG-NonSeparable].
With speed constraints 0 ≤ u ≤ u max , the minimum of f (u, ρ) + uV x is attained at The corresponding MFG system is [MFG-NonSeparable] is related to the Greenshields LWR from two perspectives. On one hand, letting V x → 0 Eq. (4.13) becomes u = u max (1 − ρ/ρ jam ), which is the same as the Greenshields equilibrium speed defined in Eq. (4.4). From Theorem 4.2 we know that the Greenshields LWR is the myopic version of [MFG-NonSeparable]. On the other hand, Eq. (4.12) can be written as (4.15) where U (ρ) is the Greenshields equilibrium speed defined in Eq. (4.4). Comparing Eq. (4.15) with Eq. (4.2), we see that [MFG-NonSeparable] can be seen as a variant of [MFG-LWR] with the Greenshields equilibrium speed. That is why we pick the cost function in the form of Eq. (4.2) in Section 4.1.1. The proposed MFG systems are more easily solvable than the discrete differential game. We will discretize the system in space and time and then present a solution algorithm to compute the MFE.

MFE Solution Algorithm
Because of the forward-backward structure, the MFG system can be solved in neither forward nor backward direction. Given the density profile ρ(x, t), the HJB equations (3.20b)(3.20c) can be solved backward from t = T to t = 0 with terminal cost V T (x) for u(x, t) and V (x, t); given the velocity field u(x, t), the continuity equation (3.20a) can be solved forward from t = 0 to t = T with initial density ρ 0 (x) for ρ(x, t). However, the two directions can not be matched simultaneously. So it is challenging to compute the MFE numerically.
As far as the authors know, there are three types of numerical methods for MFG: fixed-point iteration, variational method and Newton's method.
The fixed-point iteration solves the forward and backward equations alternatingly. It is easy to implement once appropriate forward and backward solvers are picked (Couillet et al., 2012;Chevalier et al., 2015). Applying some relaxation tricks may improve the algorithm (Zhang et al., 2018). However, the iterations converge only when T is small, that is, for a short planning horizon. Moreover, there is no theory to estimate how small T should be to guarantee the convergence.
The variational method deals with separable cost functions and potential MFGs. In this case, it is shown that the MFG system is equivalent to an optimization problem constrained by the continuity equation (Benamou et al., 2017). Then a variety of optimization tools can be applied (Lachapelle and Wolfram, 2011;Benamou and Carlier, 2015;Chow et al., 2018). The variational method works for any planning horizon but relies on the separability of the cost function. Lachapelle and Wolfram (2011) used the variational method to solve MFGs in pedestrian crowds modeling.
A more general approach is based on Newton's method. Such an approach is first proposed by Achdou and Capuzzo-Dolcetta (2010); ; Achdou and Perez (2012) to solve a family of MFGs.
The key idea is to take both forward and backward equations as a single nonlinear system and solve the nonlinear system by Newton's method. This type of method is suitable for our purpose because it has no requirements on the length of planning horizon nor the cost function. However, the Newton's method may fail to converge if one does not have a good initial guess to the solution. So tricks to improve the convergence are needed when applying the Newton's method.
This paper develops a multigrid preconditioned Newton's finite difference algorithm for MFG. It works well with different cost functions and planning horizons. Numerical examples of MFGs proposed in Section 4 are shown using this algorithm. with spatial and temporal step sizes ∆x = L/N x and ∆t = T /N t . Specify the periodic boundary conditions, x 0 and x Nx are assumed to be the same location. Denote ρ n j the average density and u n j the average speed on the j th cell [x j−1 , x j ] at time t n (j = 1, . . . , N x ; n = 0, . . . , N t ). Denote V n j = V (x j , t n ).
Discretize the initial and terminal conditions by , {V n j } 0≤n≤Nt 1≤j≤Nx . The system can be written as where w ∈ R 3NxNt+2Nx is a long vector containing all ρ n j , u n j and V n j , and F : R 3NxNt+2Nx → R 3NxNt+2Nx encodes all equations.
Eq. (5.5) is a large nonlinear system. We denote J the Jacobian matrix of F and apply Newton's method to solve Eq. (5.5): with any initial guess w 0 .
To improve the convergence of Newton's iterations, we apply multigrid to get a good initial guess and preconditioning to accelerate the linear solver. Multigrid and preconditioning are widely used tricks in numerical algorithms, see Hackbusch (2013); Golub and Van Loan (2012).
• Multigrid: Start with a coarse grid N (0) x × N (0) t which is easy to solve. Then iteratively refine the grids and solve the MFG system on finer grids N (k) x × N (k) t , k = 1, 2, . . . until getting a solution of desired resolution. At step k, interpolate the solution w (k−1) from grids N t , which provides a good initial guess when solving on the finer grids by Newton's method.
• Preconditioning: At each Newton's iteration a linear system need to be solved. We use the gmres iterative linear solver (Saad and Schultz, 1986) since J(w n ) is sparse . However, the linear system is ill-posed which results in bad convergence. So we pick an approximate matrixJ(w n ) to J(w n ) by ignoring the coupling parts between forward and backward equations. InvertingJ(w n ) is equivalent to solving a decoupled forward-backward system. We usẽ J(w n ) as a preconditioner to accelerate the gmres convergence.
Using the algorithm, we shall compute MFE solutions and show simulations for MFGs proposed in Section 4.

Numerical Examples
Set the road length L = 1 and the planning horizon length T = 3. Set the free flow speed u max = 1 and the jam density ρ jam = 1. Choose the following initial density: with ρ a = 0.05, ρ b = 0.95, γ = 0.1. The initial density represents the scenario that initially cars cluster near x = L/2 and the traffic is light in other places. We choose the terminal cost V T = 0 and specify periodic boundary conditions ρ(0, t) = ρ(L, t), V (0, t) = V (L, t).
On a spatial-temporal grids of size N x = 120 and N t = 480, we compute MFE solutions ρ * (x, t), u * (x, t) for the three MFG systems in Section 4.
For [MFG-LWR], we choose U (ρ) to be the Greenshields equilibrium speed defined in Eq. (4.4). We plot the density evolution of ρ * (x, t) in a 3D space-time-density diagram. See Fig. 4a. In addition, we collect density and flow data from the MFE solution and plot them in a fundamental diagram. See Fig. 4b. The density and flow data are collected as follows: take n x = 24 equally distributed locations x 1 , x 2 , . . . , x nx on [0, L] and n t = 96 time snapshots t k = kT nt , k = 0, 1, . . . , n t , collect the density and speed values ρ * (x i , t k ) and u * (x i , t k ) for i = 1, . . . , n x , k = 0, . . . , n t , then compute the flow q * (x i , t k ) = ρ * (x i , t k )u * (x i , t k ) and plot the data {ρ * (x i , t k ), q * (x i , t k )} 1≤i≤nx,0≤k≤nt on the density-flow diagram. Fig. 4a shows the formation and propagation of a shock wave. The shock wave moves with smaller and smaller amplitudes but does not disappear in given time horizon [0, T ]. From Fig. 4b we see that all of the collected density-flow data points fall onto the Greenshields equilibrium speed curve defined in Eq. (4.4  [MFG-NonSeparable], the density profile keeps smooth and no shock wave forms. From time t = 1, the density becomes a uniform flow. For [MFG-Separable], the behavior is similar but the high density dissipates more slowly and the density becomes nearly a uniform flow from t = 2. The results show that AV's anticipation behavior helps to avoid the formation of shock waves and stabilize the traffic in this set-up.   those MFG systems, the errors are estimated using solutions from different grids. To check the solution ρ * (x, t), u * (x, t) on the N x × N t grids, we first solve the MFG on the coarse grids of size N x /2 × N t /2 and then interpolate the coarse solution back onto the N x × N t grids. Denote the interpolated solution by (ρ * ,ũ * ), the solution error on the N x × N t grids is estimated as where the norm is chosen as the L 1 norm on [0, L] × [0, T ]. We fix the spatial-temporal ratio N t /N x = 4 and increase N x from 30 to 120. Then we plot the errors computed by Eq. (5.9) with the spatial grid size N x . From Fig. 6 we see first order convergence for all of the three numerical examples.

From MFG Back to N -Car Differential Game
Summarizing the previous sections, we have derived a continuous mean field game [MFG] from a discrete differential game (Definition 3.1) and developed a solution algorithm for the mean field game. In this section we shall build the connection between the discrete differential game equilibrium (DGE) and the continuous mean field game equilibrium (MFE) in the sense of -Nash equilibrium. First we provide a way to construct a tuple of discrete controls from a MFE solution. Then we introduce the concept of -Nash equilibrium and show how to characterize the accuracy of the MFE-constructed controls. It is validated by numerical examples that the MFE-constructed controls are a good approximate equilibrium of the original N -car differential game (DG) when N is large.

MFE-Constructed Controls and Accuracy Characterization
From a continuous MFE solution, we construct a tuple of discrete controlsv 1 (t), . . . ,v N (t) for the DG by letting the N cars follow the MFE velocity field u * (x, t) and tracking their motions. The rationale underlying such construction is quite straightforward: for i = 1, · · · , N , the i th car's instantaneous velocity choice at time t is determined by MFE's velocity field u * (x i (t), t) at that time and at the i th car's location x i (t). Mathematically, for i = 1, · · · , N and t ∈ [0, T ]: v i (t) = u * (x i (t), t), (6.1) Integrating the above dynamical system gives the i th car's velocity controlv i (t) and trajectory x i (t) over the planning horizon t ∈ [0, T ], i = 1, . . . , N . Now we would like to know whether the MFE-constructed controlsv 1 (t), . . . ,v N (t) are a good approximate equilibrium of the DG. Since the DG's true equilibrium v * 1 (t), . . . , v * N (t) may not exist nor be unique and typically it is hard to get, we will characterize the accuracy of the constructed controls in terms of the objective function. Along this line, such an approximate equilibrium can be treated as an -Nash equilibrium of the DG, which is formally defined below (Cardaliaguet, 2010(Cardaliaguet, , 2015.
Definition 6.1. A tuple of controlsṽ 1 , . . . ,ṽ N is an -Nash equilibrium of the DG, if At an -Nash equilibrium, no car can improve its driving cost better than by unilaterally switching its velocity control.
For a potential game where the objective function f (u, ρ) is separable, e.g., [MFG-Separable], Cardaliaguet (2015) proved the correspondence between the MFE-constructed controls and an -Nash equilibrium of DG, that is: for any > 0, there exist N, σ > 0 such thatv 1 (t), . . . ,v N (t) generated from the MFE solution is an -Nash equilibrium of the DG.
Unfortunately, there does not exist any theoretical result for a general cost function, such as [MFG-NonSeparable]. In this paper, instead of offering a formal proof, we will validate such correspondence using numerical examples. A rigorous proof of such a claim for a general cost function will be left for future research.
Tailoring to our context, in the subsequent numerical examples, we aim to illustrate that the MFEconstructed controls are an -Nash equilibrium by characterizing the accuracy across all feasible controls and all the cars. There always exists an arbitrarily large which can makev 1 (t), . . . ,v N (t) satisfy the condition Eq. (6.3). What we are more interested in is a lower bound, denoted asˆ > 0, such that In other words,ˆ ≡ max Move the second maximum sign in front of the second term, which only depends on v i , then we havê wherev i is the best response which solves argmin vi J N i (v i ,v −i ). We attainv i from the following optimal control problem, while keeping other cars' strategiesv −i unchanged: Definition 6.2. The accuracy of a tuple of controlsv 1 (t), . . . ,v N (t) is theˆ in Eq. (6.5).

Accuracy Validation with Numerical Examples
Summarizing Section 3, Section 5 and Section 6.1, we shall reiterate the procedure of solving an approximate equilibrium of the DG from its corresponding MFG in a more systematic way. The procedure of solving MFE-constructed controls and validating the accuracy of those controls is listed in Algorithm 1. We will test some numerical examples following the procedure.
The general set-up of the following numerical examples is similar to that used previously. We fix the length of the planning horizon T = 1. Each time we solve a different DG by varying the number of cars N = 21, 41, 61, 81, 101. For different N s cars' initial positions are sampled from the same initial distribution defined in Eq. (5.8) with ρ a = 0.2, ρ b = 0.8 and γ = 0.15L. For each N we take the road length L = N and choose the smoothing parameter σ = 0.05L. The cost functions defined in [MFG-NonSeparable] and [MFG-Separable] are used, respectively. To see the first message, Fig. 7 compares the differences between costs computed from MFE-constructed controls and the corresponding best response strategies when N = 21. In Fig. 7, the x-axis is car's index i for i = 1, 2, . . . , N , the y-axis is the corresponding cost J N i for i = 1, 2, . . . , N . We see that for both cost functions the MFE solutions generate good approximate equilibria of DGs.
To see the second message, we compute the maximal relative accuracy and mean relative accuracy of MFE-constructed controls for different numbers of cars N , which is shown in Fig. 8. We see from Fig. 8 that for both cost functions we obtain better accuracy as N increases.