Advanced Search
Article Contents
Article Contents

Linear-quadratic zero-sum mean-field type games: Optimality conditions and policy optimization

A preliminary version of this work was submitted to the 59th Conference on Decision and Control
Abstract Full Text(HTML) Figure(4) / Table(1) Related Papers Cited by
  • In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic cost are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of indistinguishable agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The optimality conditions of the game are analysed for both open-loop and closed-loop controls, and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the model-based case, the gradients are computed exactly using the model, whereas they are estimated using Monte-Carlo simulations in the sample-based case. Numerical experiments are conducted to show the convergence of the utility function as well as the two players' controls.

    Mathematics Subject Classification: Primary: 91A05, 91A07, 93E20, 49N80.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Model-based policy optimization: Convergence of each part of the utility. (a) $ C_y $ as a function of $ (K_1,K_2) $. (b) $ C_z $ as a function of $ (L_1,L_2) $

    Figure 2.  Model-based policy optimization: Convergence of the control parameters in (a) and of the relative error on the utility in (b)

    Figure 3.  Sample-based policy optimization: Convergence of each part of the utility. (a) $ C_y $ as a function of $ (K_1,K_2) $. (b) $ C_z $ as a function of $ (L_1,L_2) $

    Figure 4.  Sample-based policy optimization: Convergence of the control parameters in (a) and of the relative error on the utility in (b)

    Table 1.  Simulation parameters

    Model parameters
    $ A $ $ \overline{A} $ $ B_1=\overline{B}_1 $ $ B_2=\overline{B}_2 $ $ Q $ $ \overline{Q} $ $ R_1=\overline{R}_1 $ $ R_2=\overline{R}_2 $ $ \gamma $
    0.4 0.4 0.4 0.3 0.4 0.4 0.4 0.4 0.9
    Initial distribution and noise processes
    $ \epsilon_0^0 $ $ \epsilon^1_0 $ $ \epsilon^0_t $ $ \epsilon^1_t $
    $ \mathcal{U}([-1, 1]) $ $ \mathcal{U}([-1, 1]) $ $ \mathcal{N}(0, 0.01) $ $ \mathcal{N}(0, 0.01) $
    AG and DGA methods parameters
    $ \mathcal{N}^{max}_1 $ $ \mathcal{N}^{max}_2 $ $ T $ $ \eta_1 $ $ \eta_2 $ $ K_1^0 $ $ L_1^0 $ $ K_2^0 $ $ L_2^0 $
    10 200 2000 0.1 0.1 0.0 0.0 0.0 0.0
    Gradient estimation algorithm parameters
    $ \mathcal{T} $ $ M $ $ \tau $
    50 10000 0.1
     | Show Table
    DownLoad: CSV
  • [1] Y. AchdouF. Camilli and I. Capuzzo-Dolcetta, Mean field games: Numerical methods for the planning problem, SIAM J. Control Optim., 50 (2012), 77-109.  doi: 10.1137/100790069.
    [2] Y. Achdou and I. Capuzzo-Dolcetta, Mean field games: Numerical methods, SIAM J. Numer. Anal., 48 (2010), 1136-1162.  doi: 10.1137/090758477.
    [3] Y. Achdou and J.-M. Lasry, Mean field games for modeling crowd motion, in Contributions to Partial Differential Equations and Applications, Comput. Methods Appl. Sci., 47, Springer, Cham, 2019, 17-42. doi: 10.1007/978-3-319-78325-3_4.
    [4] Y. Achdou and M. Laurière, Mean field games and applications: Numerical aspects, in Mean Field Games, Lecture Notes in Math., 2281, Fond. CIME/CIME Found. Subser., Springer, Cham, 2020,249-307. doi: 10.1007/978-3-030-59837-2_4.
    [5] Y. Achdou and M. Laurière, Mean field type control with congestion (Ⅱ): An augmented Lagrangian method, Appl. Math. Optim., 74 (2016), 535-578.  doi: 10.1007/s00245-016-9391-z.
    [6] Y. Achdou and M. Laurière, On the system of partial differential equations arising in mean field type control, Discrete Contin. Dyn. Syst., 35 (2015), 3879-3900.  doi: 10.3934/dcds.2015.35.3879.
    [7] A. Al-TamimiF. L. Lewis and M. Abu-Khalaf, Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control, Automatica J. IFAC, 43 (2007), 473-481.  doi: 10.1016/j.automatica.2006.09.019.
    [8] C. AlasseurI. Ben Tahar and A. Matoussi, An extended mean field game for storage in smart grids, J. Optim. Theory Appl., 184 (2020), 644-670.  doi: 10.1007/s10957-019-01619-3.
    [9] B. Anahtarci, C. D. Karıksı z and N. Saldi, Value iteration algorithm for mean-field games, Systems Control Lett., 143 (2020), 10pp. doi: 10.1016/j.sysconle.2020.104744.
    [10] J. Barreiro-Gomez, T. E. Duncan and H. Tembine, Discrete-time linear-quadratic mean-field-type repeated games: Perfect, incomplete, and imperfect information, Automatica J. IFAC, 112 (2020), 16pp. doi: 10.1016/j.automatica.2019.108647.
    [11] T. Başar and P. Bernhard, $H^{\infty}$ Optimal Control and Related Minimax Design Problems: A Dynamic Game Approach, Birkhäuser, Boston, MA, 2008. doi: 10.1007/978-0-8176-4757-5.
    [12] D. Bauso, Game Theory with Engineering Applications, Advances in Design and Control, 30, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2016. doi: 10.1137/1.9781611974287.
    [13] D. BausoH. Tembine and T. Başar, Robust mean field games with application to production of an exhaustible resource, IFAC Proceedings Volumes, 45 (2012), 454-459.  doi: 10.3182/20120620-3-DK-2025.00135.
    [14] A. Bensoussan, G. Da Prato, M. C. Delfour and S. K. Mitter, Representation and Control of Infinite Dimensional Systems, Systems & Control: Foundations & Applications, Birkhäuser Boston, Inc., Boston, MA, 2007. doi: 10.1007/978-0-8176-4581-6.
    [15] A. Bensoussan, J. Frehse and P. Yam, Mean Field Games and Mean Field Type Control Theory, SpringerBriefs in Mathematics, Springer, New York, 2013. doi: 10.1007/978-1-4614-8508-7.
    [16] A. BensoussanT. Huang and M. Laurière, Mean field control and mean field game models with several populations, Minimax Theory Appl., 3 (2018), 173-209. 
    [17] L. Briceño-Arias, D. Kalise, Z. Kobeissi, M. Laurière, Á. Mateos González and F. J. Silva, On the implementation of a primal-dual algorithm for second order time-dependent mean field games with local couplings, in CEMRACS 2017-Numerical Methods for Stochastic Models: Control, Uncertainty Quantification, Mean-Field, ESAIM Proc. Surveys, 65, EDP Sci., Les Ulis, 2019,330-348. doi: 10.1051/proc/201965330.
    [18] L. M. Briceño-AriasD. Kalise and F. J. Silva, Proximal methods for stationary mean field games with local couplings, SIAM J. Control Optim., 56 (2018), 801-836.  doi: 10.1137/16M1095615.
    [19] H. Cao, X. Guo and M. Laurière, Connecting GANs, MFGs, and OT, preprint, arXiv: 2002.04112.
    [20] P. Cardaliaguet, Notes on Mean Field Games, 2013. Available from: https://www.ceremade.dauphine.fr/cardaliaguet/MFG20130420.pdf.
    [21] P. Cardaliaguet and C.-A. Lehalle, Mean field game of controls and an application to trade crowding, Math. Financ. Econ., 12 (2018), 335-363.  doi: 10.1007/s11579-017-0206-z.
    [22] E. Carlini and F. J. Silva., A fully discrete semi-Lagrangian scheme for a first order mean field game problem, SIAM J. Numer. Anal., 52 (2014), 45-67.  doi: 10.1137/120902987.
    [23] R. Carmona and F. Delarue, Probabilistic Theory of Mean Field Games with Applications. I. Mean Field FBSDEs, Control, and Games, Probability Theory and Stochastic Modelling, 83, Springer, Cham, 2018. doi: 10.1007/978-3-319-58920-6.
    [24] R. CarmonaJ.-P. Fouque and L.-H. Sun, Mean field games and systemic risk, Commun. Math. Sci., 13 (2015), 911-933.  doi: 10.4310/CMS.2015.v13.n4.a4.
    [25] R. Carmona, K. Hamidouche, M. Laurière and Z. Tan, Policy optimization for linear-quadratic zero-sum mean-field type games, Proceedings of the IEEE Conference on Decision and Control, Jeju, Korea, 2020. doi: 10.1109/CDC42340.2020.9303734.
    [26] R. Carmona and M. Laurière, Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games Ⅰ: The ergodic case, SIAM J. Numer. Anal., 59 (2021), 1455-1485.  doi: 10.1137/19M1274377.
    [27] R. Carmona and M. Laurière, Convergence analysis of machine learning algorithms for the numerical solution of mean field control and games Ⅱ: The finite horizon case, preprint, arXiv: 1908.01613.
    [28] R. Carmona, M. Laurière and Z. Tan, Linear-quadratic mean-field reinforcement learning: Convergence of policy gradient methods, preprint, arXiv: 1910.04295.
    [29] R. Carmona, M. Laurière and Z. Tan, Model-free mean-field reinforcement learning: Mean-field MDP and mean-field Q-learning, preprint, arXiv: 1910.12802.
    [30] A. CherukuriB. Gharesifard and J. Cortés, Saddle-point dynamics: Conditions for asymptotic stability of saddle points, SIAM J. Control Optim., 55 (2017), 486-511.  doi: 10.1137/15M1026924.
    [31] A. Cosso and H. Pham, Zero-sum stochastic differential games of generalized McKean-Vlasov type, J. Math. Pures Appl. (9), 129 (2019), 180-212.  doi: 10.1016/j.matpur.2018.12.005.
    [32] C. Daskalakis and I. Panageas, The limit points of (optimistic) gradient descent in min-max optimization, NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 2018, 9256-9266. Available from: https://dl.acm.org/doi/pdf/10.5555/3327546.3327597.
    [33] B. Djehiche and S. Hamadène, Optimal control and zero-sum stochastic differential game problems of mean-field type, Appl. Math. Optim., 81 (2020), 933-960.  doi: 10.1007/s00245-018-9525-6.
    [34] B. DjehicheA. Tcheukam and H. Tembine, Mean-field-type games in engineering, AIMS Electronics and Electrical Engineering, 1 (2017), 18-73.  doi: 10.3934/ElectrEng.2017.1.18.
    [35] C. Domingo-Enrich, S. Jelassi, A. Mensch, G. M. Rotskoff and J. Bruna, A mean-field analysis of two-player zero-sum games, preprint, arXiv: 2002.06277.
    [36] R. Elie, T. Ichiba and M. Laurière, Large banking systems with default and recovery: A mean field game model, preprint, arXiv: 2001.10206.
    [37] R. ElieJ. PérolatM. LaurièreM. Geist and O. Pietquin, On the convergence of model free learning in mean field games, Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 7143-7150.  doi: 10.1609/aaai.v34i05.6203.
    [38] M. Fazel, R. Ge, S. M. Kakade and M. Mesbahi, Global convergence of policy gradient methods for the linear quadratic regulator, preprint, arXiv: 1801.05039.
    [39] Z. Fu, Z. Yang, Y. Chen and Z. Wang, Actor-critic provably finds Nash equilibria of linear-quadratic mean-field games, preprint, arXiv: 1910.07498.
    [40] H. Gu, X. Guo, X. Wei and R. Xu, Mean-field controls with Q-learning for cooperative MARL: Convergence and complexity analysis, preprint, arXiv: 2002.04131.
    [41] X. Guo, A. Hu, R. Xu and J. Zhang, Learning mean-field games, Proceedings of the 33rd International Conference on Neural Information Processing Systems, 2019, 4967-4977.
    [42] M. HuangR. P. Malhamé and P. E. Caines, Large population stochastic dynamic games: Closed-loop McKean-Vlasov systems and the Nash certainty equivalence principle, Commun. Inf. Syst., 6 (2006), 221-251.  doi: 10.4310/CIS.2006.v6.n3.a5.
    [43] C. Jin, P. Netrapalli and M. I. Jordan, What is local optimality in nonconvex-nonconcave minimax optimization?, preprint, arXiv: 1902.00618.
    [44] H. KimJ. ParkM. BennisS.-L. Kim and M. Debbah, Mean-field game theoretic edge caching in ultra-dense networks, IEEE Transactions on Vehicular Technology, 69 (2019), 935-947.  doi: 10.1109/TVT.2019.2953132.
    [45] V. Kučera, The discrete Riccati equation of optimal control, Kybernetika (Prague), 8 (1972), 430-447. 
    [46] J.-M. Lasry and P.-L. Lions, Mean field games, Jpn. J. Math., 2 (2007), 229-260.  doi: 10.1007/s11537-007-0657-8.
    [47] Z. Liu, B. Wu and H. Lin, A mean field game approach to swarming robots control, 2018 Annual American Control Conference (ACC), Milwaukee, WI, 2018. doi: 10.23919/ACC.2018.8431807.
    [48] T.-T. Lu and S.-H. Shiou, Inverses of 2 × 2 block matrices, Comput. Math. Appl., 43 (2002), 119-129.  doi: 10.1016/S0898-1221(01)00278-4.
    [49] E. Mazumdar, M. I. Jordan and S. S. Sastry, On finding local Nash equilibria (and only local Nash equilibria) in zero-sum continuous games, preprint, arXiv: 1901.00838.
    [50] F. Mériaux, V. Varma and S. Lasaulce, Mean field energy games in wireless networks, 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers (ASILOMAR), Pacific Grove, CA, 2012. doi: 10.1109/ACSSC.2012.6489095.
    [51] M. NouiehedM. SanjabiT. HuangJ. D. Lee and M. Razaviyayn, Solving a class of non-convex min-max games using iterative first order methods, Advances in Neural Information Processing Systems, 32 (2019), 14934-14942. 
    [52] A. C. M. Ran and R. Vreugdenhil, Existence and comparison theorems for algebraic Riccati equations for continuous- and discrete-time systems, Linear Algebra Appl., 99 (1988), 63-83.  doi: 10.1016/0024-3795(88)90125-5.
    [53] D. ShiH. GaoL. WangM. PanZ. Han and H. V. Poor, Mean field game guided deep reinforcement learning for task placement in cooperative multi-access edge computing, IEEE Internet of Things Journal, 7 (2020), 9330-9340.  doi: 10.1109/JIOT.2020.2983741.
    [54] J. SunJ. Yong and S. Zhang, Linear quadratic stochastic two-person zero-sum differential games in an infinite horizon, ESAIM: Control Optim. Calc. Var., 22 (2016), 743-769.  doi: 10.1051/cocv/2015024.
    [55] J. von Neumann and  O. MorgensternTheory of Games and Economic Behavior, Princeton University Press, Princeton, NJ, 2007. 
    [56] R. Xu, Zero-sum stochastic differential games of mean-field type and bsdes, Proceedings of the 31st Chinese Control Conference, (2012), 1651-1654.
    [57] K. Zhang, Z. Yang and T. Basar, Policy optimization provably converges to Nash equilibria in zero-sum linear quadratic games, Advances in Neural Information Processing Systems, (2019) 11598-11610.
  • 加载中




Article Metrics

HTML views(1858) PDF downloads(186) Cited by(0)

Access History



    DownLoad:  Full-Size Img  PowerPoint