Stochastic differential games have been used extensively to model agents' competitions in finance, for instance, in P2P lending platforms from the Fintech industry, the banking system for systemic risk, and insurance markets. The recently proposed machine learning algorithm, deep fictitious play, provides a novel and efficient tool for finding Markovian Nash equilibrium of large $ N $-player asymmetric stochastic differential games [J. Han and R. Hu, Mathematical and Scientific Machine Learning Conference, pages 221-245, PMLR, 2020]. By incorporating the idea of fictitious play, the algorithm decouples the game into $ N $ sub-optimization problems, and identifies each player's optimal strategy with the deep backward stochastic differential equation (BSDE) method parallelly and repeatedly. In this paper, we prove the convergence of deep fictitious play (DFP) to the true Nash equilibrium. We can also show that the strategy based on DFP forms an $ \epsilon $-Nash equilibrium. We generalize the algorithm by proposing a new approach to decouple the games, and present numerical results of large population games showing the empirical convergence of the algorithm beyond the technical assumptions in the theorems.
Citation: |
Figure 1. A sample path for all $ N = 10 $ players in the inter-bank game, obtained from decoupling the problem by policy update and solving the sub-problems with the Deep BSDE method. Top: the optimal state process $ X_t^i $ (solid lines) and its neural networks approximation $ \hat{X}_t^i $ (circles), under the same realized path of Brownian motion. Bottom: comparisons of the strategies $ \alpha_t^i $ and $ \hat{\alpha}_t^i $ (dashed lines)
[1] | A. Angiuli, J. -P. Fouque and M. Laurière, Unified reinforcement Q-learning for mean field game and control problems, arXiv: 2006.13912, 2020. |
[2] | M. Arjovsky, S. Chintala and L. Bottou, Wasserstein generative adversarial networks, In Proceedings of the 34th International Conference on Machine Learning, volume 70 of PLMR, 2017, 214–223. |
[3] | R. Arora, A. Basu, P. Mianjy and A. Mukherjee, Understanding deep neural networks with rectified linear units, arXiv preprint, arXiv: 1611.01491, 2016. |
[4] | E. Bayraktar, A. Budhiraja and A. Cohen, A numerical scheme for a mean field game in some queueing systems based on Markov chain approximation method, SIAM J. Control Optim., 56 (2018), 4017-4044. doi: 10.1137/17M1154357. |
[5] | C. Beck, S. Becker, P. Cheridito, A. Jentzen and A. Neufeld, Deep splitting method for parabolic PDEs, SIAM J. Sci. Comput., 43 (2021), A3135–A3154. doi: 10.1137/19M1297919. |
[6] | C. Beck, W. E and A. Jentzen, Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations, J. Nonlinear Sci., 29 (2019), 1563-1619. doi: 10.1007/s00332-018-9525-3. |
[7] | A. Bensoussan, C. C. Siu, S. C. P. Yam and H. Yang, A class of non-zero-sum stochastic differential investment and reinsurance games, Automatica J. IFAC, 50 (2014), 2025-2037. doi: 10.1016/j.automatica.2014.05.033. |
[8] | U. Berger, Fictitious play in 2 × n games, J. Econom. Theory, 120 (2005), 139-154. doi: 10.1016/j.jet.2004.02.003. |
[9] | H. Brezis, Functional Analysis, Sobolev Spaces and Partial Differential Equations, Universitext. Springer, New York, 2011. |
[10] | A. Briani and P. Cardaliaguet, Stable solutions in potential mean field game systems, NoDEA Nonlinear Differential Equations Appl., 25 (2018), Paper No. 1, 26 pp. doi: 10.1007/s00030-017-0493-3. |
[11] | G. W. Brown, Some Notes on Computation of Games Solutions, Technical report, Rand Corp Santa Monica CA, 1949. |
[12] | G. W. Brown, Iterative solution of games by fictitious play, Activity Analysis of Production and Allocation, 13 (1951), 374-376. |
[13] | P. Cardaliaguet and S. Hadikhanloo, Learning in mean field games: The fictitious play, ESAIM Control Optim. Calc. Var., 23 (2017), 569-591. doi: 10.1051/cocv/2016004. |
[14] | P. Cardaliaguet and C.-A. Lehalle, Mean field game of controls and an application to trade crowding, Math. Financ. Econ., 12 (2018), 335-363. doi: 10.1007/s11579-017-0206-z. |
[15] | R. Carmona and F. Delarue, Probabilistic Theory of Mean Field Games with Applications I-II., Springer, 2018. |
[16] | R. Carmona, J.-P. Fouque and L.-H. Sun, Mean field games and systemic risk, Commun. Math. Sci., 13 (2015), 911-933. doi: 10.4310/CMS.2015.v13.n4.a4. |
[17] | P. Casgrain, B. Ning and S. Jaimungal, Deep Q-learning for Nash equilibria: Nash-DQN, arXiv: 1904.10554, 2019. |
[18] | S. Chen, H. Yang and Y. Zeng, Stochastic differential games between two insurers with generalized mean-variance premium principle, Astin Bull., 48 (2018), 413-434. doi: 10.1017/asb.2017.35. |
[19] | E. J. Dockner, S. Jørgensen, N. V. Long and G. Sorger, Differential Games in Economics and Management Science, Cambridge University Press, 2000. doi: 10.1017/CBO9780511805127. |
[20] | W. E, J. Han and A. Jentzen, Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations, Commun. Math. Stat., 5 (2017), 349-380. doi: 10.1007/s40304-017-0117-6. |
[21] | N. El Karoui, S. Peng and M. C. Quenez, Backward stochastic differential equations in finance, Math. Finance, 7 (1997), 1-71. doi: 10.1111/1467-9965.00022. |
[22] | R. Elie, J. Pérolat, M. Laurière, M. Geist and O. Pietquin, On the convergence of model free learning in mean field games, AAAI-20 Technical Tracks 5, Vol. 34, 2020. arXiv: 1907.02633. doi: 10.1609/aaai. v34i05.6203. |
[23] | M. Fazlyab, A. Robey, H. Hassani, M. Morari and G. Pappas, Efficient and accurate estimation of Lipschitz constants for deep neural networks, In Advances in Neural Information Processing Systems, (2019), 11427–11438. |
[24] | M. Germain, H. Pham and X. Warin, Deep backward multistep schemes for nonlinear PDEs and approximation error analysis, arXiv preprint, arXiv: 2006.01496, 2020. |
[25] | D. A. Gomes, S. Patrizi and V. Voskanyan, On the existence of classical solutions for stationary extended mean field games, Nonlinear Anal., 99 (2014), 49-79. doi: 10.1016/j.na.2013.12.016. |
[26] | D. A. Gomes and V. K. Voskanyan, Extended deterministic mean-field games, SIAM J. Control Optim., 54 (2016), 1030-1055. doi: 10.1137/130944503. |
[27] | A. Gosavi, A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis, Machine Learning, 55 (2004), 5-29. |
[28] | I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin and A. C. Courville, Improved training of wasserstein gans, In Advances in Neural Information Processing Systems, (2017), 5767–5777. |
[29] | X. Guo, A. Hu, R. Xu and J. Zhang, Learning mean-field games, Advances in Neural Information Processing Systems, 32 (2019), 4966-4976. |
[30] | J. Han and W. E, Deep learning approximation for stochastic control problems, arXiv: 1611.07422, 2016. |
[31] | J. Han and R. Hu, Deep fictitious play for finding Markovian Nash equilibrium in multi-agent games, Proceedings of The First Mathematical and Scientific Machine Learning Conference (MSML), 107 (2020), 221-245. |
[32] | J. Han, A. Jentzen and W. E, Solving high-dimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. USA, 115 (2018), 8505-8510. doi: 10.1073/pnas.1718942115. |
[33] | J. Han and J. Long, Convergence of the deep BSDE method for coupled FBSDEs, Probab. Uncertain. Quant. Risk, 5 (2020), Paper No. 5, 33 pp. doi: 10.1186/s41546-020-00047-w. |
[34] | J. Han, J. Lu and M. Zhou, Solving high-dimensional eigenvalue problems using deep neural networks: A diffusion Monte Carlo like approach, J. Comput. Phys., 423 (2020), 109792, 13 pp. doi: 10.1016/j. jcp. 2020.109792. |
[35] | J. Han, L. Zhang and W. E, Solving many-electron Schrödinger equation using deep neural networks, J. Comput. Phys., 399 (2019), 108929, 8 pp. doi: 10.1016/j. jcp. 2019.108929. |
[36] | J. Hofbauer and W. H. Sandholm, On the global convergence of stochastic fictitious play, Econometrica, 70 (2002), 2265-2294. |
[37] | U. Horst, Stability of linear stochastic difference equations in strategically controlled random environments, Adv. in Appl. Probab., 35 (2003), 961-981. doi: 10.1239/aap/1067436330. |
[38] | U. Horst, Stationary equilibria in discounted stochastic games with weakly interacting players, Games Econom. Behav., 51 (2005), 83-108. doi: 10.1016/j.geb.2004.03.003. |
[39] | R. A. Howard, Dynamic Programming and Markov Processes, John Wiley, 1960. |
[40] | R. Hu, Deep learning for ranking response surfaces with applications to optimal stopping problems, Quant. Finance, 20 (2020), 1567-1581. doi: 10.1080/14697688.2020.1741669. |
[41] | R. Hu, Deep fictitious play for stochastic differential games, Commun. Math. Sci., 19 (2021), 325-353. doi: 10.4310/CMS.2021.v19.n2.a2. |
[42] | C. Huré, H. Pham and X. Warin, Deep backward schemes for high-dimensional nonlinear PDEs, Math. Comp., 89 (2020), 1547-1579. doi: 10.1090/mcom/3514. |
[43] | S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, In International Conference on Machine Learning, (2015), 448–456. |
[44] | R. Isaacs, Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization, John Wiley & Sons, Inc., New York-London-Sydney 1965 |
[45] | S. Ji, S. Peng, Y. Peng and X. Zhang, Three algorithms for solving high-dimensional fully-coupled FBSDEs through deep learning, IEEE Intelligent Systems, 35 (2020), 71-84. doi: 10.1109/MIS.2020.2971597. |
[46] | D. Kingma and J. Ba, Adam: A method for stochastic optimization, In Proceedings of the International Conference on Learning Representations, 2015. |
[47] | P. E. Kloeden and E. Platen, Numerical Solution of Stochastic Differential Equations, volume 23., Springer-Verlag, Berlin, 1992. doi: 10.1007/978-3-662-12616-5. |
[48] | V. Krishna and T. Sjöström, On the convergence of fictitious play, Math. Oper. Res., 23 (1998), 479-511. doi: 10.1287/moor.23.2.479. |
[49] | H. Liu, H. Qiao, S. Wang and Y. Li, Platform competition in peer-to-peer lending considering risk control ability, European J. Oper. Res., 274 (2019), 280-290. doi: 10.1016/j.ejor.2018.09.024. |
[50] | N. V. Long, Dynamic games in the economics of natural resources: A survey, Dyn. Games Appl., 1 (2011), 115-148. doi: 10.1007/s13235-010-0003-2. |
[51] | J. Ma, P. Protter and J. Yong, Solving forward-backward stochastic differential equations explicitly-a four step scheme, Probab. Theory Related Fields, 98 (1994), 339-359. doi: 10.1007/BF01192258. |
[52] | J. Ma and J. Zhang, Representation theorems for backward stochastic differential equations, Ann. Appl. Probab., 12 (2002), 1390-1418. doi: 10.1214/aoap/1037125868. |
[53] | E. J. McShane, Extension of range of functions, Bull. Amer. Math. Soc., 40 (1934), 837-842. doi: 10.1090/S0002-9904-1934-05978-0. |
[54] | P. Milgrom and J. Roberts, Adaptive and sophisticated learning in normal form games, Games Econom. Behav., 3 (1991), 82-100. doi: 10.1016/0899-8256(91)90006-Z. |
[55] | D. Monderer and L. S. Shapley, Fictitious play property for games with identical interests, J. Econom. Theory, 68 (1996), 258-265. doi: 10.1006/jeth.1996.0014. |
[56] | T. Nakamura-Zimmerer, Q. Gong and W. Kang, Adaptive deep learning for high dimensional Hamilton-Jacobi-Bellman equations, SIAM J. Sci. Comput., 43 (2021), A1221–A1247. doi: 10.1137/19M1288802. |
[57] | É. Pardoux and S. Peng, Backward stochastic differential equations and quasilinear parabolic partial differential equations, in Stochastic Partial Differential Equations and their Applications, 200–217. Springer, 1992. doi: 10.1007/BFb0007334. |
[58] | E. Pardoux and S. Tang, Forward-backward stochastic differential equations and quasilinear parabolic PDEs, Probab. Theory Related Fields, 114 (1999), 123-150. doi: 10.1007/s004409970001. |
[59] | P. Pauli, A. Koch, J. Berberich, P. Kohler and F. Allgöwer, Training robust neural networks using {L}ipschitz bounds, 2021 American Control Conference (ACC), (2021), 2595–2600. doi: 10.23919/ACC50511.2021.9482773. |
[60] | D. Pfau, J. S. Spencer, A. G. D. G. Matthews and W. M. C. Foulkes, Ab-initio solution of the many-electron Schrödinger equation with deep neural networks, Phys. Rev. Research, 2 (2020), 033429. doi: 10.1103/PhysRevResearch.2.033429. |
[61] | H. Pham, X. Warin and M. Germain, Neural networks-based backward scheme for fully nonlinear PDEs, Partial Differ. Equ. Appl., 2 (2021), Paper No. 16, 24 pp. doi: 10.1007/s42985-020-00062-8. |
[62] | W. B. Powell and J. Ma, A review of stochastic algorithms with continuous value function approximation and some new approximate policy iteration algorithms for multidimensional continuous applications, J. Control Theory Appl., 9 (2011), 336-352. doi: 10.1007/s11768-011-0313-y. |
[63] | A. Prasad and S. P. Sethi, Competitive advertising under uncertainty: A stochastic differential game approach, J. Optim. Theory Appl., 123 (2004), 163-185. doi: 10.1023/B:JOTA.0000043996.62867.20. |
[64] | M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, John Wiley & Sons, 1994. |
[65] | C. Simone, C. Fabio and G. Alessandro, A policy iteration method for mean field games, ESAIM: Control, Optimisation and Calculus of Variations, 27 (2021). |
[66] | J. Sirignano and K. Spiliopoulos, DGM: A deep learning algorithm for solving partial differential equations, J. Comput. Phys., 375 (2018), 1339-1364. doi: 10.1016/j.jcp.2018.08.029. |
[67] | Z. Wei and M. Lin, Market mechanisms in online peer-to-peer lending, Management Science, 63 (2017), 4236-4257. doi: 10.1287/mnsc.2016.2531. |
[68] | Y. Xuan, R. Balkin, J. Han, R. Hu and H. D. Ceniceros, Optimal policies for a pandemic: A stochastic game approach and a deep learning algorithm, Proceedings of The Second Mathematical and Scientific Machine Learning Conference (MSML), 145 (2022), 987-1012. |
[69] | B. Yu, X. Xing and A. Sudjianto, Deep-learning based numerical BSDE method for barrier options, Available at SSRN. arXiv: 1904.05921, 2019. doi: 10.2139/ssrn. 3366314. |
[70] | X. Zeng, A stochastic differential reinsurance game, J. Appl. Probab., 47 (2010), 335-349. doi: 10.1239/jap/1276784895. |
[71] | J. Zhang, Backward Stochastic Differential Equations: From Linear to Fully Nonlinear Theory, Springer, 2017. doi: 10.1007/978-1-4939-7256-2. |
A sample path for all