\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Mean-field and kinetic descriptions of neural differential equations

  • *Corresponding author: Giuseppe Visconti

    *Corresponding author: Giuseppe Visconti
Abstract Full Text(HTML) Figure(9) / Table(1) Related Papers Cited by
  • Nowadays, neural networks are widely used in many applications as artificial intelligence models for learning tasks. Since typically neural networks process a very large amount of data, it is convenient to formulate them within the mean-field and kinetic theory. In this work we focus on a particular class of neural networks, i.e. the residual neural networks, assuming that each layer is characterized by the same number of neurons $ N $, which is fixed by the dimension of the data. This assumption allows to interpret the residual neural network as a time-discretized ordinary differential equation, in analogy with neural differential equations. The mean-field description is then obtained in the limit of infinitely many input data. This leads to a Vlasov-type partial differential equation which describes the evolution of the distribution of the input data. We analyze steady states and sensitivity with respect to the parameters of the network, namely the weights and the bias. In the simple setting of a linear activation function and one-dimensional input data, the study of the moments provides insights on the choice of the parameters of the network. Furthermore, a modification of the microscopic dynamics, inspired by stochastic residual neural networks, leads to a Fokker-Planck formulation of the network, in which the concept of network training is replaced by the task of fitting distributions. The performed analysis is validated by artificial numerical simulations. In particular, results on classification and regression problems are presented.

    Mathematics Subject Classification: Primary: 35Q83, 35Q84; Secondary: 90C31, 92B20.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Left: Moments of our PDE model with $ \sigma(x) = x, w = -1, b = 0 $. Right: Moments of our PDE model with $ \sigma(x) = x, w = -1, b = -\frac{m_1(t)}{m_0(0)} $

    Figure 2.  Left: The energy and variance plotted against the desired values with $ \sigma(x) = x, w = -1, b = 0 $. Right: The energy and variance plotted against the desired values with $ \sigma(x) = x, w = -1, b = -\frac{m_1(t)}{m_0(0)} $

    Figure 3.  We consider $ 50 $ vehicles with measured length $ 2 $ and $ 8 $ obtained as uniformly distributed random realizations. Left: Histogram of the measured length of the vehicles. Right: Trajectories of the neuron activation energies of the $ 50 $ measurements

    Figure 4.  Solution of the mean field neural network model at different time steps. The initial value is a uniform distribution on $ [2, 8] $ and the weight and bias is chosen as $ w = 1, \ b = -5 $

    Figure 5.  Left: Regression problem with $ 5\cdot10^3 $ measurements at fixed positions around $ y = x $. Measurement errors are distributed according to a standard Gaussian. Center: Numerical slopes computed out of the previous measurements. Right: Numerical intercepts computed out of the previous measurements

    Figure 6.  Evolution at time $ t = 0 $ (left plot), $ t = 1 $ (center plot), $ t = 2 $ (right plot) of the mean field neural network model (30) for the regression problem with weights $ w_{xx} = 1 $, $ w_{xy} = w_{yx} = 0 $, $ w_{yy} = -1 $, and biases $ b_x = -1 $, $ b_y = 0 $

    Figure 7.  Evolution at time $ t = 0 $ (left plot), $ t = 1 $ (center plot), $ t = 5 $ (right plot) of the one dimensional mean field neural network model for the regression problem with weight $ w = 1 $ and bias $ b = -1 $

    Figure 8.  Results of the mean field neural network model with updated weights and biases in the case of a novel target

    Figure 9.  Solution of the Fokker-Planck neural network model at different times. Here, we have chosen the identity as activation function with weight $ w = -1 $, bias $ b = 0 $ and diffusion function $ K(x) = 1 $

    Table 1.  Example of a data set for a classification problem

    Measurement 3 3.5 5.5 7 4.5 8 $ \dots$
    Classifier car car truck truck car truck $ \dots$
     | Show Table
    DownLoad: CSV
  • [1] D. Araújo, R. I. Oliveira and D. Yukimura, A mean-field limit for certain deep neural networks, arXiv preprint, arXiv: 1906.00193, 2019.
    [2] L. ArlottiN. Bellomo and E. De Angelis, Generalized kinetic (boltzmann) models: Mathematical structures and applications, Math. Models Methods Appl. Sci., 12 (2002), 567-591.  doi: 10.1142/S0218202502001799.
    [3] N. Bellomo, A. Marsan and A. Tosin, Complex Systems and Society: Modeling and Simulation, Springer, 2013. doi: 10.1007/978-1-4614-7242-1.
    [4] K. BobzinW. WiethegerH. HeinemannS. DokhanchiM. Rom and G. Visconti, Prediction of particle properties in plasma spraying based on machine learning, Journal of Thermal Spray Technology, 30 (2021), 1751-1764.  doi: 10.1007/s11666-021-01239-2.
    [5] J. A. CarrilloM. FornasierG. Toscani and F. Vecil, Particle, kinetic, and hydrodynamic models of swarming, Mathematical Modeling of Collective Behavior in Socio-Economic and Life Sciences, (2010), 297-336.  doi: 10.1007/978-0-8176-4946-3_12.
    [6] T. Q. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, In Advances in Neural Information Processing Systems, (2018), 6571–6583.
    [7] Y. Chen and W. Li, Optimal transport natural gradient for statistical manifolds with continuous sample space, Inf. Geom., 3 (2020), 1-32.  doi: 10.1007/s41884-020-00028-0.
    [8] R. M. ColomboM. Mercier and M. D. Rosini, Stability and total variation estimates on general scalar balance laws, Commun. Math. Sci., 7 (2009), 37-65.  doi: 10.4310/CMS.2009.v7.n1.a2.
    [9] I. CraveroG. PuppoM. Semplice and G. Visconti, CWENO: Uniformly accurate reconstructions for balance laws, Math. Comp., 87 (2018), 1689-1719.  doi: 10.1090/mcom/3273.
    [10] P. Degond and S. Motsch, Large scale dynamics of the persistent turning walker model of fish behavior, J. Stat. Phys., 131 (2008), 989-1021.  doi: 10.1007/s10955-008-9529-8.
    [11] G. Dimarco and G. Toscani, Kinetic modeling of alcohol consumption, J. Stat. Phys., 177 (2019), 1022-1042.  doi: 10.1007/s10955-019-02406-0.
    [12] Y. Dukler, W. Li, A. Lin and G. Montúfar, Wasserstein of wasserstein loss for learning generative models, In International Conference on Machine Learning, (2019), 1716–1725.
    [13] H. I. Fawaz, G. Forestier, J. Weber, L. Idoumghar and P.-A. Muller, Data augmentation using synthetic data for time series classification with deep residual network, arXiv preprint, arXiv: 1808.02455, 2018.
    [14] C. GebhardtT. TrimbornF. WeberA. BezoldC. Broeckmann and M. Herty, Simplified ResNet approach for data driven prediction of microstructure-fatigue relationship, Mechanics of Materials, 151 (2020), 103625.  doi: 10.1016/j.mechmat.2020.103625.
    [15] J. Goldberger and E. Ben-Reuven, Training deep neural-networks using a noise adaptation layer, In ICLR, 2017.
    [16] F. Golse, On the dynamics of large particle systems in the mean field limit, In Macroscopic and Large Scale Phenomena: Coarse Graining, Mean Field Limits and Ergodicity, (2016), 1–144. doi: 10.1007/978-3-319-26883-5_1.
    [17] S.-Y. HaS. Jin and D. Kim, Convergence of a first-order consensus-based global optimization algorithm, Math. Models Methods Appl. Sci., 30 (2020), 2417-2444.  doi: 10.1142/S0218202520500463.
    [18] E. Haber, F. Lucka and L. Ruthotto, Never look back - A modified EnKF method and its application to the training of neural networks without back propagation, Preprint, arXiv: 1805.08034, 2018.
    [19] K. HeX. ZhangS. Ren and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770-778.  doi: 10.1109/CVPR.2016.90.
    [20] M. Herty, A. Thünen, T. Trimborn and G. Visconti, Continuous limits of residual neural networks in case of large input data, arXiv preprint, arXiv: 2112.14150, 2021.
    [21] M. Herty and G. Visconti, Kinetic methods for inverse problems, Kinet. Relat. Models, 12 (2019), 1109-1130.  doi: 10.3934/krm.2019042.
    [22] P.-E. Jabin, A review of the mean field limits for vlasov equations, Kinet. Relat. Models, 7 (2014), 661-711.  doi: 10.3934/krm.2014.7.661.
    [23] K. Janocha and W. M. Czarnecki, On loss functions for deep neural networks in classification, Schedae Informaticae, 25 (2016).  doi: 10.4467/20838476SI.16.004.6185.
    [24] G.-S. Jiang and C.-W. Shu, Efficient implementation of weighted ENO schemes, J. Comput. Phys., 126 (1996), 202-228.  doi: 10.1006/jcph.1996.0130.
    [25] M. I. Jordan and T. M. Mitchell, Machine learning: Trends, perspectives, and prospects, Science, 349 (2015), 255-260.  doi: 10.1126/science.aaa8415.
    [26] A. V. Joshi, Machine Learning and Artificial Intelligence, Springer, 2020. doi: 10.1007/978-3-030-26622-6.
    [27] P. Kidger and T. Lyons, Universal approximation with deep narrow networks, In Conference on Learning Theory, 2020.
    [28] N. B. Kovachki and A. M. Stuart, Ensemble Kalman inversion: A derivative-free technique for machine learning tasks, Inverse Probl., 35 (2019), 095005, 35 pp. doi: 10.1088/1361-6420/ab1c3a.
    [29] A. Kurganov and D. Levy, A third-order semidiscrete central scheme for conservation laws and convection-diffusion equations, SIAM J. Sci. Comput., 22 (2000), 1461-1488.  doi: 10.1137/S1064827599360236.
    [30] D. LevyG. Puppo and G. Russo, Compact central WENO schemes for multidimensional conservation laws, SIAM J. Sci. Comput., 22 (2000), 656-672.  doi: 10.1137/S1064827599359461.
    [31] A. T. Lin, S. W. Fung, W. Li, L. Nurbekyan and S. J. Osher, Apac-net: Alternating the population and agent control via two neural networks to solve high-dimensional stochastic mean field games, Proc. Natl. Acad. Sci., 118 (2021), Paper No. e2024713118, 10 pp. doi: 10.1073/pnas.2024713118.
    [32] A. T. Lin, W. Li, S. Osher and G. Montúfar, Wasserstein proximal of gans, In International Conference on Geometric Science of Information, (2021), 524–533. doi: 10.1007/978-3-030-80209-7_57.
    [33] H. Lin and S. Jegelka, Resnet with one-neuron hidden layers is a universal approximator, NIPS'18, Red Hook, NY, USA, Curran Associates Inc, (2018), 6172–6181.
    [34] Y. Lu and J. Lu, A universal approximation theorem of deep neural networks for expressing probability distributions, Advances in Neural Information Processing Systems, Curran Associates, Inc., 33 (2020), 3094–3105.
    [35] Y. LuA. ZhongQ. Li and B. Dong, Beyond finite layer neural networks: Bridging deep architectures and numerical differential equations, 35th International Conference on Machine Learning, ICML 2018, 2018 (2018), 5181-5190. 
    [36] S. MeiA. Montanari and P.-M. Nguyen, A mean field view of the landscape of two-layer neural networks, Proc. Natl. Acad. Sci., 115 (2018), 7665-7671.  doi: 10.1073/pnas.1806579115.
    [37] S. Mishra, A machine learning framework for data driven acceleration of computations of differential equations, Math. Eng., 1 (2019), 118-146.  doi: 10.3934/Mine.2018.1.118.
    [38] V. C. Müller and N. Bostrom, Future progress in artificial intelligence: A survey of expert opinion, In Fundamental Issues of Artificial Intelligence, Springer, [Cham], 376 (2016), 553–570.
    [39] H. Noh, T. You, J. Mun and B. Han, Regularizing deep neural networks by noise: Its interpretation and optimization, Advances in Neural Information Processing Systems 30, Curran Associates, Inc., (2017), 5109–5118.
    [40] S. C. Onar, A. Ustundag, Ç. Kadaifci and B. Oztaysi, The changing role of engineering education in industry 4.0 era, In Industry 4.0: Managing The Digital Transformation, Springer, (2018), 137–151.
    [41] F. Otto and C. Villani, Generalization of an inequality by talagrand and links with the logarithmic sobolev inequality, J. Funct. Anal., 173 (2000), 361-400.  doi: 10.1006/jfan.1999.3557.
    [42] L. Pareschi and G. Toscani, Self-similarity and power-like tails in nonconservative kinetic models, J. Stat. Phys., 124 (2006), 747-779.  doi: 10.1007/s10955-006-9025-y.
    [43] L. Pareschi and  G. ToscaniInteracting Multiagent Systems. Kinetic equations and Monte Carlo methods, Oxford University Press, 2013. 
    [44] D. Ray and J. S. Hesthaven, An artificial neural network as a troubled-cell indicator, J. Comput. Phys., 367 (2018), 166-191.  doi: 10.1016/j.jcp.2018.04.029.
    [45] D. Ray and J. S. Hesthaven, Detecting troubled-cells on two-dimensional unstructured grids using a neural network, J. Comput. Phys., 397 (2019), 108845, 31 pp. doi: 10.1016/j.jcp.2019.07.043.
    [46] L. Ruthotto and E. Haber, Deep neural networks motivated by partial differential equations, J. Math. Imaging Vis., 62 (2020), 352-364.  doi: 10.1007/s10851-019-00903-1.
    [47] L. RuthottoS. OsherW. LiL. Nurbekyan and S. W. Fung, A machine learning framework for solving high-dimensional mean field game and mean field control problems, Proc. Natl. Acad. Sci., 117 (2020), 9183-9193.  doi: 10.1073/pnas.1922204117.
    [48] R. Schmitt and G. Schuh., Advances in production research, Proceedings of the 8th Congress of the German Academic Association for Production Technology (WGP), Springer, 2018.
    [49] J. Sirignano and K. Spiliopoulos, Mean field analysis of neural networks: A central limit theorem, Stochastic Process. Appl., 130 (2020), 1820-1852.  doi: 10.1016/j.spa.2019.06.003.
    [50] H. TercanT. Al KhawliU. EppeltC. BüscherT. Meisen and S. Jeschke, Improving the laser cutting process design by machine learning techniques, Production Engineering, 11 (2017), 195-203.  doi: 10.1007/s11740-017-0718-7.
    [51] G. Toscani, Kinetic models of opinion formation, Commun. Math. Sci., 4 (2006), 481-496.  doi: 10.4310/CMS.2006.v4.n3.a1.
    [52] C. Totzeck, Trends in consensus-based optimization, arXiv preprint, arXiv: 2104.01383, 2021.
    [53] D. Tran, M. W. Dusenberry, M. V. D. Wilk, and D. Hafner. Bayesian layers: A module for neural network uncertainty, In NeurIPS, 2019.
    [54] T. TrimbornS. Gerster and G. Visconti, Spectral methods to study the robustness of residual neural networks with infinite layers, Foundations of Data Science, 2 (2020), 257-278.  doi: 10.3934/fods.2020012.
    [55] Q. WangJ. S. Hesthaven and D. Ray, Non-intrusive reduced order modelling of unsteady flows using artificial neural networks with application to a combustion problem, J. Comput. Phys., 384 (2019), 289-307.  doi: 10.1016/j.jcp.2019.01.031.
    [56] Y. Wang and W. Li, Information newton's flow: Second-order optimization method in probability space, arXiv preprint, arXiv: 2001.04341, 2020.
    [57] K. Watanabe and S. G. Tzafestas, Learning algorithms for neural networks with the Kalman filters, J. Intell. Robot. Syst., 3 (1990), 305-319.  doi: 10.1007/BF00439421.
    [58] P. J. Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting, volume 1. John Wiley & Sons, 1994.
    [59] M. Wooldridge, Artificial Intelligence requires more than deep learning - but what, exactly?, Artificial Intelligence, 289 (2020), 103386.  doi: 10.1016/j.artint.2020.103386.
    [60] Z. WuC. Shen and A. Van Den Hengel, Wider or deeper: Revisiting the resnet model for visual recognition, Pattern Recognition, 90 (2019), 119-133.  doi: 10.1016/j.patcog.2019.01.006.
    [61] A. Yegenoglu, S. Diaz, K. Krajsek and M. Herty, Ensemble Kalman filter optimizing deep neural networks, In Conference on Machine Learning, Optimization and Data Science, Springer LNCS Proceedings, 12514 (2020).
    [62] Z. You, J. Ye, K. Li, Z. Xu and P. Wang, Adversarial noise layer: Regularize neural network by adding noise, In 2019 IEEE International Conference on Image Processing, (2019), 909–913. doi: 10.1109/ICIP.2019.8803055.
    [63] A. Zeng, S. Song, K.-T. Yu, E. Donlon, F. R. Hogan, M. Bauza, D. Ma, O. Taylor, M. Liu and E. Romo, et al, Robotic pick-and-place of novel objects in clutter with multi-affordance grasping and cross-domain image matching, In 2018 IEEE International Conference on Robotics and Automation (ICRA), (2018), 1–8. doi: 10.1109/ICRA.2018.8461044.
    [64] D. ZhangL. Guo and G. E. Karniadakis, Learning in modal space: Solving time-dependent stochastic PDEs using physics-informed neural networks, SIAM J. Sci. Comput., 42 (2020), 639-665.  doi: 10.1137/19M1260141.
  • 加载中

Figures(9)

Tables(1)

SHARE

Article Metrics

HTML views(729) PDF downloads(345) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return