In recent years residual neural networks (ResNets) as introduced by He et al. [17] have become very popular in a large number of applications, including in image classification and segmentation. They provide a new perspective in training very deep neural networks without suffering the vanishing gradient problem. In this article we show that ResNets are able to approximate solutions of Kolmogorov partial differential equations (PDEs) with constant diffusion and possibly nonlinear drift coefficients without suffering the curse of dimensionality, which is to say the number of parameters of the approximating ResNets grows at most polynomially in the reciprocal of the approximation accuracy $ \varepsilon > 0 $ and the dimension of the considered PDE $ d\in \mathbb{N} $. We adapt a proof in Jentzen et al. [20] - who showed a similar result for feedforward neural networks (FNNs) - to ResNets. In contrast to FNNs, the Euler-Maruyama approximation structure of ResNets simplifies the construction of the approximating ResNets substantially. Moreover, contrary to [20], in our proof using ResNets does not require the existence of an FNN representing the identity map, which enlarges the set of applicable activation functions.
Citation: |
[1] | M. Asaduzzaman, M. Shahjahan and K. Murase, Faster training using fusion of activation functions for feed forward neural networks, Int J Neural Syst., 19 (2009), 437-448. doi: 10.1142/S0129065709002130. |
[2] | B. Avelin and K. Nyström, Neural ODEs as the deep limit of ResNets with constant weights, Anal. Appl., 19 (2021), 397-437. doi: 10.1142/S0219530520400023. |
[3] | C. Beck, S. Becker, P. Grohs, N. Jaafari and A. Jentzen, Solving the Kolmogorov PDE by means of deep learning, J. Sci. Comput., 88 (2021), Paper No. 73, 28pp. doi: 10.1007/s10915-021-01590-0. |
[4] | C. Beck, W. E and A. Jentzen, Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations, J. Nonlinear Sci., 29 (2019), 1563-1619. doi: 10.1007/s00332-018-9525-3. |
[5] | W. E, C. Ma and Q. Wang, A priori estimates of the population risk for residual networks, arXiv: 1903.02154, 2019, 19 pages. |
[6] | W. E and B. Yu, The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems, Commun. Math. Stat., 6 (2018), 1-12. doi: 10.1007/s40304-018-0127-z. |
[7] | D. Elbrächter, P. Grohs, A. Jentzen and C. Schwab, DNN expression rate analysis of high-dimensional PDEs: application to option pricing, Constr. Approx., 55 (2022), 3-71. doi: 10.1007/s00365-021-09541-6. |
[8] | R. Gribonval, G. Kutyniok, M. Nielsen and F. Voigtlaender, Approximation spaces of deep neural networks, Constr. Approx., 55 (2022), 259-367. doi: 10.1007/s00365-021-09543-4. |
[9] | P. Grohs and L. Herrmann, Deep neural network approximation for high-dimensional elliptic PDEs with boundary conditions, IMA J. Numer. Anal., 42 (2022), 2055-2082. doi: 10.1093/imanum/drab031. |
[10] | P. Grohs, F. Hornung, A. Jentzen and P. von Wurstemberger, A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations, Accepted in Mem. Amer. Math. Soc., arXiv: 1809.02362 (2018), 124 pages. |
[11] | P. Grohs, F. Hornung, A. Jentzen and P. Zimmermann, Space-time error estimates for deep neural network approximations for differential equations, Revision requested from Adv. Comput. Math., arXiv: 1908.03833, (2019), 86 pages. |
[12] | P. Grohs, A. Jentzen and D. Salimova, Deep neural network approximations for solutions of PDEs based on Monte Carlo algorithms, Partial Differ. Equ. Appl., 3 (2022), Paper No. 45, 41pp. doi: 10.1007/s42985-021-00100-z. |
[13] | I. Gruber, M. Hlaváč, M. Železný and A. Karpov, Facing face recognition with ResNet: Round one, Interactive Collaborative Robotics, 10459 (2017), 67-74. doi: 10.1007/978-3-319-66471-2_8. |
[14] | M. Hairer, M. Hutzenthaler and A. Jentzen, Loss of regularity for Kolmogorov equations, Ann. Probab., 43 (2015), 468-527. doi: 10.1214/13-AOP838. |
[15] | J. Han, A. Jentzen and W. E, Solving high-dimensional partial differential equations using deep learning, Proc. Natl. Acad. Sci. USA, 115 (2018), 8505-8510. doi: 10.1073/pnas.1718942115. |
[16] | K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, In Proceedings of the IEEE International Conference on Computer Vision, (2015), 1026-1034. |
[17] | K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770-778. |
[18] | K. He, X. Zhang, S. Ren and J. Sun, Identity mappings in deep residual networks, In European Conference on Computer Vision, (2016), 630-645. |
[19] | F. Hornung, A. Jentzen and D. Salimova, Space-time deep neural network approximations for high-dimensional partial differential equations, arXiv: 2006.02199, (2020), 52 pages. |
[20] | A. Jentzen, D. Salimova and T. Welti, A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients, Commun. Math. Sci., 19 (2021), 1167-1205. doi: 10.4310/CMS.2021.v19.n5.a1. |
[21] | A. Kolmogoroff, Über die analytischen Methoden in der Wahrscheinlichkeitsrechnung, Math. Ann., 104 (1931), 415-458. doi: 10.1007/BF01457949. |
[22] | N. V. Krylov, M. Röckner and J. Zabczyk, Stochastic PDE's and Kolmogorov Equations in Infinite Dimensions, vol. 1715 of Lecture Notes in Mathematics, Springer-Verlag, Berlin; Centro Internazionale Matematico Estivo (C.I.M.E.), Florence, 1999. doi: 10.1007/BFb0092416. |
[23] | G. Kutyniok, P. Petersen, M. Raslan and R. Schneider, A theoretical analysis of deep neural networks and parametric PDEs, Constr. Approx., 55 (2022), 73-125. doi: 10.1007/s00365-021-09551-4. |
[24] | J. Müller, On the space-time expressivity of ResNets, In ICLR 2020 Workshop on Integration of Deep Neural Models and Differential Equations, (2020). |
[25] | E. Orhan and X. Pitkow, Skip connections eliminate singularities, In International Conference on Learning Representations, (2018). |
[26] | P. K. Panigrahi, S. Ghosh and D. R. Parhi, Navigation of autonomous mobile robot using different activation functions of wavelet neural network, Arch. Control Sci., 25 (2015), 21-34. doi: 10.1515/acsc-2015-0002. |
[27] | P. Petersen and F. Voigtlaender, Optimal approximation of piecewise smooth functions using deep ReLU neural networks, Neural Networks, 108 (2018), 296-330. |
[28] | T. Qin, K. Wu and D. Xiu, Data driven governing equations approximation using deep neural networks, J. Comput. Phys., 395 (2019), 620-635. doi: 10.1016/j.jcp.2019.06.042. |
[29] | P. Ramachandran, B. Zoph and Q. V. Le, Searching for Activation Functions, arXiv: 1710.05941, (2017), 13 pages. |
[30] | C. Szegedy, S. Ioffe, V. Vanhoucke and A. A. Alemi, Inception-v4, inception-resnet and the impact of residual connections on learning, In Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, (2017), AAAI'17, AAAI Press, 4278-4284. |
[31] | Y. Wang, Y. Li, Y. Song and X. Rong, The influence of the activation function in a convolution neural network model of facial expression recognition, Appl. Sci., 10 (2020), 1897. |
[32] | Z. Wu, C. Shen and A. van den Hengel, Wider or deeper: Revisiting the ResNet model for visual recognition, Pattern Recognition, 90 (2019), 119-133. |
[33] | K. Zhang, M. Sun, T. X. Han, X. Yuan, L. Guo and T. Liu, Residual networks of residual networks: Multilevel residual networks, IEEE Transactions on Circuits and Systems for Video Technology, 28 (2017), 1303-1314. |
Realization of an FNN
Realization of a ResNet