\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Deep Learning approximation of diffeomorphisms via linear-control systems

  • *Corresponding author: Alessandro Scagliotti

    *Corresponding author: Alessandro Scagliotti

The first author is partially supported by INdAM–GNAMPA.

Abstract / Introduction Full Text(HTML) Figure(3) / Table(6) Related Papers Cited by
  • In this paper we propose a Deep Learning architecture to approximate diffeomorphisms diffeotopic to the identity. We consider a control system of the form $ \dot x = \sum_{i = 1}^lF_i(x)u_i $, with linear dependence in the controls, and we use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points. Despite the simplicity of the control system, it has been recently shown that a Universal Approximation Property holds. The problem of minimizing the sum of the training error and of a regularizing term induces a gradient flow in the space of admissible controls. A possible training procedure for the discrete-time neural network consists in projecting the gradient flow onto a finite-dimensional subspace of the admissible controls. An alternative approach relies on an iterative method based on Pontryagin Maximum Principle for the numerical resolution of Optimal Control problems. Here the maximization of the Hamiltonian can be carried out with an extremely low computational effort, owing to the linear dependence of the system in the control variables. Finally, we use tools from $ \Gamma $-convergence to provide an estimate of the expected generalization error.

    Mathematics Subject Classification: Primary: 49M05, 49M25; Secondary: 68T07, 49J15.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  On the left we report the grid of points $ \{ x^1_0,\ldots,x^M_0 \} $ where we have evaluated the diffeomorphism $ \Psi: \mathbb{R}^2\to \mathbb{R}^2 $ defined as in (53). The picture on the right represents the transformation of the training dataset through the diffeomorphism $ \Psi $

    Figure 2.  ResNet 52, 16 layers, Algorithm 1, $ \beta = 10^{-4} $. On the top-left we reported the transformation of the initial grid through the approximating diffeomorphism (red circles) and through the original one (blue circles). On the top-right, we plotted the prediction on the testing data-set provided by the approximating diffeomorphism (red crosses) and the correct values obtained through the original transformation (blue crosses). In both cases, the approximation obtained is unsatisfactory. At bottom we plotted the decrease of the training error and the testing error versus the number of iterations. Finally, the curve in magenta represents the estimate of the generalization error provided by (35)

    Figure 3.  ResNet 55, 16 layers, Algorithm 1, $ \beta = 10^{-3} $. On the top-left we reported the transformation of the initial grid through the approximating diffeomorphism (red circles) and through the original one (blue circles). On the top-right, we plotted the prediction on the testing data-set provided by the approximating diffeomorphism (red crosses) and the correct values obtained through the original transformation (blue crosses). In both cases, the approximation obtained is good, and we observe that it is better where we have more data density. At bottom we plotted the decrease of the training error and the testing error versus the number of iterations. Finally, the curve in magenta represents the estimate of the generalization error provided by (35)

    Table 1.  ResNet 52, $ 16 $ layers, $ 128 $ parameters, Algorithm 1. Running time $ \sim 160 $ s

    $ \beta $ $ L_{\Phi_u} $ Training error Testing error
    $ 10^0 $ $ 1.19 $ $ 3.8785 $ $ 3.8173 $
    $ 10^{-1} $ $ 8.40 $ $ 1.3143 $ $ 1.2476 $
    $ 10^{-2} $ $ 9.32 $ $ 1.1991 $ $ 1.1451 $
    $ 10^{-3} $ $ 9.37 $ $ 1.1852 $ $ 1.1330 $
    $ 10^{-4} $ $ 9.37 $ $ 1.1839 $ $ 1.1318 $
     | Show Table
    DownLoad: CSV

    Table 2.  ResNet 52, $ 16 $ layers, $ 128 $ parameters, Algorithm 2. Running time $ \sim 130 $ s

    $ \beta $ $ L_{\Phi_u} $ Training error Testing error
    $ 10^0 $ $ 1.19 $ $ 3.8749 $ $ 3.8157 $
    $ 10^{-1} $ $ 8.40 $ $ 1.3084 $ $ 1.2455 $
    $ 10^{-2} $ $ 9.32 $ $ 1.2014 $ $ 1.1486 $
    $ 10^{-3} $ $ 9.33 $ $ 1.1898 $ $ 1.1387 $
    $ 10^{-4} $ $ 9.33 $ $ 1.1898 $ $ 1.1379 $
     | Show Table
    DownLoad: CSV

    Table 3.  ResNet 52, $ 32 $ layers, $ 256 $ parameters, Algorithm 1. Running time $ \sim 320 $ s

    $ \beta $ $ L_{\Phi_u} $ Training error Testing error
    $ 10^0 $ $ 1.19 $ $ 3.8779 $ $ 3.8168 $
    $ 10^{-1} $ $ 8.40 $ $ 1.3074 $ $ 1.2425 $
    $ 10^{-2} $ $ 9.26 $ $ 1.2015 $ $ 1.1477 $
    $ 10^{-3} $ $ 9.34 $ $ 1.1860 $ $ 1.1352 $
    $ 10^{-4} $ $ 9.34 $ $ 1.1842 $ $ 1.1332 $
     | Show Table
    DownLoad: CSV

    Table 4.  ResNet 52, $ 32 $ layers, $ 256 $ parameters, Algorithm 2. Running time $ \sim 260 $ s

    $ \beta $ $ L_{\Phi_u} $ Training error Testing error
    $ 10^0 $ $ 1.19 $ $ 3.8739 $ $ 3.8148 $
    $ 10^{-1} $ $ 8.35 $ $ 1.3085 $ $ 1.2449 $
    $ 10^{-2} $ $ 9.23 $ $ 1.2075 $ $ 1.1538 $
    $ 10^{-3} $ $ 9.26 $ $ 1.1931 $ $ 1.1416 $
    $ 10^{-4} $ $ 9.26 $ $ 1.1918 $ $ 1.1404 $
     | Show Table
    DownLoad: CSV

    Table 5.  ResNet 55, $ 16 $ layers, $ 224 $ parameters, Algorithm 1. Running time $ \sim 320 $ s

    $ \beta $ $ L_{\Phi_u} $ Training error Testing error
    $ 10^0 $ $ 10.14 $ $ 2.3791 $ $ 2.3036 $
    $ 10^{-1} $ $ 13.84 $ $ 0.1809 $ $ 0.2314 $
    $ 10^{-2} $ $ 15.64 $ $ 0.1290 $ $ 0.1784 $
    $ 10^{-3} $ $ 15.83 $ $ 0.1254 $ $ 0.1747 $
    $ 10^{-4} $ $ 15.86 $ $ 0.1257 $ $ 0.1751 $
     | Show Table
    DownLoad: CSV

    Table 6.  ResNet 55, $ 16 $ layers, $ 224 $ parameters, Algorithm 2. Running time $ \sim 310 $ s

    $ \beta $ $ L_{\Phi_u} $ Training error Testing error
    $ 10^0 $ $ 10.78 $ $ 2.3638 $ $ 2.3910 $
    $ 10^{-1} $ $ 14.32 $ $ 0.1921 $ $ 0.2422 $
    $ 10^{-2} $ $ 15.43 $ $ 0.1887 $ $ 0.2347 $
    $ 10^{-3} $ $ 15.56 $ $ 0.2260 $ $ 0.2719 $
    $ 10^{-4} $ $ 15.59 $ $ 0.2127 $ $ 0.2564 $
     | Show Table
    DownLoad: CSV
  • [1] A. AgrachevD. Barilari and  U. BoscainA Comprehensive Introduction to Sub-Riemannian Geometry, Cambridge Studies in Advanced Mathematics, 181. Cambridge University Press, Cambridge, 2020. 
    [2] A. AgrachevY. Baryshnikov and A. Sarychev, Ensemble controllability by Lie algebraic methods, ESAIM Control Optim. Calc. Var., 22 (2016), 921-938.  doi: 10.1051/cocv/2016029.
    [3] A. A. Agrachev and Y. L. Sachkov, Control Theory from the Geometric Viewpoint, Encyclopaedia of Mathematical Sciences, 87. Control Theory and Optimization, Ⅱ. Springer-Verlag, Berlin, 2004. doi: 10.1007/978-3-662-06404-7.
    [4] A. Agrachev and A. Sarychev, Control in the spaces of ensembles of points, SIAM J. Control Optim., 58 (2020), 1579-1596.  doi: 10.1137/19M1273049.
    [5] A. A. Agrachev and A. V. Sarychev, Control on the manifolds of mappings with a view to the deep learning, J. Dyn. Control Syst., (2021). 
    [6] L. Ambrosio, N. Gigli and G. Savaré, Gradient Flows in Metric Spaces and in the Space of Probability Measures, Second edition, Lectures in Mathematics ETH Zürich, Birkhäuser Verlag, Basel, 2008.
    [7] Y. BengioP. Simard and P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Trans. Neural Netw., 5 (1994), 157-166. 
    [8] M. BenningE. CelledoniM. J. ErhardtB. Owren and C. B. Schönlieb, Deep learning as optimal control problems: Models and numerical methods, J. Comput. Dyn., 6 (2019), 171-198.  doi: 10.3934/jcd.2019009.
    [9] M. BonginiM. FornasierF. Rossi and F. Solombrino, Mean-field pontryagin maximum principle, J. Optim. Theory Appl., 175 (2017), 1-38.  doi: 10.1007/s10957-017-1149-5.
    [10] B. Bonnet, C. Cipriani, M. Fornasier and H. Huang, A measure theoretical approach to the Mean-field Maximum Principle for training NeurODEs, preprint, 2021, arXiv: 2107.08707.
    [11] F. L. Chernousko and A. A. Lyubushin, Method of successive approximations for solution of optimal control problems, Optim. Control Appl. Methods, 3 (1982), 101-114.  doi: 10.1002/oca.4660030201.
    [12] E. Çinlar, Probability and Stochastics, Graduate Texts in Mathematics, 261. Springer, New York, 2011. doi: 10.1007/978-0-387-87859-1.
    [13] G. Dal Maso, An Introduction to $\Gamma$-Convergence, Progress in Nonlinear Differential Equations and their Applications, 8. Birkhäuser Boston, Inc., Boston, MA, 1993. doi: 10.1007/978-1-4612-0327-8.
    [14] W. E, A proposal on machine learning via dynamical systems, Commun. Math. Stat., 5 (2017), 1-11.  doi: 10.1007/s40304-017-0103-z.
    [15] I. GoodfellowY. Bengio and  A. CourvilleDeep Learning, MIT Press, Cambridge MA, 2016. 
    [16] E. Haber and L. Ruthotto, Stable architectures for deep neural networks, Inverse Problems, 34 (2018), 014004, 22 pp. doi: 10.1088/1361-6420/aa9a90.
    [17] K. He and J. Sun, Convolutional neural networks at constrained time cost, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 5353-5360.
    [18] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 770-778.
    [19] Q. Li, L. Chen, C. Tai and W. E, Maximum principle based algorithms for deep learning, J. Mach. Learn. Res., 18 (2017), Paper No. 165, 29 pp.
    [20] M. MarchiB. Gharesifard and P. Tabuada, Training deep residual networks for uniform approximation guarantees, PMLR, 144 (2021), 677-688. 
    [21] A. V. Pukhlikov, Optimal control of distributions, Comput. Math. Model., 15 (2004), 223-256.  doi: 10.1023/B:COMI.0000035820.49408.56.
    [22] Y. Sakawa and Y. Shindo, On global convergence of an algorithm for optimal control, IEEE Trans. Automat. Contr., 25 (1980), 1149-1153.  doi: 10.1109/TAC.1980.1102517.
    [23] A. Scagliotti, A gradient flow equation for optimal control problems with end-point cost, J. Dyn. Control Syst., (2022). 
    [24] P. Tabuada, B. Gharesifard, Universal approximation power of deep neural networks via nonlinear control theory, preprint, 2020, arXiv: 2007.06007.
    [25] M. Thorpe and Y. van Gennip, Deep limits of residual neural networks, preprint, 2018, arXiv: 1810.11741.
  • 加载中

Figures(3)

Tables(6)

SHARE

Article Metrics

HTML views(3477) PDF downloads(484) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return