Article Contents
Article Contents

# Low-rank kernel approximation of Lyapunov functions using neural networks

• We study the use of single hidden layer neural networks for the approximation of Lyapunov functions in autonomous ordinary differential equations. In particular, we focus on the connection between this approach and that of the meshless collocation method using reproducing kernel Hilbert spaces. It is shown that under certain conditions, an optimised neural network is functionally equivalent to the RKHS generalised interpolant solution corresponding to a kernel function that is implicitly defined by the neural network. We demonstrate convergence of the neural network approximation using several numerical examples, and compare with approximations obtained by the meshless collocation method. Finally, motivated by our theoretical and numerical findings, we propose a new iterative algorithm for the approximation of Lyapunov functions using single hidden layer neural networks.

Mathematics Subject Classification: Primary: 37B25, 68T07; Secondary: 46E22.

 Citation:

• Figure 1.  Left: convergence of Lyapunov function approximation for generalised interpolant (kernel) and neural network models. The neural network converges more quickly with respect to $n$. Right: the RKHS norm of the generalised interpolant grows as the dataset size $n$ increases and the approximation improves, whereas the Barron norm of the neural network is bounded

Figure 2.  Contour plots showing the approximations of the Lyapunov function $V$ for the time-reversed van der Pol equations (29). On the left is the kernel approximation using the Gaussian RKHS $\mathcal{H}_{K_G}$. In the rightmost four plots are neural network approximations with fixed input-hidden layer weights, for increasing values of hidden width $m$. The approximation that achieves the lowest test error is the neural network with $m = 200$

Figure 3.  a) Training and test loss for the neural network approximations for increasing values of $m$. Small values of $m$ exhibit underfitting, whilst large values of $m$ (as well as the Gaussian kernel generalised interpolant) overfit. The best approximation was given by the neural network with $m = 200$. b) Training time for the networks increases significantly with $m$. This can be rectified by scaling the learning rate appropriately (see text for further details)

Figure 4.  Contour plots showing the Lyapunov function approximations for the neural network (21) for varying hidden widths $m$, where the parameters $({\bf{w}}_k, b_k)_{k = 1}^m$ are not trained. Top row: the network function approximations for a fixed dataset size $n = 90$ as in Section 4.2. Bottom row: network function approximations from training on randomly sampled minibatches

Figure 5.  Contour plots showing the kernels $K_{NN}$ learned by the neural networks for varying widths $m$. As $m$ increases, $K_{NN}$ converges to a fixed, translation-invariant kernel function

Figure 6.  Test loss vs network width $m$ for the networks obtained by Algorithm 1 for system (30), using the neural network model (21)

Figure 7.  Level sets of the Lyapunov function approximation for system (30), given by the neural network model (21) with $m = 102400$

Table 1.  Test error for network models of varying widths $m$, with a fixed dataset size as in Section 4.2, and an effectively infinite dataset size created through randomly sampled minibatches

 $m$ 40 80 200 400 800 1600 3200 Fixed dataset 2.229 0.959 0.772 0.855 0.684 0.949 1.612 Random minibatches 0.851 0.395 0.312 0.238 0.207 0.178 0.12

Table 2.  Test loss for network models of varying widths $m$, where the first layer weights $({\bf{w}}_k, b_k)_{k = 1}^m$ are untrained (fixed kernel $K_{NN}$) and trained (variable kernel $K_{NN}$)

 $m$ 40 80 200 400 800 1600 3200 Fixed $K_{NN}$ 0.851 0.417 0.312 0.238 0.207 0.178 0.12 Variable $K_{NN}$ 0.453 0.27 0.189 0.103 0.073 0.076 0.063
•  [1] H. Ban and W. D. Kalies, A computational approach to Conley's decomposition theorem, J. Comput. Nonlinear Dynam., 1 (2006), 312-319.  doi: 10.1115/1.2338651. [2] A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inf. Theor., 39 (1993), 930-945.  doi: 10.1109/18.256500. [3] N. P. Bhatia, On asymptotic stability in dynamical systems, Mathematical Systems Theory, 1 (1967), 113-127.  doi: 10.1007/BF01705521. [4] N. P. Bhatia and G. P. Szegö, Stability Theory of Dynamical Systems, Classics in Mathematics, Springer-Verlag Berlin Heidelberg, 1970. [5] A. Caponnetto and E. De Vito, Optimal rates for the regularized least-squares algorithm, Foundations of Computational Mathematics, 7 (2007), 331-368.  doi: 10.1007/s10208-006-0196-8. [6] F. Cucker and S. Smale, On the mathematical foundations of learning, Bulletin of the American Mathematical Society, 39 (2002), 1-49.  doi: 10.1090/S0273-0979-01-00923-5. [7] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals, and Systems (MCSS), 2 (1989), 303-314.  doi: 10.1007/BF02551274. [8] U. Dini, Fondamenti per la Teoria Delle Funzioni di Variabili Reali, (In Italian), Pisa, 1878. [9] W. E, C. Ma and L. Wu, Barron spaces and the compositional function spaces for neural network models, Constr. Approx., 55 (2022), 369–406. arXiv: 1906.08039, (2019). doi: 10.1007/s00365-021-09549-y. [10] W. E, C. Ma and L. Wu, A priori estimates of the population risk for two-layer neural networks, Communications in Mathematical Sciences, 17 (2019), 1407-1425.  doi: 10.4310/CMS.2019.v17.n5.a11. [11] C. Franke and R. Schaback, Solving partial differential equations by collocation using radial basis functions, Appl. Math. Comput., 93 (1998), 73-82.  doi: 10.1016/S0096-3003(97)10104-7. [12] P. Giesl, Construction of a finite-time Lyapunov function by meshless collocation, Discrete & Continuous Dynamical Systems - B, 17 (2012), 2387-2412.  doi: 10.3934/dcdsb.2012.17.2387. [13] P. Giesl, Construction of Global Lyapunov Functions Using Radial Basis Functions, Lecture Notes in Mathematics, Springer-Verlag Berlin Heidelberg, 2007. [14] P. Giesl, On the determination of the basin of attraction of discrete dynamical systems, Journal of Difference Equations and Applications, 13 (2007), 523-546.  doi: 10.1080/10236190601135209. [15] P. Giesl and S. Hafstein, Computation and verification of Lyapunov functions, SIAM Journal on Applied Dynamical Systems, 14 (2015), 1663-1698.  doi: 10.1137/140988802. [16] P. Giesl and S. Hafstein, Review on computational methods for Lyapunov functions, Discrete & Continuous Dynamical Systems - B, 20 (2015), 2291-2331.  doi: 10.3934/dcdsb.2015.20.2291. [17] P. Giesl, B. Hamzi, M. Rasmussen and K. N. Webster, Approximation of Lyapunov functions from noisy data, Journal of Computational Dynamics, 7 (2020), 57-81.  doi: 10.3934/jcd.2020003. [18] P. Giesl and H. Wendland, Approximating the basin of attraction of time-periodic ODEs by meshless collocation of a Cauchy problem, Discrete and Continuous Dynamical Systems. Series A, Supplement (2009), 259-268. [19] P. Giesl and H. Wendland, Meshless collocation: Error estimates with application to dynamical systems, SIAM Journal of Numerical Analysis, 45 (2007), 1723-1741.  doi: 10.1137/060658813. [20] L. Grüne, Computing Lyapunov functions using deep neural networks, Journal of Computational Dynamics, 8 (2021), 131-152.  doi: 10.3934/jcd.2021006. [21] L. Grüne, Overcoming the curse of dimensionality for approximating Lyapunov functions with deep neural networks under a small-gain condition, in 24th International Symposium on Mathematical Theory of Networks and Systems - MTNS 2020, (2020), to appear. [22] S. Hafstein, An algorithm for constructing Lyapunov function, Electronic Journal of Differential Equations. Monograph, 8 (2007). [23] W. Hahn, Stability of Motion, Grundlehren der mathematischen Wissenschaften, Springer-Verlag Berlin Heidelberg, 1967. [24] K. Hornik, M. Stinchcombe and H. White, Multilayer feedforward networks are universal approximators, Neural Netw., 2 (1989), 359-366.  doi: 10.1016/0893-6080(89)90020-8. [25] A. Jacot, F. Gabriel and C. Hongler, Neural tangent kernel: Convergence and generalization in neural networks, in Proceedings of the 32nd International Conference on Neural Information Processing Systems, Curran Associates Inc. (2018), 8580-8589. [26] C. M. Kellett, Classical converse theorems in Lyapunov's second method, Discrete & Continuous Dynamical Systems - B, 20 (2015), 2333-2360.  doi: 10.3934/dcdsb.2015.20.2333. [27] J. M. Klusowski and A. R. Barron, Risk bounds for high-dimensional ridge function combinations including neural networks, arXiv: 1607.01434, (2018). [28] A. M. Lyapunov, Problème général de la stabilité du mouvement, Ann. of math. Stud., 17 (1949), 203-474. [29] H. Q. Minh, Some properties of gaussian reproducing kernel hilbert spaces and their implications for function approximation and learning theory, Constructive Approximation, 32 (2009), 307-338.  doi: 10.1007/s00365-009-9080-0. [30] B. S. Mityagin, The zero set of a real analytic function, Mathematical Notes, 107 (2020), 529-530.  doi: 10.1134/S0001434620030189. [31] N. Mohammed and P. Giesl, Grid refinement in the construction of Lyapunov functions using radial basis functions, Discrete & Continuous Dynamical Systems - B, 20 (2015), 2453-2476.  doi: 10.3934/dcdsb.2015.20.2453. [32] Yu. E. Nesterov, A method for unconstrained convex minimization problem with the rate of convergence o(1/k2), Doklady ANSSSR (translated as Soviet. Math. Docl.), 269 (1983), 543-547. [33] A. Papachristodoulou and S. Prajna, On the construction of Lyapunov functions using the sum of squares decomposition, Proceedings of the 41st IEEE Conference on Decision and Control, 2002, 3 (2002), 3482-3487.  doi: 10.1109/CDC.2002.1184414. [34] T. Poggio, H. Mhaskar, L. Rosasco, B. Miranda and Q. Liao, Why and when can deep-but not shallow-networks avoid the curse of dimensionality: A review, Int. J Automat. Computing, 14 (2017), 503-519.  doi: 10.1007/s11633-017-1054-2. [35] A. Rahimi and B. Recht, Random features for large-scale kernel machines, in Proceedings of the 20th International Conference on Neural Information Processing System, Curran Associates Inc. (2007), 1177-1184. [36] A. Rahimi and B. Recht, Uniform approximation of functions with random bases, in 2008 46th Annual Allerton Conference on Communication, Control, and Computing, (2008), 555-561. doi: 10.1109/ALLERTON.2008.4797607. [37] H. Robbins and S. Monro, A stochastic approximation method, The Annals of Mathematical Statistics, 22 (1951), 400-407.  doi: 10.1214/aoms/1177729586. [38] W. Rudin, Fourier Analysis on Groups, Wiley-Interscience, 1990. doi: 10.1002/9781118165621. [39] S. Sastry, Nonlinear Systems: Analysis, Stability, and Control, Springer, New York, 1999. doi: 10.1007/978-1-4757-3108-8. [40] I. Steinwart, On the influence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res., 2 (2002), 67-93. [41] A. M. Stuart and A. R. Humphries, Dynamical Systems and Numerical Analysis, Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, 1998. [42] H. Wendland, Error estimates for interpolation by compactly supported radial basis functions of minimal degree, J. Approx. Theory, 93 (1998), 258-272.  doi: 10.1006/jath.1997.3137. [43] H. Wendland, Scattered Data Approximation, Cambridge Monographs on Applied and Computational Mathematics, Cambridge University Press, 2005. [44] V. I. Zubov, Methods of A. M. Lyapunov and their application, P. Noordhoff Ltd., Groningen, 1964.

Figures(7)

Tables(2)