In this work we propose a method to perform optimization on manifolds. We assume to have an objective function $ f $ defined on a manifold and think of it as the potential energy of a mechanical system. By adding a momentum-dependent kinetic energy, we define its Hamiltonian function, which allows us to write the corresponding Hamiltonian system. We make it into a conformal Hamiltonian system by adding a dissipation term: the result is the continuous model of our scheme. We solve it via splitting methods (Lie-Trotter and leapfrog): we combine the RATTLE scheme, approximating the conserved flow, with the exact dissipated flow. The result is a conformal symplectic method for constant stepsizes. We also propose an adaptive stepsize implementation. We test the proposed schemes on an example, the minimization of a function defined on a sphere and they are competitive in comparison with the standard gradient descent method.
| Citation: |
Figure 3. Verification of the constraints for all methods used: SM of order 1, SM of order 2 and the adaptive routine (AD). Left: checking $ q_n $ belongs to the unit sphere by plotting $ g(q_n) = ||q_n||^2 - 1 $. Right: checking $ p_n $ belongs to the tangent space related to the point $ q_n $ by plotting the inner product $ \langle q_n, p_n \rangle $
Figure 5. Constant $ \alpha $ such that $ e_{n+1} = \alpha e_n $ related to the plots in Figure 4
Figure 8. Examples of the evolution of the 2-norm of $ DF(q) $ for different values of $ h $. The second row is a zoom of the plots in the first one, and they can be compared to the one in Figure 7 as the data are the same
| [1] |
P. A. Absil, R. Mahony and R. Sepulchre, Optimization Algorithms on Matrix Manifolds, Princeton University Press, Princeton, NJ, 2008.
doi: 10.1515/9781400830244.
|
| [2] |
H. C. Andersen, Rattle: A "velocity" version of the shake algorithm for molecular dynamics calculations, Journal of computational Physics, Elsevier, 52, (1983), 24-34.
doi: 10.1016/0021-9991(83)90014-1.
|
| [3] |
N. Boumal, An Introduction to Optimization on Smooth Manifolds, 1st edition, Cambridge University Press, 2023.
|
| [4] |
E. Celledoni, S. Eidnes, B. Owren and T. Ringholm, Dissipative numerical schemes on Riemannian manifolds with applications to gradient flows, SIAM Journal on Scientific Computing, SIAM, 40 (2018), A3789-A3806.
doi: 10.1137/18M1190628.
|
| [5] |
E. Celledoni, S. Eidnes, B. Owren and T. Ringholm, Energy-preserving methods on Riemannian manifolds, Mathematics of Computation, 89 (2020), 699-716.
doi: 10.1090/mcom/3470.
|
| [6] |
G. França, J. Sulam, D. Robinson and R. Vidal, Conformal symplectic and relativistic optimization, J. Stat. Mech. Theory Exp., 2020 (2020), 124008, 29 pp.
doi: 10.1088/1742-5468/abcaee.
|
| [7] |
L. C. García-Naranjo and M. Vermeeren, Structure preserving discretization of time-reparametrized Hamiltonian systems with application to nonholonomic mechanics, Journal of Computational Dynamics, 8 (2021), 241-271.
doi: 10.3934/jcd.2021011.
|
| [8] |
E. Hairer, C. Lubich and G. Wanner, Geometric Numerical Integration, 2$^{st}$ edition, Springer-Verlag, Berlin, 2006.
|
| [9] |
E. Hairer and G. Wanner, Solving Ordinary Differential Equations II, Springer Ser. Comput. Math., 14, Springer-Verlag, Berlin, 1996.
doi: 10.1007/978-3-642-05221-7.
|
| [10] |
P. Libermann and C. M. Marle, Symplectic Geometry and Analytical Mechanics, 1$^{st}$ edition, Springer Science & Business Media, 2012.
|
| [11] |
C. J. Maddison, D. Paulin, Y. W. Teh, B. O'Donoghue and A. Doucet, Hamiltonian Descent Methods, preprint, 2018, arXiv: 1809.05042.
|
| [12] |
J. E. Marsden and T. S Ratiu, Introduction to Mechanics and Symmetry: A Basic Exposition of Classical Mechanical Systems, Texts Appl. Math., 17, Springer-Verlag, New York, 1994.
|
| [13] |
E. Massart and V. Abrol, Coordinate descent on the orthogonal group for recurrent neural network training, Proceedings of the AAAI Conference on Artificial Intelligence, textbf36 (2022), 7744-7751.
|
| [14] |
R. McLachlan and M. Perlmutter, Conformal Hamiltonian systems, Journal of Geometry and Physics, Elsevier, 39 (2001), 276-300.
doi: 10.1016/S0393-0440(01)00020-1.
|
| [15] |
Y. Nesterov, A method for solving the convex programming problem with convergence rate $O(1/k^2)$, Proceedings of the USSR Academy of Sciences, textbf269 (1983), 543-547.
|
| [16] |
B. T. Polyak, Some methods of speeding up the convergence of iteration methods, Ussr Computational Mathematics and Mathematical Physics, Elsevier, 4 (1964), 791-803.
|
| [17] |
W. Su, S. Boyd and E. J. Candès, A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights, Journal of Machine Learning Research, 17 (2016), Paper No. 153, 43 pp.
|
| [18] |
N. S. Wadia, M. I. Jordan and M. Muehlebach, Optimization with adaptive step size selection from a dynamical systems perspective, Proceedings of the 35th Conference on Neural Information Processing Systems, NeurIPS Workshop on Optimization for Machine Learning, 2021.
|
| [19] |
H. Zhang and S. Sra, Towards Riemannian Accelerated Gradient Methods, preprint, 2018, arXiv: 1806.02812.
|
Results for the SM order 1, for different matrices
Results for the SM of order 1, the SM of order 2 and the adaptive scheme (AD), for different matrices
Verification of the constraints for all methods used: SM of order 1, SM of order 2 and the adaptive routine (AD). Left: checking
Distance between the exact solution and the iterates computed with the SM of order 1, of order 2 and the adaptive routine (AD). The distance is calculated between points on the sphere,
Constant
Results for the SM of order 1, the SM of order 2, the adaptive scheme (AD) and the GD method, for different matrices
Examples of the evolution of the spectral radius
Examples of the evolution of the 2-norm of