
-
Previous Article
The Signed Cumulative Distribution Transform for 1-D signal analysis and classification
- FoDS Home
- This Issue
-
Next Article
An extension of the angular synchronization problem to the heterogeneous setting
Smart Gradient - An adaptive technique for improving gradient estimation
CEMSE Division, King Abdullah University of Science and Technology, Kingdom of Saudi Arabia |
Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that requires gradients are optimization techniques such as stochastic gradient descent, Newton's method and trust region methods. However, these methods usually require a numerical computation of the gradient at every iteration of the method which is prone to numerical errors. We propose a simple limited-memory technique for improving the accuracy of a numerically computed gradient in this gradient-based optimization framework by exploiting (1) a coordinate transformation of the gradient and (2) the history of previously taken descent directions. The method is verified empirically by extensive experimentation on both test functions and on real data applications. The proposed method is implemented in the $\texttt{R} $ package $ \texttt{smartGrad}$ and in C$ \texttt{++} $.
References:
[1] |
H. Bakka, H. Rue, G.-A. Fuglstad, A. Riebler and D. Bolin, et al., Spatial modeling with R-INLA: A review, Wiley Interdiscip. Rev. Comput. Stat., 10 (2018), 24pp.
doi: 10.1002/wics.1443. |
[2] |
J. Besag,
Statistical analysis of non-lattice data, J. R. Stat. Soc. Ser. D Statist., 24 (1975), 179-195.
doi: 10.2307/2987782. |
[3] |
J. S. Depner and T. C. Rasmussen, Hydrodynamics of Time-Periodic Groundwater Flow: Diffusion Waves in Porous Media, Geophysical Monograph Series, John Wiley & Sons, 2016.
doi: 10.1002/9781119133957. |
[4] |
R. Fletcher, Practical Methods of Optimization, 2$^{nd}$ edition, John Wiley & Sons, Ltd., Chichester, 1987. |
[5] |
F. Lindgren, H. Rue and J. Lindström,
An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach, J. R. Stat. Soc. Ser. B Stat. Methodol., 73 (2011), 423-498.
doi: 10.1111/j.1467-9868.2011.00777.x. |
[6] |
J. Nocedal and S. J. Wright, Numerical Optimization, Springer Series in Operations Research, Springer-Verlag, New York, 1999.
doi: 10.1007/b98874. |
[7] |
V. Picheny, T. Wagner and D. Ginsbourger,
A benchmark of kriging-based infill criteria for noisy optimization, Struct. Multidiscip. Optim., 48 (2013), 607-626.
doi: 10.1007/s00158-013-0919-4. |
[8] |
H. Rue, S. Martino and N. Chopin,
Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, J. R. Stat. Soc. Ser. B Stat. Methodol., 71 (2009), 319-392.
doi: 10.1111/j.1467-9868.2008.00700.x. |
[9] |
H. Rue, A. Riebler, S. H. Sørbye, J. B. Illian, D. P. Simpson and F. K. Lindgren, Bayesian computing with INLA: A review, preprint, 2016, arXiv: 1604.00860. |
[10] |
G. Thomas, M. Weir, J. Hass and F. T. Giordano, Calculus Early Transcendentals, 11$^{th}$ edition, Thomas Series, 2005. |
show all references
References:
[1] |
H. Bakka, H. Rue, G.-A. Fuglstad, A. Riebler and D. Bolin, et al., Spatial modeling with R-INLA: A review, Wiley Interdiscip. Rev. Comput. Stat., 10 (2018), 24pp.
doi: 10.1002/wics.1443. |
[2] |
J. Besag,
Statistical analysis of non-lattice data, J. R. Stat. Soc. Ser. D Statist., 24 (1975), 179-195.
doi: 10.2307/2987782. |
[3] |
J. S. Depner and T. C. Rasmussen, Hydrodynamics of Time-Periodic Groundwater Flow: Diffusion Waves in Porous Media, Geophysical Monograph Series, John Wiley & Sons, 2016.
doi: 10.1002/9781119133957. |
[4] |
R. Fletcher, Practical Methods of Optimization, 2$^{nd}$ edition, John Wiley & Sons, Ltd., Chichester, 1987. |
[5] |
F. Lindgren, H. Rue and J. Lindström,
An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach, J. R. Stat. Soc. Ser. B Stat. Methodol., 73 (2011), 423-498.
doi: 10.1111/j.1467-9868.2011.00777.x. |
[6] |
J. Nocedal and S. J. Wright, Numerical Optimization, Springer Series in Operations Research, Springer-Verlag, New York, 1999.
doi: 10.1007/b98874. |
[7] |
V. Picheny, T. Wagner and D. Ginsbourger,
A benchmark of kriging-based infill criteria for noisy optimization, Struct. Multidiscip. Optim., 48 (2013), 607-626.
doi: 10.1007/s00158-013-0919-4. |
[8] |
H. Rue, S. Martino and N. Chopin,
Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, J. R. Stat. Soc. Ser. B Stat. Methodol., 71 (2009), 319-392.
doi: 10.1111/j.1467-9868.2008.00700.x. |
[9] |
H. Rue, A. Riebler, S. H. Sørbye, J. B. Illian, D. P. Simpson and F. K. Lindgren, Bayesian computing with INLA: A review, preprint, 2016, arXiv: 1604.00860. |
[10] |
G. Thomas, M. Weir, J. Hass and F. T. Giordano, Calculus Early Transcendentals, 11$^{th}$ edition, Thomas Series, 2005. |





Average MSE Vanilla Gradient | Average MSE Smart Gradient | Improvement | |
Extended Rosenbrock Function | |||
5 | 2.60e-04 | 1.04e-04 | 2.5 |
10 | 2.71e-04 | 0.78e-04 | 3.47 |
25 | 2.80e-04 | 0.49e-04 | 5.71 |
Extended Freudenstein Roth Function | |||
5 | 2.26e-04 | 1.39e-04 | 1.63 |
10 | 2.78e-04 | 1.42e-04 | 1.96 |
25 | 2.84e-04 | 1.25e-04 | 2.27 |
Average MSE Vanilla Gradient | Average MSE Smart Gradient | Improvement | |
Extended Rosenbrock Function | |||
5 | 2.60e-04 | 1.04e-04 | 2.5 |
10 | 2.71e-04 | 0.78e-04 | 3.47 |
25 | 2.80e-04 | 0.49e-04 | 5.71 |
Extended Freudenstein Roth Function | |||
5 | 2.26e-04 | 1.39e-04 | 1.63 |
10 | 2.78e-04 | 1.42e-04 | 1.96 |
25 | 2.84e-04 | 1.25e-04 | 2.27 |
[1] |
Feng Bao, Thomas Maier. Stochastic gradient descent algorithm for stochastic optimization in solving analytic continuation problems. Foundations of Data Science, 2020, 2 (1) : 1-17. doi: 10.3934/fods.2020001 |
[2] |
Wataru Nakamura, Yasushi Narushima, Hiroshi Yabe. Nonlinear conjugate gradient methods with sufficient descent properties for unconstrained optimization. Journal of Industrial and Management Optimization, 2013, 9 (3) : 595-619. doi: 10.3934/jimo.2013.9.595 |
[3] |
Yigui Ou, Yuanwen Liu. A memory gradient method based on the nonmonotone technique. Journal of Industrial and Management Optimization, 2017, 13 (2) : 857-872. doi: 10.3934/jimo.2016050 |
[4] |
Giacomo Frassoldati, Luca Zanni, Gaetano Zanghirati. New adaptive stepsize selections in gradient methods. Journal of Industrial and Management Optimization, 2008, 4 (2) : 299-312. doi: 10.3934/jimo.2008.4.299 |
[5] |
Xiaming Chen. Kernel-based online gradient descent using distributed approach. Mathematical Foundations of Computing, 2019, 2 (1) : 1-9. doi: 10.3934/mfc.2019001 |
[6] |
Ting Hu. Kernel-based maximum correntropy criterion with gradient descent method. Communications on Pure and Applied Analysis, 2020, 19 (8) : 4159-4177. doi: 10.3934/cpaa.2020186 |
[7] |
Shishun Li, Zhengda Huang. Guaranteed descent conjugate gradient methods with modified secant condition. Journal of Industrial and Management Optimization, 2008, 4 (4) : 739-755. doi: 10.3934/jimo.2008.4.739 |
[8] |
Yacine Chitour, Zhenyu Liao, Romain Couillet. A geometric approach of gradient descent algorithms in linear neural networks. Mathematical Control and Related Fields, 2022 doi: 10.3934/mcrf.2022021 |
[9] |
José Antonio Carrillo, Yanghong Huang, Francesco Saverio Patacchini, Gershon Wolansky. Numerical study of a particle method for gradient flows. Kinetic and Related Models, 2017, 10 (3) : 613-641. doi: 10.3934/krm.2017025 |
[10] |
Predrag S. Stanimirović, Branislav Ivanov, Haifeng Ma, Dijana Mosić. A survey of gradient methods for solving nonlinear optimization. Electronic Research Archive, 2020, 28 (4) : 1573-1624. doi: 10.3934/era.2020115 |
[11] |
Yanmei Sun, Yakui Huang. An alternate gradient method for optimization problems with orthogonality constraints. Numerical Algebra, Control and Optimization, 2021, 11 (4) : 665-676. doi: 10.3934/naco.2021003 |
[12] |
Cristian Barbarosie, Anca-Maria Toader, Sérgio Lopes. A gradient-type algorithm for constrained optimization with application to microstructure optimization. Discrete and Continuous Dynamical Systems - B, 2020, 25 (5) : 1729-1755. doi: 10.3934/dcdsb.2019249 |
[13] |
Saman Babaie–Kafaki, Reza Ghanbari. A class of descent four–term extension of the Dai–Liao conjugate gradient method based on the scaled memoryless BFGS update. Journal of Industrial and Management Optimization, 2017, 13 (2) : 649-658. doi: 10.3934/jimo.2016038 |
[14] |
Gaohang Yu, Lutai Guan, Guoyin Li. Global convergence of modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property. Journal of Industrial and Management Optimization, 2008, 4 (3) : 565-579. doi: 10.3934/jimo.2008.4.565 |
[15] |
René Henrion. Gradient estimates for Gaussian distribution functions: application to probabilistically constrained optimization problems. Numerical Algebra, Control and Optimization, 2012, 2 (4) : 655-668. doi: 10.3934/naco.2012.2.655 |
[16] |
Guanghui Zhou, Qin Ni, Meilan Zeng. A scaled conjugate gradient method with moving asymptotes for unconstrained optimization problems. Journal of Industrial and Management Optimization, 2017, 13 (2) : 595-608. doi: 10.3934/jimo.2016034 |
[17] |
Jueyou Li, Guoquan Li, Zhiyou Wu, Changzhi Wu, Xiangyu Wang, Jae-Myung Lee, Kwang-Hyo Jung. Incremental gradient-free method for nonsmooth distributed optimization. Journal of Industrial and Management Optimization, 2017, 13 (4) : 1841-1857. doi: 10.3934/jimo.2017021 |
[18] |
El-Sayed M.E. Mostafa. A nonlinear conjugate gradient method for a special class of matrix optimization problems. Journal of Industrial and Management Optimization, 2014, 10 (3) : 883-903. doi: 10.3934/jimo.2014.10.883 |
[19] |
Jin-Zan Liu, Xin-Wei Liu. A dual Bregman proximal gradient method for relatively-strongly convex optimization. Numerical Algebra, Control and Optimization, 2021 doi: 10.3934/naco.2021028 |
[20] |
Delio Mugnolo, René Pröpper. Gradient systems on networks. Conference Publications, 2011, 2011 (Special) : 1078-1090. doi: 10.3934/proc.2011.2011.1078 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]