# American Institute of Mathematical Sciences

March  2022, 4(1): 123-136. doi: 10.3934/fods.2021037

 CEMSE Division, King Abdullah University of Science and Technology, Kingdom of Saudi Arabia

* Corresponding author: Esmail Abdul Fattah

Received  June 2021 Revised  December 2021 Published  March 2022 Early access  January 2022

Computing the gradient of a function provides fundamental information about its behavior. This information is essential for several applications and algorithms across various fields. One common application that requires gradients are optimization techniques such as stochastic gradient descent, Newton's method and trust region methods. However, these methods usually require a numerical computation of the gradient at every iteration of the method which is prone to numerical errors. We propose a simple limited-memory technique for improving the accuracy of a numerically computed gradient in this gradient-based optimization framework by exploiting (1) a coordinate transformation of the gradient and (2) the history of previously taken descent directions. The method is verified empirically by extensive experimentation on both test functions and on real data applications. The proposed method is implemented in the $\texttt{R}$ package $\texttt{smartGrad}$ and in C$\texttt{++}$.

Citation: Esmail Abdul Fattah, Janet Van Niekerk, Håvard Rue. Smart Gradient - An adaptive technique for improving gradient estimation. Foundations of Data Science, 2022, 4 (1) : 123-136. doi: 10.3934/fods.2021037
##### References:
 [1] H. Bakka, H. Rue, G.-A. Fuglstad, A. Riebler and D. Bolin, et al., Spatial modeling with R-INLA: A review, Wiley Interdiscip. Rev. Comput. Stat., 10 (2018), 24pp. doi: 10.1002/wics.1443. [2] J. Besag, Statistical analysis of non-lattice data, J. R. Stat. Soc. Ser. D Statist., 24 (1975), 179-195.  doi: 10.2307/2987782. [3] J. S. Depner and T. C. Rasmussen, Hydrodynamics of Time-Periodic Groundwater Flow: Diffusion Waves in Porous Media, Geophysical Monograph Series, John Wiley & Sons, 2016. doi: 10.1002/9781119133957. [4] R. Fletcher, Practical Methods of Optimization, 2$^{nd}$ edition, John Wiley & Sons, Ltd., Chichester, 1987. [5] F. Lindgren, H. Rue and J. Lindström, An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach, J. R. Stat. Soc. Ser. B Stat. Methodol., 73 (2011), 423-498.  doi: 10.1111/j.1467-9868.2011.00777.x. [6] J. Nocedal and S. J. Wright, Numerical Optimization, Springer Series in Operations Research, Springer-Verlag, New York, 1999. doi: 10.1007/b98874. [7] V. Picheny, T. Wagner and D. Ginsbourger, A benchmark of kriging-based infill criteria for noisy optimization, Struct. Multidiscip. Optim., 48 (2013), 607-626.  doi: 10.1007/s00158-013-0919-4. [8] H. Rue, S. Martino and N. Chopin, Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, J. R. Stat. Soc. Ser. B Stat. Methodol., 71 (2009), 319-392.  doi: 10.1111/j.1467-9868.2008.00700.x. [9] H. Rue, A. Riebler, S. H. Sørbye, J. B. Illian, D. P. Simpson and F. K. Lindgren, Bayesian computing with INLA: A review, preprint, 2016, arXiv: 1604.00860. [10] G. Thomas, M. Weir, J. Hass and F. T. Giordano, Calculus Early Transcendentals, 11$^{th}$ edition, Thomas Series, 2005.

show all references

##### References:
 [1] H. Bakka, H. Rue, G.-A. Fuglstad, A. Riebler and D. Bolin, et al., Spatial modeling with R-INLA: A review, Wiley Interdiscip. Rev. Comput. Stat., 10 (2018), 24pp. doi: 10.1002/wics.1443. [2] J. Besag, Statistical analysis of non-lattice data, J. R. Stat. Soc. Ser. D Statist., 24 (1975), 179-195.  doi: 10.2307/2987782. [3] J. S. Depner and T. C. Rasmussen, Hydrodynamics of Time-Periodic Groundwater Flow: Diffusion Waves in Porous Media, Geophysical Monograph Series, John Wiley & Sons, 2016. doi: 10.1002/9781119133957. [4] R. Fletcher, Practical Methods of Optimization, 2$^{nd}$ edition, John Wiley & Sons, Ltd., Chichester, 1987. [5] F. Lindgren, H. Rue and J. Lindström, An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach, J. R. Stat. Soc. Ser. B Stat. Methodol., 73 (2011), 423-498.  doi: 10.1111/j.1467-9868.2011.00777.x. [6] J. Nocedal and S. J. Wright, Numerical Optimization, Springer Series in Operations Research, Springer-Verlag, New York, 1999. doi: 10.1007/b98874. [7] V. Picheny, T. Wagner and D. Ginsbourger, A benchmark of kriging-based infill criteria for noisy optimization, Struct. Multidiscip. Optim., 48 (2013), 607-626.  doi: 10.1007/s00158-013-0919-4. [8] H. Rue, S. Martino and N. Chopin, Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations, J. R. Stat. Soc. Ser. B Stat. Methodol., 71 (2009), 319-392.  doi: 10.1111/j.1467-9868.2008.00700.x. [9] H. Rue, A. Riebler, S. H. Sørbye, J. B. Illian, D. P. Simpson and F. K. Lindgren, Bayesian computing with INLA: A review, preprint, 2016, arXiv: 1604.00860. [10] G. Thomas, M. Weir, J. Hass and F. T. Giordano, Calculus Early Transcendentals, 11$^{th}$ edition, Thomas Series, 2005.
2-dimensional Rosenbrock Function and a contour plot around the point p (-0.29, 0.4). At the same point p, the MSE and magnitude of gradient are plotted using different directions
Directions used to compute Vanilla Gradient (black) - Smart Gradient (iteration 1 - red) and Smart Gradient (iteration 2 - blue)
MSE for Smart and Vanilla Gradients at each iteration using Different Dimensional Rosenbrock and Roth Functions
MSE and computational time (to reach optimum) as the function of step-size for different dimensional Rosenbrock Function
Triangulated mesh with 1000 locations on the unit square
The average MSE when using Vanilla and Smart Gradient approaches compared to the exact gradient for different dimensional Rosenbrock and Roth functions
 $\pmb x$ dimension Average MSE Vanilla Gradient Average MSE Smart Gradient Improvement Extended Rosenbrock Function 5 2.60e-04 1.04e-04 2.5 10 2.71e-04 0.78e-04 3.47 25 2.80e-04 0.49e-04 5.71 Extended Freudenstein Roth Function 5 2.26e-04 1.39e-04 1.63 10 2.78e-04 1.42e-04 1.96 25 2.84e-04 1.25e-04 2.27
 $\pmb x$ dimension Average MSE Vanilla Gradient Average MSE Smart Gradient Improvement Extended Rosenbrock Function 5 2.60e-04 1.04e-04 2.5 10 2.71e-04 0.78e-04 3.47 25 2.80e-04 0.49e-04 5.71 Extended Freudenstein Roth Function 5 2.26e-04 1.39e-04 1.63 10 2.78e-04 1.42e-04 1.96 25 2.84e-04 1.25e-04 2.27

Impact Factor: