October  2018, 38(10): 4951-4977. doi: 10.3934/dcds.2018216

A convergence analysis of the perturbed compositional gradient flow: Averaging principle and normal deviations

1. 

Department of Mathematics and Statistics, Missouri University of Science and Technology, (formerly University of Missouri, Rolla), Rolla, MO 65409-0020, USA

2. 

Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA

* Corresponding author: Wenqing Hu

Received  October 2017 Published  July 2018

Fund Project: The first author is supported by an University of Missouri Research Board (UMRB) grant

We consider in this work a system of two stochastic differential equations named the perturbed compositional gradient flow. By introducing a separation of fast and slow scales of the two equations, we show that the limit of the slow motion is given by an averaged ordinary differential equation. We then demonstrate that the deviation of the slow motion from the averaged equation, after proper rescaling, converges to a stochastic process with Gaussian inputs. This indicates that the slow motion can be approximated in the weak sense by a standard perturbed gradient flow or the continuous-time stochastic gradient descent algorithm that solves the optimization problem for a composition of two functions. As an application, the perturbed compositional gradient flow corresponds to the diffusion limit of the Stochastic Composite Gradient Descent (SCGD) algorithm for minimizing a composition of two expected-value functions in the optimization literatures. For the strongly convex case, such an analysis implies that the SCGD algorithm has the same convergence time asymptotic as the classical stochastic gradient descent algorithm. Thus it validates, at the level of continuous approximation, the effectiveness of using the SCGD algorithm in the strongly convex case.

Citation: Wenqing Hu, Chris Junchi Li. A convergence analysis of the perturbed compositional gradient flow: Averaging principle and normal deviations. Discrete & Continuous Dynamical Systems - A, 2018, 38 (10) : 4951-4977. doi: 10.3934/dcds.2018216
References:
[1]

L. Arnold, Hasslemann's program revisited: The analysis of stochasticity in deterministic climate models, in Progress in Probability Book Series, Springer, 49, Stochastic Climate Models, 141-157.

[2]

V. I. Bakhtin, Asymptotics of superregular perturbations of fiber ergodic semigroups, Stochastics and Stochastic Reports, 75 (2003), 295-318. doi: 10.1080/1045112031000155678.

[3]

V. Bakhtin and Y. Kifer, Diffusion approximation for slow motion in fully coupled averaging, Probability Theory and Related Fields, 129 (2004), 157-181. doi: 10.1007/s00440-003-0326-7.

[4]

A. Benveniste, M. Metivier and P. Priouret, Adaptive algorithms and stochastic approximations, Applications of Mathematics, Springer, 22 (1990), xii+365pp. doi: 10.1007/978-3-642-75894-2.

[5]

V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint, Cambridge University Press, 2008.

[6]

S. Cerrai, A Khasminskii's averaging principle for stochastic reaction-diffusion equations, Annals of Applied Probability, 19 (2009), 899-948. doi: 10.1214/08-AAP560.

[7]

S. Cerrai, Normal deviations from the averaged motion for some reaction-diffusion equations with fast oscillating perturbation, Journal de Mathématiques Pures et Appliquées, 91 (2009), 614-647. doi: 10.1016/j.matpur.2009.04.007.

[8]

Y. Ermoliev, Methods of Stochastic Programming. Monographs in Optimization and OR, Nauka, Moscow, 1976.

[9]

M. I. Freidlin, Functional Integration and Partial Differential Equations, Princeton University Press, Princeton, 1985. doi: 10.1515/9781400881598.

[10]

M. Freidlin and A. Wentzell, Random Perturbations of Dynamical Systems, 2$^{nd}$ Edition, Springer, 1998. doi: 10.1007/978-1-4612-0611-8.

[11]

M. Hairer and G. Pavliotis, Periodic homogenization for hypoelliptic diffusions, Journal of Statistical Physics, 117 (2004), 261-279. doi: 10.1023/B:JOSS.0000044055.59822.20.

[12]

K. Hasselmann, Stochastic climate models, Part Ⅰ, Theory, Tellus, 28 (1976), 473-485.

[13]

W. Hu and C. J. Li, On the fast convergence of random perturbations of the gradient flow, preprint, arXiv: 1706.00837

[14]

W. Hu, C. J. Li, L. Li and J. Liu, On the diffusion approximation of nonconvex stochastic gradient descent, Annals of Mathematical Science and Applications, to appear. arXiv: 1705.07562

[15]

R. Khasminskii, On stochastic processes defined by differential equations with a small parameter, Theory of Probability and its Applications, 11 (1966), 211-228.

[16]

R. Khasminskii, On the principle of averaging the It ô's stochastic differential equations, Kybernetika (Prague), 4 (1968), 260-279.

[17]

Y. Kifer, The exit problem for small random perturbations of dynamical systems with a hyperbolic fixed point, Israel Journal of Mathematics, 40 (1981), 74-96. doi: 10.1007/BF02761819.

[18]

C. Kipnis and S. R. S. Varadhan, Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions, Communications in Mathematical Physics, 104 (1986), 1-19. doi: 10.1007/BF01210789.

[19]

H. Kushner and G. George Yin, Stochastic approximations and recursive algorithms and applications, Applications of Mathematics (Stochastic Modeling and Applied Probability), 35, 2$^{nd}$ Edition, Springer, 2003.

[20]

R. J. Muirhead, Aspects of Multivariate Statistical Theory, John Wiley & Sons, Inc., New York, 1982. doi: 10.1109/IGARSS.2010.5653437.

[21]

B. Oksendal, Stochastic Differential Equations, Springer-Verlag, 1992. doi: 10.1007/978-3-662-02847-6.

[22]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 1, Annals of Probability, 29 (2001), 1061-1085. doi: 10.1214/aop/1015345596.

[23]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 2, Annals of Probability, 31 (2003), 1166-1192. doi: 10.1214/aop/1055425774.

[24]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 3, Annals of Probability, 33 (2005), 1111-1133. doi: 10.1214/009117905000000062.

[25]

D. Revuz and M. Yor, Continuous martingales and Brownian motion, Grundlehren der Mathematischen Wissenschaften, 293, 3rd edition, Springer-Verlag, Berlin, 1999, xiv+602 pp. doi: 10.1007/978-3-662-06400-9.

[26]

N. G. Van Kampen, The diffusion approximation for markov processes, Reprinted from: Thermodynamics & kinetics of biological processes (eds. I. Lamprecht and A. I. Zotin) Walter de Gruyter & Co., New York (1982). 181-195.

[27]

V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995. doi: 10.1007/978-1-4757-2440-0.

[28]

A. Yu. Veretennikov, On polynomial mixing and convergence rate for stochastic difference and differential equations, Theory of Probability and its Applications, 44 (2000), 361-374. doi: 10.1137/S0040585X97977550.

[29]

M. WangE. X. Fang and H. Liu, Stochastic compositional gradient descent: Algorithms for minimizing compositions of expected-value functions, Mathematical Programming, 161 (2016), 419-449. doi: 10.1007/s10107-016-1017-3.

[30]

M. Wang, J. Liu and E. X. Fang, Accelerating stochastic composition optimization, Advances in Neural Information Processing Systems, 2016. arXiv: 1607.07329

show all references

References:
[1]

L. Arnold, Hasslemann's program revisited: The analysis of stochasticity in deterministic climate models, in Progress in Probability Book Series, Springer, 49, Stochastic Climate Models, 141-157.

[2]

V. I. Bakhtin, Asymptotics of superregular perturbations of fiber ergodic semigroups, Stochastics and Stochastic Reports, 75 (2003), 295-318. doi: 10.1080/1045112031000155678.

[3]

V. Bakhtin and Y. Kifer, Diffusion approximation for slow motion in fully coupled averaging, Probability Theory and Related Fields, 129 (2004), 157-181. doi: 10.1007/s00440-003-0326-7.

[4]

A. Benveniste, M. Metivier and P. Priouret, Adaptive algorithms and stochastic approximations, Applications of Mathematics, Springer, 22 (1990), xii+365pp. doi: 10.1007/978-3-642-75894-2.

[5]

V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint, Cambridge University Press, 2008.

[6]

S. Cerrai, A Khasminskii's averaging principle for stochastic reaction-diffusion equations, Annals of Applied Probability, 19 (2009), 899-948. doi: 10.1214/08-AAP560.

[7]

S. Cerrai, Normal deviations from the averaged motion for some reaction-diffusion equations with fast oscillating perturbation, Journal de Mathématiques Pures et Appliquées, 91 (2009), 614-647. doi: 10.1016/j.matpur.2009.04.007.

[8]

Y. Ermoliev, Methods of Stochastic Programming. Monographs in Optimization and OR, Nauka, Moscow, 1976.

[9]

M. I. Freidlin, Functional Integration and Partial Differential Equations, Princeton University Press, Princeton, 1985. doi: 10.1515/9781400881598.

[10]

M. Freidlin and A. Wentzell, Random Perturbations of Dynamical Systems, 2$^{nd}$ Edition, Springer, 1998. doi: 10.1007/978-1-4612-0611-8.

[11]

M. Hairer and G. Pavliotis, Periodic homogenization for hypoelliptic diffusions, Journal of Statistical Physics, 117 (2004), 261-279. doi: 10.1023/B:JOSS.0000044055.59822.20.

[12]

K. Hasselmann, Stochastic climate models, Part Ⅰ, Theory, Tellus, 28 (1976), 473-485.

[13]

W. Hu and C. J. Li, On the fast convergence of random perturbations of the gradient flow, preprint, arXiv: 1706.00837

[14]

W. Hu, C. J. Li, L. Li and J. Liu, On the diffusion approximation of nonconvex stochastic gradient descent, Annals of Mathematical Science and Applications, to appear. arXiv: 1705.07562

[15]

R. Khasminskii, On stochastic processes defined by differential equations with a small parameter, Theory of Probability and its Applications, 11 (1966), 211-228.

[16]

R. Khasminskii, On the principle of averaging the It ô's stochastic differential equations, Kybernetika (Prague), 4 (1968), 260-279.

[17]

Y. Kifer, The exit problem for small random perturbations of dynamical systems with a hyperbolic fixed point, Israel Journal of Mathematics, 40 (1981), 74-96. doi: 10.1007/BF02761819.

[18]

C. Kipnis and S. R. S. Varadhan, Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions, Communications in Mathematical Physics, 104 (1986), 1-19. doi: 10.1007/BF01210789.

[19]

H. Kushner and G. George Yin, Stochastic approximations and recursive algorithms and applications, Applications of Mathematics (Stochastic Modeling and Applied Probability), 35, 2$^{nd}$ Edition, Springer, 2003.

[20]

R. J. Muirhead, Aspects of Multivariate Statistical Theory, John Wiley & Sons, Inc., New York, 1982. doi: 10.1109/IGARSS.2010.5653437.

[21]

B. Oksendal, Stochastic Differential Equations, Springer-Verlag, 1992. doi: 10.1007/978-3-662-02847-6.

[22]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 1, Annals of Probability, 29 (2001), 1061-1085. doi: 10.1214/aop/1015345596.

[23]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 2, Annals of Probability, 31 (2003), 1166-1192. doi: 10.1214/aop/1055425774.

[24]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 3, Annals of Probability, 33 (2005), 1111-1133. doi: 10.1214/009117905000000062.

[25]

D. Revuz and M. Yor, Continuous martingales and Brownian motion, Grundlehren der Mathematischen Wissenschaften, 293, 3rd edition, Springer-Verlag, Berlin, 1999, xiv+602 pp. doi: 10.1007/978-3-662-06400-9.

[26]

N. G. Van Kampen, The diffusion approximation for markov processes, Reprinted from: Thermodynamics & kinetics of biological processes (eds. I. Lamprecht and A. I. Zotin) Walter de Gruyter & Co., New York (1982). 181-195.

[27]

V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995. doi: 10.1007/978-1-4757-2440-0.

[28]

A. Yu. Veretennikov, On polynomial mixing and convergence rate for stochastic difference and differential equations, Theory of Probability and its Applications, 44 (2000), 361-374. doi: 10.1137/S0040585X97977550.

[29]

M. WangE. X. Fang and H. Liu, Stochastic compositional gradient descent: Algorithms for minimizing compositions of expected-value functions, Mathematical Programming, 161 (2016), 419-449. doi: 10.1007/s10107-016-1017-3.

[30]

M. Wang, J. Liu and E. X. Fang, Accelerating stochastic composition optimization, Advances in Neural Information Processing Systems, 2016. arXiv: 1607.07329

[1]

Timothy Blass, Rafael De La Llave, Enrico Valdinoci. A comparison principle for a Sobolev gradient semi-flow. Communications on Pure & Applied Analysis, 2011, 10 (1) : 69-91. doi: 10.3934/cpaa.2011.10.69

[2]

Shishun Li, Zhengda Huang. Guaranteed descent conjugate gradient methods with modified secant condition. Journal of Industrial & Management Optimization, 2008, 4 (4) : 739-755. doi: 10.3934/jimo.2008.4.739

[3]

Wataru Nakamura, Yasushi Narushima, Hiroshi Yabe. Nonlinear conjugate gradient methods with sufficient descent properties for unconstrained optimization. Journal of Industrial & Management Optimization, 2013, 9 (3) : 595-619. doi: 10.3934/jimo.2013.9.595

[4]

Xiaming Chen. Kernel-based online gradient descent using distributed approach. Mathematical Foundations of Computing, 2019, 2 (1) : 1-9. doi: 10.3934/mfc.2019001

[5]

Yong Xu, Bin Pei, Rong Guo. Stochastic averaging for slow-fast dynamical systems with fractional Brownian motion. Discrete & Continuous Dynamical Systems - B, 2015, 20 (7) : 2257-2267. doi: 10.3934/dcdsb.2015.20.2257

[6]

Matthias Erbar, Jan Maas. Gradient flow structures for discrete porous medium equations. Discrete & Continuous Dynamical Systems - A, 2014, 34 (4) : 1355-1374. doi: 10.3934/dcds.2014.34.1355

[7]

Bertram Düring, Daniel Matthes, Josipa Pina Milišić. A gradient flow scheme for nonlinear fourth order equations. Discrete & Continuous Dynamical Systems - B, 2010, 14 (3) : 935-959. doi: 10.3934/dcdsb.2010.14.935

[8]

Saman Babaie–Kafaki, Reza Ghanbari. A class of descent four–term extension of the Dai–Liao conjugate gradient method based on the scaled memoryless BFGS update. Journal of Industrial & Management Optimization, 2017, 13 (2) : 649-658. doi: 10.3934/jimo.2016038

[9]

Gaohang Yu, Lutai Guan, Guoyin Li. Global convergence of modified Polak-Ribière-Polyak conjugate gradient methods with sufficient descent property. Journal of Industrial & Management Optimization, 2008, 4 (3) : 565-579. doi: 10.3934/jimo.2008.4.565

[10]

Rubén Caballero, Alexandre N. Carvalho, Pedro Marín-Rubio, José Valero. Robustness of dynamically gradient multivalued dynamical systems. Discrete & Continuous Dynamical Systems - B, 2019, 24 (3) : 1049-1077. doi: 10.3934/dcdsb.2019006

[11]

Delio Mugnolo, René Pröpper. Gradient systems on networks. Conference Publications, 2011, 2011 (Special) : 1078-1090. doi: 10.3934/proc.2011.2011.1078

[12]

Matthias Erbar, Max Fathi, Vaios Laschos, André Schlichting. Gradient flow structure for McKean-Vlasov equations on discrete spaces. Discrete & Continuous Dynamical Systems - A, 2016, 36 (12) : 6799-6833. doi: 10.3934/dcds.2016096

[13]

Jonathan Zinsl. The gradient flow of a generalized Fisher information functional with respect to modified Wasserstein distances. Discrete & Continuous Dynamical Systems - S, 2017, 10 (4) : 919-933. doi: 10.3934/dcdss.2017047

[14]

K.H. Wong, C. Myburgh, L. Omari. A gradient flow approach for computing jump linear quadratic optimal feedback gains. Discrete & Continuous Dynamical Systems - A, 2000, 6 (4) : 803-808. doi: 10.3934/dcds.2000.6.803

[15]

Dmitrii Rachinskii. Realization of arbitrary hysteresis by a low-dimensional gradient flow. Discrete & Continuous Dynamical Systems - B, 2016, 21 (1) : 227-243. doi: 10.3934/dcdsb.2016.21.227

[16]

Zheng Sun, José A. Carrillo, Chi-Wang Shu. An entropy stable high-order discontinuous Galerkin method for cross-diffusion gradient flow systems. Kinetic & Related Models, 2019, 12 (4) : 885-908. doi: 10.3934/krm.2019033

[17]

Siniša Slijepčević. Extended gradient systems: Dimension one. Discrete & Continuous Dynamical Systems - A, 2000, 6 (3) : 503-518. doi: 10.3934/dcds.2000.6.503

[18]

Yong Xu, Rong Guo, Di Liu, Huiqing Zhang, Jinqiao Duan. Stochastic averaging principle for dynamical systems with fractional Brownian motion. Discrete & Continuous Dynamical Systems - B, 2014, 19 (4) : 1197-1212. doi: 10.3934/dcdsb.2014.19.1197

[19]

Fabrice Bethuel, Didier Smets. Slow motion for equal depth multiple-well gradient systems: The degenerate case. Discrete & Continuous Dynamical Systems - A, 2013, 33 (1) : 67-87. doi: 10.3934/dcds.2013.33.67

[20]

Anna Amirdjanova, Jie Xiong. Large deviation principle for a stochastic navier-Stokes equation in its vorticity form for a two-dimensional incompressible flow. Discrete & Continuous Dynamical Systems - B, 2006, 6 (4) : 651-666. doi: 10.3934/dcdsb.2006.6.651

2017 Impact Factor: 1.179

Metrics

  • PDF downloads (43)
  • HTML views (107)
  • Cited by (0)

Other articles
by authors

[Back to Top]