October  2018, 38(10): 4951-4977. doi: 10.3934/dcds.2018216

A convergence analysis of the perturbed compositional gradient flow: Averaging principle and normal deviations

1. 

Department of Mathematics and Statistics, Missouri University of Science and Technology, (formerly University of Missouri, Rolla), Rolla, MO 65409-0020, USA

2. 

Department of Operations Research and Financial Engineering, Princeton University, Princeton, NJ 08544, USA

* Corresponding author: Wenqing Hu

Received  October 2017 Published  July 2018

Fund Project: The first author is supported by an University of Missouri Research Board (UMRB) grant.

We consider in this work a system of two stochastic differential equations named the perturbed compositional gradient flow. By introducing a separation of fast and slow scales of the two equations, we show that the limit of the slow motion is given by an averaged ordinary differential equation. We then demonstrate that the deviation of the slow motion from the averaged equation, after proper rescaling, converges to a stochastic process with Gaussian inputs. This indicates that the slow motion can be approximated in the weak sense by a standard perturbed gradient flow or the continuous-time stochastic gradient descent algorithm that solves the optimization problem for a composition of two functions. As an application, the perturbed compositional gradient flow corresponds to the diffusion limit of the Stochastic Composite Gradient Descent (SCGD) algorithm for minimizing a composition of two expected-value functions in the optimization literatures. For the strongly convex case, such an analysis implies that the SCGD algorithm has the same convergence time asymptotic as the classical stochastic gradient descent algorithm. Thus it validates, at the level of continuous approximation, the effectiveness of using the SCGD algorithm in the strongly convex case.

Citation: Wenqing Hu, Chris Junchi Li. A convergence analysis of the perturbed compositional gradient flow: Averaging principle and normal deviations. Discrete & Continuous Dynamical Systems - A, 2018, 38 (10) : 4951-4977. doi: 10.3934/dcds.2018216
References:
[1]

L. Arnold, Hasslemann's program revisited: The analysis of stochasticity in deterministic climate models, in Progress in Probability Book Series, Springer, 49, Stochastic Climate Models, 141-157.  Google Scholar

[2]

V. I. Bakhtin, Asymptotics of superregular perturbations of fiber ergodic semigroups, Stochastics and Stochastic Reports, 75 (2003), 295-318.  doi: 10.1080/1045112031000155678.  Google Scholar

[3]

V. Bakhtin and Y. Kifer, Diffusion approximation for slow motion in fully coupled averaging, Probability Theory and Related Fields, 129 (2004), 157-181.  doi: 10.1007/s00440-003-0326-7.  Google Scholar

[4]

A. Benveniste, M. Metivier and P. Priouret, Adaptive algorithms and stochastic approximations, Applications of Mathematics, Springer, 22 (1990), xii+365pp. doi: 10.1007/978-3-642-75894-2.  Google Scholar

[5]

V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint, Cambridge University Press, 2008.  Google Scholar

[6]

S. Cerrai, A Khasminskii's averaging principle for stochastic reaction-diffusion equations, Annals of Applied Probability, 19 (2009), 899-948.  doi: 10.1214/08-AAP560.  Google Scholar

[7]

S. Cerrai, Normal deviations from the averaged motion for some reaction-diffusion equations with fast oscillating perturbation, Journal de Mathématiques Pures et Appliquées, 91 (2009), 614-647.  doi: 10.1016/j.matpur.2009.04.007.  Google Scholar

[8]

Y. Ermoliev, Methods of Stochastic Programming. Monographs in Optimization and OR, Nauka, Moscow, 1976.  Google Scholar

[9]

M. I. Freidlin, Functional Integration and Partial Differential Equations, Princeton University Press, Princeton, 1985. doi: 10.1515/9781400881598.  Google Scholar

[10]

M. Freidlin and A. Wentzell, Random Perturbations of Dynamical Systems, 2$^{nd}$ Edition, Springer, 1998. doi: 10.1007/978-1-4612-0611-8.  Google Scholar

[11]

M. Hairer and G. Pavliotis, Periodic homogenization for hypoelliptic diffusions, Journal of Statistical Physics, 117 (2004), 261-279.  doi: 10.1023/B:JOSS.0000044055.59822.20.  Google Scholar

[12]

K. Hasselmann, Stochastic climate models, Part Ⅰ, Theory, Tellus, 28 (1976), 473-485.   Google Scholar

[13]

W. Hu and C. J. Li, On the fast convergence of random perturbations of the gradient flow, preprint, arXiv: 1706.00837 Google Scholar

[14]

W. Hu, C. J. Li, L. Li and J. Liu, On the diffusion approximation of nonconvex stochastic gradient descent, Annals of Mathematical Science and Applications, to appear. arXiv: 1705.07562 Google Scholar

[15]

R. Khasminskii, On stochastic processes defined by differential equations with a small parameter, Theory of Probability and its Applications, 11 (1966), 211-228.   Google Scholar

[16]

R. Khasminskii, On the principle of averaging the It ô's stochastic differential equations, Kybernetika (Prague), 4 (1968), 260-279.   Google Scholar

[17]

Y. Kifer, The exit problem for small random perturbations of dynamical systems with a hyperbolic fixed point, Israel Journal of Mathematics, 40 (1981), 74-96.  doi: 10.1007/BF02761819.  Google Scholar

[18]

C. Kipnis and S. R. S. Varadhan, Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions, Communications in Mathematical Physics, 104 (1986), 1-19.  doi: 10.1007/BF01210789.  Google Scholar

[19]

H. Kushner and G. George Yin, Stochastic approximations and recursive algorithms and applications, Applications of Mathematics (Stochastic Modeling and Applied Probability), 35, 2$^{nd}$ Edition, Springer, 2003. Google Scholar

[20]

R. J. Muirhead, Aspects of Multivariate Statistical Theory, John Wiley & Sons, Inc., New York, 1982. doi: 10.1109/IGARSS.2010.5653437.  Google Scholar

[21]

B. Oksendal, Stochastic Differential Equations, Springer-Verlag, 1992. doi: 10.1007/978-3-662-02847-6.  Google Scholar

[22]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 1, Annals of Probability, 29 (2001), 1061-1085.  doi: 10.1214/aop/1015345596.  Google Scholar

[23]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 2, Annals of Probability, 31 (2003), 1166-1192.  doi: 10.1214/aop/1055425774.  Google Scholar

[24]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 3, Annals of Probability, 33 (2005), 1111-1133.  doi: 10.1214/009117905000000062.  Google Scholar

[25]

D. Revuz and M. Yor, Continuous martingales and Brownian motion, Grundlehren der Mathematischen Wissenschaften, 293, 3rd edition, Springer-Verlag, Berlin, 1999, xiv+602 pp. doi: 10.1007/978-3-662-06400-9.  Google Scholar

[26]

N. G. Van Kampen, The diffusion approximation for markov processes, Reprinted from: Thermodynamics & kinetics of biological processes (eds. I. Lamprecht and A. I. Zotin) Walter de Gruyter & Co., New York (1982). 181-195. Google Scholar

[27]

V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995. doi: 10.1007/978-1-4757-2440-0.  Google Scholar

[28]

A. Yu. Veretennikov, On polynomial mixing and convergence rate for stochastic difference and differential equations, Theory of Probability and its Applications, 44 (2000), 361-374.  doi: 10.1137/S0040585X97977550.  Google Scholar

[29]

M. WangE. X. Fang and H. Liu, Stochastic compositional gradient descent: Algorithms for minimizing compositions of expected-value functions, Mathematical Programming, 161 (2016), 419-449.  doi: 10.1007/s10107-016-1017-3.  Google Scholar

[30]

M. Wang, J. Liu and E. X. Fang, Accelerating stochastic composition optimization, Advances in Neural Information Processing Systems, 2016. arXiv: 1607.07329 Google Scholar

show all references

References:
[1]

L. Arnold, Hasslemann's program revisited: The analysis of stochasticity in deterministic climate models, in Progress in Probability Book Series, Springer, 49, Stochastic Climate Models, 141-157.  Google Scholar

[2]

V. I. Bakhtin, Asymptotics of superregular perturbations of fiber ergodic semigroups, Stochastics and Stochastic Reports, 75 (2003), 295-318.  doi: 10.1080/1045112031000155678.  Google Scholar

[3]

V. Bakhtin and Y. Kifer, Diffusion approximation for slow motion in fully coupled averaging, Probability Theory and Related Fields, 129 (2004), 157-181.  doi: 10.1007/s00440-003-0326-7.  Google Scholar

[4]

A. Benveniste, M. Metivier and P. Priouret, Adaptive algorithms and stochastic approximations, Applications of Mathematics, Springer, 22 (1990), xii+365pp. doi: 10.1007/978-3-642-75894-2.  Google Scholar

[5]

V. S. Borkar, Stochastic Approximation: A Dynamical Systems Viewpoint, Cambridge University Press, 2008.  Google Scholar

[6]

S. Cerrai, A Khasminskii's averaging principle for stochastic reaction-diffusion equations, Annals of Applied Probability, 19 (2009), 899-948.  doi: 10.1214/08-AAP560.  Google Scholar

[7]

S. Cerrai, Normal deviations from the averaged motion for some reaction-diffusion equations with fast oscillating perturbation, Journal de Mathématiques Pures et Appliquées, 91 (2009), 614-647.  doi: 10.1016/j.matpur.2009.04.007.  Google Scholar

[8]

Y. Ermoliev, Methods of Stochastic Programming. Monographs in Optimization and OR, Nauka, Moscow, 1976.  Google Scholar

[9]

M. I. Freidlin, Functional Integration and Partial Differential Equations, Princeton University Press, Princeton, 1985. doi: 10.1515/9781400881598.  Google Scholar

[10]

M. Freidlin and A. Wentzell, Random Perturbations of Dynamical Systems, 2$^{nd}$ Edition, Springer, 1998. doi: 10.1007/978-1-4612-0611-8.  Google Scholar

[11]

M. Hairer and G. Pavliotis, Periodic homogenization for hypoelliptic diffusions, Journal of Statistical Physics, 117 (2004), 261-279.  doi: 10.1023/B:JOSS.0000044055.59822.20.  Google Scholar

[12]

K. Hasselmann, Stochastic climate models, Part Ⅰ, Theory, Tellus, 28 (1976), 473-485.   Google Scholar

[13]

W. Hu and C. J. Li, On the fast convergence of random perturbations of the gradient flow, preprint, arXiv: 1706.00837 Google Scholar

[14]

W. Hu, C. J. Li, L. Li and J. Liu, On the diffusion approximation of nonconvex stochastic gradient descent, Annals of Mathematical Science and Applications, to appear. arXiv: 1705.07562 Google Scholar

[15]

R. Khasminskii, On stochastic processes defined by differential equations with a small parameter, Theory of Probability and its Applications, 11 (1966), 211-228.   Google Scholar

[16]

R. Khasminskii, On the principle of averaging the It ô's stochastic differential equations, Kybernetika (Prague), 4 (1968), 260-279.   Google Scholar

[17]

Y. Kifer, The exit problem for small random perturbations of dynamical systems with a hyperbolic fixed point, Israel Journal of Mathematics, 40 (1981), 74-96.  doi: 10.1007/BF02761819.  Google Scholar

[18]

C. Kipnis and S. R. S. Varadhan, Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions, Communications in Mathematical Physics, 104 (1986), 1-19.  doi: 10.1007/BF01210789.  Google Scholar

[19]

H. Kushner and G. George Yin, Stochastic approximations and recursive algorithms and applications, Applications of Mathematics (Stochastic Modeling and Applied Probability), 35, 2$^{nd}$ Edition, Springer, 2003. Google Scholar

[20]

R. J. Muirhead, Aspects of Multivariate Statistical Theory, John Wiley & Sons, Inc., New York, 1982. doi: 10.1109/IGARSS.2010.5653437.  Google Scholar

[21]

B. Oksendal, Stochastic Differential Equations, Springer-Verlag, 1992. doi: 10.1007/978-3-662-02847-6.  Google Scholar

[22]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 1, Annals of Probability, 29 (2001), 1061-1085.  doi: 10.1214/aop/1015345596.  Google Scholar

[23]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 2, Annals of Probability, 31 (2003), 1166-1192.  doi: 10.1214/aop/1055425774.  Google Scholar

[24]

E. Pardoux and A. Yu. Veretennikov, On the Possion equation and diffusion approximation 3, Annals of Probability, 33 (2005), 1111-1133.  doi: 10.1214/009117905000000062.  Google Scholar

[25]

D. Revuz and M. Yor, Continuous martingales and Brownian motion, Grundlehren der Mathematischen Wissenschaften, 293, 3rd edition, Springer-Verlag, Berlin, 1999, xiv+602 pp. doi: 10.1007/978-3-662-06400-9.  Google Scholar

[26]

N. G. Van Kampen, The diffusion approximation for markov processes, Reprinted from: Thermodynamics & kinetics of biological processes (eds. I. Lamprecht and A. I. Zotin) Walter de Gruyter & Co., New York (1982). 181-195. Google Scholar

[27]

V. N. Vapnik, The Nature of Statistical Learning Theory, Springer, 1995. doi: 10.1007/978-1-4757-2440-0.  Google Scholar

[28]

A. Yu. Veretennikov, On polynomial mixing and convergence rate for stochastic difference and differential equations, Theory of Probability and its Applications, 44 (2000), 361-374.  doi: 10.1137/S0040585X97977550.  Google Scholar

[29]

M. WangE. X. Fang and H. Liu, Stochastic compositional gradient descent: Algorithms for minimizing compositions of expected-value functions, Mathematical Programming, 161 (2016), 419-449.  doi: 10.1007/s10107-016-1017-3.  Google Scholar

[30]

M. Wang, J. Liu and E. X. Fang, Accelerating stochastic composition optimization, Advances in Neural Information Processing Systems, 2016. arXiv: 1607.07329 Google Scholar

[1]

Martin Heida, Stefan Neukamm, Mario Varga. Stochastic homogenization of $ \Lambda $-convex gradient flows. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 427-453. doi: 10.3934/dcdss.2020328

[2]

Predrag S. Stanimirović, Branislav Ivanov, Haifeng Ma, Dijana Mosić. A survey of gradient methods for solving nonlinear optimization. Electronic Research Archive, 2020, 28 (4) : 1573-1624. doi: 10.3934/era.2020115

[3]

Thomas Frenzel, Matthias Liero. Effective diffusion in thin structures via generalized gradient systems and EDP-convergence. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 395-425. doi: 10.3934/dcdss.2020345

[4]

Giuseppina Guatteri, Federica Masiero. Stochastic maximum principle for problems with delay with dependence on the past through general measures. Mathematical Control & Related Fields, 2020  doi: 10.3934/mcrf.2020048

[5]

Andrea Braides, Antonio Tribuzio. Perturbed minimizing movements of families of functionals. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 373-393. doi: 10.3934/dcdss.2020324

[6]

Li-Bin Liu, Ying Liang, Jian Zhang, Xiaobing Bao. A robust adaptive grid method for singularly perturbed Burger-Huxley equations. Electronic Research Archive, 2020, 28 (4) : 1439-1457. doi: 10.3934/era.2020076

[7]

Marco Ghimenti, Anna Maria Micheletti. Compactness results for linearly perturbed Yamabe problem on manifolds with boundary. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020453

[8]

Zhenzhen Wang, Tianshou Zhou. Asymptotic behaviors and stochastic traveling waves in stochastic Fisher-KPP equations. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020323

[9]

Anna Canale, Francesco Pappalardo, Ciro Tarantino. Weighted multipolar Hardy inequalities and evolution problems with Kolmogorov operators perturbed by singular potentials. Communications on Pure & Applied Analysis, 2021, 20 (1) : 405-425. doi: 10.3934/cpaa.2020274

[10]

Lorenzo Zambotti. A brief and personal history of stochastic partial differential equations. Discrete & Continuous Dynamical Systems - A, 2021, 41 (1) : 471-487. doi: 10.3934/dcds.2020264

[11]

Siyang Cai, Yongmei Cai, Xuerong Mao. A stochastic differential equation SIS epidemic model with regime switching. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020317

[12]

Xuhui Peng, Rangrang Zhang. Approximations of stochastic 3D tamed Navier-Stokes equations. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5337-5365. doi: 10.3934/cpaa.2020241

[13]

Yahia Zare Mehrjerdi. A new methodology for solving bi-criterion fractional stochastic programming. Numerical Algebra, Control & Optimization, 2020  doi: 10.3934/naco.2020054

[14]

Yueyang Zheng, Jingtao Shi. A stackelberg game of backward stochastic differential equations with partial information. Mathematical Control & Related Fields, 2020  doi: 10.3934/mcrf.2020047

[15]

Reza Chaharpashlou, Abdon Atangana, Reza Saadati. On the fuzzy stability results for fractional stochastic Volterra integral equation. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020432

[16]

Christian Beck, Lukas Gonon, Martin Hutzenthaler, Arnulf Jentzen. On existence and uniqueness properties for solutions of stochastic fixed point equations. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020320

[17]

Yangrong Li, Shuang Yang, Qiangheng Zhang. Odd random attractors for stochastic non-autonomous Kuramoto-Sivashinsky equations without dissipation. Electronic Research Archive, 2020, 28 (4) : 1529-1544. doi: 10.3934/era.2020080

[18]

Pengyu Chen. Non-autonomous stochastic evolution equations with nonlinear noise and nonlocal conditions governed by noncompact evolution families. Discrete & Continuous Dynamical Systems - A, 2020  doi: 10.3934/dcds.2020383

[19]

Lin Shi, Xuemin Wang, Dingshi Li. Limiting behavior of non-autonomous stochastic reaction-diffusion equations with colored noise on unbounded thin domains. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5367-5386. doi: 10.3934/cpaa.2020242

[20]

Fathalla A. Rihan, Hebatallah J. Alsakaji. Stochastic delay differential equations of three-species prey-predator system with cooperation among prey species. Discrete & Continuous Dynamical Systems - S, 2020  doi: 10.3934/dcdss.2020468

2019 Impact Factor: 1.338

Metrics

  • PDF downloads (88)
  • HTML views (164)
  • Cited by (0)

Other articles
by authors

[Back to Top]