
-
Previous Article
Bounded consensus of double-integrator stochastic multi-agent systems
- DCDS-S Home
- This Issue
-
Next Article
Gaussian mixture models for clustering and calibration of ensemble weather forecasts
Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.
Readers can access Online First articles via the “Online First” tab for the selected journal.
A backward SDE method for uncertainty quantification in deep learning
1. | Computational Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA |
2. | Department of Mathematics, Florida State University, Tallahassee, Florida, USA |
3. | Department of Mathematics and Statistics, Auburn University, Auburn, Alabama, USA |
4. | School of Mathematics, Jilin University, Changchun, China |
We develop a backward stochastic differential equation based probabilistic machine learning method, which formulates a class of stochastic neural networks as a stochastic optimal control problem. An efficient stochastic gradient descent algorithm is introduced with the gradient computed through a backward stochastic differential equation. Convergence analysis for stochastic gradient descent optimization and numerical experiments for applications of stochastic neural networks are carried out to validate our methodology in both theory and performance.
References:
[1] |
R. T. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, Advances in Neural Information Processing Systems, (2018), 6571–6583. |
[2] |
T. Chen, E. B. Fox and C. Guestrin, Stochastic gradient hamiltonian monte carlo, Proceedings of the 31st International Conference on Machine Learning, (2014). |
[3] |
B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen and L. Song,
SBEED: Convergent reinforcement learning with nonlinear function approximation, Proceedings of Machine Learning Research, Stockholmsmässan, Stockholm Sweden, PMLR, 80 (2018), 1125-1134.
|
[4] |
W. E, J. Han and Q. Li, A mean-field optimal control formulation of deep learning, Research in the Mathematical Sciences, 6 (2019), 41 pp.
doi: 10.1007/s40687-018-0172-y. |
[5] |
N. El Karoui, S. Peng and M. C. Quenez,
Backward stochastic differential equations in finance, Math. Finance, 7 (1997), 1-71.
doi: 10.1111/1467-9965.00022. |
[6] |
C. Fang, Z. Lin and T. Zhang, Sharp analysis for nonconvex SGD escaping from saddle points, Conference on Learning Theory, (2019), 1192–1234. |
[7] |
X. Feng, R. Glowinski and M. Neilan,
Recent developments in numerical methods for fully nonlinear second order partial differential equations, SIAM Rev., 55 (2013), 205-267.
doi: 10.1137/110825960. |
[8] |
Z. Ghahramani,
Probabilistic machine learning and artificial intelligence, Nature, 521 (2015), 452-459.
doi: 10.1038/nature14541. |
[9] |
B. Gong, W. Liu, T. Tang, W. Zhao and T. Zhou,
An efficient gradient projection method for stochastic optimal control problems, SIAM J. Numer. Anal., 55 (2017), 2982-3005.
doi: 10.1137/17M1123559. |
[10] |
E. Haber and L. Ruthotto, Stable architectures for deep neural networks, Inverse Problems, 34 (2017), 014004, 22 pp.
doi: 10.1088/1361-6420/aa9a90. |
[11] |
E. Haber, L. Ruthotto, E. Holtham and S. Jun, Learning across scales - multiscale methods for convolution neural networks, (2017), arXiv: 1703.02009v2. |
[12] |
K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016).
doi: 10.1109/CVPR.2016.90. |
[13] |
J. M. Hernandez-Lobato and R. P. Adams, Probabilistic backpropagation for scalable learning of bayesian neural networks, Proceedings of the 32nd International Conference on Machine Learning, (2015). |
[14] |
P. Jain and P. Kar, Non-convex optimization for machine learning, Foundations and Trends® in Machine Learning, 10 (2017), 142–336.
doi: 10.1561/9781680833690. |
[15] |
J. Jia and A. Benson, Neural jump stochastic differential equations, 33rd Conference on Neural Information Processing Systems, (2019). |
[16] |
P. Kidger and T. Lyons,
Universal approximation with deep narrow networks, Proceedings of Machine Learning Research, PMLR, 125 (2020), 2306-2327.
|
[17] |
L. Kong, J. Sun and C. Zhang, Sde-net: Equipping deep neural networks with uncertainty estimates, Proceedings of the 37th International Conference on Machine Learning, (2020). |
[18] |
X. Li, T. Wong, T. Chen and D. Duvenaud, Scalable gradients for stochastic differential equations, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, (2020), 3870–3882. |
[19] |
X. Liu, T. Xiao, S. Si, Q. Cao, S. K. Kumar and C.-J. Hsieh, Neural sde: Stabilizing neural ode networks with stochastic noise, arXiv preprint, (2019), arXiv: 1906.02355. |
[20] |
J. Ma, P. Protter and J. Yong,
Solving forward-backward stochastic differential equations explicitly–A four step scheme, Probab. Theory Related Fields, 98 (1994), 339-359.
doi: 10.1007/BF01192258. |
[21] |
J. Ma and J. Zhang,
Representation theorems for backward stochastic differential equations, Ann. Appl. Probab., 12 (2002), 1390-1418.
doi: 10.1214/aoap/1037125868. |
[22] |
G. N. Milstein and M. V. Tretyakov,
Numerical algorithms for forward-backward stochastic differential equations, SIAM J. Sci. Comput., 28 (2006), 561-582.
doi: 10.1137/040614426. |
[23] |
M. Morzfeld, M. S. Day, R. W. Grout, G. S. H. Pau, S. A. Finsterle and J. B. Bell, Iterative importance sampling algorithms for parameter estimation, SIAM J. Sci. Comput., 40 (2018), B329–B352.
doi: 10.1137/16M1088417. |
[24] |
S. G. Peng,
A general stochastic maximum principle for optimal control problems, SIAM J. Control Optim., 28 (1990), 966-979.
doi: 10.1137/0328054. |
[25] |
H. Pham,
On some recent aspects of stochastic control and their applications, Probab. Surv., 2 (2005), 506-549.
doi: 10.1214/154957805100000195. |
[26] |
J. T. Springenberg, A. Klein, S. Falkner and F. Hutter,
Bayesian optimization with robust bayesian neural networks, Advances in Neural Information Processing Systems, Curran Associates, Inc., 29 (2016), 4134-4142.
|
[27] |
B. Tzen and M. Raginsky, Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit, arXiv, 2019. |
[28] |
M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient langevin dynamics, Proceedings of the 28th International Conference on Machine Learning, (2011). |
[29] |
J. Yong and X. Y. Zhou, Stochastic controls. Hamiltonian Systems and HJB Equations, Applications of Mathematics (New York), 43. Springer-Verlag, New York, 1999.
doi: 10.1007/978-1-4612-1466-3. |
[30] |
J. Zhang,
A numerical scheme for BSDEs, Ann. Appl. Probab., 14 (2004), 459-488.
doi: 10.1214/aoap/1075828058. |
[31] |
W. Zhao, L. Chen and S. Peng,
A new kind of accurate numerical method for backward stochastic differential equations, SIAM J. Sci. Comput., 28 (2006), 1563-1581.
doi: 10.1137/05063341X. |
show all references
References:
[1] |
R. T. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, Advances in Neural Information Processing Systems, (2018), 6571–6583. |
[2] |
T. Chen, E. B. Fox and C. Guestrin, Stochastic gradient hamiltonian monte carlo, Proceedings of the 31st International Conference on Machine Learning, (2014). |
[3] |
B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen and L. Song,
SBEED: Convergent reinforcement learning with nonlinear function approximation, Proceedings of Machine Learning Research, Stockholmsmässan, Stockholm Sweden, PMLR, 80 (2018), 1125-1134.
|
[4] |
W. E, J. Han and Q. Li, A mean-field optimal control formulation of deep learning, Research in the Mathematical Sciences, 6 (2019), 41 pp.
doi: 10.1007/s40687-018-0172-y. |
[5] |
N. El Karoui, S. Peng and M. C. Quenez,
Backward stochastic differential equations in finance, Math. Finance, 7 (1997), 1-71.
doi: 10.1111/1467-9965.00022. |
[6] |
C. Fang, Z. Lin and T. Zhang, Sharp analysis for nonconvex SGD escaping from saddle points, Conference on Learning Theory, (2019), 1192–1234. |
[7] |
X. Feng, R. Glowinski and M. Neilan,
Recent developments in numerical methods for fully nonlinear second order partial differential equations, SIAM Rev., 55 (2013), 205-267.
doi: 10.1137/110825960. |
[8] |
Z. Ghahramani,
Probabilistic machine learning and artificial intelligence, Nature, 521 (2015), 452-459.
doi: 10.1038/nature14541. |
[9] |
B. Gong, W. Liu, T. Tang, W. Zhao and T. Zhou,
An efficient gradient projection method for stochastic optimal control problems, SIAM J. Numer. Anal., 55 (2017), 2982-3005.
doi: 10.1137/17M1123559. |
[10] |
E. Haber and L. Ruthotto, Stable architectures for deep neural networks, Inverse Problems, 34 (2017), 014004, 22 pp.
doi: 10.1088/1361-6420/aa9a90. |
[11] |
E. Haber, L. Ruthotto, E. Holtham and S. Jun, Learning across scales - multiscale methods for convolution neural networks, (2017), arXiv: 1703.02009v2. |
[12] |
K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016).
doi: 10.1109/CVPR.2016.90. |
[13] |
J. M. Hernandez-Lobato and R. P. Adams, Probabilistic backpropagation for scalable learning of bayesian neural networks, Proceedings of the 32nd International Conference on Machine Learning, (2015). |
[14] |
P. Jain and P. Kar, Non-convex optimization for machine learning, Foundations and Trends® in Machine Learning, 10 (2017), 142–336.
doi: 10.1561/9781680833690. |
[15] |
J. Jia and A. Benson, Neural jump stochastic differential equations, 33rd Conference on Neural Information Processing Systems, (2019). |
[16] |
P. Kidger and T. Lyons,
Universal approximation with deep narrow networks, Proceedings of Machine Learning Research, PMLR, 125 (2020), 2306-2327.
|
[17] |
L. Kong, J. Sun and C. Zhang, Sde-net: Equipping deep neural networks with uncertainty estimates, Proceedings of the 37th International Conference on Machine Learning, (2020). |
[18] |
X. Li, T. Wong, T. Chen and D. Duvenaud, Scalable gradients for stochastic differential equations, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, (2020), 3870–3882. |
[19] |
X. Liu, T. Xiao, S. Si, Q. Cao, S. K. Kumar and C.-J. Hsieh, Neural sde: Stabilizing neural ode networks with stochastic noise, arXiv preprint, (2019), arXiv: 1906.02355. |
[20] |
J. Ma, P. Protter and J. Yong,
Solving forward-backward stochastic differential equations explicitly–A four step scheme, Probab. Theory Related Fields, 98 (1994), 339-359.
doi: 10.1007/BF01192258. |
[21] |
J. Ma and J. Zhang,
Representation theorems for backward stochastic differential equations, Ann. Appl. Probab., 12 (2002), 1390-1418.
doi: 10.1214/aoap/1037125868. |
[22] |
G. N. Milstein and M. V. Tretyakov,
Numerical algorithms for forward-backward stochastic differential equations, SIAM J. Sci. Comput., 28 (2006), 561-582.
doi: 10.1137/040614426. |
[23] |
M. Morzfeld, M. S. Day, R. W. Grout, G. S. H. Pau, S. A. Finsterle and J. B. Bell, Iterative importance sampling algorithms for parameter estimation, SIAM J. Sci. Comput., 40 (2018), B329–B352.
doi: 10.1137/16M1088417. |
[24] |
S. G. Peng,
A general stochastic maximum principle for optimal control problems, SIAM J. Control Optim., 28 (1990), 966-979.
doi: 10.1137/0328054. |
[25] |
H. Pham,
On some recent aspects of stochastic control and their applications, Probab. Surv., 2 (2005), 506-549.
doi: 10.1214/154957805100000195. |
[26] |
J. T. Springenberg, A. Klein, S. Falkner and F. Hutter,
Bayesian optimization with robust bayesian neural networks, Advances in Neural Information Processing Systems, Curran Associates, Inc., 29 (2016), 4134-4142.
|
[27] |
B. Tzen and M. Raginsky, Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit, arXiv, 2019. |
[28] |
M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient langevin dynamics, Proceedings of the 28th International Conference on Machine Learning, (2011). |
[29] |
J. Yong and X. Y. Zhou, Stochastic controls. Hamiltonian Systems and HJB Equations, Applications of Mathematics (New York), 43. Springer-Verlag, New York, 1999.
doi: 10.1007/978-1-4612-1466-3. |
[30] |
J. Zhang,
A numerical scheme for BSDEs, Ann. Appl. Probab., 14 (2004), 459-488.
doi: 10.1214/aoap/1075828058. |
[31] |
W. Zhao, L. Chen and S. Peng,
A new kind of accurate numerical method for backward stochastic differential equations, SIAM J. Sci. Comput., 28 (2006), 1563-1581.
doi: 10.1137/05063341X. |













Algorithm 1. |
Formulate the SNN model (1) as the stochastic optimal control problem (3) - (5) and give a partition Choose the number of SGD iteration steps for SGD iteration steps : Simulate one realization of the state process : Simulate one pair of solution paths : Calculate the gradient process and update the estimated optimal control end for - The estimated optimal control is given by |
Algorithm 1. |
Formulate the SNN model (1) as the stochastic optimal control problem (3) - (5) and give a partition Choose the number of SGD iteration steps for SGD iteration steps : Simulate one realization of the state process : Simulate one pair of solution paths : Calculate the gradient process and update the estimated optimal control end for - The estimated optimal control is given by |
[1] |
Feng Bao, Thomas Maier. Stochastic gradient descent algorithm for stochastic optimization in solving analytic continuation problems. Foundations of Data Science, 2020, 2 (1) : 1-17. doi: 10.3934/fods.2020001 |
[2] |
Yacine Chitour, Zhenyu Liao, Romain Couillet. A geometric approach of gradient descent algorithms in linear neural networks. Mathematical Control and Related Fields, 2022 doi: 10.3934/mcrf.2022021 |
[3] |
Jiongmin Yong. Stochastic optimal control — A concise introduction. Mathematical Control and Related Fields, 2020 doi: 10.3934/mcrf.2020027 |
[4] |
Meiyu Sui, Yejuan Wang, Peter E. Kloeden. Pullback attractors for stochastic recurrent neural networks with discrete and distributed delays. Electronic Research Archive, 2021, 29 (2) : 2187-2221. doi: 10.3934/era.2020112 |
[5] |
Liangquan Zhang, Qing Zhou, Juan Yang. Necessary condition for optimal control of doubly stochastic systems. Mathematical Control and Related Fields, 2020, 10 (2) : 379-403. doi: 10.3934/mcrf.2020002 |
[6] |
Diana Keller. Optimal control of a linear stochastic Schrödinger equation. Conference Publications, 2013, 2013 (special) : 437-446. doi: 10.3934/proc.2013.2013.437 |
[7] |
Fulvia Confortola, Elisa Mastrogiacomo. Optimal control for stochastic heat equation with memory. Evolution Equations and Control Theory, 2014, 3 (1) : 35-58. doi: 10.3934/eect.2014.3.35 |
[8] |
Ruoxia Li, Huaiqin Wu, Xiaowei Zhang, Rong Yao. Adaptive projective synchronization of memristive neural networks with time-varying delays and stochastic perturbation. Mathematical Control and Related Fields, 2015, 5 (4) : 827-844. doi: 10.3934/mcrf.2015.5.827 |
[9] |
Pierre Guiraud, Etienne Tanré. Stability of synchronization under stochastic perturbations in leaky integrate and fire neural networks of finite size. Discrete and Continuous Dynamical Systems - B, 2019, 24 (9) : 5183-5201. doi: 10.3934/dcdsb.2019056 |
[10] |
Ali Messaoudi, Rafael Asmat Uceda. Stochastic adding machine and $2$-dimensional Julia sets. Discrete and Continuous Dynamical Systems, 2014, 34 (12) : 5247-5269. doi: 10.3934/dcds.2014.34.5247 |
[11] |
Yuri B. Gaididei, Carlos Gorria, Rainer Berkemer, Peter L. Christiansen, Atsushi Kawamoto, Mads P. Sørensen, Jens Starke. Stochastic control of traffic patterns. Networks and Heterogeneous Media, 2013, 8 (1) : 261-273. doi: 10.3934/nhm.2013.8.261 |
[12] |
Tyrone E. Duncan. Some topics in stochastic control. Discrete and Continuous Dynamical Systems - B, 2010, 14 (4) : 1361-1373. doi: 10.3934/dcdsb.2010.14.1361 |
[13] |
K Najarian. On stochastic stability of dynamic neural models in presence of noise. Conference Publications, 2003, 2003 (Special) : 656-663. doi: 10.3934/proc.2003.2003.656 |
[14] |
Ishak Alia. Time-inconsistent stochastic optimal control problems: a backward stochastic partial differential equations approach. Mathematical Control and Related Fields, 2020, 10 (4) : 785-826. doi: 10.3934/mcrf.2020020 |
[15] |
María Teresa V. Martínez-Palacios, Adrián Hernández-Del-Valle, Ambrosio Ortiz-Ramírez. On the pricing of Asian options with geometric average of American type with stochastic interest rate: A stochastic optimal control approach. Journal of Dynamics and Games, 2019, 6 (1) : 53-64. doi: 10.3934/jdg.2019004 |
[16] |
Ali Delavarkhalafi. On optimal stochastic jumps in multi server queue with impatient customers via stochastic control. Numerical Algebra, Control and Optimization, 2021 doi: 10.3934/naco.2021030 |
[17] |
Zhen Wu, Feng Zhang. Maximum principle for discrete-time stochastic optimal control problem and stochastic game. Mathematical Control and Related Fields, 2022, 12 (2) : 475-493. doi: 10.3934/mcrf.2021031 |
[18] |
Hiroaki Morimoto. Optimal harvesting and planting control in stochastic logistic population models. Discrete and Continuous Dynamical Systems - B, 2012, 17 (7) : 2545-2559. doi: 10.3934/dcdsb.2012.17.2545 |
[19] |
Fulvia Confortola, Elisa Mastrogiacomo. Feedback optimal control for stochastic Volterra equations with completely monotone kernels. Mathematical Control and Related Fields, 2015, 5 (2) : 191-235. doi: 10.3934/mcrf.2015.5.191 |
[20] |
Yufeng Shi, Tianxiao Wang, Jiongmin Yong. Optimal control problems of forward-backward stochastic Volterra integral equations. Mathematical Control and Related Fields, 2015, 5 (3) : 613-649. doi: 10.3934/mcrf.2015.5.613 |
2021 Impact Factor: 1.865
Tools
Metrics
Other articles
by authors
[Back to Top]