# American Institute of Mathematical Sciences

doi: 10.3934/dcdss.2022062
Online First

Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.

Readers can access Online First articles via the “Online First” tab for the selected journal.

## A backward SDE method for uncertainty quantification in deep learning

 1 Computational Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA 2 Department of Mathematics, Florida State University, Tallahassee, Florida, USA 3 Department of Mathematics and Statistics, Auburn University, Auburn, Alabama, USA 4 School of Mathematics, Jilin University, Changchun, China

* Corresponding author: He Zhang

Received  September 2021 Revised  January 2022 Early access March 2022

Fund Project: The second and third authors are partially supported by U.S. Department of Energy under grant numbers DE-SC0022297 and DE-SC0022253, the last author is supported by NSFC12071175 and Science and Technology Development of Jilin Province, China no. 201902013020

We develop a backward stochastic differential equation based probabilistic machine learning method, which formulates a class of stochastic neural networks as a stochastic optimal control problem. An efficient stochastic gradient descent algorithm is introduced with the gradient computed through a backward stochastic differential equation. Convergence analysis for stochastic gradient descent optimization and numerical experiments for applications of stochastic neural networks are carried out to validate our methodology in both theory and performance.

Citation: Richard Archibald, Feng Bao, Yanzhao Cao, He Zhang. A backward SDE method for uncertainty quantification in deep learning. Discrete and Continuous Dynamical Systems - S, doi: 10.3934/dcdss.2022062
##### References:
 [1] R. T. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, Advances in Neural Information Processing Systems, (2018), 6571–6583. [2] T. Chen, E. B. Fox and C. Guestrin, Stochastic gradient hamiltonian monte carlo, Proceedings of the 31st International Conference on Machine Learning, (2014). [3] B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen and L. Song, SBEED: Convergent reinforcement learning with nonlinear function approximation, Proceedings of Machine Learning Research, Stockholmsmässan, Stockholm Sweden, PMLR, 80 (2018), 1125-1134. [4] W. E, J. Han and Q. Li, A mean-field optimal control formulation of deep learning, Research in the Mathematical Sciences, 6 (2019), 41 pp. doi: 10.1007/s40687-018-0172-y. [5] N. El Karoui, S. Peng and M. C. Quenez, Backward stochastic differential equations in finance, Math. Finance, 7 (1997), 1-71.  doi: 10.1111/1467-9965.00022. [6] C. Fang, Z. Lin and T. Zhang, Sharp analysis for nonconvex SGD escaping from saddle points, Conference on Learning Theory, (2019), 1192–1234. [7] X. Feng, R. Glowinski and M. Neilan, Recent developments in numerical methods for fully nonlinear second order partial differential equations, SIAM Rev., 55 (2013), 205-267.  doi: 10.1137/110825960. [8] Z. Ghahramani, Probabilistic machine learning and artificial intelligence, Nature, 521 (2015), 452-459.  doi: 10.1038/nature14541. [9] B. Gong, W. Liu, T. Tang, W. Zhao and T. Zhou, An efficient gradient projection method for stochastic optimal control problems, SIAM J. Numer. Anal., 55 (2017), 2982-3005.  doi: 10.1137/17M1123559. [10] E. Haber and L. Ruthotto, Stable architectures for deep neural networks, Inverse Problems, 34 (2017), 014004, 22 pp. doi: 10.1088/1361-6420/aa9a90. [11] E. Haber, L. Ruthotto, E. Holtham and S. Jun, Learning across scales - multiscale methods for convolution neural networks, (2017), arXiv: 1703.02009v2. [12] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016). doi: 10.1109/CVPR.2016.90. [13] J. M. Hernandez-Lobato and R. P. Adams, Probabilistic backpropagation for scalable learning of bayesian neural networks, Proceedings of the 32nd International Conference on Machine Learning, (2015). [14] P. Jain and P. Kar, Non-convex optimization for machine learning, Foundations and Trends® in Machine Learning, 10 (2017), 142–336. doi: 10.1561/9781680833690. [15] J. Jia and A. Benson, Neural jump stochastic differential equations, 33rd Conference on Neural Information Processing Systems, (2019). [16] P. Kidger and T. Lyons, Universal approximation with deep narrow networks, Proceedings of Machine Learning Research, PMLR, 125 (2020), 2306-2327. [17] L. Kong, J. Sun and C. Zhang, Sde-net: Equipping deep neural networks with uncertainty estimates, Proceedings of the 37th International Conference on Machine Learning, (2020). [18] X. Li, T. Wong, T. Chen and D. Duvenaud, Scalable gradients for stochastic differential equations, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, (2020), 3870–3882. [19] X. Liu, T. Xiao, S. Si, Q. Cao, S. K. Kumar and C.-J. Hsieh, Neural sde: Stabilizing neural ode networks with stochastic noise, arXiv preprint, (2019), arXiv: 1906.02355. [20] J. Ma, P. Protter and J. Yong, Solving forward-backward stochastic differential equations explicitly–A four step scheme, Probab. Theory Related Fields, 98 (1994), 339-359.  doi: 10.1007/BF01192258. [21] J. Ma and J. Zhang, Representation theorems for backward stochastic differential equations, Ann. Appl. Probab., 12 (2002), 1390-1418.  doi: 10.1214/aoap/1037125868. [22] G. N. Milstein and M. V. Tretyakov, Numerical algorithms for forward-backward stochastic differential equations, SIAM J. Sci. Comput., 28 (2006), 561-582.  doi: 10.1137/040614426. [23] M. Morzfeld, M. S. Day, R. W. Grout, G. S. H. Pau, S. A. Finsterle and J. B. Bell, Iterative importance sampling algorithms for parameter estimation, SIAM J. Sci. Comput., 40 (2018), B329–B352. doi: 10.1137/16M1088417. [24] S. G. Peng, A general stochastic maximum principle for optimal control problems, SIAM J. Control Optim., 28 (1990), 966-979.  doi: 10.1137/0328054. [25] H. Pham, On some recent aspects of stochastic control and their applications, Probab. Surv., 2 (2005), 506-549.  doi: 10.1214/154957805100000195. [26] J. T. Springenberg, A. Klein, S. Falkner and F. Hutter, Bayesian optimization with robust bayesian neural networks, Advances in Neural Information Processing Systems, Curran Associates, Inc., 29 (2016), 4134-4142. [27] B. Tzen and M. Raginsky, Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit, arXiv, 2019. [28] M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient langevin dynamics, Proceedings of the 28th International Conference on Machine Learning, (2011). [29] J. Yong and X. Y. Zhou, Stochastic controls. Hamiltonian Systems and HJB Equations, Applications of Mathematics (New York), 43. Springer-Verlag, New York, 1999. doi: 10.1007/978-1-4612-1466-3. [30] J. Zhang, A numerical scheme for BSDEs, Ann. Appl. Probab., 14 (2004), 459-488.  doi: 10.1214/aoap/1075828058. [31] W. Zhao, L. Chen and S. Peng, A new kind of accurate numerical method for backward stochastic differential equations, SIAM J. Sci. Comput., 28 (2006), 1563-1581.  doi: 10.1137/05063341X.

show all references

##### References:
 [1] R. T. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, Advances in Neural Information Processing Systems, (2018), 6571–6583. [2] T. Chen, E. B. Fox and C. Guestrin, Stochastic gradient hamiltonian monte carlo, Proceedings of the 31st International Conference on Machine Learning, (2014). [3] B. Dai, A. Shaw, L. Li, L. Xiao, N. He, Z. Liu, J. Chen and L. Song, SBEED: Convergent reinforcement learning with nonlinear function approximation, Proceedings of Machine Learning Research, Stockholmsmässan, Stockholm Sweden, PMLR, 80 (2018), 1125-1134. [4] W. E, J. Han and Q. Li, A mean-field optimal control formulation of deep learning, Research in the Mathematical Sciences, 6 (2019), 41 pp. doi: 10.1007/s40687-018-0172-y. [5] N. El Karoui, S. Peng and M. C. Quenez, Backward stochastic differential equations in finance, Math. Finance, 7 (1997), 1-71.  doi: 10.1111/1467-9965.00022. [6] C. Fang, Z. Lin and T. Zhang, Sharp analysis for nonconvex SGD escaping from saddle points, Conference on Learning Theory, (2019), 1192–1234. [7] X. Feng, R. Glowinski and M. Neilan, Recent developments in numerical methods for fully nonlinear second order partial differential equations, SIAM Rev., 55 (2013), 205-267.  doi: 10.1137/110825960. [8] Z. Ghahramani, Probabilistic machine learning and artificial intelligence, Nature, 521 (2015), 452-459.  doi: 10.1038/nature14541. [9] B. Gong, W. Liu, T. Tang, W. Zhao and T. Zhou, An efficient gradient projection method for stochastic optimal control problems, SIAM J. Numer. Anal., 55 (2017), 2982-3005.  doi: 10.1137/17M1123559. [10] E. Haber and L. Ruthotto, Stable architectures for deep neural networks, Inverse Problems, 34 (2017), 014004, 22 pp. doi: 10.1088/1361-6420/aa9a90. [11] E. Haber, L. Ruthotto, E. Holtham and S. Jun, Learning across scales - multiscale methods for convolution neural networks, (2017), arXiv: 1703.02009v2. [12] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016). doi: 10.1109/CVPR.2016.90. [13] J. M. Hernandez-Lobato and R. P. Adams, Probabilistic backpropagation for scalable learning of bayesian neural networks, Proceedings of the 32nd International Conference on Machine Learning, (2015). [14] P. Jain and P. Kar, Non-convex optimization for machine learning, Foundations and Trends® in Machine Learning, 10 (2017), 142–336. doi: 10.1561/9781680833690. [15] J. Jia and A. Benson, Neural jump stochastic differential equations, 33rd Conference on Neural Information Processing Systems, (2019). [16] P. Kidger and T. Lyons, Universal approximation with deep narrow networks, Proceedings of Machine Learning Research, PMLR, 125 (2020), 2306-2327. [17] L. Kong, J. Sun and C. Zhang, Sde-net: Equipping deep neural networks with uncertainty estimates, Proceedings of the 37th International Conference on Machine Learning, (2020). [18] X. Li, T. Wong, T. Chen and D. Duvenaud, Scalable gradients for stochastic differential equations, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, (2020), 3870–3882. [19] X. Liu, T. Xiao, S. Si, Q. Cao, S. K. Kumar and C.-J. Hsieh, Neural sde: Stabilizing neural ode networks with stochastic noise, arXiv preprint, (2019), arXiv: 1906.02355. [20] J. Ma, P. Protter and J. Yong, Solving forward-backward stochastic differential equations explicitly–A four step scheme, Probab. Theory Related Fields, 98 (1994), 339-359.  doi: 10.1007/BF01192258. [21] J. Ma and J. Zhang, Representation theorems for backward stochastic differential equations, Ann. Appl. Probab., 12 (2002), 1390-1418.  doi: 10.1214/aoap/1037125868. [22] G. N. Milstein and M. V. Tretyakov, Numerical algorithms for forward-backward stochastic differential equations, SIAM J. Sci. Comput., 28 (2006), 561-582.  doi: 10.1137/040614426. [23] M. Morzfeld, M. S. Day, R. W. Grout, G. S. H. Pau, S. A. Finsterle and J. B. Bell, Iterative importance sampling algorithms for parameter estimation, SIAM J. Sci. Comput., 40 (2018), B329–B352. doi: 10.1137/16M1088417. [24] S. G. Peng, A general stochastic maximum principle for optimal control problems, SIAM J. Control Optim., 28 (1990), 966-979.  doi: 10.1137/0328054. [25] H. Pham, On some recent aspects of stochastic control and their applications, Probab. Surv., 2 (2005), 506-549.  doi: 10.1214/154957805100000195. [26] J. T. Springenberg, A. Klein, S. Falkner and F. Hutter, Bayesian optimization with robust bayesian neural networks, Advances in Neural Information Processing Systems, Curran Associates, Inc., 29 (2016), 4134-4142. [27] B. Tzen and M. Raginsky, Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit, arXiv, 2019. [28] M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient langevin dynamics, Proceedings of the 28th International Conference on Machine Learning, (2011). [29] J. Yong and X. Y. Zhou, Stochastic controls. Hamiltonian Systems and HJB Equations, Applications of Mathematics (New York), 43. Springer-Verlag, New York, 1999. doi: 10.1007/978-1-4612-1466-3. [30] J. Zhang, A numerical scheme for BSDEs, Ann. Appl. Probab., 14 (2004), 459-488.  doi: 10.1214/aoap/1075828058. [31] W. Zhao, L. Chen and S. Peng, A new kind of accurate numerical method for backward stochastic differential equations, SIAM J. Sci. Comput., 28 (2006), 1563-1581.  doi: 10.1137/05063341X.
Classified samples of the deterministic and random classification functions
Comparison between BNN output and SNN output
Weight surface for classification results obtained by the SNN
Example of data samples
Comparison of the BNN approach and the backward SDE approach for SNNs
Example of data samples
Comparison of the BNN approach and the backward SDE approach for SNNs
4D Function approximation – surface views
4D Function approximation – section views
Behavior of model curve with different parameters
Prediction performance of SNNs for parameter estimation
Distributions for SNN output
Predicted function values corresponding to different parameters
Numerical implementation of the backward SDE method for SNNs
 Algorithm 1. Formulate the SNN model (1) as the stochastic optimal control problem (3) - (5) and give a partition $\Pi^N$ to the control problem. Choose the number of SGD iteration steps $K \in \mathbb{N}$ corresponding to a stopping criteria, the learning rate $\{\eta_k\}_k$ and the initial guess for the optimal control $\{u_n^0\}_n$ for SGD iteration steps $k =0, 1, 2, \cdots, K-1$,   : Simulate one realization of the state process $\{X^{k}_{n}\}_{n}$ through the scheme (28).   : Simulate one pair of solution paths $\{\big( \hat{Y}^{k}_{n}, \ \hat{Z}^k_n \big)\}_{n}$ of the adjoint BSDEs system (7) corresponding to $\{X^{k}_{n}\}_{n}$ through the schemes (29);   : Calculate the gradient process and update the estimated optimal control $\{u^{k+1}_n\}_n$ through the SGD iteration scheme (30); end for - The estimated optimal control is given by $\{ \hat{u} _n\}_n := \{ u_{n}^{K} \}_n$;
 Algorithm 1. Formulate the SNN model (1) as the stochastic optimal control problem (3) - (5) and give a partition $\Pi^N$ to the control problem. Choose the number of SGD iteration steps $K \in \mathbb{N}$ corresponding to a stopping criteria, the learning rate $\{\eta_k\}_k$ and the initial guess for the optimal control $\{u_n^0\}_n$ for SGD iteration steps $k =0, 1, 2, \cdots, K-1$,   : Simulate one realization of the state process $\{X^{k}_{n}\}_{n}$ through the scheme (28).   : Simulate one pair of solution paths $\{\big( \hat{Y}^{k}_{n}, \ \hat{Z}^k_n \big)\}_{n}$ of the adjoint BSDEs system (7) corresponding to $\{X^{k}_{n}\}_{n}$ through the schemes (29);   : Calculate the gradient process and update the estimated optimal control $\{u^{k+1}_n\}_n$ through the SGD iteration scheme (30); end for - The estimated optimal control is given by $\{ \hat{u} _n\}_n := \{ u_{n}^{K} \}_n$;
 [1] Feng Bao, Thomas Maier. Stochastic gradient descent algorithm for stochastic optimization in solving analytic continuation problems. Foundations of Data Science, 2020, 2 (1) : 1-17. doi: 10.3934/fods.2020001 [2] Yacine Chitour, Zhenyu Liao, Romain Couillet. A geometric approach of gradient descent algorithms in linear neural networks. Mathematical Control and Related Fields, 2022  doi: 10.3934/mcrf.2022021 [3] Jiongmin Yong. Stochastic optimal control — A concise introduction. Mathematical Control and Related Fields, 2020  doi: 10.3934/mcrf.2020027 [4] Meiyu Sui, Yejuan Wang, Peter E. Kloeden. Pullback attractors for stochastic recurrent neural networks with discrete and distributed delays. Electronic Research Archive, 2021, 29 (2) : 2187-2221. doi: 10.3934/era.2020112 [5] Liangquan Zhang, Qing Zhou, Juan Yang. Necessary condition for optimal control of doubly stochastic systems. Mathematical Control and Related Fields, 2020, 10 (2) : 379-403. doi: 10.3934/mcrf.2020002 [6] Diana Keller. Optimal control of a linear stochastic Schrödinger equation. Conference Publications, 2013, 2013 (special) : 437-446. doi: 10.3934/proc.2013.2013.437 [7] Fulvia Confortola, Elisa Mastrogiacomo. Optimal control for stochastic heat equation with memory. Evolution Equations and Control Theory, 2014, 3 (1) : 35-58. doi: 10.3934/eect.2014.3.35 [8] Ruoxia Li, Huaiqin Wu, Xiaowei Zhang, Rong Yao. Adaptive projective synchronization of memristive neural networks with time-varying delays and stochastic perturbation. Mathematical Control and Related Fields, 2015, 5 (4) : 827-844. doi: 10.3934/mcrf.2015.5.827 [9] Pierre Guiraud, Etienne Tanré. Stability of synchronization under stochastic perturbations in leaky integrate and fire neural networks of finite size. Discrete and Continuous Dynamical Systems - B, 2019, 24 (9) : 5183-5201. doi: 10.3934/dcdsb.2019056 [10] Ali Messaoudi, Rafael Asmat Uceda. Stochastic adding machine and $2$-dimensional Julia sets. Discrete and Continuous Dynamical Systems, 2014, 34 (12) : 5247-5269. doi: 10.3934/dcds.2014.34.5247 [11] Yuri B. Gaididei, Carlos Gorria, Rainer Berkemer, Peter L. Christiansen, Atsushi Kawamoto, Mads P. Sørensen, Jens Starke. Stochastic control of traffic patterns. Networks and Heterogeneous Media, 2013, 8 (1) : 261-273. doi: 10.3934/nhm.2013.8.261 [12] Tyrone E. Duncan. Some topics in stochastic control. Discrete and Continuous Dynamical Systems - B, 2010, 14 (4) : 1361-1373. doi: 10.3934/dcdsb.2010.14.1361 [13] K Najarian. On stochastic stability of dynamic neural models in presence of noise. Conference Publications, 2003, 2003 (Special) : 656-663. doi: 10.3934/proc.2003.2003.656 [14] Ishak Alia. Time-inconsistent stochastic optimal control problems: a backward stochastic partial differential equations approach. Mathematical Control and Related Fields, 2020, 10 (4) : 785-826. doi: 10.3934/mcrf.2020020 [15] María Teresa V. Martínez-Palacios, Adrián Hernández-Del-Valle, Ambrosio Ortiz-Ramírez. On the pricing of Asian options with geometric average of American type with stochastic interest rate: A stochastic optimal control approach. Journal of Dynamics and Games, 2019, 6 (1) : 53-64. doi: 10.3934/jdg.2019004 [16] Ali Delavarkhalafi. On optimal stochastic jumps in multi server queue with impatient customers via stochastic control. Numerical Algebra, Control and Optimization, 2021  doi: 10.3934/naco.2021030 [17] Zhen Wu, Feng Zhang. Maximum principle for discrete-time stochastic optimal control problem and stochastic game. Mathematical Control and Related Fields, 2022, 12 (2) : 475-493. doi: 10.3934/mcrf.2021031 [18] Hiroaki Morimoto. Optimal harvesting and planting control in stochastic logistic population models. Discrete and Continuous Dynamical Systems - B, 2012, 17 (7) : 2545-2559. doi: 10.3934/dcdsb.2012.17.2545 [19] Fulvia Confortola, Elisa Mastrogiacomo. Feedback optimal control for stochastic Volterra equations with completely monotone kernels. Mathematical Control and Related Fields, 2015, 5 (2) : 191-235. doi: 10.3934/mcrf.2015.5.191 [20] Yufeng Shi, Tianxiao Wang, Jiongmin Yong. Optimal control problems of forward-backward stochastic Volterra integral equations. Mathematical Control and Related Fields, 2015, 5 (3) : 613-649. doi: 10.3934/mcrf.2015.5.613

2021 Impact Factor: 1.865