Advanced Search
Article Contents
Article Contents

A backward SDE method for uncertainty quantification in deep learning

  • * Corresponding author: He Zhang

    * Corresponding author: He Zhang

The second and third authors are partially supported by U.S. Department of Energy under grant numbers DE-SC0022297 and DE-SC0022253, the last author is supported by NSFC12071175 and Science and Technology Development of Jilin Province, China no. 201902013020

Abstract Full Text(HTML) Figure(13) / Table(1) Related Papers Cited by
  • We develop a backward stochastic differential equation based probabilistic machine learning method, which formulates a class of stochastic neural networks as a stochastic optimal control problem. An efficient stochastic gradient descent algorithm is introduced with the gradient computed through a backward stochastic differential equation. Convergence analysis for stochastic gradient descent optimization and numerical experiments for applications of stochastic neural networks are carried out to validate our methodology in both theory and performance.

    Mathematics Subject Classification: 660H35, 68T07, 93E20.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Classified samples of the deterministic and random classification functions

    Figure 2.  Comparison between BNN output and SNN output

    Figure 3.  Weight surface for classification results obtained by the SNN

    Figure 4.  Example of data samples

    Figure 5.  Comparison of the BNN approach and the backward SDE approach for SNNs

    Figure 6.  Example of data samples

    Figure 7.  Comparison of the BNN approach and the backward SDE approach for SNNs

    Figure 8.  4D Function approximation – surface views

    Figure 9.  4D Function approximation – section views

    Figure 10.  Behavior of model curve with different parameters

    Figure 11.  Prediction performance of SNNs for parameter estimation

    Figure 12.  Distributions for SNN output

    Figure 13.  Predicted function values corresponding to different parameters

    Table 1.  Numerical implementation of the backward SDE method for SNNs

    Algorithm 1.
    Formulate the SNN model (1) as the stochastic optimal control problem (3) - (5) and give a partition $ \Pi^N $ to the control problem.
    Choose the number of SGD iteration steps $ K \in \mathbb{N} $ corresponding to a stopping criteria, the learning rate $ \{\eta_k\}_k $ and the initial guess for the optimal control $ \{u_n^0\}_n $
    for SGD iteration steps $ k =0, 1, 2, \cdots, K-1 $,
      : Simulate one realization of the state process $ \{X^{k}_{n}\}_{n} $ through the scheme (28).
      : Simulate one pair of solution paths $ \{\big( \hat{Y}^{k}_{n}, \ \hat{Z}^k_n \big)\}_{n} $ of the adjoint BSDEs system (7) corresponding to $ \{X^{k}_{n}\}_{n} $ through the schemes (29);
      : Calculate the gradient process and update the estimated optimal control $ \{u^{k+1}_n\}_n $ through the SGD iteration scheme (30);
    end for
    - The estimated optimal control is given by $ \{ \hat{u} _n\}_n := \{ u_{n}^{K} \}_n $;
     | Show Table
    DownLoad: CSV
  • [1] R. T. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, Advances in Neural Information Processing Systems, (2018), 6571–6583.
    [2] T. Chen, E. B. Fox and C. Guestrin, Stochastic gradient hamiltonian monte carlo, Proceedings of the 31st International Conference on Machine Learning, (2014).
    [3] B. DaiA. ShawL. LiL. XiaoN. HeZ. LiuJ. Chen and L. Song, SBEED: Convergent reinforcement learning with nonlinear function approximation, Proceedings of Machine Learning Research, Stockholmsmässan, Stockholm Sweden, PMLR, 80 (2018), 1125-1134. 
    [4] W. E, J. Han and Q. Li, A mean-field optimal control formulation of deep learning, Research in the Mathematical Sciences, 6 (2019), 41 pp. doi: 10.1007/s40687-018-0172-y.
    [5] N. El KarouiS. Peng and M. C. Quenez, Backward stochastic differential equations in finance, Math. Finance, 7 (1997), 1-71.  doi: 10.1111/1467-9965.00022.
    [6] C. Fang, Z. Lin and T. Zhang, Sharp analysis for nonconvex SGD escaping from saddle points, Conference on Learning Theory, (2019), 1192–1234.
    [7] X. FengR. Glowinski and M. Neilan, Recent developments in numerical methods for fully nonlinear second order partial differential equations, SIAM Rev., 55 (2013), 205-267.  doi: 10.1137/110825960.
    [8] Z. Ghahramani, Probabilistic machine learning and artificial intelligence, Nature, 521 (2015), 452-459.  doi: 10.1038/nature14541.
    [9] B. GongW. LiuT. TangW. Zhao and T. Zhou, An efficient gradient projection method for stochastic optimal control problems, SIAM J. Numer. Anal., 55 (2017), 2982-3005.  doi: 10.1137/17M1123559.
    [10] E. Haber and L. Ruthotto, Stable architectures for deep neural networks, Inverse Problems, 34 (2017), 014004, 22 pp. doi: 10.1088/1361-6420/aa9a90.
    [11] E. Haber, L. Ruthotto, E. Holtham and S. Jun, Learning across scales - multiscale methods for convolution neural networks, (2017), arXiv: 1703.02009v2.
    [12] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016). doi: 10.1109/CVPR.2016.90.
    [13] J. M. Hernandez-Lobato and R. P. Adams, Probabilistic backpropagation for scalable learning of bayesian neural networks, Proceedings of the 32nd International Conference on Machine Learning, (2015).
    [14] P. Jain and P. Kar, Non-convex optimization for machine learning, Foundations and Trends® in Machine Learning, 10 (2017), 142–336. doi: 10.1561/9781680833690.
    [15] J. Jia and A. Benson, Neural jump stochastic differential equations, 33rd Conference on Neural Information Processing Systems, (2019).
    [16] P. Kidger and T. Lyons, Universal approximation with deep narrow networks, Proceedings of Machine Learning Research, PMLR, 125 (2020), 2306-2327. 
    [17] L. Kong, J. Sun and C. Zhang, Sde-net: Equipping deep neural networks with uncertainty estimates, Proceedings of the 37th International Conference on Machine Learning, (2020).
    [18] X. Li, T. Wong, T. Chen and D. Duvenaud, Scalable gradients for stochastic differential equations, Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, (2020), 3870–3882.
    [19] X. Liu, T. Xiao, S. Si, Q. Cao, S. K. Kumar and C.-J. Hsieh, Neural sde: Stabilizing neural ode networks with stochastic noise, arXiv preprint, (2019), arXiv: 1906.02355.
    [20] J. MaP. Protter and J. Yong, Solving forward-backward stochastic differential equations explicitly–A four step scheme, Probab. Theory Related Fields, 98 (1994), 339-359.  doi: 10.1007/BF01192258.
    [21] J. Ma and J. Zhang, Representation theorems for backward stochastic differential equations, Ann. Appl. Probab., 12 (2002), 1390-1418.  doi: 10.1214/aoap/1037125868.
    [22] G. N. Milstein and M. V. Tretyakov, Numerical algorithms for forward-backward stochastic differential equations, SIAM J. Sci. Comput., 28 (2006), 561-582.  doi: 10.1137/040614426.
    [23] M. Morzfeld, M. S. Day, R. W. Grout, G. S. H. Pau, S. A. Finsterle and J. B. Bell, Iterative importance sampling algorithms for parameter estimation, SIAM J. Sci. Comput., 40 (2018), B329–B352. doi: 10.1137/16M1088417.
    [24] S. G. Peng, A general stochastic maximum principle for optimal control problems, SIAM J. Control Optim., 28 (1990), 966-979.  doi: 10.1137/0328054.
    [25] H. Pham, On some recent aspects of stochastic control and their applications, Probab. Surv., 2 (2005), 506-549.  doi: 10.1214/154957805100000195.
    [26] J. T. SpringenbergA. KleinS. Falkner and F. Hutter, Bayesian optimization with robust bayesian neural networks, Advances in Neural Information Processing Systems, Curran Associates, Inc., 29 (2016), 4134-4142. 
    [27] B. Tzen and M. Raginsky, Neural stochastic differential equations: Deep latent gaussian models in the diffusion limit, arXiv, 2019.
    [28] M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient langevin dynamics, Proceedings of the 28th International Conference on Machine Learning, (2011).
    [29] J. Yong and X. Y. Zhou, Stochastic controls. Hamiltonian Systems and HJB Equations, Applications of Mathematics (New York), 43. Springer-Verlag, New York, 1999. doi: 10.1007/978-1-4612-1466-3.
    [30] J. Zhang, A numerical scheme for BSDEs, Ann. Appl. Probab., 14 (2004), 459-488.  doi: 10.1214/aoap/1075828058.
    [31] W. ZhaoL. Chen and S. Peng, A new kind of accurate numerical method for backward stochastic differential equations, SIAM J. Sci. Comput., 28 (2006), 1563-1581.  doi: 10.1137/05063341X.
  • 加载中




Article Metrics

HTML views(722) PDF downloads(326) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint