\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Affine invariant ensemble transform methods to improve predictive uncertainty in neural networks

  • *Corresponding author: Diksha Bhandari

    *Corresponding author: Diksha Bhandari 
Abstract / Introduction Full Text(HTML) Figure(7) / Table(2) Related Papers Cited by
  • We consider the problem of performing Bayesian inference for logistic regression using appropriate extensions of the ensemble Kalman filter. Two interacting particle systems are proposed that sample from an approximate posterior and prove quantitative convergence rates of these interacting particle systems to their mean-field limit as the number of particles tends to infinity. Furthermore, we apply these techniques and examine their effectiveness as methods of Bayesian approximation for quantifying predictive uncertainty in neural networks.

    Mathematics Subject Classification: Primary: 62F15, 65C05, 68T37, 62J02, 34F05.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  2D binary classification data set

    Figure 2.  Binary classification on a toy dataset using (a) MLE estimates, (b) ensemble of neural networks, last-layer Gaussian approximations over the weights obtained via (c) Laplace approximation, (d) Hamiltonian Monte Carlo (e) moment matching method, (f) deterministic second-order dynamical sampler. Background colour depicts the confidence in classification while black line represents the decision boundary obtained for the toy classification problem

    Figure 3.  Zoomed-out versions of the results in Figure 2 for binary classification on a toy data set using (a) MLE estimates, (b) ensemble of neural networks, last-layer Gaussian approximations over the weights obtained via (c) Laplace approximation, (d) Hamiltonian Monte Carlo (e) moment matching method, (f) deterministic second-order dynamical sampler. Background colour depicts the confidence in classification

    Figure 4.  Confidence of MLE, ensembles of neural networks, last-layer Laplace approximation, HMC, moment matching method, and deterministic second-order dynamical sampler as functions of $ \delta $ over the test set. Thick blue lines and shades correspond to means and $ \pm $ standard deviations, respectively. Dashed black lines signify the desirable confidence for $ \delta $ sufficiently high

    Figure 5.  Effect of varying ensemble sizes $ (J) $ on confidence in prediction for binary classification using proposed ensemble sampling methods for Bayesian inference over the network's output (last) layer

    Figure 6.  Multi-class classification on a toy dataset using (a) MLE estimates, (b) ensemble of neural networks, last-layer Gaussian approximations over the weights obtained via (c) Laplace approximation, (d) Hamiltonian Monte Carlo (e) moment matching method, and (f) deterministic second-order dynamical sampler. Background colour depicts the confidence in classification obtained for the toy classification problem

    Figure 7.  Binary classification on CIFAR10 dataset. Blue coloured barplots represent maximum probability of the test image belonging to one of the two classes on in-distribution test data and the red coloured barplots represent maximum probability for OOD test image

    Table 3.  $ l_2 $-difference between true parameter values and the posterior ensemble mean with $ P_{\rm{prior}} = I $, averaged over 100 experimental runs, and its standard deviation as a function of ensemble size $ J $

    Method / $ J $ 10 20 50 100
    Homotopy using moment matching method $ 1.518 \pm 0.017 $ $ 1.283 \pm 0.009 $ $ 0.814 \pm 0.002 $ $ 0.484 \pm 0.001 $
    Deterministic second-order dynamical sampler $ 0.895 \pm 0.004 $ $ 0.502 \pm 0.007 $ $ 0.422 \pm 0.003 $ $ 0.282 \pm 0.002 $
     | Show Table
    DownLoad: CSV

    Table 4.  $ l_2 $-difference between true parameter values and the posterior ensemble mean with non-diagonal $ P_{\rm{prior}} $, averaged over 100 experimental runs, and its standard deviation as a function of ensemble size $ J $

    Method / $ J $ 10 20 50 100
    Homotopy using moment matching method $ 1.519 \pm 0.019 $ $ 1.438 \pm 0.008 $ $ 0.857 \pm 0.003 $ $ 0.495 \pm 0.002 $
    Deterministic second-order dynamical sampler $ 0.887 \pm 0.012 $ $ 0.504 \pm 0.005 $ $ 0.461 \pm 0.005 $ $ 0.287 \pm 0.003 $
     | Show Table
    DownLoad: CSV
  • [1] M. Abdar, F. Pourpanah, S. Hussain, D. Rezazadegan, L. Liu, M. Ghavamzadeh, P. Fieguth, X. Cao, A. Khosravi, U. R. Acharya, V. Makarenkov and S. Nahavandi, A review of uncertainty quantification in deep learning: Techniques, applications and challenges, Information Fusion, 76 (2021), 243-297. Available from: https://www.sciencedirect.com/science/article/pii/S1566253521001081. doi: 10.1016/j.inffus.2021.05.008.
    [2] J. AmezcuaK. IdeE. Kalnay and S. Reich, Ensemble transform Kalman–Bucy filters, Quarterly Journal of the Royal Meteorological Society, 140 (2014), 995-1004.  doi: 10.1002/qj.2186.
    [3] C. Blundell, J. Cornebise, K. Kavukcuoglu and D. Wierstra, Weight uncertainty in neural networks, in Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37, ICML'15, Lille, France, 2015, 1613-1622.
    [4] Y. Chen, D. Z. Huang, J. Huang, S. Reich and A. M. Stuart, Sampling via gradient flows in the space of probability measures, preprint, 2023. arXiv: 2310.03597.
    [5] F. Daum, J. Huang and A. Noushin, Exact particle flow for nonlinear filters, in Signal Processing, Sensor Fusion, and Target Recognition XIX, SPIE, 7697 (2010), 769704. doi: 10.1117/12.839590.
    [6] E. Daxberger, E. Nalisnick, J. Allingham, J. Antorán and J. M. Hernández-Lobato, Bayesian deep learning via subnetwork inference, in Proceedings of 38th International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research, 139 (2021), 2510-2521. Available from: https://proceedings.mlr.press/v139/daxberger21a.html.
    [7] L. Dieci and T. Eirola, On smooth decompositions of matrices, SIAM Journal on Matrix Analysis and Applications, 20 (1999), 800-819.  doi: 10.1137/S0895479897330182.
    [8] Z. Ding and Q. Li, Ensemble Kalman inversion: Mean-field limit and convergence analysis, Statistics and Computing, 31 (2021), Paper No. 9, 21 pp. doi: 10.1007/s11222-020-09976-0.
    [9] Z. Ding and Q. Li, Ensemble Kalman sampler: Mean-field limit and convergence analysis, SIAM Journal on Mathematical Analysis, 53 (2021), 1546-1578.  doi: 10.1137/20M1339507.
    [10] G. Evensen, Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics, Journal of Geophysical Research: Oceans, 99 (1994), 10143-10162.  doi: 10.1029/94JC00572.
    [11] G. Evensen, Data Assimilation: The Ensemble Kalman Filter, Springer-Verlag, Berlin, Heidelberg, 2009.
    [12] N. Fournier and A. Guillin, On the rate of convergence in Wasserstein distance of the empirical measure, Probability Theory and Related Fields, 162 (2015), 707-738.  doi: 10.1007/s00440-014-0583-7.
    [13] Y. Gal, Uncertainty in Deep Learning, Ph.D thesis, University of Cambridge, 2016.
    [14] Y. Gal and Z. Ghahramani, Dropout as a Bayesian approximation: Representing model uncertainty in deep learning, in Proceedings of the 33rd International Conference on Machine Learning, Proceedings of Machine Learning Research, 48, New York, New York, USA, (2016), 1050-1059. Available from: https://proceedings.mlr.press/v48/gal16.html.
    [15] T. Galy-Fajou, V. Perrone and M. Opper, Flexible and efficient inference with particles for the variational Gaussian approximation, Entropy, 23 (2021), Article 990. doi: 10.3390/e23080990.
    [16] A. Garbuno-InigoF. HoffmannW. Li and A. M. Stuart, Interacting Langevin diffusions: Gradient structure and ensemble Kalman sampler, SIAM Journal on Applied Dynamical Systems, 19 (2020), 412-441.  doi: 10.1137/19M1251655.
    [17] A. Garbuno-InigoN. Nüsken and S. Reich, Affine invariant interacting Langevin dynamics for Bayesian inference, SIAM Journal on Applied Dynamical Systems, 19 (2020), 1633-1658.  doi: 10.1137/19M1304891.
    [18] A. Graves, Practical variational inference for neural networks, in Advances in Neural Information Processing Systems, 24, Curran Associates, Inc., 2011. Available from: https://proceedings.neurips.cc/paper_files/paper/2011/file/7eb3c8be3d411e8ebfab08eba5f49632-Paper.pdf.
    [19] C. Guo, G. Pleiss, Y. Sun and K. Q. Weinberger, On calibration of modern neural networks, in Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, 70 (2017), 1321-1330. Available from: https://proceedings.mlr.press/v70/guo17a.html.
    [20] E. Haber, F. Lucka and L. Ruthotto, Never look back - A modified EnKF method and its application to the training of neural networks without back propagation, preprint, 2018. arXiv: 1805.08034.
    [21] M. Hein, M. Andriushchenko and J. Bitterwolf, Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 41-50. doi: 10.1109/CVPR.2019.00013.
    [22] D. Z. HuangJ. HuangS. Reich and A. M. Stuart, Efficient derivative-free Bayesian inference for large-scale inverse problems, Inverse Problems, 38 (2022), 125006.  doi: 10.1088/1361-6420/ac99fa.
    [23] N. B. Kovachki and A. M. Stuart, Ensemble Kalman inversion: A derivative-free technique for machine learning tasks, Inverse Problems, 35 (2019), 095005.  doi: 10.1088/1361-6420/ab1c3a.
    [24] A. Kristiadi, M. Hein and P. Hennig, Being Bayesian, even just a bit, fixes overconfidence in ReLU networks, in International Conference on Machine Learning, 2020.
    [25] B. Lakshminarayanan, A. Pritzel and C. Blundell, Simple and scalable predictive uncertainty estimation using deep ensembles, in Advances in Neural Information Processing Systems (NIPS), 2016. Available from: https://proceedings.neurips.cc/paper_files/paper/2017/file/9ef2ed4b7fd2c810847ffa5fa85bce38-Paper.pdf
    [26] S. Liang, Y. Li and R. Srikant, Enhancing the reliability of out-of-distribution image detection in neural networks, in International Conference on Learning Representations, 2018. Available from: https://openreview.net/forum?id = H1VGkIxRZ.
    [27] D. J. C. MacKay, A practical Bayesian framework for backpropagation networks, Neural Computation, 4 (1992), 448-472.  doi: 10.1162/neco.1992.4.3.448.
    [28] D. J. C. MacKay, The evidence framework applied to classification networks, Neural Computation, 4 (1992), 720-736.  doi: 10.1162/neco.1992.4.5.720.
    [29] R. M. Neal, MCMC using Hamiltonian dynamics, Handbook of Markov Chain Monte Carlo, 2 (2011), 113-162.  doi: 10.1201/b10905-6.
    [30] R. M. Neal, Bayesian Learning for Neural Networks, Springer Science & Business Media, 118, 2012.
    [31] Y. Ovadia, E. Fertig, J. Ren, Z. Nado, D. Sculley, S. Nowozin, J. Dillon, B. Lakshminarayanan and J. Snoek, Can you trust your model's uncertainty? Evaluating predictive uncertainty under dataset shift, in Advances in Neural Information Processing Systems, 32, Curran Associates, Inc., 2019. Available from: https://proceedings.neurips.cc/paper_files/paper/2019/file/8558cb408c1d76621371888657d2eb1d-Paper.pdf.
    [32] J. Pidstrigach and S. Reich, Affine-invariant ensemble transform methods for logistic regression, Foundations of Computational Mathematics, 23 (2023), 675-708.  doi: 10.1007/s10208-022-09550-2.
    [33] S. Reich, A dynamical systems framework for intermittent data assimilation, BIT Numerical Mathematics, 51 (2011), 235-249.  doi: 10.1007/s10543-010-0302-4.
    [34] S. Reich and  C. CotterProbabilistic Forecasting and Bayesian Data Assimilation, Cambridge University Press, Cambridge, 2015.  doi: 10.1017/CBO9781107706804.
    [35] S. Reich and C. J. Cotter, Ensemble filter techniques for intermittent data assimilation, Radon Ser. Comput. Appl. Math., 13 (2013), 91-134. 
    [36] S. Reich and S. Weissmann, Fokker–Planck particle systems for Bayesian inference: Computational approaches, SIAM/ASA Journal on Uncertainty Quantification, 9 (2021), 446-482. 
    [37] C. Schillings and A. M. Stuart, Analysis of the ensemble Kalman filter for inverse problems, SIAM Journal on Numerical Analysis, 55 (2017), 1264-1290.  doi: 10.1137/16M105959X.
    [38] C. Schillings and A. M. Stuart, Convergence analysis of ensemble Kalman inversion: The linear, noisy case, Applicable Analysis, 97 (2018), 107-123.  doi: 10.1080/00036811.2017.1386784.
    [39] M. Sharma, S. Farquhar, E. Nalisnick and T. Rainforth, Do Bayesian neural networks need to be fully stochastic?, preprint, 2023. arXiv: 2211.06291.
    [40] J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Prabhat and R. Adams, Scalable Bayesian optimization using deep neural networks, in Proceedings of the 32nd International Conference on Machine Learning, Proceedings of Machine Learning Research, 37, Lille, France, 2015, 2171-2180. Available from: https://proceedings.mlr.press/v37/snoek15.html.
    [41] M. Welling and Y. W. Teh, Bayesian learning via stochastic gradient Langevin dynamics, in International Conference on Machine Learning, 2011.
    [42] A. G. Wilson, The case for Bayesian deep learning, preprint, 2020. arXiv: 2001.10995.
    [43] G. Zhang, S. Sun, D. Duvenaud and R. Grosse, Noisy natural gradient as variational inference, in Proceedings of the 35th International Conference on Machine Learning, Proceedings of Machine Learning Research, 80 (2018), 5852-5861. Available from: https://proceedings.mlr.press/v80/zhang18l.html.
  • 加载中

Figures(7)

Tables(2)

SHARE

Article Metrics

HTML views(3737) PDF downloads(287) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return