Article Contents
Article Contents

Bayesian neural network priors for edge-preserving inversion

Partially supported by the US National Science Foundation under grants #1723211 and #1913129

• We consider Bayesian inverse problems wherein the unknown state is assumed to be a function with discontinuous structure a priori. A class of prior distributions based on the output of neural networks with heavy-tailed weights is introduced, motivated by existing results concerning the infinite-width limit of such networks. We show theoretically that samples from such priors have desirable discontinuous-like properties even when the network width is finite, making them appropriate for edge-preserving inversion. Numerically we consider deconvolution problems defined on one- and two-dimensional spatial domains to illustrate the effectiveness of these priors; MAP estimation, dimension-robust MCMC sampling and ensemble-based approximations are utilized to probe the posterior distribution. The accuracy of point estimates is shown to exceed those obtained from non-heavy tailed priors, and uncertainty estimates are shown to provide more useful qualitative information.

Mathematics Subject Classification: Primary: 62F15, 68T07, 60E07; Secondary: 65C05.

 Citation:

• Figure 1.  Comparison between outputs on $[-1, 1]^2$ of Bayesian neural network priors with three hidden layers. Shown are realizations of networks with $\tanh(\cdot)$ as activation function and Cauchy weights (top) and Gaussian weights (bottom)

Figure 2.  Realizations of the neural network with two different weight distributions on the interval $[-1, 1]$. Shown are realizations with Cauchy weights (a) and Gaussian weights (b)

Figure 3.  Setup for Problem 1. Shown in (a) are the truth model and synthetic observations. As a reference, shown in (b) is the result of Gaussian process regression with the covariance operator $0.25(I - 10 \Delta)^{-2}$

Figure 4.  Shown are reconstructions using different initializations obtained through optimization for Problem 1. The results correspond to the Gaussian (a), Cauchy-Gaussian (b), and fully Cauchy (c) weights

Figure 5.  Shown are the means and pointwise standard deviations obtained with the ensemble method (a) and the last-layer Gaussian regression method (b). Both results are for Cauchy-Gaussian priors for Problem 1

Figure 6.  Shown in (a, b, c) are samples and in (d, e, f) the uncertainty of posterior distributions with different neural network priors for Problem 1. The plots correspond to the Gaussian (a, d), Cauchy-Gaussian (b, e), and Cauchy (c, f) neural network priors

Figure 7.  Shown in the first row are the truth $u^\dagger$ and blurred models used for Problem 2. We show minimizers obtained using the optimization method with different initializations. Results are shown for Gaussian weights (second row), Cauchy-Gaussian weights (third row), and Cauchy weights (fourth row)

Figure 8.  Results obtained with ensemble method for Problem 2. Shown are the ensemble means (top row) and standard deviation (bottom row) obtained with Gaussian weights (left), Cauchy-Gaussian weights (middle), and fully Cauchy weights (right)

Figure 9.  Results for last-layer Gaussian regression, Problem 2. Shown are the standard deviations with last-layer base functions from a pre-trained network. The figures are for networks with Gaussian (left) and Cauchy-Gaussian (right) weights

Figure 10.  MCMC sampling results for Problem 2. Shown are the means (top row) and standard deviations (bottom row) for neural network priors with Gaussian weights (left), Cauchy-Gaussian weights (middle), and fully Cauchy weights (right)

Figure 11.  Comparison between outputs on $[-1, 1]^2$ of Bayesian neural network priors with different widths of the last hidden layer. Shown in each column is a realization of $[80, 80, D_3]$ networks, where $D_3 = 5, 50,500$ and $5000$, for Cauchy weights (top row) and Gaussian weights (bottom row). Columns use the same random seed for Cauchy and Gaussian weights

Figure 12.  Comparison between outputs on $[-1, 1]^2$ of Bayesian neural network priors with Cauchy weights and networks widths $[80, 80, 1000]$. The top row shows random draws from networks with full weights and the bottom row from networks with $5$ blocks, i.e., $5$ blocks along the diagonal contain nonzero weights while the rest of the weight matrices is zero. The number of weights in the sparse network is about $20\%$ of the number of weights in the fully connected network

Table 1.  Relative $L^1$-error of reconstructions obtained using the ensemble method with Gaussian, Cauchy-Gaussian and Cauchy priors in Problem 1

 Regularizations Gaussian Cauchy-Gaussian Cauchy $\|\mathbb E \left[ u \right] - u^{\dagger}\|_{L^1}/\|u^{\dagger}\|_{L^1}$ 8.35 5.90 5.53 $\mathbb E [ \|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 8.44 5.91 5.56 ${\rm{Std}} [\|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 0.10 0.11 0.13

Table 2.  Relative $L^1$-error of samples computed by pCN with Gaussian, Cauchy-Gaussian and Cauchy priors in Problem 1

 Neural network prior Gaussian Cauchy-Gaussian Cauchy $\|\mathbb E \left[ u \right] - u^{\dagger}\|_{L^1}/\|u^{\dagger}\|_{L^1}$ 8.14 5.66 4.74 $\mathbb E [ \|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 8.57 6.45 5.63 ${\rm{Std}} [\|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 0.63 0.57 0.73
•  [1] L. Ardizzone, J. Kruse, C. Rother and U. Köthe, Analyzing inverse problems with invertible neural networks, In International Conference on Learning Representations, 2019, https://openreview.net/forum?id=rJed6j0cKX. [2] M. Asim, M. Daniels, O. Leong, A. Ahmed and P. Hand, Invertible generative models for inverse problems: Mitigating representation error and dataset bias, In Proceedings of the 37th International Conference on Machine Learning, (eds. H. D. Ⅲ and A. Singh), Proceedings of Machine Learning Research, PMLR, 119 (2020), 399–409. [3] A. Beskos, M. Girolami, S. Lan, P. E. Farrell and A. M. Stuart, Geometric MCMC for infinite-dimensional inverse problems, J. Comput. Phys., 335 (2017), 327-351.  doi: 10.1016/j.jcp.2016.12.041. [4] H. Bölcskei, P. Grohs, G. Kutyniok and P. Petersen, Optimal approximation with sparsely connected deep neural networks, SIAM J. Math. Data Sci., 1 (2019), 8-45.  doi: 10.1137/18M118709X. [5] S. Borak, W. Härdle and R. Weron, Stable distributions, 21–44, Statistical Tools for Finance and Insurance, (2005), 21–44. doi: 10.1007/3-540-27395-6_1. [6] T. Bui-Thanh, O. Ghattas, J. Martin and G. Stadler, A computational framework for infinite-dimensional Bayesian inverse problems part Ⅰ: The linearized case, with application to global seismic inversion, SIAM J. Sci. Comput., 35 (2013), 2494-2523.  doi: 10.1137/12089586X. [7] N. K. Chada, S. Lasanen and L. Roininen, Posterior convergence analysis of $\alpha$-stable sheets, 2019, arXiv: 1907.03086. [8] N. K. Chada, L. Roininen and J. Suuronen, Cauchy markov random field priors for Bayesian inversion, Stat. Comput., 32 (2022), 33.  doi: 10.1007/s11222-022-10089-z. [9] A. Chambolle, M. Novaga, D. Cremers and T. Pock, An introduction to total variation for image analysis, In Theoretical Foundations and Numerical Methods for Sparse Recovery, 2010. [10] V. Chen, M. M. Dunlop, O. Papaspiliopoulos and A. M. Stuart, Dimension-robust MCMC in Bayesian inverse problems, 2019, arXiv: 1803.03344. [11] S. L. Cotter, M. Dashti and A. M. Stuart, Approximation of Bayesian inverse problems for PDEs, SIAM J. Numer. Anal., 48 (2010), 322-345.  doi: 10.1137/090770734. [12] S. L. Cotter, G. O. Roberts, A. M. Stuart and D. White, MCMC methods for functions: Modifying old algorithms to make them faster, Statist. Sci., 28 (2013), 424-446.  doi: 10.1214/13-STS421. [13] M. Dashti, S. Harris and A. Stuart, Besov priors for Bayesian inverse problems, Inverse Probl. Imaging, 6 (2012), 183-200.  doi: 10.3934/ipi.2012.6.183. [14] A. G. de G. Matthews, J. Hron, M. Rowland, R. E. Turner and Z. Ghahramani, Gaussian process behaviour in wide deep neural networks, In International Conference on Learning Representations, 2018, https://openreview.net/forum?id=H1-nGgWC-. [15] R. Der and D. Lee, Beyond Gaussian processes: On the distributions of infinite networks, In Advances in Neural Information Processing Systems, (eds. Y. Weiss, B. Schölkopf and J. C. Platt), MIT Press, (2006), 275–282, http://papers.nips.cc/paper/2869-beyond-gaussian-processes-on-the-distributions-of-infinite-networks.pdf. [16] J. N. Franklin, Well-posed stochastic extensions of ill-posed linear problems, J. Math. Anal. Appl., 31 (1970), 682-716.  doi: 10.1016/0022-247X(70)90017-X. [17] B. V. Gnedenko and A. N. Kolmogorov, Limit Distributions for Sums of Independent Random Variables, Addison-Wesley Publishing Co., Inc., Cambridge, Mass., 1954. [18] G. González, V. Kolehmainen and A. Seppänen, Isotropic and anisotropic total variation regularization in electrical impedance tomography, Comput. Math. Appl., 74 (2017), 564-576.  doi: 10.1016/j.camwa.2017.05.004. [19] M. Hairer, A. M. Stuart and S. J. Vollmer, Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions, Ann. Appl. Probab., 24 (2014), 2455-2490.  doi: 10.1214/13-AAP982. [20] A. Immer, M. Korzepa and M. Bauer, Improving predictions of Bayesian neural nets via local linearization, In AISTATS, (2021), 703–711, http://proceedings.mlr.press/v130/immer21a.html. [21] J. Kaipio and E. Somersalo, Statistical and Computational Inverse Problems, Applied Mathematical Sciences, 160. Springer-Verlag, New York, 2005, https://cds.cern.ch/record/1338003. [22] J. Kaipio and E. Somersalo, Statistical inverse problems: Discretization, model reduction and inverse crimes, J. Comput. Appl. Math., 198 (2007), 493-504.  doi: 10.1016/j.cam.2005.09.027. [23] B. Lakshminarayanan, A. Pritzel and C. Blundell, Simple and scalable predictive uncertainty estimation using deep ensembles, In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS'17, (2017), 6405–6416. [24] M. Lassas, E. Saksman and S. Siltanen, Discretization-invariant Bayesian inversion and Besov space priors, Inverse Probl. Imaging, 3 (2009), 87-122.  doi: 10.3934/ipi.2009.3.87. [25] M. Lassas and S. Siltanen, Can one use total variation prior for edge-preserving Bayesian inversion?, Inverse Problems, 20 (2004), 1537-1563.  doi: 10.1088/0266-5611/20/5/013. [26] M. Markkanen, L. Roininen, J. M. J. Huttunen and S. Lasanen, Cauchy difference priors for edge-preserving Bayesian inversion, J. Inverse Ill-Posed Probl., 27 (2019), 225-240.  doi: 10.1515/jiip-2017-0048. [27] R. M. Neal, Priors for infinite networks, Bayesian Learning for Neural Networks, 118 (1996), 29-53.  doi: 10.1007/978-1-4612-0745-0_2. [28] J. Nocedal and S. J. Wright, Numerical Optimization, 2$^{nd}$ edition, Springer Series in Operations Research and Financial Engineering. Springer, New York, 2006. [29] R. Rahaman and A. H. Thiery, Uncertainty quantification and deep ensembles, 2020, arXiv: 2007.08792. [30] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning), MIT Press, Cambridge, MA, 2006. [31] V. K. Rohatgi, An Introduction to Probability and Statistics, Wiley, New York, 1976. [32] C. Schillings, B. Sprungk and P. Wacker, On the convergence of the Laplace approximation and noise-level-robustness of Laplace-based Monte Carlo methods for Bayesian inverse problems, Numer. Math., 145 (2020), 915-971.  doi: 10.1007/s00211-020-01131-1. [33] A. M. Stuart, Inverse problems: A Bayesian perspective, Acta Numer., 19 (2010), 451-559.  doi: 10.1017/S0962492910000061. [34] T. J. Sullivan, Well-posed Bayesian inverse problems and heavy-tailed stable quasi-Banach space priors, Inverse Probl. Imaging, 11 (2017), 857-874.  doi: 10.3934/ipi.2017040. [35] C. K. I. Williams, Computing with infinite networks, In Proceedings of the 9th International Conference on Neural Information Processing Systems, NIPS'96, MIT Press, Cambridge, MA, USA, (1996), 295–301. [36] Z.-H. Zhou, J. Wu and W. Tang, Ensembling neural networks: Many could be better than all, Artificial Intelligence, 137 (2002), 239-263.  doi: 10.1016/S0004-3702(02)00190-X.

Figures(12)

Tables(2)