October  2022, 16(5): 1229-1254. doi: 10.3934/ipi.2022022

## Bayesian neural network priors for edge-preserving inversion

 Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street, New York, NY 10012, USA

Received  December 2021 Revised  March 2022 Published  October 2022 Early access  April 2022

Fund Project: Partially supported by the US National Science Foundation under grants #1723211 and #1913129

We consider Bayesian inverse problems wherein the unknown state is assumed to be a function with discontinuous structure a priori. A class of prior distributions based on the output of neural networks with heavy-tailed weights is introduced, motivated by existing results concerning the infinite-width limit of such networks. We show theoretically that samples from such priors have desirable discontinuous-like properties even when the network width is finite, making them appropriate for edge-preserving inversion. Numerically we consider deconvolution problems defined on one- and two-dimensional spatial domains to illustrate the effectiveness of these priors; MAP estimation, dimension-robust MCMC sampling and ensemble-based approximations are utilized to probe the posterior distribution. The accuracy of point estimates is shown to exceed those obtained from non-heavy tailed priors, and uncertainty estimates are shown to provide more useful qualitative information.

Citation: Chen Li, Matthew Dunlop, Georg Stadler. Bayesian neural network priors for edge-preserving inversion. Inverse Problems and Imaging, 2022, 16 (5) : 1229-1254. doi: 10.3934/ipi.2022022
Comparison between outputs on $[-1, 1]^2$ of Bayesian neural network priors with three hidden layers. Shown are realizations of networks with $\tanh(\cdot)$ as activation function and Cauchy weights (top) and Gaussian weights (bottom)
Realizations of the neural network with two different weight distributions on the interval $[-1, 1]$. Shown are realizations with Cauchy weights (a) and Gaussian weights (b)
Setup for Problem 1. Shown in (a) are the truth model and synthetic observations. As a reference, shown in (b) is the result of Gaussian process regression with the covariance operator $0.25(I - 10 \Delta)^{-2}$
Shown are reconstructions using different initializations obtained through optimization for Problem 1. The results correspond to the Gaussian (a), Cauchy-Gaussian (b), and fully Cauchy (c) weights
Shown are the means and pointwise standard deviations obtained with the ensemble method (a) and the last-layer Gaussian regression method (b). Both results are for Cauchy-Gaussian priors for Problem 1
Shown in (a, b, c) are samples and in (d, e, f) the uncertainty of posterior distributions with different neural network priors for Problem 1. The plots correspond to the Gaussian (a, d), Cauchy-Gaussian (b, e), and Cauchy (c, f) neural network priors
Shown in the first row are the truth $u^\dagger$ and blurred models used for Problem 2. We show minimizers obtained using the optimization method with different initializations. Results are shown for Gaussian weights (second row), Cauchy-Gaussian weights (third row), and Cauchy weights (fourth row)
Results obtained with ensemble method for Problem 2. Shown are the ensemble means (top row) and standard deviation (bottom row) obtained with Gaussian weights (left), Cauchy-Gaussian weights (middle), and fully Cauchy weights (right)
Results for last-layer Gaussian regression, Problem 2. Shown are the standard deviations with last-layer base functions from a pre-trained network. The figures are for networks with Gaussian (left) and Cauchy-Gaussian (right) weights
MCMC sampling results for Problem 2. Shown are the means (top row) and standard deviations (bottom row) for neural network priors with Gaussian weights (left), Cauchy-Gaussian weights (middle), and fully Cauchy weights (right)
Comparison between outputs on $[-1, 1]^2$ of Bayesian neural network priors with different widths of the last hidden layer. Shown in each column is a realization of $[80, 80, D_3]$ networks, where $D_3 = 5, 50,500$ and $5000$, for Cauchy weights (top row) and Gaussian weights (bottom row). Columns use the same random seed for Cauchy and Gaussian weights
Comparison between outputs on $[-1, 1]^2$ of Bayesian neural network priors with Cauchy weights and networks widths $[80, 80, 1000]$. The top row shows random draws from networks with full weights and the bottom row from networks with $5$ blocks, i.e., $5$ blocks along the diagonal contain nonzero weights while the rest of the weight matrices is zero. The number of weights in the sparse network is about $20\%$ of the number of weights in the fully connected network
Relative $L^1$-error of reconstructions obtained using the ensemble method with Gaussian, Cauchy-Gaussian and Cauchy priors in Problem 1
 Regularizations Gaussian Cauchy-Gaussian Cauchy $\|\mathbb E \left[ u \right] - u^{\dagger}\|_{L^1}/\|u^{\dagger}\|_{L^1}$ 8.35 5.90 5.53 $\mathbb E [ \|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 8.44 5.91 5.56 ${\rm{Std}} [\|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 0.10 0.11 0.13
 Regularizations Gaussian Cauchy-Gaussian Cauchy $\|\mathbb E \left[ u \right] - u^{\dagger}\|_{L^1}/\|u^{\dagger}\|_{L^1}$ 8.35 5.90 5.53 $\mathbb E [ \|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 8.44 5.91 5.56 ${\rm{Std}} [\|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 0.10 0.11 0.13
Relative $L^1$-error of samples computed by pCN with Gaussian, Cauchy-Gaussian and Cauchy priors in Problem 1
 Neural network prior Gaussian Cauchy-Gaussian Cauchy $\|\mathbb E \left[ u \right] - u^{\dagger}\|_{L^1}/\|u^{\dagger}\|_{L^1}$ 8.14 5.66 4.74 $\mathbb E [ \|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 8.57 6.45 5.63 ${\rm{Std}} [\|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 0.63 0.57 0.73
 Neural network prior Gaussian Cauchy-Gaussian Cauchy $\|\mathbb E \left[ u \right] - u^{\dagger}\|_{L^1}/\|u^{\dagger}\|_{L^1}$ 8.14 5.66 4.74 $\mathbb E [ \|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 8.57 6.45 5.63 ${\rm{Std}} [\|u - u^{\dagger}\|_{L^1} ]/\|u^{\dagger}\|_{L^1}$ 0.63 0.57 0.73
