## Overcomplete representation in a hierarchical Bayesian framework

 1 University of Bologna, Department of Mathematics, Piazza di Porta San Donato 5, Bologna, Italy 2 Case Western Reserve University, Department of Mathematics, Applied Mathematics and Statistics, 10900 Euclid Avenue, Cleveland, OH 4410, USA

*Corresponding author: Monica Pragliola

Received  September 2020 Revised  April 2021 Early access May 2021

Fund Project: MP acknowledges the National Group for Scientific Computation (GNCS-INDAM). The work of DC was partly supported by NSF grants DMS-1522334 and DMS-1951446, and the work of ES was partly supported by NSF grant DMS-1714617

A common task in inverse problems and imaging is finding a solution that is sparse, in the sense that most of its components vanish. In the framework of compressed sensing, general results guaranteeing exact recovery have been proven. In practice, sparse solutions are often computed combining $\ell_1$-penalized least squares optimization with an appropriate numerical scheme to accomplish the task. A computationally efficient alternative for finding sparse solutions to linear inverse problems is provided by Bayesian hierarchical models, in which the sparsity is encoded by defining a conditionally Gaussian prior model with the prior parameter obeying a generalized gamma distribution. An iterative alternating sequential (IAS) algorithm has been demonstrated to lead to a computationally efficient scheme, and combined with Krylov subspace iterations with an early termination condition, the approach is particularly well suited for large scale problems. Here the Bayesian approach to sparsity is extended to problems whose solution allows a sparse coding in an overcomplete system such as composite frames. It is shown that among the multiple possible representations of the unknown, the IAS algorithm, and in particular, a hybrid version of it, is effectively identifying the most sparse solution. Computed examples show that the method is particularly well suited not only for traditional imaging applications but also for dictionary learning problems in the framework of machine learning.

Citation: Monica Pragliola, Daniela Calvetti, Erkki Somersalo. Overcomplete representation in a hierarchical Bayesian framework. Inverse Problems & Imaging, doi: 10.3934/ipi.2021039
 show all references

The generative model (left) and the blurred and noisy data vector $b \in {\mathbb R}^{46}$ (right)
Reconstruction of the signal $x$ (left) and the count of CGLS steps per outer iteration of the global hybrid IAS (right)
Vector $\alpha_i$ (left panels), corresponding scaled variances (middle panels) and contribution of the signal ${\mathsf W}_i \alpha_i$ (right panels) for $i = 1$, i.e. representation in terms of increments, (top row) and $i = 2$, i.e. representation in terms of cosine transform, (bottom row)
Original image (left), observed data (middle) and reconstructed image (right)
Vector $\alpha_i$ (left panels), base-10 logarithmic plot of the corresponding scaled variances (middle panels) and vector ${\mathsf W}_i\alpha_i$ contributing to the final restoration (right panels) for $i = 1$, i.e. vertical increments representation, (top row), $i = 2$, i.e. horizontal increments representation, (bottom row)
Top row: original $512\times512$ test image (left), observed data (middle) and denoised image (right). Bottom row: respective close-up(s)
Vector $\alpha_i$ (left panels), base-10 logarithmic plot of the corresponding scaled variances (middle panels), and vector $\mathsf{W}_i\alpha_i$ (right panels) for $i = 1$, i.e. vertical increments representation, (top row) and $i = 2$, horizontal increments representation, (bottom row)
Value distributions of the coefficients corresponding to vertical (left) and horizontal (right) increments in logarithmic scale. The blue distributions correspond to the original image which does not allow sparse representation in the basis, which is reflected in the high percentage of non-vanishing coefficients, while the red distribution is the sparse denoised reconstruction, in which the percentage of coefficients above the negligible threshold value is reduced by an order of magnitude
Original image (left), observed data (middle) and reconstructed image (right)
Representation vector $\alpha_i$ (left panels), base-10 logarithmic plot of the corresponding scaled variances (middle panels) and vector ${\mathsf W}_i\alpha_i$ contributing to the final restoration (right panels) for $i = 1$ (first row), $i = 2$ (second row), $i = 3$ (third row) and $i = 4$ (fourth row)
A schematic representation of the dictionary learning example. The digit on the right is the non-annotated image $b$, which is approximated in terms of the annotated atoms $w_i$ on the right. The coefficients $\alpha_i$ can then be used to identify the digit
Dictionary learning results. The first row shows the test images of the digits to be classified by the dictionary learning algorithm (vector $b$), the true annotation indicated in the figure, the second row the vectors $\theta$ after the IAS iteration with the first hyperprior, and the third row after the iteration with the second hyperprior. The fourth row represents the synthesis ${\mathsf W}\alpha$ approximating the original digit, and finally, the fifth row gives the histogram of the annotations of the atoms corresponding to coefficients above a threshold $\tau = 0.01$. The annotation is done by majority vote, choosing the largest of the bins. In this example, the standard deviation of the noise representing the mismatch was $\sigma = 0.01$
The rows are as in Figure 12. In this example, the standard deviation of the noise representing the mismatch was $\sigma = 0.05$. Observe that the approximation becomes sparser
The rows are as in Figure 12. In this example, the standard deviation of the noise representing the mismatch was $\sigma = 0.1$. The increased sparsity here is traded with an increased number of misclassifications, such as in the first and the third columns
