A common task in inverse problems and imaging is finding a solution that is sparse, in the sense that most of its components vanish. In the framework of compressed sensing, general results guaranteeing exact recovery have been proven. In practice, sparse solutions are often computed combining $ \ell_1 $-penalized least squares optimization with an appropriate numerical scheme to accomplish the task. A computationally efficient alternative for finding sparse solutions to linear inverse problems is provided by Bayesian hierarchical models, in which the sparsity is encoded by defining a conditionally Gaussian prior model with the prior parameter obeying a generalized gamma distribution. An iterative alternating sequential (IAS) algorithm has been demonstrated to lead to a computationally efficient scheme, and combined with Krylov subspace iterations with an early termination condition, the approach is particularly well suited for large scale problems. Here the Bayesian approach to sparsity is extended to problems whose solution allows a sparse coding in an overcomplete system such as composite frames. It is shown that among the multiple possible representations of the unknown, the IAS algorithm, and in particular, a hybrid version of it, is effectively identifying the most sparse solution. Computed examples show that the method is particularly well suited not only for traditional imaging applications but also for dictionary learning problems in the framework of machine learning.
Citation: |
Figure 3. Vector $ \alpha_i $ (left panels), corresponding scaled variances (middle panels) and contribution of the signal $ {\mathsf W}_i \alpha_i $ (right panels) for $ i = 1 $, i.e. representation in terms of increments, (top row) and $ i = 2 $, i.e. representation in terms of cosine transform, (bottom row)
Figure 5. Vector $ \alpha_i $ (left panels), base-10 logarithmic plot of the corresponding scaled variances (middle panels) and vector $ {\mathsf W}_i\alpha_i $ contributing to the final restoration (right panels) for $ i = 1 $, i.e. vertical increments representation, (top row), $ i = 2 $, i.e. horizontal increments representation, (bottom row)
Figure 7. Vector $ \alpha_i $ (left panels), base-10 logarithmic plot of the corresponding scaled variances (middle panels), and vector $ \mathsf{W}_i\alpha_i $ (right panels) for $ i = 1 $, i.e. vertical increments representation, (top row) and $ i = 2 $, horizontal increments representation, (bottom row)
Figure 8. Value distributions of the coefficients corresponding to vertical (left) and horizontal (right) increments in logarithmic scale. The blue distributions correspond to the original image which does not allow sparse representation in the basis, which is reflected in the high percentage of non-vanishing coefficients, while the red distribution is the sparse denoised reconstruction, in which the percentage of coefficients above the negligible threshold value is reduced by an order of magnitude
Figure 10. Representation vector $ \alpha_i $ (left panels), base-10 logarithmic plot of the corresponding scaled variances (middle panels) and vector $ {\mathsf W}_i\alpha_i $ contributing to the final restoration (right panels) for $ i = 1 $ (first row), $ i = 2 $ (second row), $ i = 3 $ (third row) and $ i = 4 $ (fourth row)
Figure 12. Dictionary learning results. The first row shows the test images of the digits to be classified by the dictionary learning algorithm (vector $ b $), the true annotation indicated in the figure, the second row the vectors $ \theta $ after the IAS iteration with the first hyperprior, and the third row after the iteration with the second hyperprior. The fourth row represents the synthesis $ {\mathsf W}\alpha $ approximating the original digit, and finally, the fifth row gives the histogram of the annotations of the atoms corresponding to coefficients above a threshold $ \tau = 0.01 $. The annotation is done by majority vote, choosing the largest of the bins. In this example, the standard deviation of the noise representing the mismatch was $ \sigma = 0.01 $
[1] | J. M. Bardsley, D. Calvetti and E. Somersalo, Hierarchical regularization for edge-preserving reconstruction of PET images, Inverse Problems, 26 (2010), 035010. doi: 10.1088/0266-5611/26/3/035010. |
[2] | S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Foundations and Trends in Machine Learning, 3 (2011), 1-122. doi: 10.1561/2200000016. |
[3] | A. M. Bruckstein, D. L. Donoho and M. Elad, From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Review, 51 (2009), 34-81. doi: 10.1137/060657704. |
[4] | D. Calvetti, H. Hakula, S. Pursiainen and E. Somersalo, Conditionally Gaussian hypermodels for cerebral source localization, SIAM Journal on Imaging Sciences, 2 (2009), 879-909. doi: 10.1137/080723995. |
[5] | D. Calvetti, F. Pitolli, J. Prezioso, E. Somersalo and B. Vantaggi, Priorconditioned CGLS-based quasi-MAP estimate, statistical stopping rule, and ranking of priors, SIAM Journal of Scientific Computing, 39 (2017), S477–S500. doi: 10.1137/16M108272X. |
[6] | D. Calvetti, A. Pascarella, F. Pitolli, E. Somersalo and B. Vantaggi, Brain activity mapping from MEG data via a hierarchical Bayesian algorithm with automatic depth weighting, Brain Topography, 32 (2019), 363-393. doi: 10.1007/s10548-018-0670-7. |
[7] | D. Calvetti, F. Pitolli, E. Somersalo and B. Vantaggi, Bayes meets Krylov: Statistically inspired preconditioners for CGLS, SIAM Review, 60 (2018), 429-461. doi: 10.1137/15M1055061. |
[8] | D. Calvetti, M. Pragliola and E. Somersalo, Sparsity promoting hybrid solvers for hierarchical Bayesian inverse problems, SIAM Journal on Scientific Computing 42 (2020), A3761–A3784. doi: 10.1137/20M1326246. |
[9] | D. Calvetti, M. Pragliola, E. Somersalo and A. Strang, Sparse reconstructions from few noisy data: Analysis of hierarchical Bayesian models with generalized gamma hyperpriors, Inverse Problems, 36 (2020), 025010. doi: 10.1088/1361-6420/ab4d92. |
[10] | D. Calvetti, E. Somersalo and A. Strang, Hierachical Bayesian models and sparsity: $\ell_2$-magic, Inverse Problems, 35 (2019), 035003. doi: 10.1088/1361-6420/aaf5ab. |
[11] | E. J. Candes, J. Romberg and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory, 52 (2006), 489-509. doi: 10.1109/TIT.2005.862083. |
[12] | E. J. Candes, Y. C. Eldar, D. Needell and P. Randall, Compressed sensing with coherent and redundant dictionaries, Applied and Computational Harmonic Analysis, 31 (2011), 59-73. doi: 10.1016/j.acha.2010.10.002. |
[13] | A. Chambolle, M. Holler and T. Pock, A convex variational model for learning convolutional image atoms from incomplete data, Journal of Mathematical Imaging and Vision, 62 (2020), 417-444. doi: 10.1007/s10851-019-00919-7. |
[14] | G. Chen and D. Needell, Compressed sensing and dictionary learning, Finite Frame Theory, Proceedings of Symposia in Applied Mathematics, Vol. 73, Providence, RI, (2016), 201–241. |
[15] | S. S. Chen, D. L. Donoho and and M. A. Saunders, Atomic decomposition by basis pursuit, SIAM Journal on Scientific Computing, 20 (1998), 33-61. doi: 10.1137/S1064827596304010. |
[16] | D. Ma, V. Gulani, N. Seiberlich, K. Liu, J. L. Sunshine, J. L. Duerk and M. A. Griswold, Magnetic resonance fingerprinting, Nature, 495 (2013), 187-192. doi: 10.1038/nature11971. |
[17] | S. G. Mallat and Z. Zhang, Matching pursuits with time-frequency dictionaries, IEEE Transactions on Signal Processing, 41 (1993), 3397-3415. |
[18] | R. Rubinstein, A. M. Bruckstein and M. Elad, Dictionaries for sparse representation modeling, Proceedings of the IEEE, 98 (2010), 1045-1057. doi: 10.1109/JPROC.2010.2040551. |
[19] | J. Starck, J. Fadili and F. J. Murtagh, The undecimated wavelet decomposition and its reconstruction, IEEE Transactions on Image Processing, 16 (2007), 297-309. doi: 10.1109/TIP.2006.887733. |
[20] | J. L. Starck, M. Elad and D. Donoho, Redundant multiscale transforms and their application for morphological component separation, Advances in Imaging and Electron Physics, 132 (2004), 287-348. doi: 10.1016/S1076-5670(04)32006-9. |
[21] | A. F. Vidal, V. De Bortoli, M. Pereyra and A. Durmus, Maximum likelihood estimation of regularization parameters in high-dimensional inverse problems: An empirical Bayesian approach Part Ⅰ: Methodology and experiments, SIAM Journal on Imaging Sciences, 13 (2020), 1945-1989. doi: 10.1137/20M1339829. |
The generative model (left) and the blurred and noisy data vector
Reconstruction of the signal
Vector
Original image (left), observed data (middle) and reconstructed image (right)
Vector
Top row: original
Vector
Value distributions of the coefficients corresponding to vertical (left) and horizontal (right) increments in logarithmic scale. The blue distributions correspond to the original image which does not allow sparse representation in the basis, which is reflected in the high percentage of non-vanishing coefficients, while the red distribution is the sparse denoised reconstruction, in which the percentage of coefficients above the negligible threshold value is reduced by an order of magnitude
Original image (left), observed data (middle) and reconstructed image (right)
Representation vector
A schematic representation of the dictionary learning example. The digit on the right is the non-annotated image
Dictionary learning results. The first row shows the test images of the digits to be classified by the dictionary learning algorithm (vector
The rows are as in Figure 12. In this example, the standard deviation of the noise representing the mismatch was
The rows are as in Figure 12. In this example, the standard deviation of the noise representing the mismatch was