Early Access articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Early Access publication benefits the research community by making new scientific discoveries known as quickly as possible.
Readers can access Early Access articles via the “Early Access” tab for the selected journal.
Variational regularization methods are commonly used to approximate solutions of inverse problems. In recent years, model-based variational regularization methods have often been replaced with data-driven ones such as the fields-of-expert model [32]. Training the parameters of such data-driven methods can be formulated as a bilevel optimization problem. In this paper, we compare the framework of bilevel learning for the training of data-driven variational regularization models with the novel framework of deep equilibrium models [3] that has recently been introduced in the context of inverse problems [13]. We show that computing the lower-level optimization problem within the bilevel formulation with a fixed point iteration is a special case of the deep equilibrium framework. We compare both approaches computationally, with a variety of numerical examples for the inverse problems of denoising, inpainting and deconvolution.
Citation: |
Figure 1. Comparison between bilevel optimization and deep equilibrium models for each of the three considered inverse problems, namely denoising, inpainting, and deblurring, over all the range of possible parameters. These boxplots consider the loss of the trained models evaluated on the test dataset. We removed all the configurations with a final loss larger than $ 0.5 $, a value we arbitrarily chose by looking for an empirical relation between the loss and the image quality
Figure 2. Denoising the MNIST dataset. Visual comparison between bilevel method (left) and deep equilibrium model (right), with parameters $ \tau = 0.5 $, $ \gamma = 0.1 $, and $ \sigma = \text{(ReLU)} $. Images are taken from the test dataset. The first row shows the original images; the second row is the model input. The last row is the output of the trained models
Figure 3. Inpainting MNIST. Comparison between bilevel method (left) and deep equilibrium model (right), with parameters $ \tau = 0.5 $, $ \gamma = 1.0 $, and $ \sigma = \text{(Softshrink)} $. Images are taken from the test dataset. The first row shows the original image, the second row is the masked image, i.e., the input of the algorithm. The fourth row is the output of the trained models. Finally, the third row shows what happens when we apply the inpainting operator on the output. The fourth row is the output of the trained deep equilibrium optimization problem. Ideally, the difference between the second and third row should be small
Figure 4. Deblurring MNIST. Comparison between bilevel method (left) and deep equilibrium model (right), with parameters $ \tau = 0.5 $, $ \gamma = 0.5 $, and $ \sigma = \text{(Softshrink)} $. Images are taken from the test dataset. The first row shows the original images; the second row is the model input. The last row is the output of the trained models. The third row shows the model output after we apply the convolution kernel to it. Ideally, the difference between the second and the third rows should be small
Figure 5. Comparison of the loss error for the test dataset evaluated after each training epoch, for increasing values of noise levels in training (noise levels from top to bottom row: $ 0, 0.05, 0.1, 0.5, 1 $). Simulations are grouped by the tasks, namely denoising, inpainting, and deblurring (left, center, right columns). Each plot shows the simulation with the configurations that achieve the lowest final test loss
Figure 7. Reconstruction for the inpainting task for a bilevel optimization model whose parameters have been trained by minimizing the error of the reconstruction $ u^* $ w.r.t. the true image $ u^\dagger $ (naïve approach). We show how the reconstruction $ \{u^k\} $ changes for different values of the iteration $ k $. As we can see, the model trained with the naïve approach is not able to inpaint the masked area
Figure 8. Denoising CelebA; sample from the test dataset. The first row contains the original image $ u $ and the noisy image $ f^\delta $. From left to right in the second and third rows: reconstructed image with random initializations of the kernels (left), with parameters learned using bilevel learning (center), and parameters learned using the DEQ model (right)
Figure 10. Comparison of $ 11\times 11 $ kernels of $ A $ (first two columns on the left) and of $ C^\top $ (two columns on the right), shown before and after training on the denoising task using the DEQ model. Note that, in the first row, the pairs of kernels in the first-third columns and the second-fourth are the same; this is because we initialize the kernels so that $ C^\top = A^\top $ before training. After the training, they are different (second row)
Figure 12. Comparison of $ 3\times 3 $ kernels of $ A $ (first five columns on the left) and of $ C^\top $ (five columns on the right), shown before and after training on the denoising task using the DEQ model. Note that, in the first row, kernels are pairwise equal; this is because we initialize the kernels so that $ C^\top = A^\top $ before training. After the training, they are different (second row)
Figure 13. Deblurring CelebA, a sample from the test dataset. The first row contains the original image $ u $ and the noisy blurred image $ f^\delta $. From left to right in the second and third rows: reconstructed image with the optimal kernels found for the denoising task in the bilevel scenario (left), with parameters learned using bilevel learning (center-left), with parameters learned using DEQ model (center-right), and optimal kernels found for the denoising task in the DEQ scenario
Figure 15. Comparison of $ 11\times 11 $ kernels of $ A $ (first two columns on the left) and of $ C^\top $ (two columns on the right), shown before and after training on the deblurring task using the DEQ model. Note that, in the first row, the pairs of kernels in the first-third columns and second-fourth are the same; this is because we initialize the kernels so that $ C^\top = A^\top $ before training. After the training, they are different (second row)
Figure 17. Comparison of $ 3\times 3 $ kernels of $ A $ (first five columns on the left) and of $ C^\top $ (five columns on the right), shown before and after training on the deblurring task using the DEQ model. Note that, in the first row, kernels are pairwise equal; this is because we initialize the kernels so that $ C^\top = A^\top $ before training
[1] | J. Adler and O. Öktem, Solving ill-posed inverse problems using iterative deep neural networks, Inverse Problems, 33 (2017), 124007. doi: 10.1088/1361-6420/aa9581. |
[2] | J. Adler and O. Öktem, Learned primal-dual reconstruction, IEEE Transactions on Medical Imaging, 37 (2018), 1322-1332. |
[3] | S. Bai, J. Z. Kolter and V. Koltun, Deep Equilibrium Models, in Advances in Neural Information Processing Systems 32, 2019. |
[4] | H. H. Bauschke, P. L. Combettes et al., Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Vol. 408, Springer, 2011. doi: 10.1007/978-1-4419-9467-7. |
[5] | A. Beck, First-Order Methods in Optimization, SIAM, 2017. doi: 10.1137/1.9781611974997.ch1. |
[6] | M. Benning and M. Burger, Modern regularization methods for inverse problems, Acta Numerica, 27 (2018), 1-111. doi: 10.1017/s0962492918000016. |
[7] | Y. Chen, R. Ranftl and T. Pock, Insights into analysis operator learning: From patch-based sparse models to higher order mrfs, IEEE Transactions on Image Processing, 23 (2014), 1060-1072. doi: 10.1109/TIP.2014.2299065. |
[8] | P. L. Combettes and J.-C. Pesquet, Fixed point strategies in data science, IEEE Transactions on Signal Processing, 69 (2021), 3878-3905. doi: 10.1109/TSP.2021.3069677. |
[9] | C. Crockett and J. A. Fessler, Bilevel methods for image reconstruction, URL http://arXiv.org/abs/2109.09610, 2021. doi: 10.1561/2000000111. |
[10] | J. C. De los Reyes, C.-B. Schönlieb and T. Valkonen, The structure of optimal parameters for image restoration problems, Journal of Mathematical Analysis and Applications, 434 (2016), 464-500. doi: 10.1016/j.jmaa.2015.09.023. |
[11] | M. J. Ehrhardt and L. Roberts, Inexact derivative-free optimization for bilevel learning, Journal of Mathematical Imaging and Vision, 63 (2021), 580-600. doi: 10.1007/s10851-021-01020-8. |
[12] | H. W. Engl, M. Hanke and A. Neubauer, Regularization of inverse problems, vol. 375, Springer Science & Business Media, 1996. |
[13] | D. Gilton, G. Ongie and R. Willett, Deep equilibrium architectures for inverse problems in imaging, IEEE Transactions on Computational Imaging. doi: 10.1109/tci.2021.3118944. |
[14] | K. Gregor and Y. LeCun, Learning fast approximations of sparse coding, in 27th International Conference on Machine Learning, 2010,399-406. |
[15] | H. Heaton, S. Wu Fung, A. Gibali and W. Yin, Feasibility-based fixed point networks, Fixed Point Theory and Algorithms for Sciences and Engineering, 2021 (2021), 1-19. doi: 10.1186/s13663-021-00706-3. |
[16] | K. H. Jin, M. T. McCann, E. Froustey and M. Unser, Deep convolutional neural network for inverse problems in imaging, IEEE Transactions on Image Processing, 26 (2017), 4509-4522. doi: 10.1109/TIP.2017.2713099. |
[17] | T. King, S. Butcher and L. Zalewski, Apocrita - High Performance Computing Cluster for Queen Mary University of London, 2017, URL https://doi.org/10.5281/zenodo.438045. |
[18] | D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv: 1412.6980. |
[19] | E. Kobler, A. Effland, K. Kunisch and T. Pock, Total deep variation: A stable regularizer for inverse problems, accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence. |
[20] | E. Kobler, T. Klatzer, K. Hammernik and T. Pock, Variational networks: Connecting variational methods and deep learning, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 10496 LNCS (2017), 281-293. doi: 10.1007/978-3-319-66709-6. |
[21] | K. Kunisch and T. Pock, A bilevel optimization approach for parameter learning in variational models, SIAM Journal on Imaging Sciences, 6 (2013), 938-983. doi: 10.1137/120882706. |
[22] | Y. LeCun, C. Cortes and C. J. Burges, MNISThandwritten digit database, 1998, URL http://yann.lecun.com/exdb/mnist/. |
[23] | Z. Liu, P. Luo, X. Wang and X. Tang, Deep learning face attributes in the wild, in Proceedings of International Conference on Computer Vision (ICCV), 2015. |
[24] | J. C. D. los Reyes, C.-B. Schnlieb and T. Valkonen, Bilevel parameter learning for higher-order total variation regularisation models, Journal of Mathematical Imaging and Vision, 57 (2017), 1-25. doi: 10.1007/s10851-016-0662-8. |
[25] | B. S. Mordukhovich, Variational Analysis and Generalized Differentiation Ⅱ: Applications, vol. 331, Springer, 2006. |
[26] | J. J. Moreau, Fonctions convexes duales et points proximaux dans un espace hilbertien, Comptes rendus hebdomadaires des séances de l'Académie des sciences, 255 (1962), 2897-2899. |
[27] | J.-J. Moreau, Proximité et dualité dans un espace hilbertien, Bulletin de la Société mathématique de France, 93 (1965), 273-299. |
[28] | V. Nair and G. E. Hinton, Rectified linear units improve restricted boltzmann machines, in ICML, 2010. |
[29] | F. Natterer and F. Wübbeling, Mathematical Methods in Image Reconstruction, SIAM, 2001. doi: 10.1137/1.9780898718324. |
[30] | P. Ochs, R. Ranftl, T. Brox and T. Pock, Bilevel optimization with nonsmooth lower level problems, in SSVM, vol. 9087, 2015 doi: 10.1007/978-3-319-18461-6_52. |
[31] | J. C. D. L. Reyes and C.-B. Schönlieb, Image denoising: Learning the noise model via nonsmooth pde-constrained optimization, Inverse Problems and Imaging, 7 (2013), 1183-1214. doi: 10.3934/ipi.2013.7.1183. |
[32] | S. Roth and M. J. Black, Fields of experts, International Journal of Computer Vision, 82 (2009), 205-229. |
[33] | L. I. Rudin, S. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268. doi: 10.1016/0167-2789(92)90242-F. |
[34] | O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier and F. Lenzen, Variational Methods in Imaging, Springer, 2009. |
[35] | C.-B. Schönlieb, Partial differential equation methods for image inpainting, vol. 29, Cambridge University Press, 2015. |
[36] | A. Tikhonov, A. Goncharsky and M. Bloch, Ill-posed Problems in the Natural Sciences, Mir. |
[37] | A. N. Tikhonov, On the solution of ill-posed problems and the method of regularization, in Doklady Akademii Nauk, vol. 151 |
[38] | H. F. Walker and P. Ni, Anderson acceleration for fixed-point iterations, SIAM Journal on Numerical Analysis, 49 (2011), 1715-1735. doi: 10.1137/10078356X. |
[39] | G. Yang, S. Yu, H. Dong, G. Slabaugh, P. L. Dragotti, X. Ye, F. Liu, S. Arridge, J. Keegan, Y. Guo and D. Firmin, Dagan: Deep de-aliasing generative adversarial networks for fast compressed sensing mri reconstruction, IEEE Transactions on Medical Imaging, 37 (2018), 1310-1321. |
[40] | K. Yosida, Functional Analysis, Springer, 1964. |
[41] | E. Zeidler, Applied Functional Analysis: Applications to Mathematical Physics, vol. 108, Springer Science & Business Media, 2012. |
[42] | B. Zhu, J. Z. Liu, S. F. Cauley, B. R. Rosen and M. S. Rosen, Image reconstruction by domain-transform manifold learning, Nature, 555 (2018), 487-492. |
Comparison between bilevel optimization and deep equilibrium models for each of the three considered inverse problems, namely denoising, inpainting, and deblurring, over all the range of possible parameters. These boxplots consider the loss of the trained models evaluated on the test dataset. We removed all the configurations with a final loss larger than
Denoising the MNIST dataset. Visual comparison between bilevel method (left) and deep equilibrium model (right), with parameters
Inpainting MNIST. Comparison between bilevel method (left) and deep equilibrium model (right), with parameters
Deblurring MNIST. Comparison between bilevel method (left) and deep equilibrium model (right), with parameters
Comparison of the loss error for the test dataset evaluated after each training epoch, for increasing values of noise levels in training (noise levels from top to bottom row:
These histograms show how many simulations were finished within an hour as a function of the number of epochs. Each simulation is a different configuration of hyperparameters. We consider only those runs where the loss on the test dataset is smaller than
Reconstruction for the inpainting task for a bilevel optimization model whose parameters have been trained by minimizing the error of the reconstruction
Denoising CelebA; sample from the test dataset. The first row contains the original image
Comparison of
Comparison of
Comparison of five (out of thirty)
Comparison of
Deblurring CelebA, a sample from the test dataset. The first row contains the original image
Comparison of
Comparison of
Comparison of five (out of thirty)
Comparison of