layer type | layer |
conv2d | |
maxpool2d | |
conv2d | |
maxpool2d | |
conv2d | |
maxpool2d | |
conv2d | |
maxpool2d | |
conv2d | |
maxpool2d | |
conv2d | |
maxpool2d | |
fullyconnected |
This paper introduces a novel generative encoder (GE) framework for generative imaging and image processing tasks like image reconstruction, compression, denoising, inpainting, deblurring, and super-resolution. GE unifies the generative capacity of GANs and the stability of AEs in an optimization framework instead of stacking GANs and AEs into a single network or combining their loss functions as in existing literature. GE provides a novel approach to visualizing relationships between latent spaces and the data space. The GE framework is made up of a pre-training phase and a solving phase. In the former, a GAN with generator $ G $ capturing the data distribution of a given image set, and an AE network with encoder $ E $ that compresses images following the estimated distribution by $ G $ are trained separately, resulting in two latent representations of the data, denoted as the generative and encoding latent space respectively. In the solving phase, given noisy image $ x = \mathcal{P}(x^*) $, where $ x^* $ is the target unknown image, $ \mathcal{P} $ is an operator adding an addictive, or multiplicative, or convolutional noise, or equivalently given such an image $ x $ in the compressed domain, i.e., given $ m = E(x) $, the two latent spaces are unified via solving the optimization problem
$ z^* = \underset{z}{\mathrm{argmin}} \|E(G(z))-m\|_2^2+\lambda\|z\|_2^2 $
and the image $ x^* $ is recovered in a generative way via $ \hat{x}: = G(z^*)\approx x^* $, where $ \lambda>0 $ is a hyperparameter. The unification of the two spaces allows improved performance against corresponding GAN and AE networks while visualizing interesting properties in each latent space.
Citation: |
Table 1.
Structure of
layer type | layer |
conv2d | |
maxpool2d | |
conv2d | |
maxpool2d | |
conv2d | |
maxpool2d | |
conv2d | |
maxpool2d | |
conv2d | |
maxpool2d | |
conv2d | |
maxpool2d | |
fullyconnected |
Table 2.
Quantitative results comparing models for CelebA. Additionally, some FID scores reported by recent GAN papers that used CelebA
Model | MSE | SSIM | FID |
CRGAN* | 16.97 | ||
SSGAN* | 24.36 | ||
Our pGAN | 22.13 | ||
ConvAE | 0.03386 | 0.6823 |
87.71 |
AEGAN | 0.03317 | 0.6907 |
34.53 |
invertGAN | 0.03529 | 0.7203 |
19.19 |
GE | 0.03262 | 0.7329 |
17.42 |
Table 3. Quantitative results comparing models for digital rocks. The number in brackets show the size of the latent vector in pGAN that the model is trained on. Models with same latent sizes are solved with the same pGAN weights. The same AE is used for all models
Model | MSE | PSNR |
ConvAE | 0.009271 | 20.32 |
invertGAN (512) | 0.008185 | 20.86 |
GE (512) | 0.007470 | 21.26 |
GE (256) | 0.007741 | 21.11 |
GE (128) | 0.007839 | 21.05 |
GE (64) | 0.008499 | 20.70 |
Table 4. Results of invertGAN, GE on spectacles. T refers to samples which produced spectacles, F refers to samples which did not. Remaining are invalid reconstructions
InvertGAN, F | InvertGAN, T | |
GE, F | 289 | 32 |
GE, T | 157 | 469 |
[1] |
M. Aharon, M. Elad and A. Bruckstein, K-svd: An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Transactions on Signal Processing, 54 (2006), 4311-4322.
![]() |
[2] |
M. Arjovsky, S. Chintala and L. Bottou, Wasserstein generative adversarial networks, In Proceedings of the 34th International Conference on Machine Learning, Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia, 70 (2017), 214–223, http://proceedings.mlr.press/v70/arjovsky17a.html.
![]() |
[3] |
D. Bau, J.-Y. Zhu, J. Wulff, W. Peebles, H. Strobelt, B. Zhou and A. Torralba, Seeing what a GAN cannot generate, arXiv: 1910.11626
doi: 10.1109/ICCV.2019.00460.![]() ![]() |
[4] |
D. Berthelot, T. Schumm and L. Metz, BEGAN: Boundary equilibrium generative adversarial networks, Computer Science, http://arXiv.org/abs/1703.10717.
![]() |
[5] |
A. Bora, A. Jalal, E. Price and A. G. Dimakis, Compressed sensing using generative models, ICML'17 Proceedings of the 34th International Conference on Machine Learning, 70 (2017), 537-546.
![]() |
[6] |
C. Bowles, L. J. Chen, R. Guerrero, P. Bentley, R. N. Gunn, A. Hammers, D. A. Dickie, M. del C. Valdés Hernández, J. M. Wardlaw and D. Rueckert, Gan augmentation: Augmenting training data using generative adversarial networks, arXiv: 1810.10863.
![]() |
[7] |
A. Buades, B. Coll and J. .Morel, A non-local algorithm for image denoising, IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05), 2 (2005), 60-65.
![]() |
[8] |
E. J. Candes, J. Romberg and T. Tao, Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information, IEEE Trans. Inform. Theory, 52 (2006), 489-509.
doi: 10.1109/TIT.2005.862083.![]() ![]() ![]() |
[9] |
J. Chen, J. Chen, H. Chao and M. Yang, Image blind denoising with generative adversarial network based noise modeling, In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
doi: 10.1109/CVPR.2018.00333.![]() ![]() |
[10] |
T. Chen, X. Zhai, M. Ritter, M. Lucic and N. Houlsby, Self-supervised gans via auxiliary rotation loss, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2019), 12146–12155.
doi: 10.1109/CVPR.2019.01243.![]() ![]() |
[11] |
A. Creswell and A. A. Bharath, Inverting the generator of A generative adversarial network, IEEE Transactions on Neural Networks and Learning Systems, 30 (2019), http://arXiv.org/abs/1611.05644.
doi: 10.1109/TNNLS.2018.2875194.![]() ![]() |
[12] |
K. Dabov, A. Foi, V. Katkovnik and K. Egiazarian, Bm3d image denoising with shape-adaptive principal component analysis, Proc. Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS'09).
![]() |
[13] |
D. Ulyanov, A. Vedaldi and V. Lempitsky, Deep image prior, arXiv: 1711.10925.
![]() |
[14] |
J. Donahue, P. Krähenbühl and T. Darrell, Adversarial feature learning, Computer Science, http://arXiv.org/abs/1605.09782.
![]() |
[15] |
W. Dong, L. Zhang, G. Shi and X. Wu, Image deblurring and super-resolution by adaptive sparse domain selection and adaptive regularization, IEEE Trans. Image Process., 20 (2011), 1838-1857.
doi: 10.1109/TIP.2011.2108306.![]() ![]() ![]() |
[16] |
D. L. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52 (2006), 1289-1306.
doi: 10.1109/TIT.2006.871582.![]() ![]() ![]() |
[17] |
V. Dumoulin, I. Belghazi, B. Poole, A. Lamb, M. Arjovsky, O. Mastropietro and A. C. Courville, Adversarially learned inference, arXiv: 1606.00704.
![]() |
[18] |
P. Getreuer, Total variation inpainting using split bregman, Image Processing On Line, 2 (2012), 147-157.
![]() |
[19] |
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets, Advances in Neural Information Processing Systems, 27 (NIPS 2014).
![]() |
[20] |
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, G. Klambauer and S. Hochreiter, Gans trained by a two time-scale update rule converge to a nash equilibrium, Computer Science, http://arXiv.org/abs/1706.08500.
![]() |
[21] |
S.-W. Huang, C.-T. Lin, S.-P. Chen, Y.-Y. Wu, P.-H. Hsu and S.-H. Lai, Auggan: Cross domain adaptation with gan-based data augmentation, In Computer Vision – ECCV 2018, (eds. V. Ferrari, M. Hebert, C. Sminchisescu and Y. Weiss), Springer International Publishing, Cham, (2018), 731–744.
![]() |
[22] |
T. Karras, T. Aila, S. Laine and J. Lehtinen, Progressive growing of gans for improved quality, stability, and variation, Computer Science, http://arXiv.org/abs/1710.10196.
![]() |
[23] |
D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, Published as A Conference Paper at the 3rd International Conference for Learning Representations, San Diego, 2015, 2014, http://arXiv.org/abs/1412.6980.
![]() |
[24] |
D. P. Kingma and M. Welling, Auto-encoding variational bayes, 2013.
![]() |
[25] |
O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin and J. Matas, Deblurgan: Blind motion deblurring using conditional adversarial networks, In IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2018), 8183–8192.
![]() |
[26] |
A. B. L. Larsen, S. K. Sønderby and O. Winther, Autoencoding beyond pixels using a learned similarity metric, Computer Science, http://arXiv.org/abs/1512.09300.
![]() |
[27] |
C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang and W. Shi, Photo-realistic single image super-resolution using a generative adversarial network, (2017), 105–114.
![]() |
[28] |
Q. Lei, A. Jalal, I. S. Dhillon and A. G. Dimakis, Inverting deep generative models, one layer at a time, Computer Science, 2019, http://arXiv.org/abs/1906.07437.
![]() |
[29] |
X. Liang, H. Zhang and E. P. Xing, Generative semantic manipulation with mask-contrasting GAN, Lecture Notes in Computer Science, 11217 (2018), 574–590, http://arXiv.org/abs/1708.00315.
doi: 10.1007/978-3-030-01261-8_34.![]() ![]() |
[30] |
Z. C. Lipton and S. Tripathi, Precise recovery of latent vectors from generative adversarial networks, Computer Science, http://arXiv.org/abs/1702.04782.
![]() |
[31] |
Z. Liu, P. Luo, X. Wang and X. Tang, Deep learning face attributes in the wild, In IEEE International Conference on Computer Vision (ICCV), 2015.
doi: 10.1109/ICCV.2015.425.![]() ![]() |
[32] |
S. Menon, A. Damian, S. Hu, N. Ravi and C. Rudin, Pulse: Self-supervised photo upsampling via latent space exploration of generative models, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2002), 2434–2442.
![]() |
[33] |
A. Nguyen, J. Yosinski, Y. Bengio, A. Dosovitskiy and J. Clune, Plug & play generative networks: Conditional iterative generation of images in latent space, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, http://arXiv.org/abs/1612.00005.
doi: 10.1109/CVPR.2017.374.![]() ![]() |
[34] |
D. Pathak, P. Krähenbühl, J. Donahue, T. Darrell and A. Efros, Context encoders: Feature learning by inpainting, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
doi: 10.1109/CVPR.2016.278.![]() ![]() |
[35] |
A. Radford, L. Metz and S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial networks, In ICLR, 2016.
![]() |
[36] |
T. Ramstad, Bentheimer micro-ct with waterflood, 2018, http://www.digitalrocksportal.org/projects/172.
![]() |
[37] |
M. Rosca, B. Lakshminarayanan, D. Warde-Farley and S. Mohamed, Variational approaches for auto-encoding generative adversarial networks, arXiv: 1706.04987.
![]() |
[38] |
L. I. Rudin, S. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268.
doi: 10.1016/0167-2789(92)90242-F.![]() ![]() ![]() |
[39] |
J. Schlemper, J. Caballero, J. V. Hajnal, A. Price and D. Rueckert, A deep cascade of convolutional neural networks for mr image reconstruction, Information Processing in Medical Imaging, 10265 (2017), 647-658.
doi: 10.1007/978-3-319-59050-9_51.![]() ![]() |
[40] |
V. Shah and C. Hegde, Solving linear inverse problems using gan priors: An algorithm with provable guarantees, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2018), 4609–4613.
doi: 10.1109/ICASSP.2018.8462233.![]() ![]() |
[41] |
Y. Shen, J. Gu, X. Tang and B. Zhou, Interpreting the latent space of gans for semantic face editing, Computer Science, http://arXiv.org/abs/1907.10786.
![]() |
[42] |
D. Ulyanov, A. Vedaldi and V. S. Lempitsky, Adversarial generator-encoder networks, Computer Science, http://arXiv.org/abs/1704.02304.
![]() |
[43] |
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio and P. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, J. Mach. Learn. Res., 11 (2010), 3371–3408, http://dl.acm.org/citation.cfm?id=1756006.1953039.
![]() ![]() |
[44] |
P. Vincent, H. Larochelle, Y. Bengio and P.-A. Manzagol, Extracting and composing robust features with denoising autoencoders, In ICML '08, (2008), 1096–1103.
doi: 10.1145/1390156.1390294.![]() ![]() |
[45] |
Z. Wang, A. Bovik, H. Sheikh and E. Simoncelli, Image quality assessment: From error visibility to structural similarity, Image Processing, IEEE Transactions on, 13 (2004), 600-612.
doi: 10.1109/TIP.2003.819861.![]() ![]() |
[46] |
D. Warde-Farley and Y. Bengio, Improving generative adversarial networks with denoising feature matching, In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Workshop Track Proceedings, 2017, https://openreview.net/forum?id=S1X7nhsxl.
![]() |
[47] |
L. Xu and J. Jia, Two-phase kernel estimation for robust motion deblurring, Lecture Notes in Computer Science, 6311 (2010), 157-170.
doi: 10.1007/978-3-642-15549-9_12.![]() ![]() |
[48] |
Q. Yan and W. Wang, DCGANsfor image super-resolution, denoising and debluring.,
![]() |
[49] |
R. Yan and L. Shao, Blind image blur estimation via deep learning, IEEE Trans Image Process, 25 (2016), 1910-1921.
![]() ![]() |
[50] |
R. A. Yeh, C. Chen, T. Lim, A. G. Schwing, M. Hasegawa-Johnson and M. N. Do, Semantic image inpainting with deep generative models, In IEEE Conference on Computer Vision and Pattern Recognition, (CVPR 2017), Honolulu, HI, USA,, (2017), 6882–6890
doi: 10.1109/CVPR.2017.728.![]() ![]() |
[51] |
F. Yu, Y. Zhang, S. Song, A. Seff, T. Funkhouser and J. Xiao, LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop, Computer Science, http://arXiv.org/abs/1506.03365.
![]() |
[52] |
J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu and T. S. Huang, Generative image inpainting with contextual attention, IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
doi: 10.1109/CVPR.2018.00577.![]() ![]() |
[53] |
H. Zhang, Z. Zhang, A. Odena and H. Lee, Consistency regularization for generative adversarial networks, arXiv: 1910.12027.
![]() |
[54] |
J. Zhang, D. Zhao and W. Gao, Group-based sparse representation for image restoration, IEEE Trans. Image Process., 23 (2014), 3336-3351.
doi: 10.1109/TIP.2014.2323127.![]() ![]() ![]() |
[55] |
J. J. Zhao, M. Mathieu and Y. LeCun, Energy-based generative adversarial networks, In ICLR, 2017.
![]() |
Flow of training process in GE. Step 1 and 2 forms the pre-training phase, while the remaining form the solving phase
Reconstruction results on CelebA dataset
Reconstruction results on Digital Rock dataset
Reconstruction results on LSUN church dataset
Denoising results on CelebA dataset
Deblurring results on CelebA dataset
Super-resolution results on CelebA dataset
Inpainting results on CelebA dataset
Plot of log of average MSE based on number of iterations in the solving phase
Comparison of image reconstruction of detail region (red box) for original image (left). In order of comparison, from left to right, we have Original, GE, invertGAN, ConvAE
Comparison of image reconstruction of detail region (red box) for original image (left). In order of comparison, from top to bottom, we have Original, GE, invertGAN, ConvAE
Additional pore sample result on Digital Rock dataset
Missing spectacles sample results on CelebA dataset
Reconstruction results for