# American Institute of Mathematical Sciences

June  2022, 16(3): 525-545. doi: 10.3934/ipi.2021060

## Generative imaging and image processing via generative encoder

 1 National University of Singapore, 21 Lower Kent Ridge Rd, Singapore, Singapore 119077 2 Purdue University, 610 Purdue Mall, West Lafayette, IN 47907, USA

Received  January 2021 Revised  July 2021 Published  June 2022 Early access  October 2021

This paper introduces a novel generative encoder (GE) framework for generative imaging and image processing tasks like image reconstruction, compression, denoising, inpainting, deblurring, and super-resolution. GE unifies the generative capacity of GANs and the stability of AEs in an optimization framework instead of stacking GANs and AEs into a single network or combining their loss functions as in existing literature. GE provides a novel approach to visualizing relationships between latent spaces and the data space. The GE framework is made up of a pre-training phase and a solving phase. In the former, a GAN with generator
 $G$
capturing the data distribution of a given image set, and an AE network with encoder
 $E$
that compresses images following the estimated distribution by
 $G$
are trained separately, resulting in two latent representations of the data, denoted as the generative and encoding latent space respectively. In the solving phase, given noisy image
 $x = \mathcal{P}(x^*)$
, where
 $x^*$
is the target unknown image,
 $\mathcal{P}$
is an operator adding an addictive, or multiplicative, or convolutional noise, or equivalently given such an image
 $x$
in the compressed domain, i.e., given
 $m = E(x)$
, the two latent spaces are unified via solving the optimization problem
 $z^* = \underset{z}{\mathrm{argmin}} \|E(G(z))-m\|_2^2+\lambda\|z\|_2^2$
and the image
 $x^*$
is recovered in a generative way via
 $\hat{x}: = G(z^*)\approx x^*$
, where
 $\lambda>0$
is a hyperparameter. The unification of the two spaces allows improved performance against corresponding GAN and AE networks while visualizing interesting properties in each latent space.
Citation: Yong Zheng Ong, Haizhao Yang. Generative imaging and image processing via generative encoder. Inverse Problems and Imaging, 2022, 16 (3) : 525-545. doi: 10.3934/ipi.2021060
##### References:

show all references

##### References:
Flow of training process in GE. Step 1 and 2 forms the pre-training phase, while the remaining form the solving phase
Reconstruction results on CelebA dataset
Reconstruction results on Digital Rock dataset
Reconstruction results on LSUN church dataset
Denoising results on CelebA dataset
Deblurring results on CelebA dataset
Super-resolution results on CelebA dataset
Inpainting results on CelebA dataset
Plot of log of average MSE based on number of iterations in the solving phase
Comparison of image reconstruction of detail region (red box) for original image (left). In order of comparison, from left to right, we have Original, GE, invertGAN, ConvAE
Comparison of image reconstruction of detail region (red box) for original image (left). In order of comparison, from top to bottom, we have Original, GE, invertGAN, ConvAE
Additional pore sample result on Digital Rock dataset
Missing spectacles sample results on CelebA dataset
Reconstruction results for $64\times 64\times 3$ images in CelebA with GE using BEGAN instead of pGAN
Structure of $E$. The decoder $DC$ is a mirror of $E$ using conv_transpose and upsample
 layer type layer conv2d $k=[3,3,3,f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ conv2d $k=[3,3,f,2*f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ conv2d $k=[3,3,2*f,4*f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ conv2d $k=[3,3,4*f,8*f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ conv2d $k=[3,3,8*f,16*f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ conv2d $k=[3,3,16*f,32*f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ fullyconnected $h=256$
 layer type layer conv2d $k=[3,3,3,f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ conv2d $k=[3,3,f,2*f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ conv2d $k=[3,3,2*f,4*f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ conv2d $k=[3,3,4*f,8*f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ conv2d $k=[3,3,8*f,16*f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ conv2d $k=[3,3,16*f,32*f],s=[1,1],a=ReLU$ maxpool2d $k=[1,2,2,1],s=[2,2]$ fullyconnected $h=256$
Quantitative results comparing models for CelebA. Additionally, some FID scores reported by recent GAN papers that used CelebA $128\times128\times3$ images are also presented for comparison, labelled with *
 Model MSE SSIM FID CRGAN* 16.97 SSGAN* 24.36 Our pGAN 22.13 ConvAE 0.03386 0.6823$\pm$0.051 87.71 AEGAN 0.03317 0.6907$\pm$0.050 34.53 invertGAN 0.03529 0.7203$\pm$0.038 19.19 GE 0.03262 0.7329$\pm$0.025 17.42
 Model MSE SSIM FID CRGAN* 16.97 SSGAN* 24.36 Our pGAN 22.13 ConvAE 0.03386 0.6823$\pm$0.051 87.71 AEGAN 0.03317 0.6907$\pm$0.050 34.53 invertGAN 0.03529 0.7203$\pm$0.038 19.19 GE 0.03262 0.7329$\pm$0.025 17.42
Quantitative results comparing models for digital rocks. The number in brackets show the size of the latent vector in pGAN that the model is trained on. Models with same latent sizes are solved with the same pGAN weights. The same AE is used for all models
 Model MSE PSNR ConvAE 0.009271 20.32 invertGAN (512) 0.008185 20.86 GE (512) 0.007470 21.26 GE (256) 0.007741 21.11 GE (128) 0.007839 21.05 GE (64) 0.008499 20.70
 Model MSE PSNR ConvAE 0.009271 20.32 invertGAN (512) 0.008185 20.86 GE (512) 0.007470 21.26 GE (256) 0.007741 21.11 GE (128) 0.007839 21.05 GE (64) 0.008499 20.70
Results of invertGAN, GE on spectacles. T refers to samples which produced spectacles, F refers to samples which did not. Remaining are invalid reconstructions
 InvertGAN, F InvertGAN, T GE, F 289 32 GE, T 157 469
 InvertGAN, F InvertGAN, T GE, F 289 32 GE, T 157 469
 [1] Susu Zhang, Jiancheng Ni, Lijun Hou, Zili Zhou, Jie Hou, Feng Gao. Global-Affine and Local-Specific Generative Adversarial Network for semantic-guided image generation. Mathematical Foundations of Computing, 2021, 4 (3) : 145-165. doi: 10.3934/mfc.2021009 [2] Changming Song, Yun Wang. Nonlocal latent low rank sparse representation for single image super resolution via self-similarity learning. Inverse Problems and Imaging, 2021, 15 (6) : 1347-1362. doi: 10.3934/ipi.2021017 [3] Amine Laghrib, Abdelkrim Chakib, Aissam Hadri, Abdelilah Hakim. A nonlinear fourth-order PDE for multi-frame image super-resolution enhancement. Discrete and Continuous Dynamical Systems - B, 2020, 25 (1) : 415-442. doi: 10.3934/dcdsb.2019188 [4] Lacramioara Grecu, Constantin Popa. Constrained SART algorithm for inverse problems in image reconstruction. Inverse Problems and Imaging, 2013, 7 (1) : 199-216. doi: 10.3934/ipi.2013.7.199 [5] Wei Wan, Weihong Guo, Jun Liu, Haiyang Huang. Non-local blind hyperspectral image super-resolution via 4d sparse tensor factorization and low-rank. Inverse Problems and Imaging, 2020, 14 (2) : 339-361. doi: 10.3934/ipi.2020015 [6] Moez Kallel, Maher Moakher, Anis Theljani. The Cauchy problem for a nonlinear elliptic equation: Nash-game approach and application to image inpainting. Inverse Problems and Imaging, 2015, 9 (3) : 853-874. doi: 10.3934/ipi.2015.9.853 [7] Jie Huang, Marco Donatelli, Raymond H. Chan. Nonstationary iterated thresholding algorithms for image deblurring. Inverse Problems and Imaging, 2013, 7 (3) : 717-736. doi: 10.3934/ipi.2013.7.717 [8] Weihong Guo, Jing Qin. A geometry guided image denoising scheme. Inverse Problems and Imaging, 2013, 7 (2) : 499-521. doi: 10.3934/ipi.2013.7.499 [9] Nam-Yong Lee, Bradley J. Lucier. Preconditioned conjugate gradient method for boundary artifact-free image deblurring. Inverse Problems and Imaging, 2016, 10 (1) : 195-225. doi: 10.3934/ipi.2016.10.195 [10] Xiangtuan Xiong, Jinmei Li, Jin Wen. Some novel linear regularization methods for a deblurring problem. Inverse Problems and Imaging, 2017, 11 (2) : 403-426. doi: 10.3934/ipi.2017019 [11] Nils Dabrock, Yves van Gennip. A note on "Anisotropic total variation regularized $L^1$-approximation and denoising/deblurring of 2D bar codes". Inverse Problems and Imaging, 2018, 12 (2) : 525-526. doi: 10.3934/ipi.2018022 [12] Rustum Choksi, Yves van Gennip, Adam Oberman. Anisotropic total variation regularized $L^1$ approximation and denoising/deblurring of 2D bar codes. Inverse Problems and Imaging, 2011, 5 (3) : 591-617. doi: 10.3934/ipi.2011.5.591 [13] Yuan Shen, Lei Ji. Partial convolution for total variation deblurring and denoising by new linearized alternating direction method of multipliers with extension step. Journal of Industrial and Management Optimization, 2019, 15 (1) : 159-175. doi: 10.3934/jimo.2018037 [14] Weishi Yin, Jiawei Ge, Pinchao Meng, Fuheng Qu. A neural network method for the inverse scattering problem of impenetrable cavities. Electronic Research Archive, 2020, 28 (2) : 1123-1142. doi: 10.3934/era.2020062 [15] Lucie Baudouin, Emmanuelle Crépeau, Julie Valein. Global Carleman estimate on a network for the wave equation and application to an inverse problem. Mathematical Control and Related Fields, 2011, 1 (3) : 307-330. doi: 10.3934/mcrf.2011.1.307 [16] Wei Wan, Haiyang Huang, Jun Liu. Local block operators and TV regularization based image inpainting. Inverse Problems and Imaging, 2018, 12 (6) : 1389-1410. doi: 10.3934/ipi.2018058 [17] Marko Filipović, Ivica Kopriva. A comparison of dictionary based approaches to inpainting and denoising with an emphasis to independent component analysis learned dictionaries. Inverse Problems and Imaging, 2011, 5 (4) : 815-841. doi: 10.3934/ipi.2011.5.815 [18] Valter Pohjola. An inverse problem for the magnetic Schrödinger operator on a half space with partial data. Inverse Problems and Imaging, 2014, 8 (4) : 1169-1189. doi: 10.3934/ipi.2014.8.1169 [19] Tianyu Yang, Yang Yang. A stable non-iterative reconstruction algorithm for the acoustic inverse boundary value problem. Inverse Problems and Imaging, 2022, 16 (1) : 1-18. doi: 10.3934/ipi.2021038 [20] Fatimzehrae Ait Bella, Aissam Hadri, Abdelilah Hakim, Amine Laghrib. A nonlocal Weickert type PDE applied to multi-frame super-resolution. Evolution Equations and Control Theory, 2021, 10 (3) : 633-655. doi: 10.3934/eect.2020084

2020 Impact Factor: 1.639