\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

CAFLOW: Conditional autoregressive flows

  • *Corresponding author: Georgios Batzolis

    *Corresponding author: Georgios Batzolis 

The first author is supported by GSK.

Abstract / Introduction Full Text(HTML) Figure(16) / Table(3) Related Papers Cited by
  • We introduce CAFLOW, a new diverse image-to-image translation model that simultaneously leverages the power of autoregressive modeling and the modeling efficiency of conditional normalizing flows. We transform the conditioning image into a sequence of latent encodings using a multiscale normalizing flow and repeat the process for the conditioned image. We model the conditional distribution of the latent encodings by modeling the autoregressive distributions with an efficient multi-scale normalizing flow, where each conditioning factor affects image synthesis at its respective resolution scale. Our proposed framework performs well on a range of image-to-image translation tasks. It outperforms former designs of conditional flows because of its expressive autoregressive structure.

    Mathematics Subject Classification: Primary: 68T07; Secondary: 68U10.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  From left to right: ideal dependencies in the $ i^{th} $ autoregressive component. Dual-Glow modeling assumption [23]; information is exchanged only between latent spaces having the same dimension. Our modeling assumption; we retain the dependencies between $ L_i $ and the latent spaces of lower dimension

    Figure 2.  Left: unconditional normalizing flow architecture used to encode conditioning and conditioned images, denoted by $ Y_n = Y $ and $ W_n = W $ respectively, into a sequence of hierarchical latent variables. Right: design of the conditional transformation $ G_{i}^\theta $ that models the $ i^{th} $ autoregressive component. The index of the flow $ i $ is omitted in both the transformed latent variable $ Z_j $ and the intermediate latent variables $ Z_j^{\prime} $ for simplicity

    Figure 3.  10 super-resolved versions of the LR image in decreasing conditional log-likelihood order

    Figure 4.  Qualitative comparison of Dual-Glow+ and CAFLOW

    Figure 5.  Qualitative evaluation on FFHQ 4x super-resolution of 16x16 resolution images

    Figure 6.  Qualitative evaluation: Four colorizations proposed by CAFLOW, CINN and ColorGAN for three test images. ColorGAN generates unrealistically diverse colorizations with significant color artifacts (for example a yellow region on a white wall). CINN generates more realistic less diverse colorizations with less pronounced color artifacts compared to ColorGAN, which is reflected in the improved FID score. Finally, CAFLOW generates even more realistic and less diverse colorizations than CINN with even rarer color artifacts, which is more representative of the data distribution according to the FID score

    Figure 7.  Different inpaintings proposed by CAFLOW with $ \tau = 0.5 $. Ground truth on the right

    Figure 8.  Image super-resolution on the FFHQ dataset. Left: LR bicubicly upsampled. Right: HR image. Middle: 10 super-resolved versions in decreasing conditional log-likelihood order from left to right. We sampled 20 super-resolved images for each LR image and we present the 10 images with the highest conditional log-likelihood. We used sampling temperature $ \tau = 0.5 $

    Figure 9.  Image super-resolution on the FFHQ dataset. Left: LR bicubicly upsampled. Right: HR image. Middle: 10 super-resolved versions in decreasing conditional log-likelihood order from left to right. We sampled 20 super-resolved images for each LR image and we present the 10 images with the highest conditional log-likelihood. We used sampling temperature $ \tau = 0.55 $

    Figure 10.  Image inpainting on the CelebA dataset. Left: Masked image. Right: Ground truth. Middle: 10 inpainted versions in decreasing conditional log-likelihood order from left to right. We sampled 30 inpainted images for each masked image and we present the 10 images with the highest conditional log-likelihood. We used sampling temperature $ \tau = 0.5 $

    Figure 11.  Image inpainting on the CelebA dataset. Left: Masked image. Right: Ground truth. Middle: 10 inpainted versions in decreasing conditional log-likelihood order from left to right. We sampled 30 inpainted images for each masked image and we present the 10 images with the highest conditional log-likelihood. We used sampling temperature $ \tau = 0.5 $

    Figure 12.  Image colorization on the LSUN BEDROOM dataset. Left: Grayscale image. Right: Ground truth. Middle: 10 colorized versions in decreasing conditional log-likelihood order from left to right. We sampled 25 colorized images for each greyscale image and we present the 10 images with the highest conditional log-likelihood. We used sampling temperature $ \tau = 0.85 $

    Figure 13.  Image colorization on the LSUN BEDROOM dataset. Left: Grayscale image. Right: Ground truth. Middle: 10 colorized versions in decreasing conditional log-likelihood order from left to right. We sampled 25 colorized images for each greyscale image and we present the 10 images with the highest conditional log-likelihood. We used sampling temperature $ \tau = 0.85 $

    Figure 14.  Image colorization on the FFHQ dataset. Left: Grayscale image. Right: Ground truth. Middle: 10 colorized versions in decreasing conditional log-likelihood order from left to right. We sampled 25 colorized images for each greyscale image and we present the 10 images with the highest conditional log-likelihood. We used sampling temperature $ \tau = 0.7 $

    Figure 15.  Image colorization on the FFHQ dataset. Left: Grayscale image. Right: Ground truth. Middle: 10 colorized versions in decreasing conditional log-likelihood order from left to right. We sampled 25 colorized images for each greyscale image and we present the 10 images with the highest conditional log-likelihood. We used sampling temperature $ \tau = 0.7 $

    Figure 16.  Sketch to image synthesis on the edges2shoes dataset [10]. Left: Sketch. Right: Ground truth. Middle: 6 samples taken with sampling temperature $ \tau = 0.8 $

    Table 1.  Quantitative evaluation of (x4) super-resolution on FFHQ $ 16^2 $. We report LPIPS/RMSE scores for each method. Lower scores are better

    Dataset CAFLOW Dual-Glow+ BRGM [18] ESRGAN [26] SRFBN [13] BICUBIC
    FFHQ $ 16^2 $ 0.08/17.56 0.14/18.56 0.24/25.66 0.35/29.32 0.33/22.07 0.34/20.10
     | Show Table
    DownLoad: CSV

    Table 2.  Quantitative evaluation of colorization on LSUN BEDROOM $ 64\times 64 $ dataset. We report FID score for each method. Lower scores are better

    Metric CAFLOW CINN [1] ColorGAN [3]
    FID 18.15 26.48 28.31
     | Show Table
    DownLoad: CSV

    Table 3.  Quantitative evaluation of inpainting on the CelebA dataset. We report PSNR and LPIPS scores for each method

    Method PSNR$ \uparrow $ LPIPS$ \downarrow $
    CAFLOW 26.08 0.06
    [16] 24.88 -
     | Show Table
    DownLoad: CSV
  • [1] L. Ardizzone, C. Lüth, J. Kruse, C. Rother and U. Köthe, Guided image generation with conditional invertible neural networks, arXiv preprint, arXiv: 1907.02392.
    [2] J. Behrmann, P. Vicol, K.-C. Wang, R. Grosse and J.-H. Jacobsen, Understanding and mitigating exploding inverses in invertible neural networks, in International Conference on Artificial Intelligence and Statistics, PMLR, (2021), 1792-1800.
    [3] M. G. Blanch, M. Mrak, A. F. Smeaton and N. E. O'Connor, End-to-end conditional gan-based architectures for image colourisation, in 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP), 2019, 1-6.
    [4] R. T. Q. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural ordinary differential equations, in Advances in Neural Information Processing Systems, 31 2018.
    [5] L. Dinh, D. Krueger and Y. Bengio, NICE: Non-linear independent components estimation, in 3rd International Conference on Learning Representations, ICLR 2015 (eds. Y. Bengio and Y. LeCun), 2015.
    [6] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets, in Advances in Neural Information Processing Systems, 27 2014.
    [7] A. GroverC. ChuteR. ShuZ. Cao and S. Ermon, Alignflow: Cycle consistent learning from multiple domains via normalizing flows, Proceedings of the AAAI Conference on Artificial Intelligence, 34 (2020), 4028-4035.  doi: 10.1609/aaai.v34i04.5820.
    [8] J. Ho, X. Chen, A. Srinivas, Y. Duan and P. Abbeel, Flow++: Improving flow-based generative models with variational dequantization and architecture design, in International Conference on Machine Learning, PMLR, (2019), 2722-2730.
    [9] C.-W. HuangD. KruegerA. Lacoste and A. Courville, Neural autoregressive flows, Proceedings of the 35th International Conference on Machine Learning, 80 (2018), 2078-2087. 
    [10] P. Isola, J.-Y. Zhu, T. Zhou and A. A. Efros, Image-to-image translation with conditional adversarial networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 1125-1134.
    [11] T. Karras, S. Laine and T. Aila, A style-based generator architecture for generative adversarial networks, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 4401-4410.
    [12] D. P. Kingma and P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions, Advances in Neural Information Processing Systems, 31 (2018), 10215-10224. 
    [13] Z. Li, J. Yang, Z. Liu, X. Yang, G. Jeon and W. Wu, Feedback network for image super-resolution, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), 3867-3876.
    [14] J. Liang, A. Lugmayr, K. Zhang, M. Danelljan, L. Van Gool and R. Timofte, Hierarchical conditional flow: A unified framework for image super-resolution and image rescaling, in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 4076-4085.
    [15] Z. Liu, P. Luo, X. Wang and X. Tang, Deep learning face attributes in the wild, in Proceedings of the IEEE International Conference on Computer Vision, (2015), 3730-3738.
    [16] Y. Lu and B. Huang, Structured output learning with conditional generative flows, AAAI.
    [17] A. Lugmayr, M. Danelljan, L. Van Gool and R. Timofte, Srflow: Learning the super-resolution space with normalizing flow, in Computer Vision–ECCV 2020, 2020.
    [18] R. V. Marinescu, D. Moyer and P. Golland, Bayesian image reconstruction using deep generative models, CoRR, abs/2012.04567, https://arXiv.org/abs/2012.04567.
    [19] D. OnkenS. W. FungX. Li and L. Ruthotto, Ot-flow: Fast and accurate continuous normalizing flows via optimal transport, Proceedings of the AAAI Conference on Artificial Intelligence, 35 (2021), 9223-9232.  doi: 10.1609/aaai.v35i10.17113.
    [20] A. Pumarola, S. Popov, F. Moreno-Noguer and V. Ferrari, C-flow: Conditional generative flow models for images and 3d point clouds, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 7946-7955.
    [21] D. Rezende and S. Mohamed, Variational inference with normalizing flows, in International Conference on Machine Learning, PMLR, (2015), 1530-1538.
    [22] Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon and B. Poole, Score-based generative modeling through stochastic differential equations, arXiv preprint, arXiv: 2011.13456.
    [23] H. Sun, R. Mehta, H. H. Zhou, Z. Huang, S. C. Johnson, V. Prabhakaran and V. Singh, Dual-glow: Conditional flow-based generative model for modality transfer, in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
    [24] A. Verine, B. Negrevergne, Y. Chevaleyre and F. Rossi, On the expressivity of Bi-Lipschitz normalizing flows, in Asian Conference on Machine Learning, PMLR, (2023), 1054-1069.
    [25] Y. Viazovetskyi, V. Ivashkin and E. Kashin, Stylegan2 distillation for feed-forward image manipulation, in European Conference on Computer Vision, Springer, (2020), 170-186.
    [26] X. Wang, K. Yu, S. Wu, J. Gu, Y. Liu, C. Dong, Y. Qiao and C. C. Loy, Esrgan: Enhanced super-resolution generative adversarial networks, in Computer Vision–ECCV 2018 Workshops (eds. L. Leal-Taixé and S. Roth), 2019, 63-79.
    [27] H. Wu, J. Köhler and F. Noé, Stochastic normalizing flows, arXiv preprint, arXiv: 2002.06707.
    [28] F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser and J. Xiao, Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop, arXiv preprint, arXiv: 1506.03365.
    [29] J. J. Yu, K. Derpanis and M. A. Brubaker, Wavelet flow: Fast training of high resolution normalizing flows, in NeurIPS, 2020.
  • 加载中

Figures(16)

Tables(3)

SHARE

Article Metrics

HTML views(1173) PDF downloads(122) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return