August  2021, 4(3): 145-165. doi: 10.3934/mfc.2021009

Global-Affine and Local-Specific Generative Adversarial Network for semantic-guided image generation

Qufu Normal University, Qufu, China

* Corresponding author: nijch@163.com

Received  September 2020 Revised  March 2021 Published  August 2021 Early access  June 2021

The recent progress in learning image feature representations has opened the way for tasks such as label-to-image or text-to-image synthesis. However, one particular challenge widely observed in existing methods is the difficulty of synthesizing fine-grained textures and small-scale instances. In this paper, we propose a novel Global-Affine and Local-Specific Generative Adversarial Network (GALS-GAN) to explicitly construct global semantic layouts and learn distinct instance-level features. To achieve this, we adopt the graph convolutional network to calculate the instance locations and spatial relationships from scene graphs, which allows our model to obtain the high-fidelity semantic layouts. Also, a local-specific generator, where we introduce the feature filtering mechanism to separately learn semantic maps for different categories, is utilized to disentangle and generate specific visual features. Moreover, we especially apply a weight map predictor to better combine the global and local pathways considering the highly complementary between these two generation sub-networks. Extensive experiments on the COCO-Stuff and Visual Genome datasets demonstrate the superior generation performance of our model against previous methods, our approach is more capable of capturing photo-realistic local characteristics and rendering small-sized entities with more details.

Citation: Susu Zhang, Jiancheng Ni, Lijun Hou, Zili Zhou, Jie Hou, Feng Gao. Global-Affine and Local-Specific Generative Adversarial Network for semantic-guided image generation. Mathematical Foundations of Computing, 2021, 4 (3) : 145-165. doi: 10.3934/mfc.2021009
References:
[1]

H. Caesar, J. Uijlings and V. Ferrari, COCO-Stuff: Thing and stuff classes in context, IEEE Conference on Computer Vision and Pattern Recognition, (2018), 1209–1218. doi: 10.1109/CVPR.2018.00132.  Google Scholar

[2]

W. L. Chen and J. Hays, Sketchygan: Towards diverse and realistic sketch to image synthesis, IEEE Conference on Computer Vision and Pattern Recognition, (2018), 9416–9425. doi: 10.1109/CVPR.2018.00981.  Google Scholar

[3]

B. Chen, T. Liu, K. Liu, H. Liu and S. Pei, Image Super-Resolution Using Complex Dense Block on Generative Adversarial Networks, IEEE International Conference on Image Processing, (2019), 2866–2870. doi: 10.1109/ICIP.2019.8803711.  Google Scholar

[4]

Y. Choi, M. Choi, M. Kim, J. M. Ha, S. H. Kim and J. Choo, Stargan: Unified generative adversarial networks for multi-domain image-to-image translation, IEEE Conference on Computer Vision and Pattern Recognition, (2018), 8789–8797. doi: 10.1109/CVPR.2018.00916.  Google Scholar

[5]

Y. Choi, Y. Uh, J. Yoo and J. W. Ha, StarGAN v2: Diverse image synthesis for multiple domains, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 8185–8194. doi: 10.1109/CVPR42600.2020.00821.  Google Scholar

[6]

H. Dhamo, A. Farshad, I. Laina, N. Navab, G. D. Hager, F. Tombari and C. Rupprecht, Semantic image manipulation using scene graphs, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 5212–5221. doi: 10.1109/CVPR42600.2020.00526.  Google Scholar

[7]

C. Gao, Q. Liu, Q. Xu, L. Wang, J. Liu and C. Zou, SketchyCOCO: Image generation from freehand scene sketches, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 5173–5182. doi: 10.1109/CVPR42600.2020.00522.  Google Scholar

[8]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets, Advances in Neural Information Processing Systems, (2014), 2672–2680. Google Scholar

[9]

S. Hong, D. Yang, J. Choi and H. Lee, Inferring semantic layout for hierarchical text-to-image synthesis, IEEE Conference on Computer Vision and Pattern Recognition, (2018), 7986–7994. doi: 10.1109/CVPR.2018.00833.  Google Scholar

[10]

J. Johnson, A. Gupta and F. F. Li, Image generation from scene graphs, IEEE Conference on Computer Vision and Pattern Recognition, (2018), 1219–1228. doi: 10.1109/CVPR.2018.00133.  Google Scholar

[11]

T. Kaneko, Y. Ushiku and T. Harada, Label-noise robust generative adversarial networks, IEEE Conference on Computer Vision and Pattern Recognition, (2019), 2462–2471. doi: 10.1109/CVPR.2019.00257.  Google Scholar

[12]

S. W. Kim, Y. Zhou, J. Philion, A. Torralba and S. Fidler, Learning to Simulate Dynamic Environments With GameGAN, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 1228–1237. doi: 10.1109/CVPR42600.2020.00131.  Google Scholar

[13]

D. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations, 2019. Google Scholar

[14]

T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, preprint, arXiv: 1609.02907. Google Scholar

[15]

R. Krishna, Visual genome: Connecting language and vision using crowdsourced dense image annotations, International Journal of Computer Vision, 123 (2017), 32-73.  doi: 10.1007/s11263-016-0981-7.  Google Scholar

[16]

T. Y. LinM. MaireS. BelongieJ. HaysP. PeronaD. RamananP. Dollar and C. L. Zitnick, Microsoft coco: Common objects in context, European Conference on Computer Vision, 8693 (2014), 740-755.  doi: 10.1007/978-3-319-10602-1_48.  Google Scholar

[17]

M. Li, H. Huang, L. Ma, W. Liu, T. Zhang and Y. Jiang, Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks, European Conference on Computer Vision, (2018), 186–201. doi: 10.1007/978-3-030-01240-3_12.  Google Scholar

[18]

W. Li, P. Zhang, L. Zhang, Q. Huang, X. He, S. Lyu and J. Gao, Object-driven text-to-image synthesis via adversarial training, IEEE Conference on Computer Vision and Pattern Recognition, (2019), 12166–12174. doi: 10.1109/CVPR.2019.01245.  Google Scholar

[19]

Y. Li, T. Ma, Y. Bai, N. Duan, S. Wei, and X. Wang, Pastegan: A semi-parametric method to generate image from scene graph, Advances in Neural Information Processing Systems, 2019. Google Scholar

[20]

B. Li, B. Zhuang, M. Li and J. Gu, Seq-SG2SL: Inferring semantic layout from scene graph through sequence to sequence learning, IEEE International Conference on Computer Vision, (2019), 7434–7442. doi: 10.1109/ICCV.2019.00753.  Google Scholar

[21]

S. Liu, T. Wang, D. Bau, J. Y. Zhu and A. Torralba, Diverse Image Generation via Self-Conditioned GANs, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 14274–14283. doi: 10.1109/CVPR42600.2020.01429.  Google Scholar

[22]

S. Nam, Y. Kim and S. J. Kim, Text-adaptive generative adversarial networks: Manipulating images with natural language, Advances in Neural Information Processing Systems, (2018), 42–51. Google Scholar

[23]

J. C. NiS. S. ZhangZ. L. ZhouJ. Hou and F. Gao, Instance Mask Embedding and Attribute-Adaptive Generative Adversarial Network for Text-to-Image Synthesis, IEEE Access, 8 (2020), 37697-37711.  doi: 10.1109/ACCESS.2020.2975841.  Google Scholar

[24]

T. Park, M. Y. Liu, T. C. Wang and J. Y. Zhu, Semantic image synthesis with spatially-adaptive normalization, IEEE Conference on Computer Vision and Pattern Recognition, (2019), 2332–2341. doi: 10.1109/CVPR.2019.00244.  Google Scholar

[25]

T. Qiao, J. Zhang, D. Xu, and D. Tao, Mirrorgan: Learning text-to-image generation by redescription, IEEE Conference on Computer Vision and Pattern Recognition, (2019), 1505–1514. Google Scholar

[26]

S. Ravuri and O. Vinyals, Classification accuracy score for conditional generative models, preprint, arXiv: 1905.10887. Google Scholar

[27]

S. RenK. HeR. Girshick and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (2016), 1137-1149.  doi: 10.1109/TPAMI.2016.2577031.  Google Scholar

[28]

S. Sah, D. Peri, A. Shringi, C. Zhang, M. Dominguez, A. Savakis and R. Ptucha, Semantically invariant text-to-image generation, IEEE International Conference on Image Processing, (2018), 3783–3787. doi: 10.1109/ICIP.2018.8451656.  Google Scholar

[29]

Y. Shen, J. Gu, X. Tang and B. Zhou, Interpreting the Latent space of GANs for semantic face editing, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 9240–9249. doi: 10.1109/CVPR42600.2020.00926.  Google Scholar

[30]

T. R. Shaham, T. Dekel and T. Michaeli, SinGAN: Learning a generative model from a single natural image, IEEE International Conference on Computer Vision, (2019), 4569–4579. doi: 10.1109/ICCV.2019.00467.  Google Scholar

[31]

W. Sun and T. F. Wu, Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis, preprint, arXiv: 2003.11571. Google Scholar

[32]

T. Sylvain, P. C. Zhang, Y. Bengio, R. D. Hjelm and S. Sharma, Object-centric image generation from layouts, preprint, arXiv: 2003.07449. Google Scholar

[33]

C. Szegedy, et al., Going deeper with convolutions, IEEE Conference on Computer Vision and Pattern Recognition, (2015), 1–9. doi: 10.1109/CVPR.2015.7298594.  Google Scholar

[34]

H. TangH. Liu and N. Sebe, Unified generative adversarial networks for controllable image-to-image translation, IEEE Transactions on Image Processing, 29 (2020), 8916-8929.  doi: 10.1109/TIP.2020.3021789.  Google Scholar

[35]

N. N. Vo and J. Hays, Localizing and orienting street views using overhead imagery, European Conference on Computer Vision, (2016), 494–509. doi: 10.1007/978-3-319-46448-0_30.  Google Scholar

[36]

D. M. Vo and A. Sugimoto, Visual-relation conscious image generation from structured-text, preprint, arXiv: 1908.01741. Google Scholar

[37]

H. Yu, Y. Huang, L. Pi and L. Wang, Recurrent deconvolutional generative adversarial networks with application to video generation, Pattern Recognition and Computer Vision, (2019), 18–28. doi: 10.1007/978-3-030-31723-2_2.  Google Scholar

[38]

L. Z. Zhang, J. C. Wang, Y. S. Xu, J. Min, T. Wen, J. C. Gee and J. B. Shi, Nested Scale-Editing for Conditional Image Synthesis, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 5476–5486. doi: 10.1109/CVPR42600.2020.00552.  Google Scholar

show all references

References:
[1]

H. Caesar, J. Uijlings and V. Ferrari, COCO-Stuff: Thing and stuff classes in context, IEEE Conference on Computer Vision and Pattern Recognition, (2018), 1209–1218. doi: 10.1109/CVPR.2018.00132.  Google Scholar

[2]

W. L. Chen and J. Hays, Sketchygan: Towards diverse and realistic sketch to image synthesis, IEEE Conference on Computer Vision and Pattern Recognition, (2018), 9416–9425. doi: 10.1109/CVPR.2018.00981.  Google Scholar

[3]

B. Chen, T. Liu, K. Liu, H. Liu and S. Pei, Image Super-Resolution Using Complex Dense Block on Generative Adversarial Networks, IEEE International Conference on Image Processing, (2019), 2866–2870. doi: 10.1109/ICIP.2019.8803711.  Google Scholar

[4]

Y. Choi, M. Choi, M. Kim, J. M. Ha, S. H. Kim and J. Choo, Stargan: Unified generative adversarial networks for multi-domain image-to-image translation, IEEE Conference on Computer Vision and Pattern Recognition, (2018), 8789–8797. doi: 10.1109/CVPR.2018.00916.  Google Scholar

[5]

Y. Choi, Y. Uh, J. Yoo and J. W. Ha, StarGAN v2: Diverse image synthesis for multiple domains, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 8185–8194. doi: 10.1109/CVPR42600.2020.00821.  Google Scholar

[6]

H. Dhamo, A. Farshad, I. Laina, N. Navab, G. D. Hager, F. Tombari and C. Rupprecht, Semantic image manipulation using scene graphs, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 5212–5221. doi: 10.1109/CVPR42600.2020.00526.  Google Scholar

[7]

C. Gao, Q. Liu, Q. Xu, L. Wang, J. Liu and C. Zou, SketchyCOCO: Image generation from freehand scene sketches, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 5173–5182. doi: 10.1109/CVPR42600.2020.00522.  Google Scholar

[8]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets, Advances in Neural Information Processing Systems, (2014), 2672–2680. Google Scholar

[9]

S. Hong, D. Yang, J. Choi and H. Lee, Inferring semantic layout for hierarchical text-to-image synthesis, IEEE Conference on Computer Vision and Pattern Recognition, (2018), 7986–7994. doi: 10.1109/CVPR.2018.00833.  Google Scholar

[10]

J. Johnson, A. Gupta and F. F. Li, Image generation from scene graphs, IEEE Conference on Computer Vision and Pattern Recognition, (2018), 1219–1228. doi: 10.1109/CVPR.2018.00133.  Google Scholar

[11]

T. Kaneko, Y. Ushiku and T. Harada, Label-noise robust generative adversarial networks, IEEE Conference on Computer Vision and Pattern Recognition, (2019), 2462–2471. doi: 10.1109/CVPR.2019.00257.  Google Scholar

[12]

S. W. Kim, Y. Zhou, J. Philion, A. Torralba and S. Fidler, Learning to Simulate Dynamic Environments With GameGAN, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 1228–1237. doi: 10.1109/CVPR42600.2020.00131.  Google Scholar

[13]

D. Kingma and J. Ba, Adam: A method for stochastic optimization, International Conference on Learning Representations, 2019. Google Scholar

[14]

T. N. Kipf and M. Welling, Semi-supervised classification with graph convolutional networks, preprint, arXiv: 1609.02907. Google Scholar

[15]

R. Krishna, Visual genome: Connecting language and vision using crowdsourced dense image annotations, International Journal of Computer Vision, 123 (2017), 32-73.  doi: 10.1007/s11263-016-0981-7.  Google Scholar

[16]

T. Y. LinM. MaireS. BelongieJ. HaysP. PeronaD. RamananP. Dollar and C. L. Zitnick, Microsoft coco: Common objects in context, European Conference on Computer Vision, 8693 (2014), 740-755.  doi: 10.1007/978-3-319-10602-1_48.  Google Scholar

[17]

M. Li, H. Huang, L. Ma, W. Liu, T. Zhang and Y. Jiang, Unsupervised image-to-image translation with stacked cycle-consistent adversarial networks, European Conference on Computer Vision, (2018), 186–201. doi: 10.1007/978-3-030-01240-3_12.  Google Scholar

[18]

W. Li, P. Zhang, L. Zhang, Q. Huang, X. He, S. Lyu and J. Gao, Object-driven text-to-image synthesis via adversarial training, IEEE Conference on Computer Vision and Pattern Recognition, (2019), 12166–12174. doi: 10.1109/CVPR.2019.01245.  Google Scholar

[19]

Y. Li, T. Ma, Y. Bai, N. Duan, S. Wei, and X. Wang, Pastegan: A semi-parametric method to generate image from scene graph, Advances in Neural Information Processing Systems, 2019. Google Scholar

[20]

B. Li, B. Zhuang, M. Li and J. Gu, Seq-SG2SL: Inferring semantic layout from scene graph through sequence to sequence learning, IEEE International Conference on Computer Vision, (2019), 7434–7442. doi: 10.1109/ICCV.2019.00753.  Google Scholar

[21]

S. Liu, T. Wang, D. Bau, J. Y. Zhu and A. Torralba, Diverse Image Generation via Self-Conditioned GANs, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 14274–14283. doi: 10.1109/CVPR42600.2020.01429.  Google Scholar

[22]

S. Nam, Y. Kim and S. J. Kim, Text-adaptive generative adversarial networks: Manipulating images with natural language, Advances in Neural Information Processing Systems, (2018), 42–51. Google Scholar

[23]

J. C. NiS. S. ZhangZ. L. ZhouJ. Hou and F. Gao, Instance Mask Embedding and Attribute-Adaptive Generative Adversarial Network for Text-to-Image Synthesis, IEEE Access, 8 (2020), 37697-37711.  doi: 10.1109/ACCESS.2020.2975841.  Google Scholar

[24]

T. Park, M. Y. Liu, T. C. Wang and J. Y. Zhu, Semantic image synthesis with spatially-adaptive normalization, IEEE Conference on Computer Vision and Pattern Recognition, (2019), 2332–2341. doi: 10.1109/CVPR.2019.00244.  Google Scholar

[25]

T. Qiao, J. Zhang, D. Xu, and D. Tao, Mirrorgan: Learning text-to-image generation by redescription, IEEE Conference on Computer Vision and Pattern Recognition, (2019), 1505–1514. Google Scholar

[26]

S. Ravuri and O. Vinyals, Classification accuracy score for conditional generative models, preprint, arXiv: 1905.10887. Google Scholar

[27]

S. RenK. HeR. Girshick and J. Sun, Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (2016), 1137-1149.  doi: 10.1109/TPAMI.2016.2577031.  Google Scholar

[28]

S. Sah, D. Peri, A. Shringi, C. Zhang, M. Dominguez, A. Savakis and R. Ptucha, Semantically invariant text-to-image generation, IEEE International Conference on Image Processing, (2018), 3783–3787. doi: 10.1109/ICIP.2018.8451656.  Google Scholar

[29]

Y. Shen, J. Gu, X. Tang and B. Zhou, Interpreting the Latent space of GANs for semantic face editing, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 9240–9249. doi: 10.1109/CVPR42600.2020.00926.  Google Scholar

[30]

T. R. Shaham, T. Dekel and T. Michaeli, SinGAN: Learning a generative model from a single natural image, IEEE International Conference on Computer Vision, (2019), 4569–4579. doi: 10.1109/ICCV.2019.00467.  Google Scholar

[31]

W. Sun and T. F. Wu, Learning Layout and Style Reconfigurable GANs for Controllable Image Synthesis, preprint, arXiv: 2003.11571. Google Scholar

[32]

T. Sylvain, P. C. Zhang, Y. Bengio, R. D. Hjelm and S. Sharma, Object-centric image generation from layouts, preprint, arXiv: 2003.07449. Google Scholar

[33]

C. Szegedy, et al., Going deeper with convolutions, IEEE Conference on Computer Vision and Pattern Recognition, (2015), 1–9. doi: 10.1109/CVPR.2015.7298594.  Google Scholar

[34]

H. TangH. Liu and N. Sebe, Unified generative adversarial networks for controllable image-to-image translation, IEEE Transactions on Image Processing, 29 (2020), 8916-8929.  doi: 10.1109/TIP.2020.3021789.  Google Scholar

[35]

N. N. Vo and J. Hays, Localizing and orienting street views using overhead imagery, European Conference on Computer Vision, (2016), 494–509. doi: 10.1007/978-3-319-46448-0_30.  Google Scholar

[36]

D. M. Vo and A. Sugimoto, Visual-relation conscious image generation from structured-text, preprint, arXiv: 1908.01741. Google Scholar

[37]

H. Yu, Y. Huang, L. Pi and L. Wang, Recurrent deconvolutional generative adversarial networks with application to video generation, Pattern Recognition and Computer Vision, (2019), 18–28. doi: 10.1007/978-3-030-31723-2_2.  Google Scholar

[38]

L. Z. Zhang, J. C. Wang, Y. S. Xu, J. Min, T. Wen, J. C. Gee and J. B. Shi, Nested Scale-Editing for Conditional Image Synthesis, IEEE Conference on Computer Vision and Pattern Recognition, (2020), 5476–5486. doi: 10.1109/CVPR42600.2020.00552.  Google Scholar

Figure 1.  Overview of the proposed GALS-GAN
Figure 2.  Illustration of a single graph convolution layer
Figure 3.  Architecture of the MLP
Figure 4.  Inferring process of the mask predictor
Figure 5.  Architecture of the local-specific generator
Figure 6.  Architecture of the multi-scale discriminators
Figure 7.  Images generated by different level generators
Figure 8.  Qualitative examples generated by our GALS-GAN based on the COCO-Stuff dataset
Figure 9.  Qualitative examples generated by our GALS-GAN based on the Visual Genome dataset
Figure 10.  Qualitative comparison of different models
Figure 11.  An example of manipulating the synthesized image
Figure 12.  Example results of different image manipulation types
Figure 13.  Ablation study of the global-affine generator
Figure 14.  Ablation study of the local-specific generator
Table 1.  Statistics of COCO-Stuff and Visual Genome datasets
datasets train val test categories max min
COCO-Stuff 74121 1024 2048 171 8 3
Visual Genome 62565 5506 5088 178 30 3
datasets train val test categories max min
COCO-Stuff 74121 1024 2048 171 8 3
Visual Genome 62565 5506 5088 178 30 3
Table 2.  Quantitative comparison of images generated by different methods on the COCO-Stuff dataset
Methods IS $ \uparrow $ FID $ \downarrow $
64 $ \times $ 64 128 $ \times $ 128 64 $ \times $ 64 128$ \times $ 128
sg2im [10] 6.7$ \pm $0.1 5.99$ \pm $0.27 67.99 95.18
stacking-GANs [36] 9.1$ \pm $0.20 12.01$ \pm $0.40 50.94 39.78
PasteGAN [19] 9.2$ \pm $0.32 - 42.30 -
PasteGAN (GT layout) [19] 10.20$ \pm $0.20 - 34.30 -
ours 9.85$ \pm $0.15 13.82$ \pm $0.30 38.29 29.62
Methods IS $ \uparrow $ FID $ \downarrow $
64 $ \times $ 64 128 $ \times $ 128 64 $ \times $ 64 128$ \times $ 128
sg2im [10] 6.7$ \pm $0.1 5.99$ \pm $0.27 67.99 95.18
stacking-GANs [36] 9.1$ \pm $0.20 12.01$ \pm $0.40 50.94 39.78
PasteGAN [19] 9.2$ \pm $0.32 - 42.30 -
PasteGAN (GT layout) [19] 10.20$ \pm $0.20 - 34.30 -
ours 9.85$ \pm $0.15 13.82$ \pm $0.30 38.29 29.62
Table 3.  Quantitative comparison of images generated by different methods on Visual Genome dataset
Methods IS $ \uparrow $ FID $ \downarrow $
64 $ \times $ 64 128 $ \times $ 128 64 $ \times $ 64 128$ \times $ 128
sg2im [10] 5.5$ \pm $0.10 4.78$ \pm $0.15 73.79 70.40
stacking-GANs [36] 6.90$ \pm $0.20 9.24$ \pm $0.41 59.53 50.19
PasteGAN [19] 7.97$ \pm $0.30 - 58.37 -
PasteGAN (GT layout) [19] 9.15$ \pm $0.20 - 34.91 -
ours 8.87$ \pm $0.15 11.20$ \pm $0.55 39.25 29.94
Methods IS $ \uparrow $ FID $ \downarrow $
64 $ \times $ 64 128 $ \times $ 128 64 $ \times $ 64 128$ \times $ 128
sg2im [10] 5.5$ \pm $0.10 4.78$ \pm $0.15 73.79 70.40
stacking-GANs [36] 6.90$ \pm $0.20 9.24$ \pm $0.41 59.53 50.19
PasteGAN [19] 7.97$ \pm $0.30 - 58.37 -
PasteGAN (GT layout) [19] 9.15$ \pm $0.20 - 34.91 -
ours 8.87$ \pm $0.15 11.20$ \pm $0.55 39.25 29.94
Table 4.  Comparison of classification accuracy
Methods Classification Accuracy Score
COCO-Stuff Visual Genome
64 $ \times $ 64 128 $ \times $ 128 64 $ \times $ 64 128$ \times $ 128
sg2im [10] 28.8 24.1 26.7 23.4
stacking-GANs [36] 33.9 31.2 32.7 30.3
PasteGAN [19] 40.3 - 38.7 -
ours 46.1 44.6 45.4 43.5
Methods Classification Accuracy Score
COCO-Stuff Visual Genome
64 $ \times $ 64 128 $ \times $ 128 64 $ \times $ 64 128$ \times $ 128
sg2im [10] 28.8 24.1 26.7 23.4
stacking-GANs [36] 33.9 31.2 32.7 30.3
PasteGAN [19] 40.3 - 38.7 -
ours 46.1 44.6 45.4 43.5
Table 5.  Quantitative comparison of predicted semantic layouts
Methods R@0.3 R@0.5
COCO-Stuff Visual Genome COCO-Stuff Visual Genome
sg2im [10] 52.4 21.9 32.2 10.6
stacking-GANs [36] 65.3 35.0 49.1 23.2
PasteGAN [19] 71.2 45.2 62.4 33.8
ours 80.7 48.4 66.2 36.5
Methods R@0.3 R@0.5
COCO-Stuff Visual Genome COCO-Stuff Visual Genome
sg2im [10] 52.4 21.9 32.2 10.6
stacking-GANs [36] 65.3 35.0 49.1 23.2
PasteGAN [19] 71.2 45.2 62.4 33.8
ours 80.7 48.4 66.2 36.5
Table 6.  Ablation study of GALS-GAN different architectures
Architectures IS $ \uparrow $ FID $ \downarrow $
w/o $ G_{g-a} $ 7.52$ \pm $0.40 78.94
w/o $ G_{l-s} $ 11.30$ \pm $0.12 46.83
full model 13.82$ \pm $0.30 29.62
Architectures IS $ \uparrow $ FID $ \downarrow $
w/o $ G_{g-a} $ 7.52$ \pm $0.40 78.94
w/o $ G_{l-s} $ 11.30$ \pm $0.12 46.83
full model 13.82$ \pm $0.30 29.62
[1]

Liu Hui, Lin Zhi, Waqas Ahmad. Network(graph) data research in the coordinate system. Mathematical Foundations of Computing, 2018, 1 (1) : 1-10. doi: 10.3934/mfc.2018001

[2]

Deena Schmidt, Janet Best, Mark S. Blumberg. Random graph and stochastic process contributions to network dynamics. Conference Publications, 2011, 2011 (Special) : 1279-1288. doi: 10.3934/proc.2011.2011.1279

[3]

Rajendra K C Khatri, Brendan J Caseria, Yifei Lou, Guanghua Xiao, Yan Cao. Automatic extraction of cell nuclei using dilated convolutional network. Inverse Problems & Imaging, 2021, 15 (1) : 27-40. doi: 10.3934/ipi.2020049

[4]

Yang Mi, Kang Zheng, Song Wang. Homography estimation along short videos by recurrent convolutional regression network. Mathematical Foundations of Computing, 2020, 3 (2) : 125-140. doi: 10.3934/mfc.2020014

[5]

Zhuwei Qin, Fuxun Yu, Chenchen Liu, Xiang Chen. How convolutional neural networks see the world --- A survey of convolutional neural network visualization methods. Mathematical Foundations of Computing, 2018, 1 (2) : 149-180. doi: 10.3934/mfc.2018008

[6]

Yuantian Xia, Juxiang Zhou, Tianwei Xu, Wei Gao. An improved deep convolutional neural network model with kernel loss function in image classification. Mathematical Foundations of Computing, 2020, 3 (1) : 51-64. doi: 10.3934/mfc.2020005

[7]

Editorial Office. Retraction: Honggang Yu, An efficient face recognition algorithm using the improved convolutional neural network. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 901-901. doi: 10.3934/dcdss.2019060

[8]

G. Calafiore, M.C. Campi. A learning theory approach to the construction of predictor models. Conference Publications, 2003, 2003 (Special) : 156-166. doi: 10.3934/proc.2003.2003.156

[9]

Roberto De Leo, James A. Yorke. The graph of the logistic map is a tower. Discrete & Continuous Dynamical Systems, 2021, 41 (11) : 5243-5269. doi: 10.3934/dcds.2021075

[10]

Xumei Zhang, Jiafeng Yuan, Bin Dan, Ronghua Sui, Wenbo Li. The evolution mechanism of the multi-value chain network ecosystem supported by the third-party platform. Journal of Industrial & Management Optimization, 2021  doi: 10.3934/jimo.2021148

[11]

Yu-Hao Liang, Wan-Rou Wu, Jonq Juang. Fastest synchronized network and synchrony on the Julia set of complex-valued coupled map lattices. Discrete & Continuous Dynamical Systems - B, 2016, 21 (1) : 173-184. doi: 10.3934/dcdsb.2016.21.173

[12]

Mingyuan Mao, Hewei Zhang, Simeng Li, Baochang Zhang. SEMANTIC-RTAB-MAP (SRM): A semantic SLAM system with CNNs on depth images. Mathematical Foundations of Computing, 2019, 2 (1) : 29-41. doi: 10.3934/mfc.2019003

[13]

Abdolhossein Sadrnia, Amirreza Payandeh Sani, Najme Roghani Langarudi. Sustainable closed-loop supply chain network optimization for construction machinery recovering. Journal of Industrial & Management Optimization, 2021, 17 (5) : 2389-2414. doi: 10.3934/jimo.2020074

[14]

Ziang Long, Penghang Yin, Jack Xin. Global convergence and geometric characterization of slow to fast weight evolution in neural network training for classifying linearly non-separable data. Inverse Problems & Imaging, 2021, 15 (1) : 41-62. doi: 10.3934/ipi.2020077

[15]

Jiangtao Mo, Liqun Qi, Zengxin Wei. A network simplex algorithm for simple manufacturing network model. Journal of Industrial & Management Optimization, 2005, 1 (2) : 251-273. doi: 10.3934/jimo.2005.1.251

[16]

Konstantin Avrachenkov, Giovanni Neglia, Vikas Vikram Singh. Network formation games with teams. Journal of Dynamics & Games, 2016, 3 (4) : 303-318. doi: 10.3934/jdg.2016016

[17]

Joanna Tyrcha, John Hertz. Network inference with hidden units. Mathematical Biosciences & Engineering, 2014, 11 (1) : 149-165. doi: 10.3934/mbe.2014.11.149

[18]

T. S. Evans, A. D. K. Plato. Network rewiring models. Networks & Heterogeneous Media, 2008, 3 (2) : 221-238. doi: 10.3934/nhm.2008.3.221

[19]

David J. Aldous. A stochastic complex network model. Electronic Research Announcements, 2003, 9: 152-161.

[20]

Pradeep Dubey, Rahul Garg, Bernard De Meyer. Competing for customers in a social network. Journal of Dynamics & Games, 2014, 1 (3) : 377-409. doi: 10.3934/jdg.2014.1.377

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]