\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

How convolutional neural networks see the world --- A survey of convolutional neural network visualization methods

  • * Corresponding author: Xiang Chen

    * Corresponding author: Xiang Chen
The authors are supported by NSF Grant CNS-1717775.
Abstract Full Text(HTML) Figure(21) / Table(1) Related Papers Cited by
  • Nowadays, the Convolutional Neural Networks (CNNs) have achieved impressive performance on many computer vision related tasks, such as object detection, image recognition, image retrieval, etc. These achievements benefit from the CNNs' outstanding capability to learn the input features with deep layers of neuron structures and iterative training process. However, these learned features are hard to identify and interpret from a human vision perspective, causing a lack of understanding of the CNNs' internal working mechanism. To improve the CNN interpretability, the CNN visualization is well utilized as a qualitative analysis method, which translates the internal features into visually perceptible patterns. And many CNN visualization works have been proposed in the literature to interpret the CNN in perspectives of network structure, operation, and semantic concept.

    In this paper, we expect to provide a comprehensive survey of several representative CNN visualization methods, including Activation Maximization, Network Inversion, Deconvolutional Neural Networks (DeconvNet), and Network Dissection based visualization. These methods are presented in terms of motivations, algorithms, and experiment results. Based on these visualization methods, we also discuss their practical applications to demonstrate the significance of the CNN interpretability in areas of network design, optimization, security enhancement, etc.

    Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  CaffeNet architecture

    Figure 2.  Convolutional and max-pooling process

    Figure 3.  Human vision and CNNs visualization

    Figure 4.  First layer of CaffeNet visualized by Activation Maximization

    Figure 5.  Hidden layers of CaffeNet visualization by Activation Maximization. Adapted from "Understanding Neural Networks Through Deep Visualization," by J. Yosinski, 2015

    Figure 6.  Output layer of CaffeNet visualized by Activation Maximization

    Figure 7.  The structure of the Deconvolutional Network

    Figure 8.  CaffeNet visualized by DeconvNet

    Figure 9.  First and second layer visualization of AlexNet and ZFNet Adapted from "Visualizing and Understanding Convolutional Networks," by M.D. Zeiler, 2014

    Figure 10.  Feature evolution during training ZFNet. Adapted from "Visualizing and Understanding Convolutional Networks," by M.D. Zeiler, 2014

    Figure 11.  The data flow of the two Network Inversion algorithms

    Figure 12.  AlexNet reconstruction by Network Inversion with regularizer and UpconvNet. Adapted from "Inverting Visual Representations with Convolutional Networks," by A. Dosovitskiy, 2016

    Figure 13.  AlexNet reconstruction by perturbing the feature maps. Adapted from "Inverting Visual Representations with Convolutional Networks," by A. Dosovitskiy, 2016

    Figure 14.  The Broden images that activate certain neurons in AlexNet

    Figure 15.  Illustration of network dissection for measuring semantic alignment of neuron in a given CNN. Adapted from "Network Dissection: Quantifying Interpretability of Deep Visual Representations," by D. Bau, 2017

    Figure 16.  AlexNet visualization by Network Dissection

    Figure 17.  Semantic concept emerging in each layers and under different training conditions

    Figure 18.  Network Dissection with single neuron and neuron combinations. Adapted from "Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks," by R. Fong, 2018

    Figure 19.  Adversarial noises that manipulate the CNN classification

    Figure 20.  Adversarial example visualization

    Figure 21.  Style transfer example

    Table 1.  Visualization methods

    Method Interpretation Perspective Focused Layer Applied Network Representative Study
    Activation Maximization Individual Neuron with visualized pattern CLs
    FLs
    Auto-Encoder, DBN, AlexNet [26]
    Deconvolutional Neural Networks Neuron activation in input image CLs AlexNet [55]
    Network Inversion One layer CLs
    FLs
    HOG, SIFT, LBD, Bag of words, CaffeNet [29][64]
    Network Dissection Individual Neuron with semantic concept CLs AlexNet, VGG, GoogLeNet, ResNet [32][70]
     | Show Table
    DownLoad: CSV
  • [1] P. Agrawal, R. Girshick and J. Malik, Analyzing the performance of multilayer neural networks for object recognition, in Proceedings of the European Conference on Computer Vision, 2014, 329-344. doi: 10.1007/978-3-319-10584-0_22.
    [2] M. Arjovsky, S. Chintala and L. Bottou, Wasserstein gan, arXiv preprint, arXiv: 1701.07875.
    [3] D. Bau, B. Zhou, A. Khosla, A. Oliva and A. Torralba, Network dissection: Quantifying interpretability of deep visual representations, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 3319-3327. doi: 10.1109/CVPR.2017.354.
    [4] D. C. Ciresan, U. Meier, J. Masci, L. Maria Gambardella and J. Schmidhuber, Flexible, High performance convolutional neural networks for image classification, in Proceedings of the International Joint Conference on Artificial Intelligence, vol. 22, 2011, p1237.
    [5] R. Collobert, K. Kavukcuoglu and C. Farabet, Torch7: A matlab-like environment for machine learning, in Workshop on BigLearn, NIPS, 2011.
    [6] G. Csurka, C. Dance, L. Fan, J. Willamowski and C. Bray, Visual categorization with bags of keypoints, in Workshop on statistical learning in computer vision, ECCV, vol. 1, 2004, 1-2.
    [7] N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005, 886-893. doi: 10.1109/CVPR.2005.177.
    [8] E. d'Angelo, A. Alahi and P. Vandergheynst, Beyond bits: Reconstructing images from local binary descriptors, in Proceedings of the IEEE Conference on Pattern Recognition, 2012, 935-938.
    [9] E. L. Denton, S. Chintala, R. Fergus et al., Deep generative image models using a Laplacian pyramid of adversarial networks, in Proceedings of the Advances in Neural Information Processing Systems, 2015, 1486-1494.
    [10] A. Dosovitskiy and T. Brox, Generating images with perceptual similarity metrics based on deep networks, in Proceedings of the Advances in Neural Information Processing Systems, 2016, 658-666.
    [11] A. Dosovitskiy and T. Brox, Inverting visual representations with convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 4829-4837. doi: 10.1109/CVPR.2016.522.
    [12] A. Dosovitskiy, J. Tobias Springenberg and T. Brox, Learning to generate chairs with convolutional neural networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 1538-1546. doi: 10.1109/CVPR.2015.7298761.
    [13] J. DuchiE. Hazan and Y. Singer, Adaptive subgradient methods for online learning and stochastic optimization, Journal of Machine Learning Research, 12 (2011), 2121-2159. 
    [14] D. ErhanY. BengioA. Courville and P. Vincent, Visualizing higher-layer features of a deep network, Technical report, University of Montreal, (2009), p3. 
    [15] P. F. FelzenszwalbR. B. GirshickD. McAllester and D. Ramanan, Object detection with discriminatively trained part-based models, IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (2010), 1627-1645.  doi: 10.1109/TPAMI.2009.167.
    [16] R. Fong and A. Vedaldi, Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks, arXiv preprint, arXiv: 1801.03454.
    [17] L. A. Gatys, A. S. Ecker and M. Bethge, A neural algorithm of artistic style, Journal of Vision, 16 (2016), p326, arXiv: 1508.06576. doi: 10.1167/16.12.326.
    [18] L. A. Gatys, A. S. Ecker and M. Bethge, Texture synthesis and the controlled generation of natural stimuli using convolutional neural networks, arXiv preprint, arXiv: 1505.07376, 12.
    [19] R. B. Girshick, P. F. Felzenszwalb and D. McAllester, Discriminatively trained deformable part models, release 5, http://people.cs.uchicago.edu/~rbg/latent-release5/.
    [20] R. Girshick, J. Donahue, T. Darrell and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, 580-587. doi: 10.1109/CVPR.2014.81.
    [21] X. Glorot and Y. Bengio, Understanding the difficulty of training deep feedforward neural networks, in Proceedings of the International Conference on Artificial Intelligence and Statistics, 2010, 249-256.
    [22] Y. Gong, L. Wang, R. Guo and S. Lazebnik, Multi-scale orderless pooling of deep convolutional activation features, in Proceedings of the European Conference on Computer Vision, 2014, 392-407. doi: 10.1007/978-3-319-10584-0_26.
    [23] A. Gonzalez-GarciaD. Modolo and V. Ferrari, Do semantic parts emerge in convolutional neural networks?, International Journal of Computer Vision, 126 (2018), 476-494.  doi: 10.1007/s11263-017-1048-0.
    [24] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio, Generative adversarial nets, in Proceedings of the Advances in Neural Information Processing Systems, 2014, 2672-2680.
    [25] A. Gordo, J. Almazán, J. Revaud and D. Larlus, Deep image retrieval: Learning global representations for image search, in Proceedings of the European Conference on Computer Vision, Springer, 2016, 241-257. doi: 10.1007/978-3-319-46466-4_15.
    [26] S. Han, H. Mao and W. J. Dally, Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding, arXiv preprint, arXiv: 1510.00149.
    [27] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2016, 770-778. doi: 10.1109/CVPR.2016.90.
    [28] G. E. HintonS. Osindero and Y.-W. Teh, A fast learning algorithm for deep belief nets, Neural Computation, 18 (2006), 1527-1554.  doi: 10.1162/neco.2006.18.7.1527.
    [29] D. H. Hubel and T. N. Wiesel, Receptive fields and functional architecture of monkey striate cortex, The Journal of Physiology, 195 (1968), 215-243, URL http://dx.doi.org/10.1113/jphysiol.1968.sp008455. doi: 10.1113/jphysiol.1968.sp008455.
    [30] D. H. Hubel and T. N. Wiesel, Receptive fields of single neurones in the cat's striate cortex, The Journal of Physiology, 148 (1959), 574-591.  doi: 10.1113/jphysiol.1959.sp006308.
    [31] S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the International Conference on Machine Learning, 2015, 448-456.
    [32] S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, in Proceedings of the International Conference on Machine Learning, 2015, 448-456.
    [33] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama and T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in Proceedings of the International Conference on Multimedia, 2014, 675-678. doi: 10.1145/2647868.2654889.
    [34] G.-S. Kalanit and M. Rafael, The human visual cortex, Annual Review of Neuroscience, 27 (2004), 649-677. 
    [35] K. N. KayT. NaselarisR. J. Prenger and J. L. Gallant, Identifying natural images from human brain activity, Nature, 452 (2008), p352.  doi: 10.1038/nature06713.
    [36] A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, in Proceedings of the Advances in Neural Information Processing Systems, 2012, 1097-1150. doi: 10.1145/3065386.
    [37] N. KrugerP. JanssenS. KalkanM. LappeA. LeonardisJ. PiaterA. J. Rodriguez-Sanchez and L. Wiskott, Deep hierarchies in the primate visual cortex: What can we learn for computer vision?, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2013), 1847-1871.  doi: 10.1109/TPAMI.2012.272.
    [38] A. Kurakin, I. Goodfellow and S. Bengio, Adversarial examples in the physical world, arXiv preprint, arXiv: 1607.02533.
    [39] Y. LeCunL. BottouY. Bengio and P. Haffner, Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86 (1998), 2278-2324.  doi: 10.1109/5.726791.
    [40] Y. LeCun, C. Cortes and C. J. Burges, The mnist database of handwritten digits, 1998.
    [41] C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang et al., Photo-realistic single image super-resolution using a generative adversarial network, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017. doi: 10.1109/CVPR.2017.19.
    [42] H. Lee, C. Ekanadham and A. Y. Ng, Sparse deep belief net model for visual area v2, in Proceedings of the Advances in Neural Information Processing Systems, 2008, 873-880.
    [43] H. Lee, R. Grosse, R. Ranganath and A. Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in Proceedings of the International Conference on Machine Learning, 2009, 609-616. doi: 10.1145/1553374.1553453.
    [44] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár and C. Zitnick, Microsoft coco: common objects in context. corr abs/1405. 0312 (2014).
    [45] D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60 (2004), 91-110.  doi: 10.1023/B:VISI.0000029664.99615.94.
    [46] A. Mahendran and A. Vedaldi, Understanding deep image representations by inverting them, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2015, 5188-5196. doi: 10.1109/CVPR.2015.7299155.
    [47] A. Mahendran and A. Vedaldi, Visualizing deep convolutional neural networks using natural pre-images, International Journal of Computer Vision, 120 (2016), 233-255.  doi: 10.1007/s11263-016-0911-8.
    [48] M. ManassiB. Sayim and M. H. Herzog, When crowding of crowding leads to uncrowding, Journal of Vision, 13 (2013), 10-10. 
    [49] A. MordvintsevC. Olah and M. Tyka, Inceptionism: Going deeper into neural networks, Google Research Blog. Retrieved June, 20 (2015), 14pp. 
    [50] A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox and J. Clune, Synthesizing the preferred inputs for neurons in neural networks via deep generator networks, in Proceedings of the Advances in Neural Information Processing Systems, 2016, 3387-3395.
    [51] A. Nguyen, J. Yosinski and J. Clune, Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 427-436. doi: 10.1109/CVPR.2015.7298640.
    [52] A. Nguyen, J. Yosinski and J. Clune, Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks, arXiv preprint, arXiv: 1602.03616.
    [53] S. J. Pan and Q. Yang, A survey on transfer learning, IEEE Transactions on Knowledge and Data Engineering, 22 (2010), 1345-1359.  doi: 10.1109/TKDE.2009.191.
    [54] M. I. Posner and S. E. Petersen, The attention system of the human brain, Annual Review of Neuroscience, 13 (1990), 25-42. 
    [55] C. Poultney, S. Chopra, Y. L. Cun et al., Efficient learning of sparse representations with an energy-based model, in Proceedings of the Advances in Neural Information Processing Systems, 2007, 1137-1144.
    [56] N. Qian, On the momentum term in gradient descent learning algorithms, Neural Networks, 12 (1999), 145-151.  doi: 10.1016/S0893-6080(98)00116-6.
    [57] R. Q. Quiroga, L. Reddy, G. Kreiman, C. Koch and I. Fried., Invariant visual representation by single neurons in the human brain, Nature, 435 (2005), 1102-1107, URL http://dx.doi.org/10.1038/nature03687. doi: 10.1038/nature03687.
    [58] S. RenK. HeR. Girshick and J. Sun, Faster R-CNN: towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (2017), 1137-1149.  doi: 10.1109/TPAMI.2016.2577031.
    [59] L. I. RudinS. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268.  doi: 10.1016/0167-2789(92)90242-F.
    [60] H.-C. ShinH. R. RothM. GaoL. LuZ. XuI. NoguesJ. YaoD. Mollura and R. M. Summers, Deep convolutional neural networks for computer-aided detection: Cnn architectures, dataset characteristics and transfer learning, IEEE Transactions on Medical Imaging, 35 (2016), 1285-1298.  doi: 10.1109/TMI.2016.2528162.
    [61] D. SilverA. HuangC. J. MaddisonA. GuezL. SifreG. Van Den DriesscheJ. SchrittwieserI. AntonoglouV. Panneershelvam and M. Lanctot, et al., Mastering the game of go with deep neural networks and tree search, Nature, 529 (2016), 484-489.  doi: 10.1038/nature16961.
    [62] K. Simonyan, A. Vedaldi and A. Zisserman, Deep inside convolutional networks: Visualising image classification models and saliency maps, arXiv preprint, arXiv: 1312.6034.
    [63] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint, arXiv: 1409.1556.
    [64] J. Sivic and A. Zisserman, Video google: A text retrieval approach to object matching in videos, in Proceeding of Ninth IEEE International Conference on Computer Vision, 2003, 1470. doi: 10.1109/ICCV.2003.1238663.
    [65] N. SrivastavaG. E. HintonA. KrizhevskyI. Sutskever and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, 15 (2014), 1929-1958. 
    [66] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2015, 1-9. doi: 10.1109/CVPR.2015.7298594.
    [67] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow and R. Fergus, Intriguing properties of neural networks, arXiv preprint, arXiv: 1312.6199.
    [68] P. VincentH. LarochelleI. LajoieY. Bengio and P.-A. Manzagol, Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion, Journal of Machine Learning Research, 11 (2010), 3371-3408. 
    [69] L. WangY. Zhang and J. Feng, On the euclidean distance of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 27 (2005), 1334-1339. 
    [70] D. Wei, B. Zhou, A. Torrabla and W. Freeman, Understanding intra-class knowledge inside cnn, arXiv preprint, arXiv: 1507.02379.
    [71] J. Yosinski, J. Clune, A. Nguyen, T. Fuchs and H. Lipson, Understanding neural networks through deep visualization, arXiv preprint, arXiv: 1506.06579.
    [72] M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, in Proceedings of the European Conference on Computer Vision, 2014, 818-833. doi: 10.1007/978-3-319-10590-1_53.
    [73] M. D. Zeiler, D. Krishnan, G. W. Taylor and R. Fergus, Deconvolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, 2528-2535. doi: 10.1109/CVPR.2010.5539957.
    [74] M. D. Zeiler, G. W. Taylor and R. Fergus, Adaptive deconvolutional networks for mid and high level feature learning, in Proceedings of the IEEE International Conference on Computer Vision, 2011, 2018-2025. doi: 10.1109/ICCV.2011.6126474.
    [75] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva and A. Torralba, Object detectors emerge in deep scene CNNs, arXiv preprint, arXiv: 1412.6856.
  • 加载中

Figures(21)

Tables(1)

SHARE

Article Metrics

HTML views(6080) PDF downloads(1325) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return