Article Contents
Article Contents

# How convolutional neural networks see the world --- A survey of convolutional neural network visualization methods

• * Corresponding author: Xiang Chen
The authors are supported by NSF Grant CNS-1717775.
• Nowadays, the Convolutional Neural Networks (CNNs) have achieved impressive performance on many computer vision related tasks, such as object detection, image recognition, image retrieval, etc. These achievements benefit from the CNNs' outstanding capability to learn the input features with deep layers of neuron structures and iterative training process. However, these learned features are hard to identify and interpret from a human vision perspective, causing a lack of understanding of the CNNs' internal working mechanism. To improve the CNN interpretability, the CNN visualization is well utilized as a qualitative analysis method, which translates the internal features into visually perceptible patterns. And many CNN visualization works have been proposed in the literature to interpret the CNN in perspectives of network structure, operation, and semantic concept.

In this paper, we expect to provide a comprehensive survey of several representative CNN visualization methods, including Activation Maximization, Network Inversion, Deconvolutional Neural Networks (DeconvNet), and Network Dissection based visualization. These methods are presented in terms of motivations, algorithms, and experiment results. Based on these visualization methods, we also discuss their practical applications to demonstrate the significance of the CNN interpretability in areas of network design, optimization, security enhancement, etc.

Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.

 Citation:

• Figure 1.  CaffeNet architecture

Figure 2.  Convolutional and max-pooling process

Figure 3.  Human vision and CNNs visualization

Figure 4.  First layer of CaffeNet visualized by Activation Maximization

Figure 5.  Hidden layers of CaffeNet visualization by Activation Maximization. Adapted from "Understanding Neural Networks Through Deep Visualization," by J. Yosinski, 2015

Figure 6.  Output layer of CaffeNet visualized by Activation Maximization

Figure 7.  The structure of the Deconvolutional Network

Figure 8.  CaffeNet visualized by DeconvNet

Figure 9.  First and second layer visualization of AlexNet and ZFNet Adapted from "Visualizing and Understanding Convolutional Networks," by M.D. Zeiler, 2014

Figure 10.  Feature evolution during training ZFNet. Adapted from "Visualizing and Understanding Convolutional Networks," by M.D. Zeiler, 2014

Figure 11.  The data flow of the two Network Inversion algorithms

Figure 12.  AlexNet reconstruction by Network Inversion with regularizer and UpconvNet. Adapted from "Inverting Visual Representations with Convolutional Networks," by A. Dosovitskiy, 2016

Figure 13.  AlexNet reconstruction by perturbing the feature maps. Adapted from "Inverting Visual Representations with Convolutional Networks," by A. Dosovitskiy, 2016

Figure 14.  The Broden images that activate certain neurons in AlexNet

Figure 15.  Illustration of network dissection for measuring semantic alignment of neuron in a given CNN. Adapted from "Network Dissection: Quantifying Interpretability of Deep Visual Representations," by D. Bau, 2017

Figure 16.  AlexNet visualization by Network Dissection

Figure 17.  Semantic concept emerging in each layers and under different training conditions

Figure 18.  Network Dissection with single neuron and neuron combinations. Adapted from "Net2Vec: Quantifying and Explaining how Concepts are Encoded by Filters in Deep Neural Networks," by R. Fong, 2018

Figure 19.  Adversarial noises that manipulate the CNN classification

Figure 21.  Style transfer example

Table 1.  Visualization methods

 Method Interpretation Perspective Focused Layer Applied Network Representative Study Activation Maximization Individual Neuron with visualized pattern CLsFLs Auto-Encoder, DBN, AlexNet [26] Deconvolutional Neural Networks Neuron activation in input image CLs AlexNet [55] Network Inversion One layer CLsFLs HOG, SIFT, LBD, Bag of words, CaffeNet [29][64] Network Dissection Individual Neuron with semantic concept CLs AlexNet, VGG, GoogLeNet, ResNet [32][70]

Figures(21)

Tables(1)