# American Institute of Mathematical Sciences

doi: 10.3934/ipi.2020046

## Adversarial defense via the data-dependent activation, total variation minimization, and adversarial training

 1 Department of Mathematics, Scientific Computing and Imaging Institute, University of Utah, Salt Lake City, UT, 84112-0090, USA 2 Department of Mathematics, University of California, Los Angeles, Los Angeles, CA, 90095, USA 3 Department of Mathematics, Duke University, Durham, NC, 27708, USA

Received  November 2019 Revised  April 2020 Published  August 2020

We improve the robustness of Deep Neural Net (DNN) to adversarial attacks by using an interpolating function as the output activation. This data-dependent activation remarkably improves both the generalization and robustness of DNN. In the CIFAR10 benchmark, we raise the robust accuracy of the adversarially trained ResNet20 from $\sim 46\%$ to $\sim 69\%$ under the state-of-the-art Iterative Fast Gradient Sign Method (IFGSM) based adversarial attack. When we combine this data-dependent activation with total variation minimization on adversarial images and training data augmentation, we achieve an improvement in robust accuracy by 38.9$\%$ for ResNet56 under the strongest IFGSM attack. Furthermore, We provide an intuitive explanation of our defense by analyzing the geometry of the feature space.

Citation: Bao Wang, Alex Lin, Penghang Yin, Wei Zhu, Andrea L. Bertozzi, Stanley J. Osher. Adversarial defense via the data-dependent activation, total variation minimization, and adversarial training. Inverse Problems & Imaging, doi: 10.3934/ipi.2020046
##### References:

show all references

##### References:
Training and testing procedures of the DNN with softmax and WNLL functions as the output activation layer. (a) and (b) show the training and testing steps for the standard DNN, respectively; (c) and (d) illustrate the training and testing procedure of the WNLL activated DNN, respectively
Samples from CIFAR10. Panel (a): from the top to the last rows show the original, adversarial images by attacking ResNet56 with FGSM and IFGSM ($\epsilon = 0.02$); and by attacking ResNet56-WNLL. Panel (b) corresponding to those in panel (a) with $\epsilon = 0.08$. Charts (c) and (d) corresponding to the TV minimized images in (a) and (b), respectively
$\epsilon$ v.s. accuracy without defense, and defending by WNLL activation, TVM and augmented training. (a) and (b) plot results for FGSM and IFGSM attack, respectively
Epochs v.s. accuracy of ResNet56 on CIFAR10. (a): without the additional FC layer; (b): with the additional FC layer
Visualization of the features learned by DNN with softmax ((a), (b), (c), (d)) and WNLL ((e), (f), (g), (h)) activation functions. (a) and (b) plot the 2D features of the original and adversarial testing images; (c) and (d) are the first two principle components of the 64D features for the original and adversarial testing images, respectively. Charts (e), (f) plot the first two components of the training and testing features learned by ResNet56-WNLL; (g) and (h) show the two principle components of the adversarial and TV minimized adversarial images for the test set
(a): $\#$IFGSM iterations v.s. accuracy for the ResNet20 and the ResNet20-WNLL trained with PGD adversarial training. (b):$\epsilon$ v.s. accuracy for the ResNet20 and the ResNet20-WNLL trained with PGD adversarial training
Running time and GPU memory for ResNet20 with two different activation functions
 Training time Testing time Memory ResNet20 3925.6 (s) 0.657 (s) 1007 (MB) ResNet20-WNLL 7378.4 (s) 14.09 (s) 1563 (MB)
 Training time Testing time Memory ResNet20 3925.6 (s) 0.657 (s) 1007 (MB) ResNet20-WNLL 7378.4 (s) 14.09 (s) 1563 (MB)
Mutual classification accuracy on the adversarial images crafted by using FGSM and IFGSM to attack ResNet56 and ResNet56-WNLL. (Unit: $\%$)
 Attack Training data $\epsilon=0$ $\epsilon=0.02$ $\epsilon=0.04$ $\epsilon=0.06$ $\epsilon=0.08$ $\epsilon=0.1$ Accuracy of ResNet56 on adversarial images crafted by attacking ResNet56-WNLL FGSM Original data 93.0 69.8 56.9 44.6 34.6 28.3 FGSM TVM data 88.3 51.5 37.9 30.1 24.7 20.9 FGSM Original + TVM 93.1 78.5 70.9 64.6 59.8 55.8 IFGSM Original data 93.0 5.22 5.73 6.73 7.55 8.55 IFGSM TVM data 88.3 7.00 6.82 8.30 9.28 10.7 IFGSM Original + TVM 93.1 27.3 28.6 29.5 29.1 29.4 Accuracy of ResNet56-WNLL on adversarial images crafted by attacking ResNet56 FGSM Original data 94.5 65.2 49.0 39.3 32.8 28.3 FGSM TVM data 90.6 45.9 30.9 22.2 16.9 13.8 FGSM Original + TVM data 94.7 78.3 68.2 61.1 56.5 52.5 IFGSM Original data 94.5 3.37 3.71 3.54 4.69 6.41 IFGSM TVM data 90.6 7.88 7.51 7.58 8.07 9.67 IFGSM Original + TVM data 94.7 34.3 33.4 33.1 34.6 35.8
 Attack Training data $\epsilon=0$ $\epsilon=0.02$ $\epsilon=0.04$ $\epsilon=0.06$ $\epsilon=0.08$ $\epsilon=0.1$ Accuracy of ResNet56 on adversarial images crafted by attacking ResNet56-WNLL FGSM Original data 93.0 69.8 56.9 44.6 34.6 28.3 FGSM TVM data 88.3 51.5 37.9 30.1 24.7 20.9 FGSM Original + TVM 93.1 78.5 70.9 64.6 59.8 55.8 IFGSM Original data 93.0 5.22 5.73 6.73 7.55 8.55 IFGSM TVM data 88.3 7.00 6.82 8.30 9.28 10.7 IFGSM Original + TVM 93.1 27.3 28.6 29.5 29.1 29.4 Accuracy of ResNet56-WNLL on adversarial images crafted by attacking ResNet56 FGSM Original data 94.5 65.2 49.0 39.3 32.8 28.3 FGSM TVM data 90.6 45.9 30.9 22.2 16.9 13.8 FGSM Original + TVM data 94.7 78.3 68.2 61.1 56.5 52.5 IFGSM Original data 94.5 3.37 3.71 3.54 4.69 6.41 IFGSM TVM data 90.6 7.88 7.51 7.58 8.07 9.67 IFGSM Original + TVM data 94.7 34.3 33.4 33.1 34.6 35.8
Mutual classification accuracy on the adversarial images crafted by using CW-L2 to attack ResNet56 and ResNet56-WNLL. (Unit: $\%$)
 Training data Original data TVM data Original + TVM data Exp-Ⅰ 52.1 43.2 80.0 Exp-Ⅱ 59.7 41.1 80.1
 Training data Original data TVM data Original + TVM data Exp-Ⅰ 52.1 43.2 80.0 Exp-Ⅱ 59.7 41.1 80.1
Testing accuracy on the adversarial/TVM adversarial CIFAR10 dataset. The testing accuracy with no defense is in red italic; and the results with all three defenses are in boldface. (Unit: $\%$)
 Training data Original data TVM data Original + TVM data ResNet56 4.94/32.2 11.8/54.0 15.1/52.4 ResNet56-WNLL 18.3/35.2 15.0/53.9 28/54.5
 Training data Original data TVM data Original + TVM data ResNet56 4.94/32.2 11.8/54.0 15.1/52.4 ResNet56-WNLL 18.3/35.2 15.0/53.9 28/54.5
Testing accuracy on the adversarial/TVM adversarial CIFAR10 dataset. The testing accuracy with no defense is in red italic; and the results with all three defenses are in boldface. (Unit: $\%$)
 Attack Training data $\epsilon=0$ $\epsilon=0.02$ $\epsilon=0.04$ $\epsilon=0.06$ $\epsilon=0.08$ $\epsilon=0.1$ ResNet56 FGSM Original data 93.0 36.9/19.4 29.6/18.9 26.1/18.4 23.1/17.9 20.5/17.1 FGSM TVM data 88.3 27.4/50.4 19.1/47.2 16.6/43.7 15.0/38.9 13.7/35.0 FGSM Original + TVM 93.1 48.6/51.1 42.0/47.6 39.1/44.2 37.1/41.8 35.6/39.1 IFGSM Original data 93.0 0/16.6 0/16.1 0.02/15.9 0.1/15.5 0.25/16.1 IFGSM TVM data 88.3 0.01/43.4 0/42.5 0.02/42.4 0.18/42.7 0.49/42.4 IFGSM Original + TVM 93.1 0.1/38.4 0.09/37.9 0.36/37.9 0.84/37.6 1.04/37.9 ResNet56-WNLL FGSM Original data 94.5 58.5/26.0 50.1/25.4 42.3/25.5 35.7/24.9 29.2/22.9 FGSM TVM data 90.6 31.5/52.6 24.5/49.6 20.2/45.3 17.3/41.6 14.4/37.5 FGSM Original + TVM 94.7 60.5/ 55.4 56.7/52.0 55.3/48.6 53.2/45.9 50.1/43.7 IFGSM Original data 94.5 0.49/16.7 0.14/17.3 0.3/16.9 1.01/16.6 0.94/16.5 IFGSM TVM data 90.6 0.61/37.3 0.43/36.3 0.63/35.9 0.87/35.9 1.19/35.5 IFGSM Original + TVM 94.7 0.19/38.5 0.3/39.4 0.63/ 40.1 1.26/ 38.9 1.72/ 39.1
 Attack Training data $\epsilon=0$ $\epsilon=0.02$ $\epsilon=0.04$ $\epsilon=0.06$ $\epsilon=0.08$ $\epsilon=0.1$ ResNet56 FGSM Original data 93.0 36.9/19.4 29.6/18.9 26.1/18.4 23.1/17.9 20.5/17.1 FGSM TVM data 88.3 27.4/50.4 19.1/47.2 16.6/43.7 15.0/38.9 13.7/35.0 FGSM Original + TVM 93.1 48.6/51.1 42.0/47.6 39.1/44.2 37.1/41.8 35.6/39.1 IFGSM Original data 93.0 0/16.6 0/16.1 0.02/15.9 0.1/15.5 0.25/16.1 IFGSM TVM data 88.3 0.01/43.4 0/42.5 0.02/42.4 0.18/42.7 0.49/42.4 IFGSM Original + TVM 93.1 0.1/38.4 0.09/37.9 0.36/37.9 0.84/37.6 1.04/37.9 ResNet56-WNLL FGSM Original data 94.5 58.5/26.0 50.1/25.4 42.3/25.5 35.7/24.9 29.2/22.9 FGSM TVM data 90.6 31.5/52.6 24.5/49.6 20.2/45.3 17.3/41.6 14.4/37.5 FGSM Original + TVM 94.7 60.5/ 55.4 56.7/52.0 55.3/48.6 53.2/45.9 50.1/43.7 IFGSM Original data 94.5 0.49/16.7 0.14/17.3 0.3/16.9 1.01/16.6 0.94/16.5 IFGSM TVM data 90.6 0.61/37.3 0.43/36.3 0.63/35.9 0.87/35.9 1.19/35.5 IFGSM Original + TVM 94.7 0.19/38.5 0.3/39.4 0.63/ 40.1 1.26/ 38.9 1.72/ 39.1
 [1] Martin Benning, Elena Celledoni, Matthias J. Ehrhardt, Brynjulf Owren, Carola-Bibiane Schönlieb. Deep learning as optimal control problems: Models and numerical methods. Journal of Computational Dynamics, 2019, 6 (2) : 171-198. doi: 10.3934/jcd.2019009 [2] H. N. Mhaskar, T. Poggio. Function approximation by deep networks. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4085-4095. doi: 10.3934/cpaa.2020181 [3] Jean Dolbeault, Maria J. Esteban, Michał Kowalczyk, Michael Loss. Improved interpolation inequalities on the sphere. Discrete & Continuous Dynamical Systems - S, 2014, 7 (4) : 695-724. doi: 10.3934/dcdss.2014.7.695 [4] Charles Fefferman. Interpolation by linear programming I. Discrete & Continuous Dynamical Systems - A, 2011, 30 (2) : 477-492. doi: 10.3934/dcds.2011.30.477 [5] Jerry L. Bona, Henrik Kalisch. Models for internal waves in deep water. Discrete & Continuous Dynamical Systems - A, 2000, 6 (1) : 1-20. doi: 10.3934/dcds.2000.6.1 [6] Scott G. McCalla. Paladins as predators: Invasive waves in a spatial evolutionary adversarial game. Discrete & Continuous Dynamical Systems - B, 2014, 19 (5) : 1437-1457. doi: 10.3934/dcdsb.2014.19.1437 [7] Anca-Voichita Matioc. On particle trajectories in linear deep-water waves. Communications on Pure & Applied Analysis, 2012, 11 (4) : 1537-1547. doi: 10.3934/cpaa.2012.11.1537 [8] Jiequn Han, Jihao Long. Convergence of the deep BSDE method for coupled FBSDEs. Probability, Uncertainty and Quantitative Risk, 2020, 5 (0) : 5-. doi: 10.1186/s41546-020-00047-w [9] Yvon Maday, Ngoc Cuong Nguyen, Anthony T. Patera, S. H. Pau. A general multipurpose interpolation procedure: the magic points. Communications on Pure & Applied Analysis, 2009, 8 (1) : 383-404. doi: 10.3934/cpaa.2009.8.383 [10] Alan Beggs. Learning in monotone bayesian games. Journal of Dynamics & Games, 2015, 2 (2) : 117-140. doi: 10.3934/jdg.2015.2.117 [11] Yangyang Xu, Wotao Yin, Stanley Osher. Learning circulant sensing kernels. Inverse Problems & Imaging, 2014, 8 (3) : 901-923. doi: 10.3934/ipi.2014.8.901 [12] Mauro Maggioni, James M. Murphy. Learning by active nonlinear diffusion. Foundations of Data Science, 2019, 1 (3) : 271-291. doi: 10.3934/fods.2019012 [13] Nicolás M. Crisosto, Christopher M. Kribs-Zaleta, Carlos Castillo-Chávez, Stephen Wirkus. Community resilience in collaborative learning. Discrete & Continuous Dynamical Systems - B, 2010, 14 (1) : 17-40. doi: 10.3934/dcdsb.2010.14.17 [14] Christian Soize, Roger Ghanem. Probabilistic learning on manifolds. Foundations of Data Science, 2020  doi: 10.3934/fods.2020013 [15] Mats Ehrnström. Deep-water waves with vorticity: symmetry and rotational behaviour. Discrete & Continuous Dynamical Systems - A, 2007, 19 (3) : 483-491. doi: 10.3934/dcds.2007.19.483 [16] Hyeontae Jo, Hwijae Son, Hyung Ju Hwang, Eun Heui Kim. Deep neural network approach to forward-inverse problems. Networks & Heterogeneous Media, 2020, 15 (2) : 247-259. doi: 10.3934/nhm.2020011 [17] L’ubomír Baňas, Amy Novick-Cohen, Robert Nürnberg. The degenerate and non-degenerate deep quench obstacle problem: A numerical comparison. Networks & Heterogeneous Media, 2013, 8 (1) : 37-64. doi: 10.3934/nhm.2013.8.37 [18] Hongfei Yang, Xiaofeng Ding, Raymond Chan, Hui Hu, Yaxin Peng, Tieyong Zeng. A new initialization method based on normed statistical spaces in deep networks. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020045 [19] Maxime Breden, Jean-Philippe Lessard. Polynomial interpolation and a priori bootstrap for computer-assisted proofs in nonlinear ODEs. Discrete & Continuous Dynamical Systems - B, 2018, 23 (7) : 2825-2858. doi: 10.3934/dcdsb.2018164 [20] Anita Mayo. Accurate two and three dimensional interpolation for particle mesh calculations. Discrete & Continuous Dynamical Systems - B, 2012, 17 (4) : 1205-1228. doi: 10.3934/dcdsb.2012.17.1205

2019 Impact Factor: 1.373