A deep neural network with invertible hidden layers has a nice property of preserving all the information in the feature learning stage. In this paper, we analyse the hidden layers of residual rectifier neural networks, and investigate conditions for invertibility under which the hidden layers are invertible. A new fixed-point algorithm is developed to invert the hidden layers of residual networks. The proposed inverse algorithms are capable of inverting some residual networks which cannot be inverted by existing inverting algorithms. Furthermore, a special residual rectifier network is designed and trained on MNIST so that it can achieve comparable performance with the state-of-art performance while its hidden layers are invertible.
Citation: |
[1] | S. An, F. Boussaid and M. Bennamoun, How can deep rectifier networks achieve linear separability and preserve distances?, International Conference on Machine Learning, 2015,514–523. |
[2] | J. Behrmann, W. Grathwohl, R. T. Q. Chen, D. Duvenaud and J.-H. Jacobsen, Invertible residual networks, International Conference on Machine Learning, 2019,573–582. |
[3] | F. Chollet, Xception: Deep learning with depthwise separable convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 1251–1258. doi: 10.1109/CVPR.2017.195. |
[4] | M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin and N. Usunier, Parseval networks: Improving robustness to adversarial examples, in Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017,854–863. |
[5] | L. Dinh, D. Krueger and Y. Bengio, NICE: Non-linear independent components estimation, preprint, arXiv: 1410.8516. |
[6] | L. Dinh, J. Sohl-Dickstein and S. Bengio, Density estimation using real NVP, preprint, arXiv: 1605.08803. |
[7] | A. Dosovitskiy and T. Brox, Inverting visual representations with convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 4829–4837. doi: 10.1109/CVPR.2016.522. |
[8] | A. N. Gomez, M. Ren, R. Urtasun and R. B. Grosse, The reversible residual network: Backpropagation without storing activations, in Advances in Neural Information Processing Systems, 2017, 2214–2224. |
[9] | I. Goodfellow, Y. Bengio and A. Courville, Deep Learning. Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2016. |
[10] | A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto and H. Adam, MobileNets: Efficient convolutional neural networks for mobile vision applications, preprint, arXiv: 1704.04861. |
[11] | S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, preprint, arXiv: 1502.03167. |
[12] | J.-H. Jacobsen, A. Smeulders and E. Oyallon, $i$-RevNet: Deep invertible networks, preprint, arXiv: 1802.07088. |
[13] | D. P. Kingma and P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions, in Advances in Neural Information Processing Systems, 2018, 10215–10224. |
[14] | A. Mahendran and A. Vedaldi, Understanding deep image representations by inverting them, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 5188–5196. doi: 10.1109/CVPR.2015.7299155. |
[15] | E. Oyallon, Building a regular decision boundary with deep networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 5106–5114. doi: 10.1109/CVPR.2017.204. |
[16] | R. Prenger, R. Valle and B. Catanzaro, Waveglow: A flow-based generative network for speech synthesis, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, 3617–3621. doi: 10.1109/ICASSP.2019.8683143. |
[17] | A. Saberi, A. A. Stoorvogel and P. Sannuti, Inverse filtering and deconvolution, Internat. J. Robust Nonlinear Control, 11 (2001), 131-156. doi: 10.1002/rnc.553. |
[18] | R. Shwartz-Ziv and N. Tishby, Opening the black box of Deep Neural Networks via Information, preprint, arXiv: 1703.00810. |
[19] | T. F. van der Ouderaa and D. E. Worrall, Reversible GANS for memory-efficient image-to-image translation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, 4720–4728. |
[20] | J. Wang and L. Perez, The effectiveness of data augmentation in image classification using deep learning, preprint, arXiv: 1712.04621. |
[21] | M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, in Computer Vision–ECCV 2014, Lecture Notes in Computer Science, 8689, Springer, Cham, 2014,818–833. doi: 10.1007/978-3-319-10590-1_53. |
The proposed residual network architecture
Inverse of the rectifier linear transform: Invertible percentage of 500 cases changes along with
Inverse of the residual unit with the fully-connected layer: Invertible percentage of 500 cases changes along with
Comparison of recovered images to original digit images. The 1st row illustrates the original images, whereas the 2nd and 3rd rows show the recovered images from the proposed fixed-point method and the existing fixed-point method, respectively
Relative error rates (%) of the recovered images. One hundred samples per each class, in total 1000 samples, were chosen and recovered