• Previous Article
    The uses and abuses of an age-period-cohort method: On the linear algebra and statistical properties of intrinsic and related estimators
  • MFC Home
  • This Issue
  • Next Article
    On approximation to discrete q-derivatives of functions via q-Bernstein-Schurer operators
February  2021, 4(1): 31-44. doi: 10.3934/mfc.2020024

Fixed-point algorithms for inverse of residual rectifier neural networks

School of Electrical Engineering, Computing and Mathematical Sciences, Curtin University, Bentley, WA, Australia

* Corresponding author: Ruhua Wang

Received  August 2020 Revised  September 2020 Published  February 2021 Early access  October 2020

A deep neural network with invertible hidden layers has a nice property of preserving all the information in the feature learning stage. In this paper, we analyse the hidden layers of residual rectifier neural networks, and investigate conditions for invertibility under which the hidden layers are invertible. A new fixed-point algorithm is developed to invert the hidden layers of residual networks. The proposed inverse algorithms are capable of inverting some residual networks which cannot be inverted by existing inverting algorithms. Furthermore, a special residual rectifier network is designed and trained on MNIST so that it can achieve comparable performance with the state-of-art performance while its hidden layers are invertible.

Citation: Ruhua Wang, Senjian An, Wanquan Liu, Ling Li. Fixed-point algorithms for inverse of residual rectifier neural networks. Mathematical Foundations of Computing, 2021, 4 (1) : 31-44. doi: 10.3934/mfc.2020024
References:
[1]

S. An, F. Boussaid and M. Bennamoun, How can deep rectifier networks achieve linear separability and preserve distances?, International Conference on Machine Learning, 2015,514–523. Google Scholar

[2]

J. Behrmann, W. Grathwohl, R. T. Q. Chen, D. Duvenaud and J.-H. Jacobsen, Invertible residual networks, International Conference on Machine Learning, 2019,573–582. Google Scholar

[3]

F. Chollet, Xception: Deep learning with depthwise separable convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 1251–1258. doi: 10.1109/CVPR.2017.195.  Google Scholar

[4]

M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin and N. Usunier, Parseval networks: Improving robustness to adversarial examples, in Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017,854–863. Google Scholar

[5]

L. Dinh, D. Krueger and Y. Bengio, NICE: Non-linear independent components estimation, preprint, arXiv: 1410.8516. Google Scholar

[6]

L. Dinh, J. Sohl-Dickstein and S. Bengio, Density estimation using real NVP, preprint, arXiv: 1605.08803. Google Scholar

[7]

A. Dosovitskiy and T. Brox, Inverting visual representations with convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 4829–4837. doi: 10.1109/CVPR.2016.522.  Google Scholar

[8]

A. N. Gomez, M. Ren, R. Urtasun and R. B. Grosse, The reversible residual network: Backpropagation without storing activations, in Advances in Neural Information Processing Systems, 2017, 2214–2224. Google Scholar

[9] I. GoodfellowY. Bengio and A. Courville, Deep Learning. Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2016.   Google Scholar
[10]

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto and H. Adam, MobileNets: Efficient convolutional neural networks for mobile vision applications, preprint, arXiv: 1704.04861. Google Scholar

[11]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, preprint, arXiv: 1502.03167. Google Scholar

[12]

J.-H. Jacobsen, A. Smeulders and E. Oyallon, $i$-RevNet: Deep invertible networks, preprint, arXiv: 1802.07088. Google Scholar

[13]

D. P. Kingma and P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions, in Advances in Neural Information Processing Systems, 2018, 10215–10224. Google Scholar

[14]

A. Mahendran and A. Vedaldi, Understanding deep image representations by inverting them, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 5188–5196. doi: 10.1109/CVPR.2015.7299155.  Google Scholar

[15]

E. Oyallon, Building a regular decision boundary with deep networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 5106–5114. doi: 10.1109/CVPR.2017.204.  Google Scholar

[16]

R. Prenger, R. Valle and B. Catanzaro, Waveglow: A flow-based generative network for speech synthesis, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, 3617–3621. doi: 10.1109/ICASSP.2019.8683143.  Google Scholar

[17]

A. SaberiA. A. Stoorvogel and P. Sannuti, Inverse filtering and deconvolution, Internat. J. Robust Nonlinear Control, 11 (2001), 131-156.  doi: 10.1002/rnc.553.  Google Scholar

[18]

R. Shwartz-Ziv and N. Tishby, Opening the black box of Deep Neural Networks via Information, preprint, arXiv: 1703.00810. Google Scholar

[19]

T. F. van der Ouderaa and D. E. Worrall, Reversible GANS for memory-efficient image-to-image translation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, 4720–4728. Google Scholar

[20]

J. Wang and L. Perez, The effectiveness of data augmentation in image classification using deep learning, preprint, arXiv: 1712.04621. Google Scholar

[21]

M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, in Computer Vision–ECCV 2014, Lecture Notes in Computer Science, 8689, Springer, Cham, 2014,818–833. doi: 10.1007/978-3-319-10590-1_53.  Google Scholar

show all references

References:
[1]

S. An, F. Boussaid and M. Bennamoun, How can deep rectifier networks achieve linear separability and preserve distances?, International Conference on Machine Learning, 2015,514–523. Google Scholar

[2]

J. Behrmann, W. Grathwohl, R. T. Q. Chen, D. Duvenaud and J.-H. Jacobsen, Invertible residual networks, International Conference on Machine Learning, 2019,573–582. Google Scholar

[3]

F. Chollet, Xception: Deep learning with depthwise separable convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 1251–1258. doi: 10.1109/CVPR.2017.195.  Google Scholar

[4]

M. Cisse, P. Bojanowski, E. Grave, Y. Dauphin and N. Usunier, Parseval networks: Improving robustness to adversarial examples, in Proceedings of the 34th International Conference on Machine Learning-Volume 70, 2017,854–863. Google Scholar

[5]

L. Dinh, D. Krueger and Y. Bengio, NICE: Non-linear independent components estimation, preprint, arXiv: 1410.8516. Google Scholar

[6]

L. Dinh, J. Sohl-Dickstein and S. Bengio, Density estimation using real NVP, preprint, arXiv: 1605.08803. Google Scholar

[7]

A. Dosovitskiy and T. Brox, Inverting visual representations with convolutional networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, 4829–4837. doi: 10.1109/CVPR.2016.522.  Google Scholar

[8]

A. N. Gomez, M. Ren, R. Urtasun and R. B. Grosse, The reversible residual network: Backpropagation without storing activations, in Advances in Neural Information Processing Systems, 2017, 2214–2224. Google Scholar

[9] I. GoodfellowY. Bengio and A. Courville, Deep Learning. Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2016.   Google Scholar
[10]

A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto and H. Adam, MobileNets: Efficient convolutional neural networks for mobile vision applications, preprint, arXiv: 1704.04861. Google Scholar

[11]

S. Ioffe and C. Szegedy, Batch normalization: Accelerating deep network training by reducing internal covariate shift, preprint, arXiv: 1502.03167. Google Scholar

[12]

J.-H. Jacobsen, A. Smeulders and E. Oyallon, $i$-RevNet: Deep invertible networks, preprint, arXiv: 1802.07088. Google Scholar

[13]

D. P. Kingma and P. Dhariwal, Glow: Generative flow with invertible 1x1 convolutions, in Advances in Neural Information Processing Systems, 2018, 10215–10224. Google Scholar

[14]

A. Mahendran and A. Vedaldi, Understanding deep image representations by inverting them, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, 5188–5196. doi: 10.1109/CVPR.2015.7299155.  Google Scholar

[15]

E. Oyallon, Building a regular decision boundary with deep networks, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, 5106–5114. doi: 10.1109/CVPR.2017.204.  Google Scholar

[16]

R. Prenger, R. Valle and B. Catanzaro, Waveglow: A flow-based generative network for speech synthesis, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, 3617–3621. doi: 10.1109/ICASSP.2019.8683143.  Google Scholar

[17]

A. SaberiA. A. Stoorvogel and P. Sannuti, Inverse filtering and deconvolution, Internat. J. Robust Nonlinear Control, 11 (2001), 131-156.  doi: 10.1002/rnc.553.  Google Scholar

[18]

R. Shwartz-Ziv and N. Tishby, Opening the black box of Deep Neural Networks via Information, preprint, arXiv: 1703.00810. Google Scholar

[19]

T. F. van der Ouderaa and D. E. Worrall, Reversible GANS for memory-efficient image-to-image translation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, 4720–4728. Google Scholar

[20]

J. Wang and L. Perez, The effectiveness of data augmentation in image classification using deep learning, preprint, arXiv: 1712.04621. Google Scholar

[21]

M. D. Zeiler and R. Fergus, Visualizing and understanding convolutional networks, in Computer Vision–ECCV 2014, Lecture Notes in Computer Science, 8689, Springer, Cham, 2014,818–833. doi: 10.1007/978-3-319-10590-1_53.  Google Scholar

Figure 1.  The proposed residual network architecture
Figure 2.  Inverse of the rectifier linear transform: Invertible percentage of 500 cases changes along with $ \gamma $ when the dimension of $ \mathbf{x} $ is 10
Figure 3.  Inverse of the residual unit with the fully-connected layer: Invertible percentage of 500 cases changes along with $ \gamma $ when the dimension of $ \mathbf{x} $ is 10
Figure 4.  Comparison of recovered images to original digit images. The 1st row illustrates the original images, whereas the 2nd and 3rd rows show the recovered images from the proposed fixed-point method and the existing fixed-point method, respectively
Figure 5.  Relative error rates (%) of the recovered images. One hundred samples per each class, in total 1000 samples, were chosen and recovered
[1]

Ying Sue Huang. Resynchronization of delayed neural networks. Discrete & Continuous Dynamical Systems, 2001, 7 (2) : 397-401. doi: 10.3934/dcds.2001.7.397

[2]

Torsten Trimborn, Stephan Gerster, Giuseppe Visconti. Spectral methods to study the robustness of residual neural networks with infinite layers. Foundations of Data Science, 2020, 2 (3) : 257-278. doi: 10.3934/fods.2020012

[3]

Tatyana S. Turova. Structural phase transitions in neural networks. Mathematical Biosciences & Engineering, 2014, 11 (1) : 139-148. doi: 10.3934/mbe.2014.11.139

[4]

Leong-Kwan Li, Sally Shao. Convergence analysis of the weighted state space search algorithm for recurrent neural networks. Numerical Algebra, Control & Optimization, 2014, 4 (3) : 193-207. doi: 10.3934/naco.2014.4.193

[5]

Benedict Leimkuhler, Charles Matthews, Tiffany Vlaar. Partitioned integrators for thermodynamic parameterization of neural networks. Foundations of Data Science, 2019, 1 (4) : 457-489. doi: 10.3934/fods.2019019

[6]

Leslaw Skrzypek, Yuncheng You. Feedback synchronization of FHN cellular neural networks. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2021001

[7]

Ricai Luo, Honglei Xu, Wu-Sheng Wang, Jie Sun, Wei Xu. A weak condition for global stability of delayed neural networks. Journal of Industrial & Management Optimization, 2016, 12 (2) : 505-514. doi: 10.3934/jimo.2016.12.505

[8]

Benedetta Lisena. Average criteria for periodic neural networks with delay. Discrete & Continuous Dynamical Systems - B, 2014, 19 (3) : 761-773. doi: 10.3934/dcdsb.2014.19.761

[9]

Larry Turyn. Cellular neural networks: asymmetric templates and spatial chaos. Conference Publications, 2003, 2003 (Special) : 864-871. doi: 10.3934/proc.2003.2003.864

[10]

Lars Grüne. Computing Lyapunov functions using deep neural networks. Journal of Computational Dynamics, 2021, 8 (2) : 131-152. doi: 10.3934/jcd.2021006

[11]

Leong-Kwan Li, Sally Shao, K. F. Cedric Yiu. Nonlinear dynamical system modeling via recurrent neural networks and a weighted state space search algorithm. Journal of Industrial & Management Optimization, 2011, 7 (2) : 385-400. doi: 10.3934/jimo.2011.7.385

[12]

Karim El Laithy, Martin Bogdan. Synaptic energy drives the information processing mechanisms in spiking neural networks. Mathematical Biosciences & Engineering, 2014, 11 (2) : 233-256. doi: 10.3934/mbe.2014.11.233

[13]

Yong Zhao, Qishao Lu. Periodic oscillations in a class of fuzzy neural networks under impulsive control. Conference Publications, 2011, 2011 (Special) : 1457-1466. doi: 10.3934/proc.2011.2011.1457

[14]

Zbigniew Gomolka, Boguslaw Twarog, Jacek Bartman. Improvement of image processing by using homogeneous neural networks with fractional derivatives theorem. Conference Publications, 2011, 2011 (Special) : 505-514. doi: 10.3934/proc.2011.2011.505

[15]

Ivanka Stamova, Gani Stamov. On the stability of sets for reaction–diffusion Cohen–Grossberg delayed neural networks. Discrete & Continuous Dynamical Systems - S, 2021, 14 (4) : 1429-1446. doi: 10.3934/dcdss.2020370

[16]

Jui-Pin Tseng. Global asymptotic dynamics of a class of nonlinearly coupled neural networks with delays. Discrete & Continuous Dynamical Systems, 2013, 33 (10) : 4693-4729. doi: 10.3934/dcds.2013.33.4693

[17]

Xiaochen Mao, Weijie Ding, Xiangyu Zhou, Song Wang, Xingyong Li. Complexity in time-delay networks of multiple interacting neural groups. Electronic Research Archive, , () : -. doi: 10.3934/era.2021022

[18]

Cheng-Hsiung Hsu, Suh-Yuh Yang. Structure of a class of traveling waves in delayed cellular neural networks. Discrete & Continuous Dynamical Systems, 2005, 13 (2) : 339-359. doi: 10.3934/dcds.2005.13.339

[19]

Benoît Perthame, Delphine Salort. On a voltage-conductance kinetic system for integrate & fire neural networks. Kinetic & Related Models, 2013, 6 (4) : 841-864. doi: 10.3934/krm.2013.6.841

[20]

Meiyu Sui, Yejuan Wang, Peter E. Kloeden. Pullback attractors for stochastic recurrent neural networks with discrete and distributed delays. Electronic Research Archive, 2021, 29 (2) : 2187-2221. doi: 10.3934/era.2020112

 Impact Factor: 

Metrics

  • PDF downloads (172)
  • HTML views (359)
  • Cited by (0)

Other articles
by authors

[Back to Top]