October  2021, 8(4): 495-520. doi: 10.3934/jcd.2021018

Classification with Runge-Kutta networks and feature space augmentation

1. 

Institut für Mathematik, Humboldt-Universität zu Berlin, Unter den Linden 6, 10099 Berlin, Germany

2. 

Martin-Luther-Universität Halle-Wittenberg, Theodor-Lieser-Str. 5, 06120 Halle, Germany

* Corresponding author: Axel Kröner

Received  April 2021 Revised  September 2021 Published  October 2021 Early access  November 2021

Fund Project: The second author is supported by DAAD project 57570343

In this paper we combine an approach based on Runge-Kutta Nets considered in [Benning et al., J. Comput. Dynamics, 9, 2019] and a technique on augmenting the input space in [Dupont et al., NeurIPS, 2019] to obtain network architectures which show a better numerical performance for deep neural networks in point and image classification problems. The approach is illustrated with several examples implemented in PyTorch.

Citation: Elisa Giesecke, Axel Kröner. Classification with Runge-Kutta networks and feature space augmentation. Journal of Computational Dynamics, 2021, 8 (4) : 495-520. doi: 10.3934/jcd.2021018
References:
[1]

M. BenningE. CelledoniM. J. EhrhardtB. Owren and C.-B. Schönlieb, Deep learning as optimal control problems: Models and numerical methods, J. Comput. Dyn., 6 (2019), 171-198.  doi: 10.3934/jcd.2019009.  Google Scholar

[2]

E. CelledoniM. J. EhrhardtC. EtmannR. I. McLachlanB. OwrenC.-B. Schönlieb and F. Sherry, Structure-preserving deep learning, European J. Appl. Math., 32 (2021), 888-936.  doi: 10.1017/S0956792521000139.  Google Scholar

[3]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural Ordinary Differential Equations, Advances in Neural Information Processing Systems, 31, Curran Associates, Inc., 2018. Google Scholar

[4]

E. Dupont, A. Doucet and Y. W. Teh, Augmented neural ODEs, Adv. Neural Inf. Process. Syst., 32 (2019). Google Scholar

[5]

W. E, A proposal on machine learning via dynamical systems, Commun. Math. Stat., 5 (2017), 1-11.  doi: 10.1007/s40304-017-0103-z.  Google Scholar

[6]

E. Giesecke, Augmented-RK-Nets, 2021. Available from: https://github.com/ElisaGiesecke/augmented-RK-Nets. Google Scholar

[7] I. GoodfellowY. Bengio and A. Courville, Deep Learning, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2016.   Google Scholar
[8]

W. W. Hager, Runge-Kutta methods in optimal control and the transformed adjoint system, Numer. Math., 87 (2000), 247-282.  doi: 10.1007/s002110000178.  Google Scholar

[9]

K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016. doi: 10.1109/CVPR.2016.90.  Google Scholar

[10]

C. F. Higham and D. J. Higham, Deep learning: An introduction for applied mathematicians, SIAM Rev., 61 (2019), 860-891.  doi: 10.1137/18M1165748.  Google Scholar

[11]

M. RaissiP. Perdikaris and G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys., 378 (2019), 686-707.  doi: 10.1016/j.jcp.2018.10.045.  Google Scholar

[12]

D. Ruiz-Balet and E. Zuazua, Neural ODE control for classification, approximation and transport, preprint, arXiv: 2104.05278. Google Scholar

[13]

J. M. Sanz-Serna, Symplectic Runge-Kutta and related methods: Recent results, Phys. D, 60 (1992), 293-302.  doi: 10.1016/0167-2789(92)90245-I.  Google Scholar

[14]

J. M. Sanz-Serna, Symplectic Runge-Kutta schemes for adjoint equations, automatic differentiation, optimal control, and more, SIAM Rev., 58 (2016), 3-33.  doi: 10.1137/151002769.  Google Scholar

show all references

References:
[1]

M. BenningE. CelledoniM. J. EhrhardtB. Owren and C.-B. Schönlieb, Deep learning as optimal control problems: Models and numerical methods, J. Comput. Dyn., 6 (2019), 171-198.  doi: 10.3934/jcd.2019009.  Google Scholar

[2]

E. CelledoniM. J. EhrhardtC. EtmannR. I. McLachlanB. OwrenC.-B. Schönlieb and F. Sherry, Structure-preserving deep learning, European J. Appl. Math., 32 (2021), 888-936.  doi: 10.1017/S0956792521000139.  Google Scholar

[3]

R. T. Q. Chen, Y. Rubanova, J. Bettencourt and D. K. Duvenaud, Neural Ordinary Differential Equations, Advances in Neural Information Processing Systems, 31, Curran Associates, Inc., 2018. Google Scholar

[4]

E. Dupont, A. Doucet and Y. W. Teh, Augmented neural ODEs, Adv. Neural Inf. Process. Syst., 32 (2019). Google Scholar

[5]

W. E, A proposal on machine learning via dynamical systems, Commun. Math. Stat., 5 (2017), 1-11.  doi: 10.1007/s40304-017-0103-z.  Google Scholar

[6]

E. Giesecke, Augmented-RK-Nets, 2021. Available from: https://github.com/ElisaGiesecke/augmented-RK-Nets. Google Scholar

[7] I. GoodfellowY. Bengio and A. Courville, Deep Learning, Adaptive Computation and Machine Learning, MIT Press, Cambridge, MA, 2016.   Google Scholar
[8]

W. W. Hager, Runge-Kutta methods in optimal control and the transformed adjoint system, Numer. Math., 87 (2000), 247-282.  doi: 10.1007/s002110000178.  Google Scholar

[9]

K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016. doi: 10.1109/CVPR.2016.90.  Google Scholar

[10]

C. F. Higham and D. J. Higham, Deep learning: An introduction for applied mathematicians, SIAM Rev., 61 (2019), 860-891.  doi: 10.1137/18M1165748.  Google Scholar

[11]

M. RaissiP. Perdikaris and G. E. Karniadakis, Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations, J. Comput. Phys., 378 (2019), 686-707.  doi: 10.1016/j.jcp.2018.10.045.  Google Scholar

[12]

D. Ruiz-Balet and E. Zuazua, Neural ODE control for classification, approximation and transport, preprint, arXiv: 2104.05278. Google Scholar

[13]

J. M. Sanz-Serna, Symplectic Runge-Kutta and related methods: Recent results, Phys. D, 60 (1992), 293-302.  doi: 10.1016/0167-2789(92)90245-I.  Google Scholar

[14]

J. M. Sanz-Serna, Symplectic Runge-Kutta schemes for adjoint equations, automatic differentiation, optimal control, and more, SIAM Rev., 58 (2016), 3-33.  doi: 10.1137/151002769.  Google Scholar

Figure 1.  Butcher tableaus: (from left to right) general form, forward Euler and classic RK4.
Figure 2.  Two dimensional datasets for binary point classification with 1500 samples each: donut 1D and donut 2D (top), squares and spiral (bottom).
Figure 3.  Classification of donut_2D with RK4Net of width $ \hat{d} = 2 $ corresponding to the NODE-approach (top) and $ \hat{d} = 3 $, i. e. with space augmentation characterizing the ANODE-approach (bottom), and of same depth $ L = 100 $ and $ \tanh $ activation. The plots show (from left to right) the trajectories of the features starting at the small dot and terminating at the large dot, their final transformation in the output layer and the resulting prediction with coloured background according to the network's classification.
Figure 4.  Classification of squares with RK4Net of width $ \hat{d} = 2 $ corresponding to the NODE-approach (top) and $ \hat{d} = 3 $, i. e. with space augmentation characterizing the ANODE-approach (bottom), and of same depth $ L = 100 $ and $ \tanh $ activation. The plots show (from left to right) the trajectories of the features starting at the small dot and terminating at the large dot, their final transformation in the output layer and the resulting prediction with coloured background according to the network's classification.
Figure 5.  Feature transformation of spiral with StandardNet (top) and RK4Net (bottom) of width $ \hat{d} = 16 $, depth $ L = 20 $ and $ \tanh $ activation. (From left to right) features in input layer, hidden layers and output layer.
Figure 6.  Prediction of donut_1D with RK4Net of width $ \hat{d} = 16 $, depth $ L = 20 $ and $ \tanh $ activation.
Figure 7.  Accuracy (left) and cost (right) over the course of epochs on donut_1D with RK4Net of width $ \hat{d} = 16 $ and depth $ L = 20 $. Solid lines represent metrics on validation and dotted lines on training data.
Figure 8.  Donut and squares datasets of different dimensionality and with varying number of classes used for comparing performance of networks between binary and multiclass classification (first column), as well as 2D and 3D input space (second and third column).
Figure 9.  Repetitions with random initializations for RK4Net with width $ \hat{d} = 16 $, depth $ L = 100 $ and $ \tanh $ activation, on donut 2D & 6C. The plots show (upper row) the feature transformation in the output layer reduced by PCA to 3D, and (lower row) the resulting prediction underlaid with a coloured background according to the network's classification.
Figure 10.  Classification of donut 2D & 6C with network width $ \hat{d} = 16 $, depth $ L = 5 $ for StandardNet and $ L = 100 $ for EulerNet and RK4Net, and $ \tanh $ activation. The plots show (from left to right) the feature transformation in the output layer reduced by PCA to 3D and 2D, and the resulting prediction. Two dimensional plots are underlaid with a coloured background according to the network's classification.
Figure 11.  Classification of squares 2D & 4C with network width $ \hat{d} = 16 $, depth $ L = 5 $ for StandardNet and $ L = 100 $ for EulerNet and RK4Net, and $ \tanh $ activation. The plots show (from left to right) the feature transformation in the output layer reduced by PCA to 3D and 2D, and the resulting prediction. Two dimensional plots are underlaid with a coloured background according to the network's classification.
Figure 12.  Validation accuracy (left) and cost (right) over the course of epochs on donut 3D & 6C with network width $ \hat{d} = 16 $, depth $ L = 5 $ for StandardNet and $ L = 100 $ for EulerNet and RK4Net, and $ \tanh $ activation. Solid line represents the mean and shaded area the standard deviation over repetitions.
Figure 13.  Validation accuracy (left) and cost (right) over the course of epochs on squares 3D & 4C with network width $ \hat{d} = 16 $, depth $ L = 5 $ for StandardNet and $ L = 100 $ for EulerNet and RK4Net, and $ \tanh $ activation. Solid line represents the mean and shaded area the standard deviation over repetitions.
Figure 14.  Exemplary images of MNIST with true label and prediction produced by RK4Net with width $ \hat{d} = 30^2 $, depth $ L = 100 $ and $ \tanh $ activation.
Figure 15.  Exemplary images of Fashion-MNIST with true label and prediction produced by RK4Net with width $ \hat{d} = 30^2 $, depth $ L = 100 $ and $ \tanh $ activation.
Figure 16.  Feature transformation in the output layer of StandardNet (left) and RK4Net (right) of Fashion-MNIST images reduced by PCA to 3D. Each color represents one article class.
Figure 17.  Accuracy (left) and cost (right) over the course of epochs on MNIST with network width $ \hat{d} = 30^2 $, depth $ L = 5 $ for StandardNet and $ L = 100 $ for EulerNet and RK4Net, and $ \tanh $ activation. Solid lines represent metrics on validation and dotted lines on training data.
Figure 18.  Accuracy (left) and cost (right) over the course of epochs on Fashion-MNIST with network width $ \hat{d} = 30^2 $, depth $ L = 5 $ for StandardNet and $ L = 100 $ for EulerNet and RK4Net, and $ \tanh $ activation. Solid lines represent metrics on validation and dotted lines on training data.
Table 1.  Mean of training (upper row) and validation (lower row) accuracy (%) over four repetitions on spiral with network width $ \hat{d} = 16 $ and $ \tanh $ activation.
depth L 1 3 5 10 20 40 100
StandardNet 92.73
91.88
92.87
92.50
98.12
98.10
97.52
97.45
67.62
66.87
51.08
48.92
50.67
49.33
RK4Net 75.60
75.12
91.42
90.68
97.90
97.33
99.77
99.47
99.93
99.70
99.73
99.50
99.95
99.75
depth L 1 3 5 10 20 40 100
StandardNet 92.73
91.88
92.87
92.50
98.12
98.10
97.52
97.45
67.62
66.87
51.08
48.92
50.67
49.33
RK4Net 75.60
75.12
91.42
90.68
97.90
97.33
99.77
99.47
99.93
99.70
99.73
99.50
99.95
99.75
Table 2.  Mean of training (upper row) and validation (lower row) cost ($ \times 10^{-1} $) over four repetitions on spiral with network width $ \hat{d} = 16 $ and $ \tanh $ activation.
depth L 1 3 5 10 20 40 100
StandardNet 2.23
2.33
1.38
1.53
0.66
0.67
0.77
0.77
6.09
6.13
6.93
6.94
6.93
6.93
RK4Net 4.32
4.39
2.68
2.69
0.98
1.06
0.16
0.28
0.04
0.13
0.10
0.12
0.01
0.12
depth L 1 3 5 10 20 40 100
StandardNet 2.23
2.33
1.38
1.53
0.66
0.67
0.77
0.77
6.09
6.13
6.93
6.94
6.93
6.93
RK4Net 4.32
4.39
2.68
2.69
0.98
1.06
0.16
0.28
0.04
0.13
0.10
0.12
0.01
0.12
Table 3.  Variability of accuracy (%) and cost ($ \times 10^{-1} $) over four repetitions for RK4Net with width $ \hat{d} = 16 $, depth $ L = 100 $ and $ \tanh $ activation, on donut 2D & 6C.
training accuracy validation accuracy training cost validation cost
mean 77.13 74.92 5.13 5.59
standard deviation 0.76 0.89 0.08 0.16
training accuracy validation accuracy training cost validation cost
mean 77.13 74.92 5.13 5.59
standard deviation 0.76 0.89 0.08 0.16
Table 4.  Mean of validation accuracy (%, upper row) and cost ($ \times 10^{-1} $, lower row) over four repetitions with network width $ \hat{d} = 16 $, depth $ L = 5 $ for StandardNet and $ L = 100 $ for EulerNet and RK4Net, and $ \tanh $ activation.
donut
3D & 2C
donut
3D & 3C
donut
2D & 6C
donut
3D & 6C
squares
2D & 4C
squares
3D & 4C
StandardNet 92.37
1.71
87.75
2.85
75.12
5.60
73.00
5.86
94.12
1.57
89.68
3.03
EulerNet 91.88
1.84
88.30
2.75
74.87
5.56
74.63
5.67
93.35
1.66
89.48
2.71
RK4Net 92.73
1.72
87.13
2.95
74.92
5.59
74.88
5.73
93.20
1.64
89.37
2.81
donut
3D & 2C
donut
3D & 3C
donut
2D & 6C
donut
3D & 6C
squares
2D & 4C
squares
3D & 4C
StandardNet 92.37
1.71
87.75
2.85
75.12
5.60
73.00
5.86
94.12
1.57
89.68
3.03
EulerNet 91.88
1.84
88.30
2.75
74.87
5.56
74.63
5.67
93.35
1.66
89.48
2.71
RK4Net 92.73
1.72
87.13
2.95
74.92
5.59
74.88
5.73
93.20
1.64
89.37
2.81
Table 5.  Mean and standard deviation of accuracy (%) and cost ($ \times 10^{-1} $) over four repetitions for non-augmented (upper row) and augmented (lower row) RK4Net with depth $ L = 100 $ and $ \tanh $ activation on MNIST.
width $\hat{d}$ training accuracy validation accuracy training cost validation cost
$28^2$ $97.70 \pm 2.80$ $87.27 \pm 2.93$ $0.78 \pm 0.95$ $7.71 \pm 1.62$
$30^2$ $99.77 \pm 0.40$ $90.40 \pm 1.08$ $0.10 \pm 0.17$ $5.36 \pm 0.46$
width $\hat{d}$ training accuracy validation accuracy training cost validation cost
$28^2$ $97.70 \pm 2.80$ $87.27 \pm 2.93$ $0.78 \pm 0.95$ $7.71 \pm 1.62$
$30^2$ $99.77 \pm 0.40$ $90.40 \pm 1.08$ $0.10 \pm 0.17$ $5.36 \pm 0.46$
Table 6.  Mean and standard deviation of validation accuracy (%, upper row) and cost ($ \times 10^{-1} $, lower row) over four repetitions with network width $ \hat{d} = 30^2 $, depth $ L = 5 $ for StandardNet and $ L = 100 $ for EulerNet and RK4Net, and $ \tanh $ activation.
MNIST Fashion-MNIST
StandardNet $85.67 \pm 0.78\\8.95 \pm 0.92$ $61.23 \pm 6.00\\11.41 \pm 1.37$
EulerNet $90.98 \pm 0.48\\5.71 \pm 0.36$ $77.62 \pm 2.57\\9.52 \pm 1.87$
RK4Net $90.40 \pm 1.08\\5.36 \pm 0.46$ $79.13 \pm 1.57\\8.24 \pm 0.60$
MNIST Fashion-MNIST
StandardNet $85.67 \pm 0.78\\8.95 \pm 0.92$ $61.23 \pm 6.00\\11.41 \pm 1.37$
EulerNet $90.98 \pm 0.48\\5.71 \pm 0.36$ $77.62 \pm 2.57\\9.52 \pm 1.87$
RK4Net $90.40 \pm 1.08\\5.36 \pm 0.46$ $79.13 \pm 1.57\\8.24 \pm 0.60$
[1]

Da Xu. Numerical solutions of viscoelastic bending wave equations with two term time kernels by Runge-Kutta convolution quadrature. Discrete & Continuous Dynamical Systems - B, 2017, 22 (6) : 2389-2416. doi: 10.3934/dcdsb.2017122

[2]

Sihong Shao, Huazhong Tang. Higher-order accurate Runge-Kutta discontinuous Galerkin methods for a nonlinear Dirac model. Discrete & Continuous Dynamical Systems - B, 2006, 6 (3) : 623-640. doi: 10.3934/dcdsb.2006.6.623

[3]

Yuantian Xia, Juxiang Zhou, Tianwei Xu, Wei Gao. An improved deep convolutional neural network model with kernel loss function in image classification. Mathematical Foundations of Computing, 2020, 3 (1) : 51-64. doi: 10.3934/mfc.2020005

[4]

Seonho Park, Maciej Rysz, Kaitlin L. Fair, Panos M. Pardalos. Synthetic-Aperture Radar image based positioning in GPS-denied environments using Deep Cosine Similarity Neural Networks. Inverse Problems & Imaging, 2021, 15 (4) : 763-785. doi: 10.3934/ipi.2021013

[5]

Lars Grüne. Computing Lyapunov functions using deep neural networks. Journal of Computational Dynamics, 2021, 8 (2) : 131-152. doi: 10.3934/jcd.2021006

[6]

Miria Feng, Wenying Feng. Evaluation of parallel and sequential deep learning models for music subgenre classification. Mathematical Foundations of Computing, 2021, 4 (2) : 131-143. doi: 10.3934/mfc.2021008

[7]

Zbigniew Gomolka, Boguslaw Twarog, Jacek Bartman. Improvement of image processing by using homogeneous neural networks with fractional derivatives theorem. Conference Publications, 2011, 2011 (Special) : 505-514. doi: 10.3934/proc.2011.2011.505

[8]

Ruhua Wang, Senjian An, Wanquan Liu, Ling Li. Fixed-point algorithms for inverse of residual rectifier neural networks. Mathematical Foundations of Computing, 2021, 4 (1) : 31-44. doi: 10.3934/mfc.2020024

[9]

H. N. Mhaskar, T. Poggio. Function approximation by deep networks. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4085-4095. doi: 10.3934/cpaa.2020181

[10]

Antonia Katzouraki, Tania Stathaki. Intelligent traffic control on internet-like topologies - integration of graph principles to the classic Runge--Kutta method. Conference Publications, 2009, 2009 (Special) : 404-415. doi: 10.3934/proc.2009.2009.404

[11]

Wenjuan Zhai, Bingzhen Chen. A fourth order implicit symmetric and symplectic exponentially fitted Runge-Kutta-Nyström method for solving oscillatory problems. Numerical Algebra, Control & Optimization, 2019, 9 (1) : 71-84. doi: 10.3934/naco.2019006

[12]

Christopher Oballe, David Boothe, Piotr J. Franaszczuk, Vasileios Maroulas. ToFU: Topology functional units for deep learning. Foundations of Data Science, 2021  doi: 10.3934/fods.2021021

[13]

Ziju Shen, Yufei Wang, Dufan Wu, Xu Yang, Bin Dong. Learning to scan: A deep reinforcement learning approach for personalized scanning in CT imaging. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2021045

[14]

Ying Sue Huang. Resynchronization of delayed neural networks. Discrete & Continuous Dynamical Systems, 2001, 7 (2) : 397-401. doi: 10.3934/dcds.2001.7.397

[15]

Hyeontae Jo, Hwijae Son, Hyung Ju Hwang, Eun Heui Kim. Deep neural network approach to forward-inverse problems. Networks & Heterogeneous Media, 2020, 15 (2) : 247-259. doi: 10.3934/nhm.2020011

[16]

Zheng Chen, Liu Liu, Lin Mu. Solving the linear transport equation by a deep neural network approach. Discrete & Continuous Dynamical Systems - S, 2021  doi: 10.3934/dcdss.2021070

[17]

Martin Benning, Elena Celledoni, Matthias J. Ehrhardt, Brynjulf Owren, Carola-Bibiane Schönlieb. Deep learning as optimal control problems: Models and numerical methods. Journal of Computational Dynamics, 2019, 6 (2) : 171-198. doi: 10.3934/jcd.2019009

[18]

Nicholas Geneva, Nicholas Zabaras. Multi-fidelity generative deep learning turbulent flows. Foundations of Data Science, 2020, 2 (4) : 391-428. doi: 10.3934/fods.2020019

[19]

Govinda Anantha Padmanabha, Nicholas Zabaras. A Bayesian multiscale deep learning framework for flows in random media. Foundations of Data Science, 2021, 3 (2) : 251-303. doi: 10.3934/fods.2021016

[20]

Tatyana S. Turova. Structural phase transitions in neural networks. Mathematical Biosciences & Engineering, 2014, 11 (1) : 139-148. doi: 10.3934/mbe.2014.11.139

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]