September  2020, 2(3): 309-332. doi: 10.3934/fods.2020014

Adaptive random Fourier features with Metropolis sampling

1. 

Department of Mathematics, Royal Institute of Technology, 100 44, Stockholm, Sweden

2. 

H-Ai AB, Grevgatan 23,114 53, Stockholm, Sweden

3. 

Department of Mathematical Sciences, University of Delaware, Newark, DE 19716, USA

* Corresponding author: Aku Kammonen

Received  July 2020 Revised  August 2020 Published  September 2020

Fund Project: The research was supported by Swedish Research Council grant 2019-03725. The work of P. P. was supported in part by the ARO Grant W911NF-19-1-0243

The supervised learning problem to determine a neural network approximation $ \mathbb{R}^d\ni x\mapsto\sum_{k = 1}^K\hat\beta_k e^{{{\mathrm{i}}}\omega_k\cdot x} $ with one hidden layer is studied as a random Fourier features algorithm. The Fourier features, i.e., the frequencies $ \omega_k\in\mathbb{R}^d $, are sampled using an adaptive Metropolis sampler. The Metropolis test accepts proposal frequencies $ \omega_k' $, having corresponding amplitudes $ \hat\beta_k' $, with the probability $ \min\big\{1, (|\hat\beta_k'|/|\hat\beta_k|)^{\gamma}\big\} $, for a certain positive parameter $ {\gamma} $, determined by minimizing the approximation error for given computational work. This adaptive, non-parametric stochastic method leads asymptotically, as $ K\to\infty $, to equidistributed amplitudes $ |\hat\beta_k| $, analogous to deterministic adaptive algorithms for differential equations. The equidistributed amplitudes are shown to asymptotically correspond to the optimal density for independent samples in random Fourier features methods. Numerical evidence is provided in order to demonstrate the approximation properties and efficiency of the proposed algorithm. The algorithm is tested both on synthetic data and a real-world high-dimensional benchmark.

Citation: Aku Kammonen, Jonas Kiessling, Petr Plecháč, Mattias Sandberg, Anders Szepessy. Adaptive random Fourier features with Metropolis sampling. Foundations of Data Science, 2020, 2 (3) : 309-332. doi: 10.3934/fods.2020014
References:
[1]

H. Avron, M. Kapralov, C. Musco, C. Musco, A. Velingker and A. Zandieh, Random Fourier features for kernel ridge regression: Approximation bounds and statistical guarantees, in Proceedings of the 34th International Conference on Machine Learning (eds. D. Precup and Y. W. Teh), vol. 70 of Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia, 2017,253–262, http://proceedings.mlr.press/v70/avron17a.html. Google Scholar

[2]

I. Babuška and W. C. Rheinboldt, Error estimates for adaptive finite element computations, SIAM J. Numer. Anal., 15 (1978), 736-754.  doi: 10.1137/0715049.  Google Scholar

[3]

F. Bach, On the equivalence between kernel quadrature rules and random feature expansions, J. Mach. Learn. Res., 18 (2017), Paper No. 21, 38.  Google Scholar

[4]

A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inform. Theory, 39 (1993), 930-945.  doi: 10.1109/18.256500.  Google Scholar

[5]

W. EC. Ma and L. Wu, A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics, Sci. China Math., 63 (2020), 1235-1258.  doi: 10.1007/s11425-019-1628-5.  Google Scholar

[6]

L. C. Evans, Partial Differential Equations, vol. 19 of Graduate Studies in Mathematics, 2nd edition, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/gsm/019.  Google Scholar

[7]

H. Haario, E. Saksman and J. Tamminen, An adaptive metropolis algorithm, Bernoulli, 7 (2001), 223–242, https://projecteuclid.org:443/euclid.bj/1080222083. doi: 10.2307/3318737.  Google Scholar

[8]

L. K. Jones, A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training, Ann. Statist., 20 (1992), 608-613.  doi: 10.1214/aos/1176348546.  Google Scholar

[9]

Z. Li, J.-F. Ton, D. Oglic and D. Sejdinovic, Towards a unified analysis of random Fourier features, in ICML, 2019, arXiv: 1806.09178v4. Google Scholar

[10]

A. Rahimi and B. Recht, Random features for large-scale kernel machines, in Advances in Neural Information Processing Systems 20 (eds. J. C. Platt, D. Koller, Y. Singer and S. T. Roweis), Curran Associates, Inc., 2008, 1177–1184. Google Scholar

[11]

G. O. RobertsA. Gelman and W. R. Gilks, Weak convergence and optimal scaling of random walk Metropolis algorithms, Ann. Appl. Probab., 7 (1997), 110-120.  doi: 10.1214/aoap/1034625254.  Google Scholar

[12]

G. O. Roberts and J. S. Rosenthal, Optimal scaling for various Metropolis-Hastings algorithms, Statist. Sci., 16 (2001), 351-367.  doi: 10.1214/ss/1015346320.  Google Scholar

[13]

G. O. Roberts and J. S. Rosenthal, Examples of adaptive MCMC, J. Comput. Graph. Statist., 18 (2009), 349–367, http://www.jstor.org/stable/25651249. doi: 10.1198/jcgs.2009.06134.  Google Scholar

[14]

A. Rudi and L. Rosasco, Generalization properties of learning with random features, in Advances in Neural Information Processing Systems 30 (eds. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan and R. Garnett), Curran Associates, Inc., 2017, 3215–3225. Google Scholar

[15] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014.  doi: 10.1017/CBO9781107298019.  Google Scholar
[16]

L. N. Trefethen and D. Bau III, Numerical Linear Algebra, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997. doi: 10.1137/1.9780898719574.  Google Scholar

[17]

A. G. Wilson and R. P. Adams, Gaussian process kernels for pattern discovery and extrapolation, in ICML, 2013, arXiv: 1302.4245v3. Google Scholar

show all references

References:
[1]

H. Avron, M. Kapralov, C. Musco, C. Musco, A. Velingker and A. Zandieh, Random Fourier features for kernel ridge regression: Approximation bounds and statistical guarantees, in Proceedings of the 34th International Conference on Machine Learning (eds. D. Precup and Y. W. Teh), vol. 70 of Proceedings of Machine Learning Research, PMLR, International Convention Centre, Sydney, Australia, 2017,253–262, http://proceedings.mlr.press/v70/avron17a.html. Google Scholar

[2]

I. Babuška and W. C. Rheinboldt, Error estimates for adaptive finite element computations, SIAM J. Numer. Anal., 15 (1978), 736-754.  doi: 10.1137/0715049.  Google Scholar

[3]

F. Bach, On the equivalence between kernel quadrature rules and random feature expansions, J. Mach. Learn. Res., 18 (2017), Paper No. 21, 38.  Google Scholar

[4]

A. R. Barron, Universal approximation bounds for superpositions of a sigmoidal function, IEEE Trans. Inform. Theory, 39 (1993), 930-945.  doi: 10.1109/18.256500.  Google Scholar

[5]

W. EC. Ma and L. Wu, A comparative analysis of optimization and generalization properties of two-layer neural network and random feature models under gradient descent dynamics, Sci. China Math., 63 (2020), 1235-1258.  doi: 10.1007/s11425-019-1628-5.  Google Scholar

[6]

L. C. Evans, Partial Differential Equations, vol. 19 of Graduate Studies in Mathematics, 2nd edition, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/gsm/019.  Google Scholar

[7]

H. Haario, E. Saksman and J. Tamminen, An adaptive metropolis algorithm, Bernoulli, 7 (2001), 223–242, https://projecteuclid.org:443/euclid.bj/1080222083. doi: 10.2307/3318737.  Google Scholar

[8]

L. K. Jones, A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training, Ann. Statist., 20 (1992), 608-613.  doi: 10.1214/aos/1176348546.  Google Scholar

[9]

Z. Li, J.-F. Ton, D. Oglic and D. Sejdinovic, Towards a unified analysis of random Fourier features, in ICML, 2019, arXiv: 1806.09178v4. Google Scholar

[10]

A. Rahimi and B. Recht, Random features for large-scale kernel machines, in Advances in Neural Information Processing Systems 20 (eds. J. C. Platt, D. Koller, Y. Singer and S. T. Roweis), Curran Associates, Inc., 2008, 1177–1184. Google Scholar

[11]

G. O. RobertsA. Gelman and W. R. Gilks, Weak convergence and optimal scaling of random walk Metropolis algorithms, Ann. Appl. Probab., 7 (1997), 110-120.  doi: 10.1214/aoap/1034625254.  Google Scholar

[12]

G. O. Roberts and J. S. Rosenthal, Optimal scaling for various Metropolis-Hastings algorithms, Statist. Sci., 16 (2001), 351-367.  doi: 10.1214/ss/1015346320.  Google Scholar

[13]

G. O. Roberts and J. S. Rosenthal, Examples of adaptive MCMC, J. Comput. Graph. Statist., 18 (2009), 349–367, http://www.jstor.org/stable/25651249. doi: 10.1198/jcgs.2009.06134.  Google Scholar

[14]

A. Rudi and L. Rosasco, Generalization properties of learning with random features, in Advances in Neural Information Processing Systems 30 (eds. I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan and R. Garnett), Curran Associates, Inc., 2017, 3215–3225. Google Scholar

[15] S. Shalev-Shwartz and S. Ben-David, Understanding Machine Learning: From Theory to Algorithms, Cambridge University Press, 2014.  doi: 10.1017/CBO9781107298019.  Google Scholar
[16]

L. N. Trefethen and D. Bau III, Numerical Linear Algebra, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997. doi: 10.1137/1.9780898719574.  Google Scholar

[17]

A. G. Wilson and R. P. Adams, Gaussian process kernels for pattern discovery and extrapolation, in ICML, 2013, arXiv: 1302.4245v3. Google Scholar

Figure 1.  Case 1: Graph of the target function $ f $ with sampled data set points $ (x_n,y_n) $ marked (red on-line). The inset shows $ |\hat f| $ of its Fourier transform and the detail of its behaviour at the origin
Figure 2.  Case 1: The same data are shown in all figures with each of the six different experiments highlighted (blue on-line)
Figure 3.  Case 2: The figure illustrates the generalization error with respect to $ K $ for a target function in dimension $ d = 5 $
Figure 4.  Case 3: The figure illustrates the generalization error over time when the approximate problem solution is computed using Algorithm 1, Algorithm 2 and stochastic gradient descent
Figure 5.  Case 4: Dependence on $ K $ of the misclassification percentage in the MNIST
Figure 6.  The generalization error as a function of the standard deviation $ \sigma_\omega $ of $ p({\omega}) $
Table 2.  Summary of numerical experiments together with their corresponding parameter choices
Regression
Case Case 1: a regularised step function Case 2: a high dimensional function
Purpose Find $p$ such that the constant
$\mathbb{E}_\omega[\frac{|\hat f(\omega)|^2}{(2\pi)^{d}p^2(\omega)}]$
does not become large
Study the ability of Alg. 1 to find a high dimensional dimensional function
Target $f(x)$ $\text{Si}\left(\frac{x}{a}\right)e^{-\frac{x^2}{2}}$
$a = 10^{-3}$
$\text{Si}\left(\frac{x_1}{a}\right)e^{-\frac{|x|^2}{2}}$
$a = 10^{-1}$
$d$ $1$ $5$
$K$ $2^i, \, i = 1, 2, ..., 11$ $2^i, \, i = 1, 2, ..., 10$
Experiment Exp. 1 Exp. 2 Exp. 3 Exp. 4 Exp. 5 Exp. 1 Exp. 4
Method Alg. 1 Alg. 2 RFF$^{1}$ SGM Alg. 1
tabular
Alg. 1 SGM
$N$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$
$\tilde{N}$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$
$γ$ $3d-2$ $3d-2$ $3d-2$ $3d-2$
$\lambda$ $0.1$ $0.1$ $0.1$ $0$ $0.1$ $0.1$ 0
$M$ $10^3$ $5000$ N/A $10^7$ $10^4$ $2.5\times 10^3$ $10^7$
$\bar{M}$ $10$ $10$ $10$ $10$ $10$ $10$ $10$
$\delta$ $2.4^2/d$ $0.1$ $2.4^2/d$ $\frac{2.4^2}{10d}$
$\Delta t$ $\mathtt{1.5e-4}$ $\mathtt{3.0e-4}$
$t_0$ $M/10$
$ω_{\text{max}}$ $\infty$
$m$ $10$ $50$ $100$ $25$
Regression Classification
Case Case 3:
anisotropic Gaussian
function
Case 4:
The MNIST
data set
Purpose Computational
complexity comparison
Study the ability of Alg. 1 to work with non synthetic data
Target
$f(x)$
$e^{-(32 x_1)^2/2}e^{-(32^{-1} x_2)^2/2}$
$d$ $2$ $784$
$K$ $256$ $2^i, \, i = 1, 2, ..., 13$
Experiment Exp. 1 Exp. 2 Exp. 4 Exp. 1 Exp. 3
Method Alg. 1 Alg. 2 SGM Alg. 1 RFF$^{2}$
$N$ $10^4$ $10^4$ $3\times 10^7$ $6\times 10^4$ $6 \times 10^4$
$\tilde{N}$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$
$γ$ $3d-2$ $3d-2$ $3d-2$
$\lambda$ $0.1$ $0.1$ $0$ $0.1$ $0.1$
$M$ $10^4$ $10^4$ $3\times 10^7$ $10^2$
$\bar{M}$ $1$ $1$ $1$ $1$ $1$
$\delta$ $0.5$ $0.1$ $0.1$
$\Delta t$ $\mathtt{1.5e-3}$
$t_0$ $M/10$
$ω_{\text{max}}$ $\infty$
$m$ $100$ $100$ $M+1$
$ ^1 $ – Random Fourier features with frequencies sampled from the fixed distribution $ \mathcal{N}(0,1) $
$ ^2 $ – Random Fourier features with frequencies sampled from the fixed distribution $ \mathcal{N}(0,1) $, or $ \mathcal{N}(0,0.1^2) $
Regression
Case Case 1: a regularised step function Case 2: a high dimensional function
Purpose Find $p$ such that the constant
$\mathbb{E}_\omega[\frac{|\hat f(\omega)|^2}{(2\pi)^{d}p^2(\omega)}]$
does not become large
Study the ability of Alg. 1 to find a high dimensional dimensional function
Target $f(x)$ $\text{Si}\left(\frac{x}{a}\right)e^{-\frac{x^2}{2}}$
$a = 10^{-3}$
$\text{Si}\left(\frac{x_1}{a}\right)e^{-\frac{|x|^2}{2}}$
$a = 10^{-1}$
$d$ $1$ $5$
$K$ $2^i, \, i = 1, 2, ..., 11$ $2^i, \, i = 1, 2, ..., 10$
Experiment Exp. 1 Exp. 2 Exp. 3 Exp. 4 Exp. 5 Exp. 1 Exp. 4
Method Alg. 1 Alg. 2 RFF$^{1}$ SGM Alg. 1
tabular
Alg. 1 SGM
$N$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$
$\tilde{N}$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$
$γ$ $3d-2$ $3d-2$ $3d-2$ $3d-2$
$\lambda$ $0.1$ $0.1$ $0.1$ $0$ $0.1$ $0.1$ 0
$M$ $10^3$ $5000$ N/A $10^7$ $10^4$ $2.5\times 10^3$ $10^7$
$\bar{M}$ $10$ $10$ $10$ $10$ $10$ $10$ $10$
$\delta$ $2.4^2/d$ $0.1$ $2.4^2/d$ $\frac{2.4^2}{10d}$
$\Delta t$ $\mathtt{1.5e-4}$ $\mathtt{3.0e-4}$
$t_0$ $M/10$
$ω_{\text{max}}$ $\infty$
$m$ $10$ $50$ $100$ $25$
Regression Classification
Case Case 3:
anisotropic Gaussian
function
Case 4:
The MNIST
data set
Purpose Computational
complexity comparison
Study the ability of Alg. 1 to work with non synthetic data
Target
$f(x)$
$e^{-(32 x_1)^2/2}e^{-(32^{-1} x_2)^2/2}$
$d$ $2$ $784$
$K$ $256$ $2^i, \, i = 1, 2, ..., 13$
Experiment Exp. 1 Exp. 2 Exp. 4 Exp. 1 Exp. 3
Method Alg. 1 Alg. 2 SGM Alg. 1 RFF$^{2}$
$N$ $10^4$ $10^4$ $3\times 10^7$ $6\times 10^4$ $6 \times 10^4$
$\tilde{N}$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$
$γ$ $3d-2$ $3d-2$ $3d-2$
$\lambda$ $0.1$ $0.1$ $0$ $0.1$ $0.1$
$M$ $10^4$ $10^4$ $3\times 10^7$ $10^2$
$\bar{M}$ $1$ $1$ $1$ $1$ $1$
$\delta$ $0.5$ $0.1$ $0.1$
$\Delta t$ $\mathtt{1.5e-3}$
$t_0$ $M/10$
$ω_{\text{max}}$ $\infty$
$m$ $100$ $100$ $M+1$
$ ^1 $ – Random Fourier features with frequencies sampled from the fixed distribution $ \mathcal{N}(0,1) $
$ ^2 $ – Random Fourier features with frequencies sampled from the fixed distribution $ \mathcal{N}(0,1) $, or $ \mathcal{N}(0,0.1^2) $
Table 1.  Case 4: The table shows the percentage of misclassified digits in the MNIST test data set for different values of $ K $. Comparison between adaptively computed distribution of frequencies $ \omega_k $ and sampling a fixed (normal) distribution
K Fixed Fixed Adaptive
$ \omega\sim\mathcal{N}(0, 1) $ $ \omega\sim\mathcal{N}(0, 0.1^2) $
2 89.97% 81.65% 80.03%
4 89.2% 70.65% 65.25%
8 88.98% 55.35% 53.44%
16 88.69% 46.77% 36.42%
32 88.9% 30.43% 23.52%
64 88.59% 19.73% 16.98%
128 88.7% 13.6% 11.13%
256 88.09% 10.12% 7.99%
512 88.01% 8.16% 5.93%
1024 87.34% 6.29% 4.57%
2048 86.5% 4.94% 3.5%
4096 85.21% 3.76% 2.74%
8192 83.98% 3.16% 1.98%
K Fixed Fixed Adaptive
$ \omega\sim\mathcal{N}(0, 1) $ $ \omega\sim\mathcal{N}(0, 0.1^2) $
2 89.97% 81.65% 80.03%
4 89.2% 70.65% 65.25%
8 88.98% 55.35% 53.44%
16 88.69% 46.77% 36.42%
32 88.9% 30.43% 23.52%
64 88.59% 19.73% 16.98%
128 88.7% 13.6% 11.13%
256 88.09% 10.12% 7.99%
512 88.01% 8.16% 5.93%
1024 87.34% 6.29% 4.57%
2048 86.5% 4.94% 3.5%
4096 85.21% 3.76% 2.74%
8192 83.98% 3.16% 1.98%
[1]

Feng Bao, Thomas Maier. Stochastic gradient descent algorithm for stochastic optimization in solving analytic continuation problems. Foundations of Data Science, 2020, 2 (1) : 1-17. doi: 10.3934/fods.2020001

[2]

Leong-Kwan Li, Sally Shao. Convergence analysis of the weighted state space search algorithm for recurrent neural networks. Numerical Algebra, Control & Optimization, 2014, 4 (3) : 193-207. doi: 10.3934/naco.2014.4.193

[3]

Meiyu Sui, Yejuan Wang, Peter E. Kloeden. Pullback attractors for stochastic recurrent neural networks with discrete and distributed delays. Electronic Research Archive, , () : -. doi: 10.3934/era.2020112

[4]

King Hann Lim, Hong Hui Tan, Hendra G. Harno. Approximate greatest descent in neural network optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 327-336. doi: 10.3934/naco.2018021

[5]

Leong-Kwan Li, Sally Shao, K. F. Cedric Yiu. Nonlinear dynamical system modeling via recurrent neural networks and a weighted state space search algorithm. Journal of Industrial & Management Optimization, 2011, 7 (2) : 385-400. doi: 10.3934/jimo.2011.7.385

[6]

Stephen Coombes, Helmut Schmidt, Carlo R. Laing, Nils Svanstedt, John A. Wyller. Waves in random neural media. Discrete & Continuous Dynamical Systems - A, 2012, 32 (8) : 2951-2970. doi: 10.3934/dcds.2012.32.2951

[7]

Ying Sue Huang. Resynchronization of delayed neural networks. Discrete & Continuous Dynamical Systems - A, 2001, 7 (2) : 397-401. doi: 10.3934/dcds.2001.7.397

[8]

Ruoxia Li, Huaiqin Wu, Xiaowei Zhang, Rong Yao. Adaptive projective synchronization of memristive neural networks with time-varying delays and stochastic perturbation. Mathematical Control & Related Fields, 2015, 5 (4) : 827-844. doi: 10.3934/mcrf.2015.5.827

[9]

Pierre Guiraud, Etienne Tanré. Stability of synchronization under stochastic perturbations in leaky integrate and fire neural networks of finite size. Discrete & Continuous Dynamical Systems - B, 2019, 24 (9) : 5183-5201. doi: 10.3934/dcdsb.2019056

[10]

Quan Hai, Shutang Liu. Mean-square delay-distribution-dependent exponential synchronization of chaotic neural networks with mixed random time-varying delays and restricted disturbances. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020221

[11]

Tatyana S. Turova. Structural phase transitions in neural networks. Mathematical Biosciences & Engineering, 2014, 11 (1) : 139-148. doi: 10.3934/mbe.2014.11.139

[12]

Yunsai Chen, Zhao Yang, Liang Ma, Peng Li, Yongjie Pang, Xin Zhao, Wenyi Yang. Efficient extraction algorithm for local fuzzy features of dynamic images. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1311-1325. doi: 10.3934/dcdss.2019090

[13]

Xiaming Chen. Kernel-based online gradient descent using distributed approach. Mathematical Foundations of Computing, 2019, 2 (1) : 1-9. doi: 10.3934/mfc.2019001

[14]

Ting Hu. Kernel-based maximum correntropy criterion with gradient descent method. Communications on Pure & Applied Analysis, 2020, 19 (8) : 4159-4177. doi: 10.3934/cpaa.2020186

[15]

Shishun Li, Zhengda Huang. Guaranteed descent conjugate gradient methods with modified secant condition. Journal of Industrial & Management Optimization, 2008, 4 (4) : 739-755. doi: 10.3934/jimo.2008.4.739

[16]

Wataru Nakamura, Yasushi Narushima, Hiroshi Yabe. Nonlinear conjugate gradient methods with sufficient descent properties for unconstrained optimization. Journal of Industrial & Management Optimization, 2013, 9 (3) : 595-619. doi: 10.3934/jimo.2013.9.595

[17]

Delio Mugnolo, René Pröpper. Gradient systems on networks. Conference Publications, 2011, 2011 (Special) : 1078-1090. doi: 10.3934/proc.2011.2011.1078

[18]

Xianmin Geng, Shengli Zhou, Jiashan Tang, Cong Yang. A sufficient condition for classified networks to possess complex network features. Networks & Heterogeneous Media, 2012, 7 (1) : 59-69. doi: 10.3934/nhm.2012.7.59

[19]

Benedict Leimkuhler, Charles Matthews, Tiffany Vlaar. Partitioned integrators for thermodynamic parameterization of neural networks. Foundations of Data Science, 2019, 1 (4) : 457-489. doi: 10.3934/fods.2019019

[20]

Ricai Luo, Honglei Xu, Wu-Sheng Wang, Jie Sun, Wei Xu. A weak condition for global stability of delayed neural networks. Journal of Industrial & Management Optimization, 2016, 12 (2) : 505-514. doi: 10.3934/jimo.2016.12.505

 Impact Factor: 

Metrics

  • PDF downloads (17)
  • HTML views (99)
  • Cited by (0)

[Back to Top]