# American Institute of Mathematical Sciences

September  2020, 2(3): 309-332. doi: 10.3934/fods.2020014

## Adaptive random Fourier features with Metropolis sampling

 1 Department of Mathematics, Royal Institute of Technology, 100 44, Stockholm, Sweden 2 H-Ai AB, Grevgatan 23,114 53, Stockholm, Sweden 3 Department of Mathematical Sciences, University of Delaware, Newark, DE 19716, USA

* Corresponding author: Aku Kammonen

Received  July 2020 Revised  August 2020 Published  September 2020

Fund Project: The research was supported by Swedish Research Council grant 2019-03725. The work of P. P. was supported in part by the ARO Grant W911NF-19-1-0243

The supervised learning problem to determine a neural network approximation $\mathbb{R}^d\ni x\mapsto\sum_{k = 1}^K\hat\beta_k e^{{{\mathrm{i}}}\omega_k\cdot x}$ with one hidden layer is studied as a random Fourier features algorithm. The Fourier features, i.e., the frequencies $\omega_k\in\mathbb{R}^d$, are sampled using an adaptive Metropolis sampler. The Metropolis test accepts proposal frequencies $\omega_k'$, having corresponding amplitudes $\hat\beta_k'$, with the probability $\min\big\{1, (|\hat\beta_k'|/|\hat\beta_k|)^{\gamma}\big\}$, for a certain positive parameter ${\gamma}$, determined by minimizing the approximation error for given computational work. This adaptive, non-parametric stochastic method leads asymptotically, as $K\to\infty$, to equidistributed amplitudes $|\hat\beta_k|$, analogous to deterministic adaptive algorithms for differential equations. The equidistributed amplitudes are shown to asymptotically correspond to the optimal density for independent samples in random Fourier features methods. Numerical evidence is provided in order to demonstrate the approximation properties and efficiency of the proposed algorithm. The algorithm is tested both on synthetic data and a real-world high-dimensional benchmark.

Citation: Aku Kammonen, Jonas Kiessling, Petr Plecháč, Mattias Sandberg, Anders Szepessy. Adaptive random Fourier features with Metropolis sampling. Foundations of Data Science, 2020, 2 (3) : 309-332. doi: 10.3934/fods.2020014
##### References:

show all references

##### References:
Case 1: Graph of the target function $f$ with sampled data set points $(x_n,y_n)$ marked (red on-line). The inset shows $|\hat f|$ of its Fourier transform and the detail of its behaviour at the origin
Case 1: The same data are shown in all figures with each of the six different experiments highlighted (blue on-line)
Case 2: The figure illustrates the generalization error with respect to $K$ for a target function in dimension $d = 5$
Case 3: The figure illustrates the generalization error over time when the approximate problem solution is computed using Algorithm 1, Algorithm 2 and stochastic gradient descent
Case 4: Dependence on $K$ of the misclassification percentage in the MNIST
The generalization error as a function of the standard deviation $\sigma_\omega$ of $p({\omega})$
Summary of numerical experiments together with their corresponding parameter choices
 Regression Case Case 1: a regularised step function Case 2: a high dimensional function Purpose Find $p$ such that the constant $\mathbb{E}_\omega[\frac{|\hat f(\omega)|^2}{(2\pi)^{d}p^2(\omega)}]$ does not become large Study the ability of Alg. 1 to find a high dimensional dimensional function Target $f(x)$ $\text{Si}\left(\frac{x}{a}\right)e^{-\frac{x^2}{2}}$ $a = 10^{-3}$ $\text{Si}\left(\frac{x_1}{a}\right)e^{-\frac{|x|^2}{2}}$ $a = 10^{-1}$ $d$ $1$ $5$ $K$ $2^i, \, i = 1, 2, ..., 11$ $2^i, \, i = 1, 2, ..., 10$ Experiment Exp. 1 Exp. 2 Exp. 3 Exp. 4 Exp. 5 Exp. 1 Exp. 4 Method Alg. 1 Alg. 2 RFF$^{1}$ SGM Alg. 1 tabular Alg. 1 SGM $N$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $\tilde{N}$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $γ$ $3d-2$ $3d-2$ $3d-2$ $3d-2$ $\lambda$ $0.1$ $0.1$ $0.1$ $0$ $0.1$ $0.1$ 0 $M$ $10^3$ $5000$ N/A $10^7$ $10^4$ $2.5\times 10^3$ $10^7$ $\bar{M}$ $10$ $10$ $10$ $10$ $10$ $10$ $10$ $\delta$ $2.4^2/d$ $0.1$ $2.4^2/d$ $\frac{2.4^2}{10d}$ $\Delta t$ $\mathtt{1.5e-4}$ $\mathtt{3.0e-4}$ $t_0$ $M/10$ $ω_{\text{max}}$ $\infty$ $m$ $10$ $50$ $100$ $25$ Regression Classification Case Case 3: anisotropic Gaussian function Case 4: The MNIST data set Purpose Computational complexity comparison Study the ability of Alg. 1 to work with non synthetic data Target $f(x)$ $e^{-(32 x_1)^2/2}e^{-(32^{-1} x_2)^2/2}$ $d$ $2$ $784$ $K$ $256$ $2^i, \, i = 1, 2, ..., 13$ Experiment Exp. 1 Exp. 2 Exp. 4 Exp. 1 Exp. 3 Method Alg. 1 Alg. 2 SGM Alg. 1 RFF$^{2}$ $N$ $10^4$ $10^4$ $3\times 10^7$ $6\times 10^4$ $6 \times 10^4$ $\tilde{N}$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $γ$ $3d-2$ $3d-2$ $3d-2$ $\lambda$ $0.1$ $0.1$ $0$ $0.1$ $0.1$ $M$ $10^4$ $10^4$ $3\times 10^7$ $10^2$ $\bar{M}$ $1$ $1$ $1$ $1$ $1$ $\delta$ $0.5$ $0.1$ $0.1$ $\Delta t$ $\mathtt{1.5e-3}$ $t_0$ $M/10$ $ω_{\text{max}}$ $\infty$ $m$ $100$ $100$ $M+1$ $^1$ – Random Fourier features with frequencies sampled from the fixed distribution $\mathcal{N}(0,1)$ $^2$ – Random Fourier features with frequencies sampled from the fixed distribution $\mathcal{N}(0,1)$, or $\mathcal{N}(0,0.1^2)$
 Regression Case Case 1: a regularised step function Case 2: a high dimensional function Purpose Find $p$ such that the constant $\mathbb{E}_\omega[\frac{|\hat f(\omega)|^2}{(2\pi)^{d}p^2(\omega)}]$ does not become large Study the ability of Alg. 1 to find a high dimensional dimensional function Target $f(x)$ $\text{Si}\left(\frac{x}{a}\right)e^{-\frac{x^2}{2}}$ $a = 10^{-3}$ $\text{Si}\left(\frac{x_1}{a}\right)e^{-\frac{|x|^2}{2}}$ $a = 10^{-1}$ $d$ $1$ $5$ $K$ $2^i, \, i = 1, 2, ..., 11$ $2^i, \, i = 1, 2, ..., 10$ Experiment Exp. 1 Exp. 2 Exp. 3 Exp. 4 Exp. 5 Exp. 1 Exp. 4 Method Alg. 1 Alg. 2 RFF$^{1}$ SGM Alg. 1 tabular Alg. 1 SGM $N$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $\tilde{N}$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $γ$ $3d-2$ $3d-2$ $3d-2$ $3d-2$ $\lambda$ $0.1$ $0.1$ $0.1$ $0$ $0.1$ $0.1$ 0 $M$ $10^3$ $5000$ N/A $10^7$ $10^4$ $2.5\times 10^3$ $10^7$ $\bar{M}$ $10$ $10$ $10$ $10$ $10$ $10$ $10$ $\delta$ $2.4^2/d$ $0.1$ $2.4^2/d$ $\frac{2.4^2}{10d}$ $\Delta t$ $\mathtt{1.5e-4}$ $\mathtt{3.0e-4}$ $t_0$ $M/10$ $ω_{\text{max}}$ $\infty$ $m$ $10$ $50$ $100$ $25$ Regression Classification Case Case 3: anisotropic Gaussian function Case 4: The MNIST data set Purpose Computational complexity comparison Study the ability of Alg. 1 to work with non synthetic data Target $f(x)$ $e^{-(32 x_1)^2/2}e^{-(32^{-1} x_2)^2/2}$ $d$ $2$ $784$ $K$ $256$ $2^i, \, i = 1, 2, ..., 13$ Experiment Exp. 1 Exp. 2 Exp. 4 Exp. 1 Exp. 3 Method Alg. 1 Alg. 2 SGM Alg. 1 RFF$^{2}$ $N$ $10^4$ $10^4$ $3\times 10^7$ $6\times 10^4$ $6 \times 10^4$ $\tilde{N}$ $10^4$ $10^4$ $10^4$ $10^4$ $10^4$ $γ$ $3d-2$ $3d-2$ $3d-2$ $\lambda$ $0.1$ $0.1$ $0$ $0.1$ $0.1$ $M$ $10^4$ $10^4$ $3\times 10^7$ $10^2$ $\bar{M}$ $1$ $1$ $1$ $1$ $1$ $\delta$ $0.5$ $0.1$ $0.1$ $\Delta t$ $\mathtt{1.5e-3}$ $t_0$ $M/10$ $ω_{\text{max}}$ $\infty$ $m$ $100$ $100$ $M+1$ $^1$ – Random Fourier features with frequencies sampled from the fixed distribution $\mathcal{N}(0,1)$ $^2$ – Random Fourier features with frequencies sampled from the fixed distribution $\mathcal{N}(0,1)$, or $\mathcal{N}(0,0.1^2)$
Case 4: The table shows the percentage of misclassified digits in the MNIST test data set for different values of $K$. Comparison between adaptively computed distribution of frequencies $\omega_k$ and sampling a fixed (normal) distribution
 K Fixed Fixed Adaptive $\omega\sim\mathcal{N}(0, 1)$ $\omega\sim\mathcal{N}(0, 0.1^2)$ 2 89.97% 81.65% 80.03% 4 89.2% 70.65% 65.25% 8 88.98% 55.35% 53.44% 16 88.69% 46.77% 36.42% 32 88.9% 30.43% 23.52% 64 88.59% 19.73% 16.98% 128 88.7% 13.6% 11.13% 256 88.09% 10.12% 7.99% 512 88.01% 8.16% 5.93% 1024 87.34% 6.29% 4.57% 2048 86.5% 4.94% 3.5% 4096 85.21% 3.76% 2.74% 8192 83.98% 3.16% 1.98%
 K Fixed Fixed Adaptive $\omega\sim\mathcal{N}(0, 1)$ $\omega\sim\mathcal{N}(0, 0.1^2)$ 2 89.97% 81.65% 80.03% 4 89.2% 70.65% 65.25% 8 88.98% 55.35% 53.44% 16 88.69% 46.77% 36.42% 32 88.9% 30.43% 23.52% 64 88.59% 19.73% 16.98% 128 88.7% 13.6% 11.13% 256 88.09% 10.12% 7.99% 512 88.01% 8.16% 5.93% 1024 87.34% 6.29% 4.57% 2048 86.5% 4.94% 3.5% 4096 85.21% 3.76% 2.74% 8192 83.98% 3.16% 1.98%
 [1] Lars Grüne. Computing Lyapunov functions using deep neural networks. Journal of Computational Dynamics, 2020  doi: 10.3934/jcd.2021006 [2] Leslaw Skrzypek, Yuncheng You. Feedback synchronization of FHN cellular neural networks. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2021001 [3] Editorial Office. Retraction: Honggang Yu, An efficient face recognition algorithm using the improved convolutional neural network. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 901-901. doi: 10.3934/dcdss.2019060 [4] Martin Heida, Stefan Neukamm, Mario Varga. Stochastic homogenization of $\Lambda$-convex gradient flows. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 427-453. doi: 10.3934/dcdss.2020328 [5] Bixiang Wang. Mean-square random invariant manifolds for stochastic differential equations. Discrete & Continuous Dynamical Systems - A, 2021, 41 (3) : 1449-1468. doi: 10.3934/dcds.2020324 [6] Yangrong Li, Shuang Yang, Qiangheng Zhang. Odd random attractors for stochastic non-autonomous Kuramoto-Sivashinsky equations without dissipation. Electronic Research Archive, 2020, 28 (4) : 1529-1544. doi: 10.3934/era.2020080 [7] Leanne Dong. Random attractors for stochastic Navier-Stokes equation on a 2D rotating sphere with stable Lévy noise. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020352 [8] Wenlong Sun, Jiaqi Cheng, Xiaoying Han. Random attractors for 2D stochastic micropolar fluid flows on unbounded domains. Discrete & Continuous Dynamical Systems - B, 2021, 26 (1) : 693-716. doi: 10.3934/dcdsb.2020189 [9] Xiaoxian Tang, Jie Wang. Bistability of sequestration networks. Discrete & Continuous Dynamical Systems - B, 2021, 26 (3) : 1337-1357. doi: 10.3934/dcdsb.2020165 [10] Predrag S. Stanimirović, Branislav Ivanov, Haifeng Ma, Dijana Mosić. A survey of gradient methods for solving nonlinear optimization. Electronic Research Archive, 2020, 28 (4) : 1573-1624. doi: 10.3934/era.2020115 [11] Timothy Chumley, Renato Feres. Entropy production in random billiards. Discrete & Continuous Dynamical Systems - A, 2021, 41 (3) : 1319-1346. doi: 10.3934/dcds.2020319 [12] Wolfgang Riedl, Robert Baier, Matthias Gerdts. Optimization-based subdivision algorithm for reachable sets. Journal of Computational Dynamics, 2021, 8 (1) : 99-130. doi: 10.3934/jcd.2021005 [13] Gabrielle Nornberg, Delia Schiera, Boyan Sirakov. A priori estimates and multiplicity for systems of elliptic PDE with natural gradient growth. Discrete & Continuous Dynamical Systems - A, 2020, 40 (6) : 3857-3881. doi: 10.3934/dcds.2020128 [14] D. R. Michiel Renger, Johannes Zimmer. Orthogonality of fluxes in general nonlinear reaction networks. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 205-217. doi: 10.3934/dcdss.2020346 [15] Bernold Fiedler. Global Hopf bifurcation in networks with fast feedback cycles. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 177-203. doi: 10.3934/dcdss.2020344 [16] Pedro Aceves-Sanchez, Benjamin Aymard, Diane Peurichard, Pol Kennel, Anne Lorsignol, Franck Plouraboué, Louis Casteilla, Pierre Degond. A new model for the emergence of blood capillary networks. Networks & Heterogeneous Media, 2020  doi: 10.3934/nhm.2021001 [17] Patrick W. Dondl, Martin Jesenko. Threshold phenomenon for homogenized fronts in random elastic media. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 353-372. doi: 10.3934/dcdss.2020329 [18] Hongfei Yang, Xiaofeng Ding, Raymond Chan, Hui Hu, Yaxin Peng, Tieyong Zeng. A new initialization method based on normed statistical spaces in deep networks. Inverse Problems & Imaging, 2021, 15 (1) : 147-158. doi: 10.3934/ipi.2020045 [19] Charlotte Rodriguez. Networks of geometrically exact beams: Well-posedness and stabilization. Mathematical Control & Related Fields, 2021  doi: 10.3934/mcrf.2021002 [20] Thomas Frenzel, Matthias Liero. Effective diffusion in thin structures via generalized gradient systems and EDP-convergence. Discrete & Continuous Dynamical Systems - S, 2021, 14 (1) : 395-425. doi: 10.3934/dcdss.2020345

Impact Factor:

## Tools

Article outline

Figures and Tables