$ f= $ | $ f^1 $ | $ f^2 $ | ||
$ d_{sp}= $} | 1 | 2 | 1 | 2 |
$ \mathcal{A}(f, d_{sp}, \infty) $ | 0.6788 | 0.2465 | 0.3421 | 0.5783 |
$ \mathcal{A}(f, d_{sp}, 1) $ | 0.3614 | 0.0453 | 0.2102 | 0.1203 |
High-dimensional real-world systems can often be well characterized by a small number of simultaneous low-complexity interactions. The analysis of variance (ANOVA) decomposition and the anchored decomposition are typical techniques to find sparse additive decompositions of functions. In this paper, we are interested in a setting, where these decompositions are not directly sparse, but become so after an appropriate basis transform. Noting that the sparsity of those additive function decompositions is equivalent to the fact that most of its mixed partial derivatives vanish, we can exploit a connection to the underlying function graphs to determine an orthogonal transform that realizes the appropriate basis change. This is done in three steps: we apply singular value decomposition to minimize the number of vertices of the function graph, and joint block diagonalization techniques of families of matrices followed by sparse minimization based on relaxations of the zero "norm" for minimizing the number of edges. For the latter one, we propose and analyze minimization techniques over the manifold of special orthogonal matrices. Various numerical examples illustrate the reliability of our approach for functions having, after a basis transform, a sparse additive decomposition into summands with at most two variables.
Citation: |
Figure 1. Left: $ f(x) := \sin(u_1^{ \mathrm{T}}x\sqrt{2}/2) $ where $ U = (u_1,u_2) $ is a $ 2 $-dimensional rotation matrix of rotation angle $ \pi/4. $ Right: Plot of $ f_{U}. $ The left-hand diagram shows that $ f $ depends on both variables $ x_1 $ and $ x_2 $ while $ f_U $ is constant in $ x_1$
Figure 2. Left: $ f(x) = \sin(5 u_1^{ \mathrm{T}}x) + \sin(5 u_2^{ \mathrm{T}}x) $ where $ U = (u_1, u_2) $ is a $ 2 $-dimensional rotation matrix of angle $ \pi/4. $ Right: Image of the partial derivative $ \partial_{x_1}f(x). $ The partial derivative $ \partial_{x_1}f(x) $ depends on $ x_1 $ and $ x_2 $ since its values vary in both variables and thus $ f $ cannot be decomposed as a sum of two univariate functions
Figure 3. Left: $ f_U $ where $ f, U $ are as in Figure 2. Right: Image of the partial derivative $ \partial_{x_1}f_{U} $ which is constant with respect to $ x_2 $. Therefore $ f_U $ can be decomposed as a sum of two univariate functions (see Theorem 2.4)
Figure 4. Mean optimality gap $ \ell_{\mathcal H}\bigl(U^{(r)}\bigr) - \ell_{\mathcal H}\bigl(R^{ \mathrm{T}}\bigr) $ over 100 experiments. Top: Noise-free matrices. Bottom: Noisy matrices. RI denotes random initialization and $ h $ uses the grid search with the corresponding grid density value. Subscripts Rgd and La stand for Riemannian gradient descent and Landing algorithm, respectively
Figure 6. Mean optimality gap $ \overline\ell_{\mathcal H}(U^{(r)}) - \overline\ell_{\mathcal H}(R^T) $ and failure ratio $ \mathcal R $ for 50 noise-free functions, see (28) and (26) respectively. RI denotes random initialization and $ h $ uses the grid search with the corresponding grid density value. Subscripts Rgd and La stand for Riemannian gradient descent and Landing algorithm, respectively
Figure 7. Reconstructions for noisy test functions $ f_n $ with notation as in Figure 6
Figure 8. First row: $ f = f^1_{U_1} $. Second row: $ f = f^{1}_{U_{1,n}} $. First column: $ p = \infty. $ Second column: $ p = 1. $ Blue bars: $ \|f\|_{\{i\},p}: = \max_{\{i\} \subseteq \mathbf{u} \subseteq [7]}\left\| {f_{ \mathbf{u},A}} \right\|_p $, in decreasing order. Orange bars: $ L^p $-norm of the corresponding first-order partial derivative $ \|\partial_{\{i\}} f\|_p $. The green dashed line represents $ 2^6\max\{\|\partial_{\{i\}} f\|_p:\|\partial_{\{i\}} f\|_p \leq 10^{-4}\} $ corresponding to Theorem 2.5 and the red dashed line represents the truncation value $ 10^{-4} $
Figure 9. First row: $ f = f^1_{U_1} $. Second row: $ f = f^{1}_{ U_{1,n}} $. First column: $ p = \infty. $ Second column: $ p = 1. $ Blue bars: $ \|f\|_{\{i,j\},p}: = \max_{\{i,j\} \subseteq \mathbf{u} \subseteq [7]}\left\| {f_{ \mathbf{u},A}} \right\|_p $ in decreasing order. Orange bars: corresponding $ L^p $-norm of the second-order partial derivative $ \|\partial_{\{i,j\}} f\|_p $. Only the largest $ 8 $ terms are displayed. The green dashed line represents $ 2^5\max\{\|\partial_{\{i,j\}} f\|_p:\|\partial_{\{i,j\}} f\|_p \leq 10^{-4}\} $ corresponding to Theorem 2.5 and the red dashed line represents the truncation value $ 10^{-4} $
Figure 10. First row: $ f = f^2_{U_2} $. Second row: $ f = f^{n}_{ U_{2,n}} $. First column: $ p = \infty. $ Second column: $ p = 1. $ Blue bars: $ \|f\|_{\{i,j\},p}: = \max_{\{i,j\} \subseteq \mathbf{u} \subseteq [7]}\left\| {f_{ \mathbf{u},A}} \right\|_p $ in decreasing order. Orange bars: corresponding $ L^p $-norm of the second-order partial derivative $ \|\partial_{\{i,j\}} f\|_p $. Only the largest $ 8 $ terms are displayed. The green dashed line represents $ 2^5\max\{\|\partial_{\{i,j\}} f\|_p:\|\partial_{\{i,j\}} f\|_p \leq 10^{-4}\} $ corresponding to Theorem 2.5 and the red dashed line represents the truncation value $ 10^{-4} $
Table 1.
Minimal values of the
$ f= $ | $ f^1 $ | $ f^2 $ | ||
$ d_{sp}= $} | 1 | 2 | 1 | 2 |
$ \mathcal{A}(f, d_{sp}, \infty) $ | 0.6788 | 0.2465 | 0.3421 | 0.5783 |
$ \mathcal{A}(f, d_{sp}, 1) $ | 0.3614 | 0.0453 | 0.2102 | 0.1203 |
Table 2.
Number
$ f= $ | $ \tilde{f}^1 $ | $ f^1_{U_1} $ | $ f^{1}_{U_{1,n}} $ | $ \tilde{f}^2 $ | $ f^2_{U_2} $ | $ f^{2}_{ U_{2,n}} $ |
$ G(f, \infty) \\ \;\; G(f, 1)$ | 2 | 2 | 2 | 0 | 0 | 0 |
2 | 2 | 2 | 0 | 0 | 0 | |
$ H(f, \infty) \\ \;\; H(f, 1)$ | 18 | 18 | 17 | 16 | 15 | 14 |
18 | 18 | 18 | 16 | 15 | 15 |
Table 3.
Number of operations for the grid search and for
Grid search | Number of grid points $ \Theta(h) $ | $ \mathcal{O}\bigl((\lceil 2\pi/h \rceil)^{d}(\lceil \pi/h \rceil)^{d^2}\bigr) $ |
Matrix $ U \in \Gamma $ | $ \mathcal{O}(d^3) $ | |
Loss function $ \ell(U) $ | $ \mathcal{O}(Nd^3) $ | |
Total | $ \mathcal{O}\bigl(Nd^3(\lceil 2\pi/h \rceil)^{d}(\lceil \pi/h \rceil)^{d^2}\bigr) $ | |
Manifold Optimiztion | Riemannian/Euclidean gradient | $ \mathcal{O}(N d^3) $ |
Riemannian gradient descent (21) | $ \mathcal{O}(K N d^{3}) $ | |
Landing (22) | $ \mathcal{O}(K N d^3) $ |
Table 4.
Number of set of matrices such that diff
Clean data | Noisy data $ \sigma=10^{-3} $ | ||||||||||||||||
Rgd | Landing | Rgd | Landing | ||||||||||||||
i | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | |
Random initialization | 100 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 98 | 2 | 0 | 0 | 98 | 2 | 0 | 0 | |
Grid-Search initialization | $ h=1 $ | 100 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 98 | 2 | 0 | 0 | 98 | 2 | 0 | 0 |
$ h=0.5 $ | 100 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 98 | 2 | 0 | 0 | 98 | 2 | 0 | 0 | |
$ h=0.25 $ | 100 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 98 | 2 | 0 | 0 | 98 | 2 | 0 | 0 | |
$ h=0.125 $ | 100 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 98 | 2 | 0 | 0 | 98 | 2 | 0 | 0 | |
$ h=0.1 $ | 100 | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 98 | 2 | 0 | 0 | 98 | 2 | 0 | 0 |
Table 5.
Same as table 4 for input dimension
Clean data | Noisy data $ \sigma=10^{-3} $ | ||||||||||||||||
Rgd | Landing | Rgd | Landing | ||||||||||||||
$ i $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | |
Random initialization | 98 | 0 | 0 | 2 | 87 | 0 | 5 | 8 | 98 | 1 | 1 | 0 | 95 | 2 | 3 | 0 | |
Grid-Search initialization | $ h=1 $ | 99 | 1 | 0 | 0 | 91 | 1 | 0 | 8 | 98 | 1 | 1 | 0 | 98 | 1 | 1 | 0 |
$ h=0.5 $ | 100 | 0 | 0 | 0 | 91 | 0 | 0 | 9 | 99 | 0 | 1 | 0 | 99 | 0 | 1 | 0 | |
$ h=0.25 $ | 100 | 0 | 0 | 0 | 91 | 0 | 0 | 9 | 99 | 0 | 1 | 0 | 99 | 0 | 1 | 0 | |
$ h=0.125 $ | 100 | 0 | 0 | 0 | 92 | 0 | 0 | 8 | 99 | 0 | 1 | 0 | 99 | 0 | 1 | 0 | |
$ h=0.1 $ | 100 | 0 | 0 | 0 | 92 | 0 | 0 | 8 | 100 | 0 | 0 | 0 | 100 | 0 | 0 | 0 |
Table 6.
Same as table 4 for input dimension
Clean data | Noisy data $ \sigma=10^{-3} $ | ||||||||||||||||
Rgd | Landing | Rgd | Landing | ||||||||||||||
$ i $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | |
Random initialization | 86 | 1 | 8 | 5 | 57 | 2 | 21 | 20 | 84 | 6 | 7 | 3 | 88 | 1 | 11 | 0 | |
Grid-Search initialization | $ h=1 $ | 94 | 3 | 2 | 1 | 84 | 1 | 7 | 8 | 92 | 5 | 2 | 1 | 92 | 5 | 2 | 1 |
$ h=0.5 $ | 98 | 0 | 1 | 1 | 87 | 0 | 7 | 6 | 96 | 2 | 2 | 0 | 96 | 2 | 2 | 0 | |
$ h=0.25 $ | 100 | 0 | 0 | 0 | 89 | 0 | 6 | 5 | 99 | 1 | 0 | 0 | 99 | 1 | 0 | 0 | |
$ h=0.125 $ | 100 | 0 | 0 | 0 | 89 | 0 | 6 | 5 | 99 | 1 | 0 | 0 | 99 | 1 | 0 | 0 | |
$ h=0.1 $ | 100 | 0 | 0 | 0 | 89 | 0 | 6 | 5 | 99 | 1 | 0 | 0 | 99 | 1 | 0 | 0 |
Table 7.
Same as table 4 for input dimension
Clean data | Noisy data $ \sigma=10^{-3} $ | ||||||||||||||||
Rgd | Landing | Rgd | Landing | ||||||||||||||
$ i $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | 0 | 1 | 2 | $ \geq3 $ | |
Random initialization | $ 71 $ | $ 4 $ | $ 13 $ | $ 12 $ | $ 37 $ | $ 7 $ | $ 13 $ | $ 43 $ | $ 70 $ | 9 | 9 | $ 12 $ | $ 74 $ | 5 | 8 | $ 13 $ | |
Grid-Search initialization | $ h=1 $ | $ 93 $ | 2 | $ 4 $ | 1 | $ 85 $ | 2 | 8 | 5 | $ 93 $ | 2 | $ 4 $ | 1 | $ 93 $ | 2 | $ 4 $ | 1 |
Table 8.
Mean runtime in seconds
Input dimension | Random Init. | Grid-search Init. | ||||||||||
Rgd | La | Rotation $ U \in \Gamma(h) $ | Losses $ \ell(U), U \in \Gamma(U) $ | |||||||||
1 | 0.5 | 0.25 | 0.125 | 0.1 | 1 | 0.5 | 0.25 | 0.125 | 0.1 | |||
$ d=2 $ | 29.36 | 33.36 | 0.0010 | 0.0009 | 0.0008 | 0.0008 | 0.0008 | 0.0003 | 0.0003 | 0.0003 | 0.0003 | 0.0003 |
0.0009 | 0.0009 | 0.0009 | 0.0009 | 0.0009 | ||||||||
$ d=3 $ | 29.39 | 32.18 | 0.0211 | 0.0211 | 0.0211 | 0.0213 | 0.0215 | 0.0032 | 0.0032 | 0.0032 | 0.0035 | 0.0041 |
0.0194 | 0.0193 | 0.0193 | 0.0210 | 0.0244 | ||||||||
$ d=4 $ | 29.66 | 32.50 | 0.6105 | 0.7062 | 2.0015 | 11.3731 | 168.4473 | 0.0343 | 0.0761 | 0.1732 | 5.9366 | 12.4121 |
0.3434 | 0.7612 | 1.7323 | 59.3655 | 124.1206 | ||||||||
$ d=5 $ | 29.77 | 32.55 | 23.5289 | - | - | - | - | 1.1344 | - | - | - | - |
17.0158 | - | - | - | - |
[1] | P. Ablin and G. Peyré, Fast and accurate optimization on the orthogonal manifold without retraction, In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, PMLR, 2022, 5636-5657. |
[2] | P. A. Absil, R. Mahony and B. Andrews, Convergence of the iterates of descent methods for analytic cost functions, SIAM Journal on Optimization, 16 (2005), 531-547. doi: 10.1137/040605266. |
[3] | R. Agarwal, L. Melnick, N. Frosst, X. Zhang, B. Lengerich, R. Caruana and G. E. Hinton, Neural additive models: Interpretable machine learning with neural nets, Advances in Neural Information Processing Systems, 34 (2021), 4699-4711. |
[4] | H. Attouch, J. Bolte, P. Redont and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Mathematics of Operations Research, 35 (2010), 438-457. doi: 10.1287/moor.1100.0449. |
[5] | F. A. Ba and M. Quellmalz, Accelerating the sinkhorn algorithm for sparse multi-marginal optimal transport via fast Fourier transforms, Algorithms, 15 (2022), 311. doi: 10.3390/a15090311. |
[6] | J. Baldeaux and M. Gnewuch, Optimal randomized multilevel algorithms for infinite-dimensional integration on function spaces with ANOVA-type decomposition, SIAM Journal on Numerical Analysis, 52 (2014), 1128-1155. doi: 10.1137/120896001. |
[7] | F. Bartel, D. Potts and M. Schmischke, Grouped transformations and regularization in high-dimensional explainable ANOVA approximation, SIAM Journal on Scientific Computing, 44 (2022), A1606-A1631. doi: 10.1137/20M1374547. |
[8] | F. Beier, J. von Lindheim, S. Neumayer and G. Steidl, Unbalanced multi-marginal optimal transport, Journal of Mathematical Imaging and Vision, 65 (2023), 394-413. doi: 10.1007/s10851-022-01126-7. |
[9] | J. Bilmes and C. Bartels, Graphical model architectures for speech recognition, IEEE Signal Processing Magazine, 22 (2005), 89-100. doi: 10.1109/MSP.2005.1511827. |
[10] | J. Bilmes and G. Zweig, The graphical models toolkit: An open source software system for speech and time-series processing, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 4 (2002), 3916-3919. |
[11] | N. Boumal, An Introduction to Optimization on Smooth manifolds, Cambridge University Press, Cambridge, 2023. |
[12] | N. Boumal, P.-A. Absil and C. Cartis, Global rates of convergence for nonconvex optimization on manifolds, IMA Journal of Numerical Analysis, 39 (2019), 1-33. doi: 10.1093/imanum/drx080. |
[13] | R. E. Caflisch, Valuation of morgage backed securities using Brownian bridges to reduce effective dimension, The Journal of Computational Finance, 1 (1997), 27-46. |
[14] | C.-H. Chang, R. Caruana and A. Goldenberg, Node-gam: Neural generalized additive model for interpretable deep learning, arXiv: 2106.01613, 2021. |
[15] | P. M. Cohn, Basic Algebra: Groups, Rings and Fields, Springer Science & Business Media, 2012. doi: 10.1090/conm/575/11387. |
[16] | J. Dick, F. Y. Kuo and I. H. Sloan, High-dimensional integration: The quasi-Monte Carlo way, Acta Numerica, 22 (2013), 133-288. doi: 10.1017/S0962492913000044. |
[17] | J. Enouen and Y. Liu, Sparse interaction additive networks via feature interaction detection and sparse selection, Advances in Neural Information Processing Systems, 35 (2022), 13908-13920. |
[18] | M. Griebel and M. Holtz, Dimension-wise integration of high-dimensional functions with applications to finance, Journal of Complexity, 26 (2010), 455-489. doi: 10.1016/j.jco.2010.06.001. |
[19] | M. Griebel, F. Y. Kuo and I. H. Sloan, The ANOVA decomposition of a non-smooth function of infinitely many variables can have every term smooth, Mathematics of Computations, 86 (2017), 1855-1876. doi: 10.1090/mcom/3171. |
[20] | C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. F. del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke and T. E. Oliphant, Array programming with NumPy, Nature, 585 (2020), 357-362. doi: 10.1038/s41586-020-2649-2. |
[21] | M. Hasannasab, J. Hertrich, S. Neumayer, G. Plonka, S. Setzer and G. Steidl, Parseval proximal neural networks, Journal of Fourier Analysis and Applications, 26 (2020), Paper No. 59, 31 pp. doi: 10.1007/s00041-020-09761-7. |
[22] | M. Hefter, K. Ritter and G. W. Wasilkowski, On equivalence of weighted anchored and ANOVA spaces of functions with mixed smoothness of order one in $l_1$ or $l_\infty$, Journal of Complexity, 32 (2016), 1-19. doi: 10.1016/j.jco.2015.07.001. |
[23] | J. Hertrich, F. A. Ba and G. Steidl, Sparse mixture models inspired by ANOVA decompositions, Electronic Transactions on Numerical Analysis, 55 (2022), 142-168. doi: 10.1553/etna_vol55s142. |
[24] | J. Hertrich, S. Neumayer and G. Steidl, Convolutional proximal neural networks and plug-and-play algorithms, Linear Algebra and its Applications, 631 (2021), 203-234. doi: 10.1016/j.laa.2021.09.004. |
[25] | F. Hickernell, I. Sloan and G. Wasilkowski, On tractability of weighted integration over bounded and unbounded regions in $\mathbb{R}^{s}$, Mathematics of Computation, 73 (2004), 1885-1901. doi: 10.1090/S0025-5718-04-01624-2. |
[26] | F. J. Hickernell, I. H. Sloan and G. W. Wasilkowski, The strong tractability of multivariate integration using lattice rules, Monte Carlo and Quasi-Monte Carlo Methods 2002, Berlin, Heidelberg, Springer-Verlag, Berlin, 2004,259-273. |
[27] | J. Hu, X. Liu, Z.-W. Wen and Y.-X. Yuan, A brief introduction to manifold optimization, Journal of the Operations Research Society of China, 8 (2020), 199-248. doi: 10.1007/s40305-020-00295-9. |
[28] | A. Hurwitz, Über die Erzeugung der Invarianten durch Integration, Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse, 1897 (1897), 71-90. doi: 10.1007/978-3-0348-4160-3_38. |
[29] | M. G. Kapteyn, J. V. R. Pretorius and K. E. Willcox, A probabilistic graphical model foundation for enabling predictive digital twins at scale, Nature Computational Science, 1 (2021), 337-347. |
[30] | M. Kochurov, R. Karimov and S. Kozlukov, Geoopt: Riemannian optimization in pytorch, arXiv: 2005.02819, 2020. |
[31] | S. G. Krantz and H. R. Parks, A Primer of Real Analytic Functions, Birkhäuser Advanced Texts Basler Lehrbücher, Birkhäuser Boston, MA, 2nd ed. edition, 2002. doi: 10.1007/978-0-8176-8134-0. |
[32] | F. Kuo, D. Nuyens, L. Plaskota, I. Sloan and G. Wasilkowski, Infinite-dimensional integration and the multivariate decomposition method, Journal of Computational and Applied Mathematics, 326 (2017), 217-234. doi: 10.1016/j.cam.2017.05.031. |
[33] | F. Y. Kuo, I. H. Sloan, G. W. Wasilkowski and H. Wozniakowski, On decompositions of multivariate functions, Mathematics of Computation, 79 (2010), 953-966. doi: 10.1090/S0025-5718-09-02319-9. |
[34] | G. Li and T. K. Pong, Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods, Foundations of Computational Mathematics, 18 (2018), 1199-1232. doi: 10.1007/s10208-017-9366-8. |
[35] | T. Li, Y. Zhao, K. Yan, K. Zhou, C. Zhang and X. Zhang, Probabilistic graphical models in energy systems: A review, Building Simulation, 15 (2022), 699-728. doi: 10.1007/s12273-021-0849-9. |
[36] | L. Lippert and D. Potts, Variable transformations in combination with wavelets and ANOVA for high-dimensional approximation, arXiv: 2207.12826, 2022. |
[37] | L. Lippert, D. Potts and T. Ullrich, Fast hyperbolic wavelet regression meets ANOVA, Numerische Mathematik, 154 (2023), 155-207. doi: 10.1007/s00211-023-01358-8. |
[38] | T. Maehara and K. Murota, Algorithm for error-controlled simultaneous block-diagonalization of matrices, SIAM Journal on Matrix Analysis and Applications, 32 (2011), 605-620. doi: 10.1137/090779966. |
[39] | K. Murota, Y. Kanno, M. Kojima and S. Kojima, A numerical algorithm for block-diagonal decomposition of matrix $\ast$-algebras with application to semidefinite programming, Japan Journal of Industrial and Applied Mathematics, 27 (2010), 125-160. doi: 10.1007/s13160-010-0006-9. |
[40] | F. Nestler, M. Stoll and T. Wagner, Learning in high-dimensional feature spaces using ANOVA-based fast matrix-vector multiplication, Foundations of Data Science, 4 (2022), 423-440. doi: 10.3934/fods.2022012. |
[41] | T. J. O'Kane, D. Harries and M. A. Collier, Bayesian structure learning for climate model evaluation, Earth and Space Science Open Archive, 2023. doi: 10.22541/essoar.169603507.70356103/v1. |
[42] | A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai and S. Chintala, Pytorch: An imperative style, high-performance deep learning library, In Advances in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, 8024-8035. |
[43] | D. Potts and M. Schmischke, Approximation of high-dimensional periodic functions with Fourier-based methods, SIAM Journal on Numerical Analysis, 59 (2021), 2393-2429. doi: 10.1137/20M1354921. |
[44] | D. Potts and M. Schmischke, Interpretable transformed ANOVA approximation on the example of the prevention of forest fires, Frontiers in Applied Mathematics and Statistics, 8 (2022), 795250. doi: 10.3389/fams.2022.795250. |
[45] | Q. Rebjock and N. Boumal, Fast convergence to non-isolated minima: four equivalent conditions for $C^2$ functions, arXiv: 2303.00096, 2023. |
[46] | R. Schneider and A. Uschmajew, Convergence results for projected line-search methods on varieties of low-rank matrices via Łojasiewicz inequality, SIAM Journal on Optimization, 25 (2015), 622-646. doi: 10.1137/140957822. |
[47] | H. H. Sohrab, Basic Real Analysis, Birkhäuser New York, NY, 2nd ed. edition, 2014. doi: 10.1007/978-1-4939-1841-6. |
[48] | S. Sullivant, Algebraic Statistics, Graduate Studies in Mathematics, 194, American Mathematical Society, Providence, Rhode Island, 2018. doi: 10.1090/gsm/194. |
[49] | Trilla-Fuertes et al., Multi-omics characterization of response to pd-1 inhibitors in advanced melanoma, Cancers, 15 (2023). |
[50] | K. Usevich, J. Li and P. Comon, Approximate matrix and tensor diagonalization by unitary transformations: Convergence of jacobi-type algorithms, SIAM Journal on Optimization, 30 (2020), 2998-3028. doi: 10.1137/19M125950X. |
[51] | M. Usvyatsov, R. Ballester-Ripoll and K. Schindler, tntorch: Tensor network learning with PyTorch, Journal of Machine Learning Research, 23 (2022), 1-6. |
[52] | S.-T. Yau, Non-existence of continuous convex functions on certain riemannian manifolds, Mathematische Annalen, 207 (1974), 269-270. doi: 10.1007/BF01351342. |
[53] | P. Yu, G. Li and T. K. Pong, Kurdyka-Łojasiewicz Exponent via Inf-projection, Foundations of Computational Mathematics, 22 (2022), 1171-1217. doi: 10.1007/s10208-021-09528-6. |
Left:
Left:
Left:
Mean optimality gap
Failure ratio
Mean optimality gap
Reconstructions for noisy test functions
First row:
First row:
First row: