\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Sparse additive function decompositions facing basis transforms

  • *Corresponding author: Fatima Antarou Ba

    *Corresponding author: Fatima Antarou Ba 

The first author is supported by the BMBF 01|S20053B project SAℓE and the third author is supported by the DFG within the SFB "Tomography Across the Scales" (STE 571/19-1, project number: 495365311)

Abstract / Introduction Full Text(HTML) Figure(10) / Table(8) Related Papers Cited by
  • High-dimensional real-world systems can often be well characterized by a small number of simultaneous low-complexity interactions. The analysis of variance (ANOVA) decomposition and the anchored decomposition are typical techniques to find sparse additive decompositions of functions. In this paper, we are interested in a setting, where these decompositions are not directly sparse, but become so after an appropriate basis transform. Noting that the sparsity of those additive function decompositions is equivalent to the fact that most of its mixed partial derivatives vanish, we can exploit a connection to the underlying function graphs to determine an orthogonal transform that realizes the appropriate basis change. This is done in three steps: we apply singular value decomposition to minimize the number of vertices of the function graph, and joint block diagonalization techniques of families of matrices followed by sparse minimization based on relaxations of the zero "norm" for minimizing the number of edges. For the latter one, we propose and analyze minimization techniques over the manifold of special orthogonal matrices. Various numerical examples illustrate the reliability of our approach for functions having, after a basis transform, a sparse additive decomposition into summands with at most two variables.

    Mathematics Subject Classification: Primary: 26Bxx, 33F05, 58C05, 90C26, 65Kxx; Secondary: 65F25, 15B99.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Left: $ f(x) := \sin(u_1^{ \mathrm{T}}x\sqrt{2}/2) $ where $ U = (u_1,u_2) $ is a $ 2 $-dimensional rotation matrix of rotation angle $ \pi/4. $ Right: Plot of $ f_{U}. $ The left-hand diagram shows that $ f $ depends on both variables $ x_1 $ and $ x_2 $ while $ f_U $ is constant in $ x_1$

    Figure 2.  Left: $ f(x) = \sin(5 u_1^{ \mathrm{T}}x) + \sin(5 u_2^{ \mathrm{T}}x) $ where $ U = (u_1, u_2) $ is a $ 2 $-dimensional rotation matrix of angle $ \pi/4. $ Right: Image of the partial derivative $ \partial_{x_1}f(x). $ The partial derivative $ \partial_{x_1}f(x) $ depends on $ x_1 $ and $ x_2 $ since its values vary in both variables and thus $ f $ cannot be decomposed as a sum of two univariate functions

    Figure 3.  Left: $ f_U $ where $ f, U $ are as in Figure 2. Right: Image of the partial derivative $ \partial_{x_1}f_{U} $ which is constant with respect to $ x_2 $. Therefore $ f_U $ can be decomposed as a sum of two univariate functions (see Theorem 2.4)

    Figure 4.  Mean optimality gap $ \ell_{\mathcal H}\bigl(U^{(r)}\bigr) - \ell_{\mathcal H}\bigl(R^{ \mathrm{T}}\bigr) $ over 100 experiments. Top: Noise-free matrices. Bottom: Noisy matrices. RI denotes random initialization and $ h $ uses the grid search with the corresponding grid density value. Subscripts Rgd and La stand for Riemannian gradient descent and Landing algorithm, respectively

    Figure 5.  Failure ratio $ \mathcal R $ (26) of suboptimal joint sparsity reconstructions for a given thresholding parameter $ \eta $. First row: Clean data. Second row: Noisy data with additive random Gaussian noise $ \mathcal{N}(0,\sigma), \sigma = 10^{-3} $

    Figure 6.  Mean optimality gap $ \overline\ell_{\mathcal H}(U^{(r)}) - \overline\ell_{\mathcal H}(R^T) $ and failure ratio $ \mathcal R $ for 50 noise-free functions, see (28) and (26) respectively. RI denotes random initialization and $ h $ uses the grid search with the corresponding grid density value. Subscripts Rgd and La stand for Riemannian gradient descent and Landing algorithm, respectively

    Figure 7.  Reconstructions for noisy test functions $ f_n $ with notation as in Figure 6

    Figure 8.  First row: $ f = f^1_{U_1} $. Second row: $ f = f^{1}_{U_{1,n}} $. First column: $ p = \infty. $ Second column: $ p = 1. $ Blue bars: $ \|f\|_{\{i\},p}: = \max_{\{i\} \subseteq \mathbf{u} \subseteq [7]}\left\| {f_{ \mathbf{u},A}} \right\|_p $, in decreasing order. Orange bars: $ L^p $-norm of the corresponding first-order partial derivative $ \|\partial_{\{i\}} f\|_p $. The green dashed line represents $ 2^6\max\{\|\partial_{\{i\}} f\|_p:\|\partial_{\{i\}} f\|_p \leq 10^{-4}\} $ corresponding to Theorem 2.5 and the red dashed line represents the truncation value $ 10^{-4} $

    Figure 9.  First row: $ f = f^1_{U_1} $. Second row: $ f = f^{1}_{ U_{1,n}} $. First column: $ p = \infty. $ Second column: $ p = 1. $ Blue bars: $ \|f\|_{\{i,j\},p}: = \max_{\{i,j\} \subseteq \mathbf{u} \subseteq [7]}\left\| {f_{ \mathbf{u},A}} \right\|_p $ in decreasing order. Orange bars: corresponding $ L^p $-norm of the second-order partial derivative $ \|\partial_{\{i,j\}} f\|_p $. Only the largest $ 8 $ terms are displayed. The green dashed line represents $ 2^5\max\{\|\partial_{\{i,j\}} f\|_p:\|\partial_{\{i,j\}} f\|_p \leq 10^{-4}\} $ corresponding to Theorem 2.5 and the red dashed line represents the truncation value $ 10^{-4} $

    Figure 10.  First row: $ f = f^2_{U_2} $. Second row: $ f = f^{n}_{ U_{2,n}} $. First column: $ p = \infty. $ Second column: $ p = 1. $ Blue bars: $ \|f\|_{\{i,j\},p}: = \max_{\{i,j\} \subseteq \mathbf{u} \subseteq [7]}\left\| {f_{ \mathbf{u},A}} \right\|_p $ in decreasing order. Orange bars: corresponding $ L^p $-norm of the second-order partial derivative $ \|\partial_{\{i,j\}} f\|_p $. Only the largest $ 8 $ terms are displayed. The green dashed line represents $ 2^5\max\{\|\partial_{\{i,j\}} f\|_p:\|\partial_{\{i,j\}} f\|_p \leq 10^{-4}\} $ corresponding to Theorem 2.5 and the red dashed line represents the truncation value $ 10^{-4} $

    Table 1.  Minimal values of the $ L^p $-norm, $ p\in \{1, \infty\} $ among all first- and second-order ANOVA terms, $ \mathcal{A}(f, d_{sp}, p) : = \min_{\substack{\mathbf{u} \subseteq [d]\\ | \mathbf{u}| = d_{sp}}} \left\| {f_{ \mathbf{u},A}} \right\|_p $

    $ f= $ $ f^1 $ $ f^2 $
    $ d_{sp}= $} 1 2 1 2
    $ \mathcal{A}(f, d_{sp}, \infty) $ 0.6788 0.2465 0.3421 0.5783
    $ \mathcal{A}(f, d_{sp}, 1) $ 0.3614 0.0453 0.2102 0.1203
     | Show Table
    DownLoad: CSV

    Table 2.  Number $ G(f, p): = \big|\{i \in [d]:\|\partial_{\{i\}}f\|_p \leq 10^{-4}\}\big| $ of small first-order derivatives and number $ H(f, p) : = \big|\{\{i,j\}\subset [d]: i\neq j \text{ and } \|\partial_{\{i,j\}}f\|_p \leq 10^{-4}\}\big| $ of small second-order derivatives for the ground truth functions $ \tilde{f}^i $ and for $ f^i_{U_i}, f^{i}_{U_{i,n}} $ where $ U_i, U_{i,n} $ is the output of the sparsifying algorithm

    $ f= $ $ \tilde{f}^1 $ $ f^1_{U_1} $ $ f^{1}_{U_{1,n}} $ $ \tilde{f}^2 $ $ f^2_{U_2} $ $ f^{2}_{ U_{2,n}} $
    $ G(f, \infty) \\ \;\; G(f, 1)$ 2 2 2 0 0 0
    2 2 2 0 0 0
    $ H(f, \infty) \\ \;\; H(f, 1)$ 18 18 17 16 15 14
    18 18 18 16 15 15
     | Show Table
    DownLoad: CSV

    Table 3.  Number of operations for the grid search and for $ K $ iterations of the manifold optimization method. Here $ d $ is the dimension, $ N $ the number of Hessians and $ h $ the density of the grid

    Grid search Number of grid points $ \Theta(h) $ $ \mathcal{O}\bigl((\lceil 2\pi/h \rceil)^{d}(\lceil \pi/h \rceil)^{d^2}\bigr) $
    Matrix $ U \in \Gamma $ $ \mathcal{O}(d^3) $
    Loss function $ \ell(U) $ $ \mathcal{O}(Nd^3) $
    Total $ \mathcal{O}\bigl(Nd^3(\lceil 2\pi/h \rceil)^{d}(\lceil \pi/h \rceil)^{d^2}\bigr) $
    Manifold Optimiztion Riemannian/Euclidean gradient $ \mathcal{O}(N d^3) $
    Riemannian gradient descent (21) $ \mathcal{O}(K N d^{3}) $
    Landing (22) $ \mathcal{O}(K N d^3) $
     | Show Table
    DownLoad: CSV

    Table 4.  Number of set of matrices such that diff $ \chi(U, \eta)=i $ (see (25)) and input dimension $ d=2 $, $ \eta=10^{-9}, 10^{-4} $ for clean resp. noisy data. Rgd: Riemannian gradient descent

    Clean data Noisy data $ \sigma=10^{-3} $
    Rgd Landing Rgd Landing
    i 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $
    Random initialization 100 0 0 0 100 0 0 0 98 2 0 0 98 2 0 0
    Grid-Search initialization $ h=1 $ 100 0 0 0 100 0 0 0 98 2 0 0 98 2 0 0
    $ h=0.5 $ 100 0 0 0 100 0 0 0 98 2 0 0 98 2 0 0
    $ h=0.25 $ 100 0 0 0 100 0 0 0 98 2 0 0 98 2 0 0
    $ h=0.125 $ 100 0 0 0 100 0 0 0 98 2 0 0 98 2 0 0
    $ h=0.1 $ 100 0 0 0 100 0 0 0 98 2 0 0 98 2 0 0
     | Show Table
    DownLoad: CSV

    Table 5.  Same as table 4 for input dimension $ d=3 $

    Clean data Noisy data $ \sigma=10^{-3} $
    Rgd Landing Rgd Landing
    $ i $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $
    Random initialization 98 0 0 2 87 0 5 8 98 1 1 0 95 2 3 0
    Grid-Search initialization $ h=1 $ 99 1 0 0 91 1 0 8 98 1 1 0 98 1 1 0
    $ h=0.5 $ 100 0 0 0 91 0 0 9 99 0 1 0 99 0 1 0
    $ h=0.25 $ 100 0 0 0 91 0 0 9 99 0 1 0 99 0 1 0
    $ h=0.125 $ 100 0 0 0 92 0 0 8 99 0 1 0 99 0 1 0
    $ h=0.1 $ 100 0 0 0 92 0 0 8 100 0 0 0 100 0 0 0
     | Show Table
    DownLoad: CSV

    Table 6.  Same as table 4 for input dimension $ d=4 $

    Clean data Noisy data $ \sigma=10^{-3} $
    Rgd Landing Rgd Landing
    $ i $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $
    Random initialization 86 1 8 5 57 2 21 20 84 6 7 3 88 1 11 0
    Grid-Search initialization $ h=1 $ 94 3 2 1 84 1 7 8 92 5 2 1 92 5 2 1
    $ h=0.5 $ 98 0 1 1 87 0 7 6 96 2 2 0 96 2 2 0
    $ h=0.25 $ 100 0 0 0 89 0 6 5 99 1 0 0 99 1 0 0
    $ h=0.125 $ 100 0 0 0 89 0 6 5 99 1 0 0 99 1 0 0
    $ h=0.1 $ 100 0 0 0 89 0 6 5 99 1 0 0 99 1 0 0
     | Show Table
    DownLoad: CSV

    Table 7.  Same as table 4 for input dimension $ d=5 $

    Clean data Noisy data $ \sigma=10^{-3} $
    Rgd Landing Rgd Landing
    $ i $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $ 0 1 2 $ \geq3 $
    Random initialization $ 71 $ $ 4 $ $ 13 $ $ 12 $ $ 37 $ $ 7 $ $ 13 $ $ 43 $ $ 70 $ 9 9 $ 12 $ $ 74 $ 5 8 $ 13 $
    Grid-Search initialization $ h=1 $ $ 93 $ 2 $ 4 $ 1 $ 85 $ 2 8 5 $ 93 $ 2 $ 4 $ 1 $ 93 $ 2 $ 4 $ 1
     | Show Table
    DownLoad: CSV

    Table 8.  Mean runtime in seconds$ (s) $ for the 5-time random initialisation method with $ K=5\cdot 10^3 $ iterations and the grid search initialisation (step size $ h=1, 0.5, 0.25, 0.125, 0.1 $) procedure over $ 10 $ random chosen examples out of the 100 randomly generated examples for dimension $ d=2, 3, 4, 5 $. For each dimension $ d=2,3,4,5 $ the first row in the subtable losses $ \ell(U), U \in \Gamma(U) $ denotes the minimal runtime for $ N=1 $ Hessians and the second row the maximal runtime for $ N=d(d+1)/2 $ Hessians. Rgd: Riemannian gradient descent. La: Landing method

    Input dimension Random Init. Grid-search Init.
    Rgd La Rotation $ U \in \Gamma(h) $ Losses $ \ell(U), U \in \Gamma(U) $
    1 0.5 0.25 0.125 0.1 1 0.5 0.25 0.125 0.1
    $ d=2 $ 29.36 33.36 0.0010 0.0009 0.0008 0.0008 0.0008 0.0003 0.0003 0.0003 0.0003 0.0003
    0.0009 0.0009 0.0009 0.0009 0.0009
    $ d=3 $ 29.39 32.18 0.0211 0.0211 0.0211 0.0213 0.0215 0.0032 0.0032 0.0032 0.0035 0.0041
    0.0194 0.0193 0.0193 0.0210 0.0244
    $ d=4 $ 29.66 32.50 0.6105 0.7062 2.0015 11.3731 168.4473 0.0343 0.0761 0.1732 5.9366 12.4121
    0.3434 0.7612 1.7323 59.3655 124.1206
    $ d=5 $ 29.77 32.55 23.5289 - - - - 1.1344 - - - -
    17.0158 - - - -
     | Show Table
    DownLoad: CSV
  • [1] P. Ablin and G. Peyré, Fast and accurate optimization on the orthogonal manifold without retraction, In G. Camps-Valls, F. J. R. Ruiz, and I. Valera, editors, Proceedings of The 25th International Conference on Artificial Intelligence and Statistics, volume 151 of Proceedings of Machine Learning Research, PMLR, 2022, 5636-5657.
    [2] P. A. AbsilR. Mahony and B. Andrews, Convergence of the iterates of descent methods for analytic cost functions, SIAM Journal on Optimization, 16 (2005), 531-547.  doi: 10.1137/040605266.
    [3] R. AgarwalL. MelnickN. FrosstX. ZhangB. LengerichR. Caruana and G. E. Hinton, Neural additive models: Interpretable machine learning with neural nets, Advances in Neural Information Processing Systems, 34 (2021), 4699-4711. 
    [4] H. AttouchJ. BolteP. Redont and A. Soubeyran, Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality, Mathematics of Operations Research, 35 (2010), 438-457.  doi: 10.1287/moor.1100.0449.
    [5] F. A. Ba and M. Quellmalz, Accelerating the sinkhorn algorithm for sparse multi-marginal optimal transport via fast Fourier transforms, Algorithms, 15 (2022), 311.  doi: 10.3390/a15090311.
    [6] J. Baldeaux and M. Gnewuch, Optimal randomized multilevel algorithms for infinite-dimensional integration on function spaces with ANOVA-type decomposition, SIAM Journal on Numerical Analysis, 52 (2014), 1128-1155.  doi: 10.1137/120896001.
    [7] F. Bartel, D. Potts and M. Schmischke, Grouped transformations and regularization in high-dimensional explainable ANOVA approximation, SIAM Journal on Scientific Computing, 44 (2022), A1606-A1631. doi: 10.1137/20M1374547.
    [8] F. BeierJ. von LindheimS. Neumayer and G. Steidl, Unbalanced multi-marginal optimal transport, Journal of Mathematical Imaging and Vision, 65 (2023), 394-413.  doi: 10.1007/s10851-022-01126-7.
    [9] J. Bilmes and C. Bartels, Graphical model architectures for speech recognition, IEEE Signal Processing Magazine, 22 (2005), 89-100.  doi: 10.1109/MSP.2005.1511827.
    [10] J. Bilmes and G. Zweig, The graphical models toolkit: An open source software system for speech and time-series processing, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, 4 (2002), 3916-3919. 
    [11] N. BoumalAn Introduction to Optimization on Smooth manifolds, Cambridge University Press, Cambridge, 2023. 
    [12] N. BoumalP.-A. Absil and C. Cartis, Global rates of convergence for nonconvex optimization on manifolds, IMA Journal of Numerical Analysis, 39 (2019), 1-33.  doi: 10.1093/imanum/drx080.
    [13] R. E. Caflisch, Valuation of morgage backed securities using Brownian bridges to reduce effective dimension, The Journal of Computational Finance, 1 (1997), 27-46. 
    [14] C.-H. Chang, R. Caruana and A. Goldenberg, Node-gam: Neural generalized additive model for interpretable deep learning, arXiv: 2106.01613, 2021.
    [15] P. M. Cohn, Basic Algebra: Groups, Rings and Fields, Springer Science & Business Media, 2012. doi: 10.1090/conm/575/11387.
    [16] J. DickF. Y. Kuo and I. H. Sloan, High-dimensional integration: The quasi-Monte Carlo way, Acta Numerica, 22 (2013), 133-288.  doi: 10.1017/S0962492913000044.
    [17] J. Enouen and Y. Liu, Sparse interaction additive networks via feature interaction detection and sparse selection, Advances in Neural Information Processing Systems, 35 (2022), 13908-13920. 
    [18] M. Griebel and M. Holtz, Dimension-wise integration of high-dimensional functions with applications to finance, Journal of Complexity, 26 (2010), 455-489.  doi: 10.1016/j.jco.2010.06.001.
    [19] M. GriebelF. Y. Kuo and I. H. Sloan, The ANOVA decomposition of a non-smooth function of infinitely many variables can have every term smooth, Mathematics of Computations, 86 (2017), 1855-1876.  doi: 10.1090/mcom/3171.
    [20] C. R. HarrisK. J. MillmanS. J. van der WaltR. GommersP. VirtanenD. CournapeauE. WieserJ. TaylorS. BergN. J. SmithR. KernM. PicusS. HoyerM. H. van KerkwijkM. BrettA. HaldaneJ. F. del RíoM. WiebeP. PetersonP. Gérard-MarchantK. SheppardT. ReddyW. WeckesserH. AbbasiC. Gohlke and T. E. Oliphant, Array programming with NumPy, Nature, 585 (2020), 357-362.  doi: 10.1038/s41586-020-2649-2.
    [21] M. Hasannasab, J. Hertrich, S. Neumayer, G. Plonka, S. Setzer and G. Steidl, Parseval proximal neural networks, Journal of Fourier Analysis and Applications, 26 (2020), Paper No. 59, 31 pp. doi: 10.1007/s00041-020-09761-7.
    [22] M. HefterK. Ritter and G. W. Wasilkowski, On equivalence of weighted anchored and ANOVA spaces of functions with mixed smoothness of order one in $l_1$ or $l_\infty$, Journal of Complexity, 32 (2016), 1-19.  doi: 10.1016/j.jco.2015.07.001.
    [23] J. HertrichF. A. Ba and G. Steidl, Sparse mixture models inspired by ANOVA decompositions, Electronic Transactions on Numerical Analysis, 55 (2022), 142-168.  doi: 10.1553/etna_vol55s142.
    [24] J. HertrichS. Neumayer and G. Steidl, Convolutional proximal neural networks and plug-and-play algorithms, Linear Algebra and its Applications, 631 (2021), 203-234.  doi: 10.1016/j.laa.2021.09.004.
    [25] F. HickernellI. Sloan and G. Wasilkowski, On tractability of weighted integration over bounded and unbounded regions in $\mathbb{R}^{s}$, Mathematics of Computation, 73 (2004), 1885-1901.  doi: 10.1090/S0025-5718-04-01624-2.
    [26] F. J. Hickernell, I. H. Sloan and G. W. Wasilkowski, The strong tractability of multivariate integration using lattice rules, Monte Carlo and Quasi-Monte Carlo Methods 2002, Berlin, Heidelberg, Springer-Verlag, Berlin, 2004,259-273.
    [27] J. HuX. LiuZ.-W. Wen and Y.-X. Yuan, A brief introduction to manifold optimization, Journal of the Operations Research Society of China, 8 (2020), 199-248.  doi: 10.1007/s40305-020-00295-9.
    [28] A. Hurwitz, Über die Erzeugung der Invarianten durch Integration, Nachrichten von der Gesellschaft der Wissenschaften zu Göttingen, Mathematisch-Physikalische Klasse, 1897 (1897), 71-90.  doi: 10.1007/978-3-0348-4160-3_38.
    [29] M. G. KapteynJ. V. R. Pretorius and K. E. Willcox, A probabilistic graphical model foundation for enabling predictive digital twins at scale, Nature Computational Science, 1 (2021), 337-347. 
    [30] M. Kochurov, R. Karimov and S. Kozlukov, Geoopt: Riemannian optimization in pytorch, arXiv: 2005.02819, 2020.
    [31] S. G. Krantz and H. R. Parks, A Primer of Real Analytic Functions, Birkhäuser Advanced Texts Basler Lehrbücher, Birkhäuser Boston, MA, 2nd ed. edition, 2002. doi: 10.1007/978-0-8176-8134-0.
    [32] F. KuoD. NuyensL. PlaskotaI. Sloan and G. Wasilkowski, Infinite-dimensional integration and the multivariate decomposition method, Journal of Computational and Applied Mathematics, 326 (2017), 217-234.  doi: 10.1016/j.cam.2017.05.031.
    [33] F. Y. KuoI. H. SloanG. W. Wasilkowski and H. Wozniakowski, On decompositions of multivariate functions, Mathematics of Computation, 79 (2010), 953-966.  doi: 10.1090/S0025-5718-09-02319-9.
    [34] G. Li and T. K. Pong, Calculus of the exponent of Kurdyka-Łojasiewicz inequality and its applications to linear convergence of first-order methods, Foundations of Computational Mathematics, 18 (2018), 1199-1232.  doi: 10.1007/s10208-017-9366-8.
    [35] T. LiY. ZhaoK. YanK. ZhouC. Zhang and X. Zhang, Probabilistic graphical models in energy systems: A review, Building Simulation, 15 (2022), 699-728.  doi: 10.1007/s12273-021-0849-9.
    [36] L. Lippert and D. Potts, Variable transformations in combination with wavelets and ANOVA for high-dimensional approximation, arXiv: 2207.12826, 2022.
    [37] L. LippertD. Potts and T. Ullrich, Fast hyperbolic wavelet regression meets ANOVA, Numerische Mathematik, 154 (2023), 155-207.  doi: 10.1007/s00211-023-01358-8.
    [38] T. Maehara and K. Murota, Algorithm for error-controlled simultaneous block-diagonalization of matrices, SIAM Journal on Matrix Analysis and Applications, 32 (2011), 605-620.  doi: 10.1137/090779966.
    [39] K. MurotaY. KannoM. Kojima and S. Kojima, A numerical algorithm for block-diagonal decomposition of matrix $\ast$-algebras with application to semidefinite programming, Japan Journal of Industrial and Applied Mathematics, 27 (2010), 125-160.  doi: 10.1007/s13160-010-0006-9.
    [40] F. NestlerM. Stoll and T. Wagner, Learning in high-dimensional feature spaces using ANOVA-based fast matrix-vector multiplication, Foundations of Data Science, 4 (2022), 423-440.  doi: 10.3934/fods.2022012.
    [41] T. J. O'Kane, D. Harries and M. A. Collier, Bayesian structure learning for climate model evaluation, Earth and Space Science Open Archive, 2023. doi: 10.22541/essoar.169603507.70356103/v1.
    [42] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai and S. Chintala, Pytorch: An imperative style, high-performance deep learning library, In Advances in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, 8024-8035.
    [43] D. Potts and M. Schmischke, Approximation of high-dimensional periodic functions with Fourier-based methods, SIAM Journal on Numerical Analysis, 59 (2021), 2393-2429.  doi: 10.1137/20M1354921.
    [44] D. Potts and M. Schmischke, Interpretable transformed ANOVA approximation on the example of the prevention of forest fires, Frontiers in Applied Mathematics and Statistics, 8 (2022), 795250.  doi: 10.3389/fams.2022.795250.
    [45] Q. Rebjock and N. Boumal, Fast convergence to non-isolated minima: four equivalent conditions for $C^2$ functions, arXiv: 2303.00096, 2023.
    [46] R. Schneider and A. Uschmajew, Convergence results for projected line-search methods on varieties of low-rank matrices via Łojasiewicz inequality, SIAM Journal on Optimization, 25 (2015), 622-646.  doi: 10.1137/140957822.
    [47] H. H. Sohrab, Basic Real Analysis, Birkhäuser New York, NY, 2nd ed. edition, 2014. doi: 10.1007/978-1-4939-1841-6.
    [48] S. Sullivant, Algebraic Statistics, Graduate Studies in Mathematics, 194, American Mathematical Society, Providence, Rhode Island, 2018. doi: 10.1090/gsm/194.
    [49] Trilla-Fuertes et al., Multi-omics characterization of response to pd-1 inhibitors in advanced melanoma, Cancers, 15 (2023).
    [50] K. UsevichJ. Li and P. Comon, Approximate matrix and tensor diagonalization by unitary transformations: Convergence of jacobi-type algorithms, SIAM Journal on Optimization, 30 (2020), 2998-3028.  doi: 10.1137/19M125950X.
    [51] M. UsvyatsovR. Ballester-Ripoll and K. Schindler, tntorch: Tensor network learning with PyTorch, Journal of Machine Learning Research, 23 (2022), 1-6. 
    [52] S.-T. Yau, Non-existence of continuous convex functions on certain riemannian manifolds, Mathematische Annalen, 207 (1974), 269-270.  doi: 10.1007/BF01351342.
    [53] P. YuG. Li and T. K. Pong, Kurdyka-Łojasiewicz Exponent via Inf-projection, Foundations of Computational Mathematics, 22 (2022), 1171-1217.  doi: 10.1007/s10208-021-09528-6.
  • 加载中

Figures(10)

Tables(8)

SHARE

Article Metrics

HTML views(604) PDF downloads(212) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return