Advanced Search
Article Contents
Article Contents

Krylov subspace methods to accelerate kernel machines on graphs

Abstract / Introduction Full Text(HTML) Figure(4) / Table(2) Related Papers Cited by
  • In classical frameworks as the Euclidean space, positive definite kernels as well as their analytic properties are explicitly available and can be incorporated directly in kernel-based learning algorithms. This is different if the underlying domain is a discrete irregular graph. In this case, respective kernels have to be computed in a preliminary step in order to apply them inside a kernel machine. Typically, such a kernel is given as a matrix function of the graph Laplacian. Its direct calculation leads to a high computational burden if the size of the graph is very large. In this work, we investigate five different block Krylov subspace methods to obtain cheaper iterative approximations of these kernels. We will investigate convergence properties of these Krylov subspace methods and study to what extent these methods are able to preserve the symmetry and positive definiteness of the original kernels they are approximating. We will further discuss the computational complexity and the memory requirements of these methods, as well as possible implications for the kernel predictors in machine learning.

    Mathematics Subject Classification: Primary: 65F60, 65F50; Secondary: 65D15.


    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Comparison between Lanczos and Chebyshev Krylov subspace approximation of a diffusion and a variation spline kernel on the path graph $ G_1 $. The blue line indicates the support of the approximant, the error with respect to the exact kernel column is measured in the uniform norm

    Figure 2.  Kernel interpolant on the bunny graph using $ N = 20 $ samples and the variational spline kernel with parameters $ s = 2 $ and $ \epsilon = 0.05 $

    Figure 3.  Uniform error $ \|y - y^{(\mathrm{kr})}\|_\infty $ for the five block Krylov methods $ \mathrm{kr} \in \{\mathrm{cbl}, \mathrm{gbl}, \mathrm{sbl}, \mathrm{cheb}, \mathrm{cheb}^2\} $ in terms of the iteration numbers $ m $

    Figure 4.  Eigenvalues of $ \mathrm{E}_W^* p_{\phi,5}^{(\mathrm{kr})}( \mathbf{L}) \mathrm{E}_W $ for the five methods $ \mathrm{kr} \in \{\mathrm{cbl}, \mathrm{gbl}, \mathrm{sbl}, \mathrm{cheb}, \mathrm{cheb}^2\} $, with kernel $ \phi( \mathbf{L}) = e^{-t \mathbf{L}} $, $ t = 20 $

    Table 1.  Required operations to calculate $ p_{\phi,m-1}^{(\mathrm{kr})}( \mathbf{L}) \mathrm{E}_W $ for the Krylov space methods $ \mathrm{kr} \in \{\mathrm{cbl},\mathrm{gbl},\mathrm{sbl},\mathrm{cheb}\} $

    Operations $ \rm cbl $ $ \rm gbl $ $ \rm sbl $ $ \rm cheb $
    MVs $ m N $ $ m N $ $ m N $ $ m N $
    DOTs $ \mathcal{O}(m N^2) $ $ \mathcal{O}(m N) $ $ \mathcal{O}(m N) $ -
    AXPYs $ \mathcal{O}(m N^2) $ $ \mathcal{O}(m N) $ $ \mathcal{O}(m N) $ $ \mathcal{O}(m N) $
    $ \phi(\mathbf{H}_m) / c_k(\phi) $ $ \mathcal{O}(m N^3) + \mathcal{O}(m^2 N^2) $ $ \mathcal{O}(m^2) $ $ \mathcal{O}(m^2 N) $ $ \mathcal{O}(m \log m) $
     | Show Table
    DownLoad: CSV

    Table 2.  Memory requirements for the calculation of the matrix polynomial $ p_{\phi,m-1}^{(\mathrm{kr})}( \mathbf{L}) \mathrm{E}_W $ for the Krylov space methods $ \mathrm{kr} \in \{\mathrm{cbl},\mathrm{gbl},\mathrm{sbl},\mathrm{cheb}\} $

    Storage $ \rm cbl $ $ \rm gbl $ $ \rm sbl $ $ \rm cheb $
    $ \mathrm{Q}_k/ T_k( \mathbf{L}) \mathrm{E}_W $
    $ \mathbf{H}_m / c_k(\phi) $
    $ m n N $
    $ \mathcal{O}(m N^2) $
    $ m n N $
    $ \mathcal{O}(m) $
    $ m n $
    $ \mathcal{O}(m) $
    $ 2n $
    $ m $
     | Show Table
    DownLoad: CSV
  • [1] M. Belkin, T. Matveeva and P. Niyogi, Regularization and semi-supervised learning on large graphs, Learning Theory, Lecture Notes in Comput. Sci., Springer, Berlin, Heidelberg, 3120 (2004), 624-638. doi: 10.1007/978-3-540-27819-1_43.
    [2] L. Bergamaschi and M. Vianello, Efficient computation of the exponential operator for large, sparse, symmetric matrices, Numer. Linear Algebra Appl., 7 (2000), 27-45.  doi: 10.1002/(SICI)1099-1506(200001/02)7:1<27::AID-NLA185>3.0.CO;2-4.
    [3] R. Cavoretto, A. De Rossi and W. Erb., Partition of unity methods for signal processing on graphs, J. Fourier Anal. Appl., 27 (2021), Paper No. 66, 29 pp. doi: 10.1007/s00041-021-09871-w.
    [4] R. CavorettoA. De Rossi and W. Erb., GBFPUM - A MATLAB package for partition of unity based signal interpolation and approximation on graphs, J. Fourier Anal. Appl., 15 (2022), 25-34. 
    [5] R. Coifman and M. Maggioni., Diffusion wavelets, Appl. Comput. Harmonic Anal., 21 (2006), 53-94.  doi: 10.1016/j.acha.2006.04.004.
    [6] S. CuomoW. Erb and G. Santin, Kernel-Based Models for Influence Maximization on Graphs based on Gaussian Process Variance Minimization, J. Comput. Appl. Math., 423 (2023), 114951.  doi: 10.1016/j.cam.2022.114951.
    [7] R. A. DeVore and G. G. Lorentz, Constructive Approximation, Springer-Verlag, Berlin, Heidelberg, 1993.
    [8] S. Elsworth and S. Güttel, The block rational Arnoldi method, SIAM Journal on Matrix Analysis and Applications, 41 (2020), 365-388.  doi: 10.1137/19M1245505.
    [9] W. Erb, Graph signal interpolation with positive definite graph basis functions, Appl. Comput. Harmon. Anal., 60 (2022), 368-395.  doi: 10.1016/j.acha.2022.03.005.
    [10] W. Erb, Semi-supervised learning on graphs with feature-augmented graph basis functions, preprint, 2020, arXiv: 2003.07646.
    [11] W. Erb, Graph wedgelets: Adaptive data compression on graphs based on binary wedge partitioning trees and geometric wavelets, IEEE Transactions on Signal and Information Processing over Networks, 9 (2023), 24-34.  doi: 10.1109/tsipn.2023.3240899.
    [12] A. FrommerK. Lund and D. B. Szyld, Block Krylov subspace methods for functions of matrices, Electron. Trans. Numer. Anal., 47 (2017), 100-126.  doi: 10.1553/etna_vol47s100.
    [13] E. Gallopoulos and Y. Saad, Efficient solution of parabolic equations by Krylov approximation methods, SlAM J. Sci. Statist. Comput., 13 (1992), 1236-1264.  doi: 10.1137/0913071.
    [14] C. Godsil and G. Royle, Algebraic Graph Theory, Springer-Verlag, New York, 2001. doi: 10.1007/978-1-4613-0163-9.
    [15] M. H. Gutknecht, Block Krylov space methods for linear systems with multiple right-hand sides: An introduction, Modern Mathematical Models, Methods and Algorithms for Real World Systems, Anshan Ltd, 2006/2007,420-447.
    [16] M. Hochbruck and C. Lubich, On Krylov subspace approximations to the matrix exponential, SIAM J. Numer. Anal., 34 (1997), 1911-1925.  doi: 10.1137/S0036142995280572.
    [17] K. JbilouA. Messaoudi and H. Sadok, Global FOM and GMRES algorithms for matrix equations, Appl. Numer. Math., 31 (1999), 49-63.  doi: 10.1016/S0168-9274(98)00094-4.
    [18] R. I. Kondor and J. Lafferty, Diffusion kernels on graphs and other discrete input spaces, Proc. of the 19th. Intern. Conf. on Machine Learning ICML02, 2002,315-322.
    [19] J. Liesen and  Z. StrakosKrylov Subspace Methods: Principles and Analysis, Numerical Mathematics and Scientific Computation. Oxford University Press, Oxford, 2013. 
    [20] L. Lopez and V. Simoncini, Preserving geometric properties of the exponential matrix by block Krylov subspace methods, BIT, 46 (2006), 813-830.  doi: 10.1007/s10543-006-0096-6.
    [21] K. Lund, A New Block Krylov Subspace Framework with Applications to Functions of Matrices Acting on Multiple Vectors, Ph.D thesis, Temple University, 2018.
    [22] C. Musco, C. Musco and A. Sidford, Stability of the Lanczos method for matrix function approximation, Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SIAM, Philadelphia, PA, 2018, 1605-1624. doi: 10.1137/1.9781611975031.105.
    [23] I. P. Natanson, Constructive Function Theory. Vol. I. Uniform Approximation, Frederick Ungar Publishing, New York, 1964.
    [24] B. N. Parlett, The Symmetric Eigenvalue Problem, Corrected reprint of the 1980 original. Classics in Applied Mathematics, 20. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1998. doi: 10.1137/1.9781611971163.
    [25] I. Z. Pesenson, Variational Splines and Paley-Wiener Spaces on Combinatorial Graphs, Constr. Approx., 29 (2009), 1-21.  doi: 10.1007/s00365-007-9004-9.
    [26] R. RifkinG. Yeo and T. Poggio, Regularized least-squares classification, Nato Science Series Sub Series III Computer and SystemsSciences, 190 (2003), 131-154. 
    [27] D. RomeroM. Ma and G. B. Giannakis, Kernel-Based Reconstruction of Graph Signals, IEEE Transactions on Signal Processing, 65 (2017), 764-778.  doi: 10.1109/TSP.2016.2620116.
    [28] Y. Saad, Analysis of some Krylov subspace approximations to the matrix exponential operator, SIAM J. Numer. Anal., 29 (1992), 209-228.  doi: 10.1137/0729014.
    [29] Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd Edition, SIAM, Philadelphia, 2003. doi: 10.1137/1.9780898718003.
    [30] T. Schmelzer, Block Krylov Methods for Hermitian Linear Systems, Master's thesis, University of Kaiserslautern, 2004.
    [31] B. Schölkopf and  A. SmolaLearning with Kernels, MIT Press, Cambridge, 2002. 
    [32] D. I. Shuman, Localized spectral graph filter frames: A unifying framework, survey of design considerations, and numerical comparison, IEEE Sig. Proc. Mag, 37 (2020), 43-63. 
    [33] D. I. ShumanB. Ricaud and P. Vandergheynst, Vertex-frequency analysis on graphs, Appl. Comput. Harm. Anal., 40 (2016), 260-291.  doi: 10.1016/j.acha.2015.02.005.
    [34] V. Simoncini and E. Gallopoulos, An iterative method for nonsymmetric systems with multiple right-hand sides, SIAM J. Sci. Comput., 16 (1995), 917-933.  doi: 10.1137/0916053.
    [35] V. Simoncini and E. Gallopoulos, Convergence properties of block GMRES and matrix polynomials, Linear Algebra Appl., 247 (1996), 97-119. 
    [36] A. Smola and R. I. Kondor, Kernels and regularization on graphs, Learning Theory and Kernel Machines, Springer, Berlin, Heidelberg, 2003,144-158.
    [37] D. E. Stewart and T. S. Leyk, Error estimates for Krylov subspace approximations of matrix exponentials, J. Comput. Appl. Math., 72 (1996), 359-369.  doi: 10.1016/0377-0427(96)00006-4.
    [38] L. N. Trefethen, Approximation Theory and Approximation Practice, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2013.
    [39] V. N. Vapnik, Statistical Learning Theory, Wiley, New York, 1998.
    [40] J. P. WardF. J. Narcowich and J. D. Ward, Interpolating splines on graphs for data science applications, Appl. Comput. Harmon. Anal., 49 (2020), 540-557.  doi: 10.1016/j.acha.2020.06.001.
    [41] X. Zhu, Semi-Supervised Learning with Graphs, Ph.D thesis, Carnegie Mellon University, 2005.
  • 加载中




Article Metrics

HTML views(884) PDF downloads(195) Cited by(0)

Access History

Other Articles By Authors



    DownLoad:  Full-Size Img  PowerPoint