\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

On exact computational complexity of triangular factorization algorithms for general banded matrices

  • *Corresponding author: Liyong Zhu

    *Corresponding author: Liyong Zhu

The second author is supported by [the National Natural Science Foundation of China (No.11871091 and 91730305)].

Abstract / Introduction Full Text(HTML) Figure(8) / Table(6) Related Papers Cited by
  • Matrix triangular factorizations, as an intermediate step of some algorithms, are widely employed to solve scientific and engineering problems. However, there is no exact and explicit expressions on the computational complexity of triangular factorizations for general banded matrices in literatures so far. In this paper, specific and detailed descriptions on triangular factorization algorithms are presented for general banded matrices, and then by carefully dividing matrix into special blocks and with the help of the mathematical software "Maple", exact and explicit expressions on the computational complexity of these algorithms are rigorously derived. These theoretical results are helpful for calculating the computational complexity of numerical algorithms that employ triangular factorizations, and also provide guidance for choosing appropriate algorithms for specific problems. Numerical experiments validate the obtained theoretical results.

    Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Block division for general band Doolittle factorization

    Figure 2.  Blocks division for band Cholesky factorization

    Figure 3.  Nonzero entries of different scale matrices with same bandwidth(one blue point corresponding to one nonzero element)

    Figure 4.  Runtime of different scale matrices with same bandwidth

    Figure 5.  Nonzero entries of same scale matrices with different bandwidth

    Figure 6.  Runtime of same scale matrices with different bandwidth

    Figure 7.  Nonzeros of different scale matrices with different bandwidth

    Figure 8.  Runtime of different scale matrices with different bandwidth

    Table 1.  Subscript ranges of the sub-blocks in Figure 1

    $ k $ $ j $ $ t $
    $ U_1 $ $ 2,3,...,r $ $ k,k+1,...,s $ $ 1,2,...,k-1 $
    $ U_2 $ $ 2,3,...,r $ $ s+1,s+2,..., s+k $ $ j-s, j-s+1,..., k-1 $
    $ U_3 $ $ r+1, r+2,..., n-s $ $ k+s-r, k+s-r+1,..., k+s $ $ j-s, j-s+1,..., k-1 $
    $ U_4 $ $ n-s+1, n-s+2,...,n-s+r $ $ k+s-r, k+s-r+1,..., n $ $ j-s, j-s+1,..., k-1 $
    $ U_5 $ $ r+1, r+2,..., n-s+r $ $ k, k+1,..., k+s-r $ $ k-r, k-r+1,..., k-1 $
    $ U_6 $ $ n-s+r+1, n-s+r+2,..., n $ $ k, k+1,..., n $ $ k-r, k-r+1,..., k-1 $
    $ k $ $ i $ $ t $
    $ L_1 $ $ 2, 3,..., r $ $ k+1, k+2,..., r $ $ 1,2,...,k-1 $
    $ L_2 $ $ 2,3,...,r $ $ r+1, r+2,..., r+k $ $ i-r, i-r+1,..., k-1 $
    $ L_3 $ $ r+1, r+2,..., n-r $ $ k+1, k+2,..., k+r $ $ i-r, i-r+1,..., k-1 $
    $ L_4 $ $ n- r+1, n- r+2,..., n $ $ k+1, k+2,..., n $ $ i-r, i-r+1,..., k-1 $
     | Show Table
    DownLoad: CSV

    Table 2.  Testbeds for our experiments

    Item Specification of CPU
    Model Intel(R) Core(TM) i5-7200U
    Clock rate(Hz) 2.5-2.7G
    Memory(GB) 8
     | Show Table
    DownLoad: CSV

    Table 3.  Different problems used for testing algorithms

    Name $ n $ $ r $ $ s $ $ rsn $ Symm Application
    b10 10 2 3 $ 6.00\times 10^1 $ no optimization problem
    olm100 100 2 3 $ 6.000\times 10^2 $ no computational fluid dynamics problem
    olm500 500 2 3 $ 3.000\times 10^3 $ no computational fluid dynamics problem
    olm1000 1000 2 3 $ 6.000\times 10^3 $ no computational fluid dynamics problem
    olm2000 2000 2 3 $ 1.200\times 10^4 $ no computational fluid dynamics problem
    olm5000 5000 2 3 $ 3.000\times 10^4 $ no computational fluid dynamics problem
    sherman1 1000 100 100 $ 1.000\times 10^7 $ yes computational fluid dynamics problem
    sherman2 1080 221 221 $ 5.275\times 10^7 $ no computational fluid dynamics problem
    sherman4 1104 368 368 $ 1.495\times 10^8 $ no computational fluid dynamics problem
    bcsstm07 420 47 47 $ 9.278\times 10^5 $ yes structural problem
    bcsstm10 1086 37 37 $ 1.487\times 10^6 $ yes structural problem
    bcsstm34 588 95 95 $ 5.307\times 10^6 $ yes structural problem
    bcsstm35 30237 5 5 $ 7.559\times 10^5 $ yes structural problem
    bcsstm37 25503 5 5 $ 6.376\times 10^5 $ yes structural problem
     | Show Table
    DownLoad: CSV

    Table 4.  Runtime of different scale matrices with same bandwidth

    Name b10 olm100 olm500 olm1000 olm2000 olm5000
    Runtime/$ s $ 0.0025 0.014 0.065 0.13 0.26 0.70
     | Show Table
    DownLoad: CSV

    Table 5.  Runtime of same scale matrices with different bandwidth

    Name olm1000 bcsstm10 sherman1 sherman2 sherman4
    Runtime/$ s $ 0.13 12.97 78.06 414.45 1133.61
     | Show Table
    DownLoad: CSV

    Table 6.  Runtime of different scale matrices with different bandwidth

    Name bcsstm07 bcsstm34 bcsstm35 bcsstm37
    Runtime/$ s $ 8.25 52.23 10.45 8.78
     | Show Table
    DownLoad: CSV
  • [1] C. W. Cryer, The solution of a quadratic programing problem using systematic over-relaxation, SIAM J. Control, 9 (1971), 385-392. 
    [2] T. A. Davis and Y.-F. Hu, The university of florida sparse matrices collection, ACM Trans. Math. Software, 38 (2011), 25pp.  doi: 10.1145/2049662.2049663.
    [3] J. W. Demmel, Applied Numerical Linear Algebra, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1997. doi: 10.1137/1.9781611971446.
    [4] M. Faverge, J. Herrmann, J. Langou, et al., Mix LU and QR factorization algorithms to design high-performance dense linear algebra solvers, J. Parallel and Distrib Comput., 85 (2015), 32-46.
    [5] G. H. Golub and  C. F. V. LoanMatrix Computations, 3$^{rd}$ edition, Johns Hopkins University Press, Baltimore, 1996. 
    [6] S. Kapoor, D. Vigh, H. Li, et al., Full waveform inversion for detialed velocity model building, in 74th EAGE Conference and Exhibition, Copenhagen Europen Press, 2012.
    [7] X. S. Li, An overview of SuperLU:Algorithms, implementation, and user interface, ACM Trans. Math. Software, 31 (2005), 302-325.  doi: 10.1145/1089014.1089017.
    [8] E. J. PlaskaczM. R. Ramirez and S. Gupta, Nonlinear explicit transient finite element analysis on the Intel Delta, Comput. Syst. Eng., 5 (1994), 1-17. 
    [9] D. M. SmithEngineering Computation with Matlab, 2$^{nd}$ edition, Pearson higher education, Boston, 2009. 
    [10] P. Soudais, Iterative solution of a 3-D scattering problem from arbitrary shaped multidielectric and multiconducting bodies, EEE Transanctions on Antennas and Propagation, 42 (1994), 954-959. 
    [11] Ta rantola, Inversion of seismic reflection data in the acoustic approximations, Geophysics, 49 (1984), 1259-1266. 
    [12] Q. J. YanNumerical analysis, 4$^{th}$ edition, Beihang University Press, Beijing, 2012. 
  • 加载中

Figures(8)

Tables(6)

SHARE

Article Metrics

HTML views(1600) PDF downloads(330) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return