A PROJECTED PRECONDITIONED CONJUGATE GRADIENT METHOD FOR THE LINEAR RESPONSE EIGENVALUE PROBLEM

. The linear response eigenvalue problem aims at computing a few smallest positive eigenvalues together with the associated eigenvectors of a spe- cial Hamiltonian matrix and plays an important role for estimating the excited states of physical systems. A subspace version of the Thouless minimization principle was established by Bai and Li ( SIAM J. Matrix Anal. Appl. , 33:1075-1100, 2012) which characterizes the desired eigenpairs as its solution. In this paper, we propose a Projected Preconditioned Conjugate Gradient ( PPCG lrep ) method to solve this subspace version of Thouless’s minimization directly. We show that PPCG lrep is an eﬃcient implementation of the inverse power iteration and can be performed in parallel. It also enjoys several properties includ- ing the monotonicity and constraint preservation in the Thouless minimization principle. Convergence of both eigenvalues and eigenvectors are established and numerical experiences on various problems are reported.

A further step towards Thouless's minimization principal (1.6) is made in [1] where a subspace version is developed, 1 namely, χpU, V q :" where U, V P R nˆk . It is true that the condition (1.2) implies that both K and M in (1.4) are symmetric and positive definite [1], and computing the first k eigenpairs of (1.5) associated with λ j for j " 1, 2, . . . , k is also referred to the Linear Response Eigenvalue Problem (LREP) in this paper.
Apart from some direct methods [5,8] for obtaining the full eigenvalue decomposition of the matrix H in (1.1), several projection methods [2,3,10,13,14] have been proposed based on this extension of Thouless's minimization principal (1.7) for finding the desired eigenpairs. The noticeable work of Bai and Li in [2] proposes the LO4DCG method which is an efficient realization of the locally optimal block preconditioned CG by making full use of the special structure of (1.5). LO4DCG is an improvement over the block 4D steep descent method introduced in [10]. [3,13] discuss single-vector Lanczos type methods and [14] considers a block Chebyshev-Davidson iteration to compute the desired eigenpairs of (1.5). All these algorithms follow similarly a Rayleigh-Ritz projection technique used in the traditional eigenvalue computations. In particular, different algorithm introduces specific subspace expansion procedure but projects the original LREP onto the resulted subspace to form a smaller size LREP (1.5) for the Ritz pairs.

XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG
In this paper, we suggest a Projected Preconditioned Conjugate Gradient (PPCG lrep) method to solve the subspace version of Thouless's minimization principal (1.7) directly. The proposed PPCG lrep is indeed an efficient implementation of an alternative variables iteration for the minimization principle (1.7). PPCG lrep iteratively computes a sequence of pairs pU j , V j q for j " 1, 2, . . . , and preserves the biorthogonality U T j V j " I k and ensures the monotonicity χpU j`1 , V j`1 q ď χpU j , V j q. For the iteration j, the k columns in either U j or V j can be computed by a projected preconditioned CG as a whole in a block manner, or computed separately in a parallel scheme. Moreover, if appropriate preconditioners of K and/or M are available, they can be incorporated to speed up PPCG lrep. The linear convergence of pU j , V j q to the minimizer of the Thouless's minimization principal (1.7) is proved under a gap assumption λ k ă λ k`1 , and the k approximate eigenpairs can finally be obtained via solving an LREP of size k. Numerical experiences of PPCG lrep, coded both in MATLAB and in C lagrange, on several problems with various choices of preconditioners are reported.
We organize the paper as follows. Relevant preliminary results are stated in Section 2. In Section 3, we first give the framework of an alternative variable iteration, and then describe its implementation details, i.e., the PPCG lrep method. The convergence analysis of PPCG lrep is made in Section 4. Our numerical evaluation of PPCG lrep for LREP on both randomly generated problems as well as two practical problems from computational quantum chemistry is conducted in Section 5, and conclusions are drawn in Section 6.
2. Preliminary results. To facilitate our discussions, we first collect several necessary properties of the LREP in Theorem 2.1, and the reader can refer to [1,Section 2] and [15,16,17,18]  1. There exists a nonsingular Y P R nˆn such that where Λ " diagtλ 1 , λ 2 , . . . , λ n u and X " Y´T. 2. The eigen-decompostion of KM and M K are respectively. 3. If K is also definite, then all λ i ą 0 and His diagonalizable:

4.
H is not diagonalizable if and only if λ 1 " 0, which happens when and only when K is singular.
is the eigenvector corresponding to λ i , and it is unique if (a) λ i is a simple eigenvalue of H, or (b) i " 1, λ 1 "`0 ă λ 2 . In this case, 0 is a double eigenvalue of H but there is only one eigenvector associated with it.
The property (2.2) follows directly from (2.1), which implies that we can alternatively solve the LREP by solving any product eigenvalue problem in (2.2). It is worth noting that only the k smallest positive eigenvalues together with the associated eigenvectors are of interests, and we are concerned with the extreme eigenpair of the product eigenvalue problem in (2.2).
By the Lagrangian multiplier theory, for a KKT pair pU, V q of the Thouless type minimization principle (1.7), we know that there are two matrices Ξ P R kˆk and Υ P R kˆk such that KU " V Ξ and M V " U Υ. The deflating subspace tU, Vu of tK, M u in [1] is essentially the one spanned by the KKT pair pU, V q which satisfies KU Ď V and M V Ď U. For any deflating subspace tU, Vu with dimpUq " dimpVq " k and U ' V K " R n , (2.3) there is a KKT pair pU, V q so that RpU q " U and RpV q " V, and vice versa. The solution of the Thouless type minimization principle (1.7) just corresponds to the extreme positive deflating subspace so that the objective function value χpU, V q achieves the minimum.
Let U P R nˆk and V P R nˆk be the basis matrices of the subspaces U and V satisfying (2.3), then W " U T V is nonsingular. Factorize W as W " W T 1 W 2 with two nonsingular W 1 and W 2 to have a structure-preserving projection matrix H SR Consequently, any eigenpairˆλ, "ŷ x ˙o f H SR yields an eigenvalue λ of H and the corresponding eigenvector of (1.5) with x " U W´1 1x and y " V W´1 2ŷ .
2.2. Canonical angles. In our discussion of the convergence, we will use the canonical angles and angles in M -inner product. For two subspaces A and B of R n with k " dimpAq ď dimpBq " , the angles θ i pA, Bq are defined recursively for i " 1, 2, . . . , k, by [7] cos θ i pA, Bq " max If A P R nˆk and B P R nˆ are orthonormal basis matrices of A and B, respectively, and σ 1 ď¨¨¨ď σ k are the singular values of B T A, then the k canonical angles θ j pA, Bq from A to B are 0 ď θ j pA, Bq :" arccos σ j ď π 2 for 1 ď j ď k.
(2.8) Note that θ 1 pA, Bq ě¨¨¨ě θ k pA, Bq and the angles are independent of the orthonormal basis matrices A and B. Therefore, no confusion can arise to use ΘpA, Bq instead of ΘpA, Bq.

XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG
The value } sin ΘpA, Bq} 2 defines a distance metric between A and B, and where B K is an orthonormal basis matrix of the orthogonal complement of B. Now for the symmetric and positive definite matrix M , we denote the M -inner product by xx, yy M " x T M y. Generalizing the canonical angles given by (2.6) and (2.7) leads to the angles in M -inner product, which we will denote by θ i pA, Bq M and ΘpA, Bq M " diagtθ 1 pA, Bq M , . . . , θ k pA, Bq M u. The canonical angles θ i pA, Bq and the angles θ i pA, Bq M in M -inner product are related as follows [7,Theorem 4.2]: if M " XX T , then the angels between subspaces A and B relative to M -inner product coincide with the canonical angels between subspaces X T A and X T B. Based on this connection, instead of the distance metric } sin ΘpA, Bq} 2 between A and B, we have the distance } sin ΘpA, Bq M } 2 and (2.9) 3. The Projected Preconditioned Conjugate Gradient Method. In this section, we first present the framework of an alternative variables iteration and then discuss an efficient Projected Preconditioned Conjugate Gradient (PPCG lrep) method for its implementation. We assume that K and M are symmetric positive definite from now on.
3.1. The framework of the alternative variables iteration. By relying on the minimization principle (1.7), a natural idea is to alternatively iterate U P R nˆk and V P R nˆk to improve the objective value χpU, V q, each with the other variable fixed. In particular, starting from an initial pair rU 0 , V 0 s P R nˆ2k satisfying U T 0 V 0 " I k , Algorithm 3.1 provides the basic iteration step.
Algorithm 3.1 The framework of the alternative variables iteration for LREP Given a pair rU 0 , V 0 s P R nˆ2k with U T 0 V 0 " I k , the following iteration computes an approximate maximizer for the optimization problem (1.7). 1: j Ð 0; 2: while not convergence do 3: The initial pair rU 0 , V 0 s P R nˆ2k can simply be U 0 " V 0 " re 1 , . . . , e k s, or more generally, U 0 " U pU T V q´1, V 0 " V for any full column rank U and V with rankpU T V q " k. Thus it is seen that the main computational burden in each outer iteration of Algorithm 3.1 is solving the two subproblems (3.1a) and (3.1b). Taking (3.1a) for example, the Lagrangian multiplier theory says that there exists a matrix Ξ P R kˆk satisfying KU´V j Ξ " 0 and U T V j " I k , (3.2) which leads to the solution of (3.1a) By a similar argument, we can easily see that the solution of (3.1b) is Now, by substituting (3.3) into (3.4), we know that the iteration formulation from V j to the next V j`1 is given by (3.5) One can easily realize that an exchange of the subproblems (3.1a) and (3.1b) in Algorithm 3.1 results in the iteration formula for the sequence tU j u with An interesting observation from (3.5) or (3.6) implies that our alternative variables iteration is essentially a special type of inverse power iteration for solving the product eigenvalue problems (2.2). However, we remark that the explicit iteration formula (3.5) is of little use from the computational point of view, as it would be too expensive to compute the matrix-matrix products and inverses; moreover, we will not also rely upon (3.5) to get an approximation solution (using, for example, the Krylov subspace iteration to obtain an inexact solution V j`1 ) because the decreasing of the objective function χpU, V q is not guaranteed. Instead, in the next subsection, we will propose an implementation for obtaining the solution or an inexact solution which guarantees that (1) the objective functioin χpU, V q is non-increasing; that is, χpU j , V j q ě χpU j`1 , V j`1 q; (2) the constraint is preserved, i.e., (3) the columns of U j`1 and V j`1 in (3.1a) and (3.1b), respectively, can be computed in parallel; (4) each column of U j`1 and V j`1 can be computed by an efficient preconditioned CG iteration and therefore, only matrix-vector products are involved. We conclude this subsection by suggesting a stopping criterion for Algorithm 3.1 in line 2. Let pU j , V j q be the current iteration pair, which is an approximation (i.e., an approximate KKT pair) for the Thouless's minimization principal (1.7). Therefore, according to (2.3), pU j , V j q is an approximate deflating subspace of tK, M u, which by U T j V j " I k , (2.4) and (2.5) imply that the k positive eigenvalues (the Ritz values) of serve as the approximations for λ 1 , . . . , λ k . Based upon this observation, we can terminate the iteration if the relative residual error is fulfilled. Accordingly, to measure the accuracy of the Ritz values, we can choose and stop the iteration if χ r ď χ for a given tolerance χ . In our numerical testing, the two measures are combined together as the stoping criterion. The Ritz pairs corresponding to the k positive Ritz values of H j in (3.8) at the terminated iteration j are used as the approximations for the LREP (1.5).
3.2. Solve the subproblems. We now discuss how to solve the subproblems (3.1a) and (3.1b) in an efficient way. The implementation results our Projected Preconditioned Conjugate Gradient (PPCG lrep) method. Take the former as an example. Using the condition U T j V j " I k , for the moment, we first express the constraint U T V j " I k as where P v P R nˆpn´kq is a basis of NullpV T j q. It should be pointed out that, in our computation, P v needs not to be formed explicitly. Substituting (3.11) into (3.1a) yields Note that Therefore, we only need to consider the solutions of each of which can be computed by the (preconditioned) CG iteration [9,Section 16.3]. To be precise, if W " P T vK P v is a precondition so that the symmetric and positive definite matrixK « K and W´1 2 pP T v KP v qW´1 2 « I n´k , then the preconditioned CG iteration [9, Algorithm 16.1] for (3.13) processes as Algorithm 3.2.
There are many nice properties for the (preconditioning) CG iteration (e.g., [9, Chapter 5]), and among them, the following proposition [9, Theorem 5.2] implies the monotonic decrease of the objective value.
Algorithm 3.2 Preconditioned CG for the reduced systems w.r.t. z i Staring from an initial point z p0q i P R n´k and a precondition matrix W " P T vK P v , this preconditioned CG iteration solves inexactly (3.13).
" `1; 8: end while Indeed, Algorithm 3.2 can be taken one step further to yield a projected CG iteration (see e.g., [9, Algorithm 16.2]), which works directly on systems with respect k s. This can be realized by introducing new variables r i , g i and d i by respectively. With these new variables, we can work on the variable u i with the corresponding preconditioner F K " P v W´1P T v and with a special initial point u p0q i (i.e., corresponding to z p0q i " 0 in Algorithm 3.2), and the iteration is summarized in Algorithm 3.3 in which the stopping criterion is pr Algorithm 3.3 Projected preconditioned CG for the systems w.r.t. u i Given a precondition matrix F K " P v pP T vK P v q´1P T v , this preconditioned CG iteration obtains a vector which is an approximation to the ith column of the solution U of (3.1a). i u, which are also the sequences generated by Algorithm 3.2 staring from the initial point z p0q i " 0, and it is true that (3.14)

XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG
Moreover, with the initial point u p0q i and by the relation (3.14), we know that, if is an inexact solution obtained from Algorithm 3.3, it is true that which means that the constraint U T V j " I k is preserved. In addition, by Proposition 1 and (3.14), we know that As a result, we have One can easily see that the vectors u p q i for i " 1, 2, . . . , k generated by Algorithm 3.3 can also be obtained simultaneously by updating the corresponding matrices The other subproblem (3.1b) can be solved similarly. In fact, if we denote by P u a basis of NullpU T j`1 q and by F M " P u pP T uM P u q´1P T u a precondition matrix so thatM « M is symmetric and positive definite, the subproblem (3.1b) can be solved by Algorithm 3.5.
Theorem 3.1. Suppose tU j u and tV j u are the sequences generated from Algorithm 3.1 where each U j and V j pj ě 1q, are computed (inexactly) by Algorithm 3.4 and Algorithm 3.5, respectively, then for j " 0, 1, . . . , Proof. These conclusions follow by induction. Since U T 0 V 0 " I k , by (3.15) and (3.16), we have U T 1 V 0 " I k and χpU 1 , V 0 q ď χpU 0 , V 0 q. Thus, according to the similar argument, the next V 1 generated by Algorithm 3.5 satisfies U T 1 V 1 " I k and χpU 1 , V 1 q ď χpU 1 , V 0 q.

Algorithm 3.5 Projected preconditioned CG for (3.1b)
Given a precondition matrix F M " P u pP T uM P u q´1P T u , this preconditioned CG iteration obtains an (inexact) solution V j`1 " V p q of (3.1b).
For a detailed implementation of Algorithms 3.4 and 3.5, we have the following additional remarks: 1. There are many simple choices of the precondition matrices for F K and F M . For instance, we can take rK,M s " rI n , I n s, rK,M s " rdiagt|K ii |u, diagt|M ii |us or the block diagonal forms. 2. It should be pointed out that in computing f K " F K t and f M " F M t for a given vector or a matrix t, we do not need to form explicitly the basis matrices P v and P u . Instead, they can be computed as follows (see [9,Section 16.3]): 3. The columns of U j`1 and V j`1 can be computed in parallel in which the precondition matrices F K and F M can be used for all i " 1, 2, . . . , k.
4. Convergence analysis. Assuming the subproblems (3.1a) and (3.1b) are solved exactly, we investigate the convergence of the alternative variables iteration Algorithm 3.1 in this section.

4.1.
Accuracy of the deflating subspace. Relying upon the explicit updating formulation (3.5), we first estimate how fast RpV j q and RpU j q approach RpY 1 q and RpX 1 q, respectively, where RpY 1 q and RpX 1 q are the eigenspaces associated with the k smallest eigenvalues of KM and M K, respectively, and Y " rY 1 , Y 2 s, X " rX 1 , X 2 s and . . , λ 2 k u. Taking the advantage of the equal role of K and M in the Thouless's minimization principal (1.7), we can focus on the accuracy of RpV j q in approximating RpY 1 q first, and then apply the results to RpU j q. To this end, we will specially use the distance } sin ΘpV j , Y 1 q M } 2 . For the sake of convenience, in the following discussions, the subscripts are omitted so that V j and V j`1 are denoted simply by V and r V respectively.

XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG
By (2.9) and (2.1), for X " rX 1 , X 2 s with X 1 P R nˆk and X 2 P R nˆpn´kq , we have where we have defined On the other hand, it follows that with r S " X T r V . By (3.5) and (2.1), we have and thus, and by (4.3) Consequently, we have The values µ and r µ in (4.1) and (4.6), respectively, can be further simplified as the following lemma shows.
where λ max p‚q stands for the largest eigenvalue of ‚.
Proof. (4.7) is obvious from (4.1). For (4.8), we have from (4.6) that Using Lemma 4.1, we are able to give a relation between r µ and µ in Lemma 4.2. where ζ :" λ k`1 λ k ě 1. Proof. By (4.7) in Lemma 4.1 and S T S " S T 1 S 1`S T 2 S 2 , we have Similarly, by On the other hand, by the ascending order of λ i ą 0, we have where A ľ B means that A´B is positive semi-definite. Therefore, by (4.11) and (4.10), we have 1 which consequently leads to (4.9), and the proof is completed.
We can present our convergence result for the alternative variables iteration Algorithm 3.1, which also reveals the numerical behavior of the PPCG lrep method as well.
Remark 2. Theorem 4.3 implies that the larger the ζ is, the smaller the σ could be, and therefore, the faster convergence is. Moreover, it can be seen from the initial condition (4.13) that as ζ gets large (i.e., σ gets small), the attractive basin for the local convergence becomes large, too.
Remark 3. Note that for an arbitrary basis pU, V q of pRpX 1 q, RpY 1 qq with U T V " I k , it is true [1, Appendix A] that χpU, V q ě ř k i"1 λ i and the strictly inequality is possible. But as in [1, Appendix A], we can choose another basis pǓ ,V q of pRpX 1 q, RpY 1 qqǓ In other words, an arbitrary basis pU, V q of the deflating subspace pRpX 1 q, RpY 1 qq with U T V " I k may not achieve the minimum ř k i"1 λ i of Thouless's minimization principal (1.7), and the solution to (1.7) is only a special basis pRpX 1 q, RpY 1 qq. Now, let pÛ ,V q be a limit point of the sequence tpU j , V j qu 8 j"0 , which by Theorem 4.3 implies that pRpÛ q, RpV qq " pRpX 1 q, RpY 1 qq. It might also be true that χpÛ ,V q ą ř k i"1 λ i , but this has no any impact on our final task in solving LREP (1.5), as the projected matrix which are the k positive eigenvalues of the matrix H j given in (3.8).
First, it has been known that [1, Theorem 4.1] 0 ď λ i ďλ i for i " 1, 2, . . . , k, and hence ř k i"1 λ i is always a lower bound for ř k i"1λ i . Thus, we are interested in an upper bound for ř k i"1λ i . The following Theorem 4.5 is a Rayleigh-Ritz type perturbation theory for LREP (see [18] for other types of Rayleigh-Ritz approximations for LREP), which is of interest in its own right and also provides the accuracy of ř k i"1λ i in approximating the Thouless's minimum  Let 0 ăλ 1 ď¨¨¨ďλ k be the k positive eigenvalues of Then for any sufficiently small P r0, 1q, we have (4.20)

XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG
Proof. Note from M " XX T , K " pY ΛqpY Λq T , Y T X " I n and Rˆ" Thus, assumptions (4.18) imply that whereĈ " rC T 1 , C T 2 s T P R nˆk andC " rC T 3 , C T 4 s T P R nˆk , both are of orthonormal columns.
Since U T V " I k , by Thouless's minimization principal (1.7), we have on the other hand, ř k i"1λ i is sum of the k positive eigenvalues of H SR given in (4.19). It has been shown [1, Theorem 2.9] that the Ritz values˘λ i for i " 1, 2, . . . , k, are invariant with respect to the choice of basis pȖ ,V q of pRpU q, RpV qq as long as U TV " I k . Thus, we choose a new and special basis pȖ ,V q for pRpU q, RpV qq given byV which satisfiesȖ TV " I k and pRpȖ q, RpV qq " pRpU q, RpV qq. The nonsingularity of the matrixĈ T Λ´1C follows from (4.21), (4.22) and U T V " I k . With this choice of basis, it follows from (4.23) that To this end, we first notice by usingĈ TĈ " I k and }C 2 } 2 ď that and thus, For trpȖ T KȖ q, we havȇ Consider the medium matrix in (4.27) and we have Therefore, for any sufficiently small , which by Lemma 4.4 and (4.27) yields Note that for any sufficiently small , Then for any i " 1, 2, . . . , k, 1´ 2˙˙´1 (4.30)
Revealed by Theorem 4.5, we know that, similar to the Hermitian eigenvalue problem, the accuracy of the sum of the k positive Ritz values is also proportional to the square of the accuracy of the deflating subspaces. As } sin ΘpV, Y 1 q M } 2 and } sin ΘpU, X 1 q K } 2 converge to zero linearly with factor ? σ (by Theorem 4.3), we know that ř k i"1λ i converges to ř k i"1 λ i with factor σ. Corollary 1. Suppose tU j u and tV j u are the sequences generated by Algorithm 3.1 with U 0 and V 0 satisfying (4.15) and (4.13), respectively. Assume ζ " λ k`1 λ k ą 1 and σ is arbitrary satisfying 1 ζ 4 ď σ ă 1. Then for sufficiently large j, the k positive eigenvaluesλ i for i " 1, 2, . . . , k of H j defined by (3.8) satisfy Proof. According to Theorem 4.3, we know that Now use Theorem 4.5 to conclude (4.33) for any sufficient large j.

Numerical experiments.
In this section, we evaluate our proposed PPCG lrep method from several aspects. In the next subsection, we first show that PPCG lrep is a much more efficient implementation of the inverse power iteration (3.5)-(3.6) than the other straightforward one, where the involved linear systems in (3.5) and (3.6) are solved by the preconditioned CG (PCG); our next goal is to test several types of preconditioners for K and M . Both of these tasks are carried out in MATLAB (R2011b) on made-up LREPs with randomly generated dense matrices and sparse matrices from University of Florida Sparse Matrix Collection [4]. Our final goal is to evaluate the parallelization capability of PPCG lrep when columns in U j and V j are computed in parallel. For that purpose, we code the algorithm PPCG lrep in C language with the inner Algorithms 3.2 and 3.3 parallelized by OpenMP, and test two practical problems arising from TDDFT. All of our tests are conducted on a PC under Linux system with Intel Core i5-3230M CPU (2.6 GHz) and 4 GB memory.

Numerical tests for dense random and sparse made-up problems.
Our reported numerical experiments of PPCG lrep (using the block implementation via Algorithms 3.4 and 3.5 to compute pU j , V j q) are based on several types of preconditioners, namely, Iden, Diag, Chol, IChol, and SSOR. Specifically, Iden and Diag are diagonal preconditioners using the identity I n and diagonal matrices of K and M , respectively; Chol and IChol are triangular preconditioners resulting from the Cholesky factorization for dense case and incomplete Cholesky factorization for sparse case, respectively; SSOR uses the symmetric successive over relaxation method as preconditioner. For the stopping criterion, we terminate Algorithm 3.1 whenever either (3.9) or (3.10) is satisfied, where both r and χ are set to be 10´6. The initial U 0 and V 0 are simply re e e 1 , . . . , e e e k s, and the maximum numbers of outer iteration and inner iteration are chosen to be 100 and 500, respectively. In addition, we vary the tolerance (cg tol) in the inner CG from 10´5 to 10´1 0 in order to investigate the overall performance of PPCG lrep with various accurate approximations of (3.1a) and (3.1b). For our first test example, we use a pair of matrices pK, M q of order n " 500 generated randomly by in MATLAB. Our goal is to compute the first 4 smallest positive eigenvalues 0 ă λ 1 ď λ 2 ď λ 3 ď λ 4 . In the following Table 5.1, we compare the performance of PPCG lrep with the straightforward way via the preconditioned CG (PCG) for solving the subproblems (3.1a) and (3.1b). The numerical results listed in Table 5.1 clearly indicate that PPCG lrep is a much more efficient implementation for (3.3) and (3.4) than the direct way by PCG. For more information about the numerical results, in Table 5.2, we report the relative eigenvalue errors (RESeig), the total number of inner CG iterations (inner iter.) and the number of outer loop iterations (outer iter.) of PPCG lrep. The relative error RESeig concerns the accuracy of the sum of eigenvalues associated with the approximate eigenvalues (Ritz values)λ pjq i for i " 1, . . . , k at the jth iteration and is defined as where λ exact i for i " 1, . . . , k are computed by MATLAB function eig for the matrix H given in (1.4).
We observe from Table 5.2 that (1) the outer loop iterations roughly remain the same (about 25) for different preconditioners, but the efficiency in terms of the executing CPU time and the number of inner CG steps is largely dependent on the specific preconditioner, and (2) the relative eigenvalue error RESeig (i.e., the accuracy of the outer loop) decreases when the relevant systems are computed more accurately by the inner loop CG; in particular, for Iden and Chol, the relative error RESeig are roughly of the same order as the tolerance cg tol for the inner CG. In Figure 5.1, we further demonstrate the behavior history of the relative error for individual eigenvalueλ pjq i for i " 1, 2, 3, 4 with respect to the outer loop iteration j. It is observed that various preconditioners perform differently forλ  Next, we test PPCG lrep on two sparse problems, where the matrices K and M are from University of Florida Sparse Matrix Collection [4]. The features of these matrices are presented in Table 5.3, and in the case when the two matrices from the collection have different dimensions, we extract out the leading principal submatrix of the larger one to give K or M of equal size. Moreover, in this test, we set the tolerance in the inner CG to be cg tol " 10´8 to compute the first 4 smallest positive eigenvalues. The computation outputs of PPCG lrep are listed in Table 5.4, and we observed that IChol is a good preconditioner for PPCG lrep in the test problems. As the final part of this subsection, we extend our numerical evaluation of PPCG lrep by comparing another gradient type method, namely, the block 4-D steepest descent algorithm (B4dSD) proposed in [10]. In this test, we set the tolerance cg tol " 10´6 and r " χ " 10´8 for PPCG lrep, while we terminate B4dSD whenever the number of iterations is larger than 5000, or each of the relative residual associated with (1.5) at the approximated solution pλ i ,z i q satisfies }Hz i´λizi } 1 }H} 1 }z i } 1`|λi |}z i } 1 ď 10´8, i " 1, 2, . . . , k.  To demonstrate the performance, besides the test matrices in the form of (5.2), we use another kind of random symmetric and positive definite matrices rQ, "s " qrprandnpnqq, D " diagprandpn, 1q`0.01q, K " Q 1˚D˚Q (5.2) for K and for M similarly. Tables 5.5 and 5.6 summarizes parts of our experiments for (5.1) and (5.2), respectively. Note that for B4dSD, various types of preconditioners including the identity (Iden), the Cholesky (Chol) and the CG (CG) are used for testing, and we refer to [10] for the details of B4dSD as well as the preconditioning technique. By observation, one can see that PPCG lrep can be more efficient for these randomly generated problems. In particular, for the type of (5.1), the condition numbers of K and M are generally large and PPCG lrep can achieve more accurate solutions (with smaller RESeig) than B4dSD.  Both algorithms are applied to solve two LREPs for computing the optical spectra of the Na 2 sodium clusters and silane (SiH 4 ). The test matrices are obtained from the plane wave-pseudopotential turbo TDDFT code, which is part of the Quantum ESPRESSO(QE) package. In particular, the dimensions of Na 2 and SiH 4 are 1864 and 5660, respectively, and we set the stopping criteria r " χ " 10´8 for both versions to compute the first k " 4 smallest positive eigenvalues. The numerical results are displayed in Table 5.7. Although these are preliminary numerical tests for PPCGp lrep, they show the improvements on the block version PPCGb lrep, which is based on level-3 BLAS operations.