The bundle scheme for solving arbitrary eigenvalue optimizations

Optimization involving eigenvalues arises in a large spectrum of applications in various domains, such as physics, engineering, statistics and finance. In this paper, we consider the arbitrary eigenvalue minimization problems over an affine family of symmetric matrices, which is a special class of eigenvalue function--D.C. function \begin{document} $λ^_{l}$ \end{document} . An explicit proximal bundle approach for solving this class of nonsmooth, nonconvex (D.C.) optimization problem is developed. We prove the global convergence of our method, in the sense that every accumulation point of the sequence of iterates generated by the proposed algorithm is stationary. As an application, we use the proposed bundle method to solve some particular eigenvalue problems. Some encouraging preliminary numerical results indicate that our method can solve the test problems efficiently.


1.
Introduction. During the past decade, eigenvalue optimization problems have received remarkable attention in many application fields, among the plethora of applications of eigenvalues in mathematics and engineering. They can be mentioned such as numerical analysis, structural design, quantum mechanics and system dynamics (physical, chemical and biological models). Singular values and condition number of matrices are also defined in terms of eigenvalues. Eigenvalue optimization is an important testing-ground for nonsmooth optimization theory. Optimization problems involving the eigenvalues of a real symmetric matrix arise in many applications, from engineering design to graph-partitioning: two extensive surveys are [19,37].
The analysis for optimization of eigenvalues of a symmetric matrix is a classical subject in applied and numerical analysis. For problems involving eigenvalues, in general it is not just the case to calculate the eigenvalues of a given matrix A.
To obtain fast convergence, here our interest is nonconvex arbitrary eigenvalue optimization problem. Very little research dealing with the general (arbitrary) eigenvalue case has been presented. The aim of this paper is to formulate a general executable algorithm for solving the following particular model problem: and the constrained optimization problem where λ l (·) is the l-th maximum eigenvalue function of A(x), the mapping A : R m → S n , is a symmetric matrix depending smoothly on the variable x, which satisfies A(x) = A 0 + A · x is affine: A is a linear operator, which maps the vector x to the matrix, defined by A · x := m i=1 x i A i ; A i , i = 0, 1, · · · , m are the given real n × n symmetric matrices. We suppose these eigenvalues have been sorted in the decreasing order λ 1 (A(x)) ≥ λ 2 (A(x)) ≥ · · · ≥ λ n (A(x)). Note that programs of the form (2) may be transformed into (1) using exact penalization, so we only need to design the method to solve the unconstrained problem (1) for the sake of simplicity. Generally speaking, one of the main difficulties with the numerical analysis of such problems is that the behavior of the eigenvalues λ l (A(x)), considered as functions of a symmetric matrix, can be nonsmooth, especially at those points where λ l (A(x)) are multiple, which are precisely these points of utmost interest. This leads that we cannot directly apply smooth optimization techniques in solving it. A gradient-based numerical method (e.g., Newton's method) may get into trouble when hitting those points. In addition, theoretical analysis becomes difficult without differentiability. Motivated by above statement, the purpose of this paper is to introduce the recent algorithm formulations for solving the arbitrary eigenvalue optimization. In this work we describe an approach to solve the arbitrary eigenvalue problems using the proximal bundle method. The idea of using the bundle method in nonsmooth optimization can be dated back to 1970's [18,39] for studying some convex nondifferentiable functions. To our best knowledge, up to now there is no successful method for deriving manner of computing this arbitrary eigenvalue (D.C.) problem. Therefore, this paper can be regarded as a novel technique which allows to use the corresponding information to deal with the special class of eigenvalue functions.
Next, to derive the arbitrary eigenvalue function, we study a phenomenon of intrinsic interest in eigenvalue optimization. Let f k (·) be the sum of k largest eigenvalues. Then, the classical eigenvalue optimization problem is often formulated as follows min x∈R m f k (A(x)) := k j=1 λ j (A(x)), k = 1, · · · , n. ( It is not hard to verify that λ 1 (·) is a convex function (the largest eigenvalue function) and λ n (·) is a concave function; and, for l = 2, · · · , n − 1, λ l (·) is the difference of two convex functions, which can be written as λ l (A(x)) = f l (A(x)) − f l−1 (A(x)). Besides, we can easily know the function f k (A(x)) is a positively homogeneous and Lipschitz continuous convex function, according to Ky Fans maximum principle [3,7], so the arbitrary eigenvalue function is generally a nonconvex, positively homogeneous and Lipschitz continuous (In fact the Lipschitz constant is 1) function. The function λ l (·) can be written as a difference of the two convex functions, and is called a D.C. function. Its form is the following , where h 1 , h 2 are both proper convex functions on R n . At optimal solutions of (3) the eigenvalues of the optimal matrix tend to coalesce at some minimum point x * , and the multiplicity of λ k (A(x * )) can be greater than two. The clustering phenomenon plays a central role in eigenvalue optimization. So the function f k is nondifferentiable at x * . The readers can refer to Overton and Womersley (1993, [26]) and Hiriart-Urruty and Ye (1995, [12]) for the characterization of the subdifferential of f k . A more general treatment on computing the subdifferentials of functions of eigenvalues can be seen in [19].
We need to note that the arbitrary eigenvalue functions have special structurethe D.C. form, which will make the effective algorithms to solve such problems possible under mild conditions. When l = 1, the maximum eigenvalue function λ 1 (A(x)) has been studied from the theoretical point in [24,25,32,33]. To the sum of the k-largest eigenvalues, we can refer to [12,26,27]. Many practical problems are derived from the arbitrary eigenvalue problem. For instance, when l = n − 1, λ l becomes the second smallest eigenvalue, which has been studied to solve the fastest mixing Markov process on a graph by Boyd et al. in 2006 [34], and used for homogenous quasi-linear elliptic operators by Robinson [30]. Friedland et al. proposed the inverse eigenvalue problems [8]; Cullum et al. solved the graph partitioning problems [6]; low rank matrix optimization can also be considered as an arbitrary eigenvalue problem [9]; Polak and Wardi presented structural optimization problems [28,29]. Torki had studied the arbitrary eigenvalue function by the way of epi-differentiability [35,36]. In 1995 Hiriart-Urruty and Ye introduced the arbitrary eigenvalue [12], where they presented the first-order sensitivity analysis of all eigenvalues of a symmetric matrix and theory result.
Approaches to solve the convex maximum eigenvalue optimization (i.e., l = 1) can be divided into interior point methods and nonsmooth optimization methods. The interior-point method for solving the convex eigenvalue optimization or semidefinite programming had been gained great interest in [2,13,23]. Besides, most of the interior-point schemes proposed in the early 1990s were the path-following or the potential reduction methods [1,14,21,22]. There has been a recent renewed interest in first order subgradient methods for semidefinite programming (SDP) and eigenvalue optimization, see [10,17]. But, when 2 ≤ l ≤ n, rare work has studied the implementable algorithm in detail. Based on this point, studying the executive algorithm towards (1) is becoming more meaningful. Here our primary contribution is to gain an efficient and fast algorithm with global convergence. To the best of our knowledge, it is the first time that such an optimal method of solving the arbitrary eigenvalue problem has been presented in the literature. The main purpose of this paper is to take full advantage of special structure of the objective function and design a specialized bundle method to solve the nonconvex arbitrary eigenvalue optimization (1). The main motivation of this work is to show that making use of the idea from nonsmooth analysis, we can design an proximal bundle method for solving the nonconvex arbitrary eigenvalues problem. In addition, the global convergence of the proximal bundle method can be proved, where the generated sequence tends to a stationary point of of the arbitrary eigenvalue function.
The rest of the paper is organized as follows. In Section 2, we present the bundle method for the convex maximum eigenvalues. The corresponding proximal bundle algorithm for the arbitrary eigenvalue function (D.C.) is established in Section 3. The main results are given in Section 4: for the arbitrary eigenvalue function λ l , we design an implementable proximal bundle algorithm with aggregate subgradients. Then in Section 5 we present the convergence result. It can be proved that our method are both readily implementable and globally convergent in the sense that all their accumulation points are stationary. As an important application, we use the proposed method to solve a class of eigenvalue problems and report numerical results in Section 6. Finally, we present some concluding remarks by pointing out some possible research topics.
2. The proximal bundle algorithm for the convex maximum eigenvalue. In this section, we present the basic ideas about the general proximal bundle method of Kiwiel [17] in convex case, i.e., f = λ 1 .
Our aim is to produce a sequence {x k } ∞ k=1 ⊂ R m converging to some minimizer of the problem. Consider the minimization of a finite-valued, convex function λ 1 : R m → R. Although λ 1 may be nonsmooth, it is well known that its Moreau-Yosida regularization is a Lipschitz continuously differentiable convex function with the same set of minimizers as λ 1 [11]. The unique minimizerx called the proximal point of x in (4), satisfies and hence represents a substitution of x along the negative gradient of f µ , with µ −1 being a step length. The notion of the proximal point algorithm for minimizing f is then to minimize f µ by updating the current point x to its proximal pointx. The idea behind the proximal bundle algorithm is to approximatex by replacing the λ 1 (y) term in the right hand side of (4) with a cutting plane model based on the subdifferential [11,31] of λ 1 . Suppose that the starting point x 1 is feasible and at the k-th iteration of the algorithm we have the current iteration point x k and some trial points y j ∈ R m and some subset of past iterations J k ⊂ {1, · · · , k}, the so-called "bundle", namely function values λ 1 (y j )(j ∈ J k ) and a set G k = {g j ∈ ∂λ 1 (y j ) : j ∈ J k } of subgradients, furnishes a piecewise linear function which minorizes λ 1 ; in other words, we replace λ 1 by so-called polyhedral cutting plane modelλ 1 k .λ 1 k equivalently can be written in the form with the linearization errors , f or all j ∈ J k . Note that in convex case the subdifferential can be rewritten as see [31]. Then it is not difficult to prove that That is to say, when we study the maximum eigenvalue function, the cutting plane modelλ 1 k is an underestimate for f = λ 1 and the linearization error α j is nonnegative, which measures the approximation of the cutting plane model to the function λ 1 . Usingλ 1 k (y) from (5) in lieu of λ 1 (x) and setting x = x k in (4) yields as the subproblem, which is readily formulated as a quadratic programming. By solving this programming problem, one obtains the optimal subgradient in (8), call itḡ, as well as the unique minimizer y k+1 which approximates the proximal point of x k . If y k+1 meets the 'sufficient descent' criterion for a given constant m L ∈ (0, 1/2), it is accepted as the new iterate by setting x k+1 = y k+1 ; this is termed a serious step. Otherwise, x k+1 = x k , and we have a null step. In either case, the cutting plane model G k is improved by addingḡ and a new subgradient g ∈ ∂λ 1 (y k+1 ) to yield G k+1 , thus making a serious step more likely when (8) is solved in the next iteration. The stopping criterion tests if the upper bound on the maximum attainable descent in this step [left hand side of (10)] falls below a small fraction δ of the magnitude of the objective value [right hand side of (10)], in which case x k is returned as the computed solution. Two additional points need to be addressed to complete the description of the algorithm. First, less significant subgradients in G k may be united together into an aggregate subgradient, to keep storage bounded. Second, for good practical performance, the penalty parameter µ has to be judiciously varied at every iteration, as was already recognized in the early computational works of Lemaréchal. Both issues have been addressed by Kiwiel [15,17]. The following framework form is the resulting algorithm.
Step 2 (Stopping criterion). If (10) is satisfied, stop. Return y * = x k as the computed solution.
Step 4. Replace k by k + 1 and loop to Step 1.

End of the algorithm
In the following we give the solution of the programming (8). Notice that the problem (8) still is a nonsmooth optimization problem. However, it can be rewritten as a smooth quadratic programming subproblem due to its piecewise linear nature, i.e., the problem (8) is equivalent to the quadratic programming with affine constraints: The dual of this quadratic programming can be formulated explicitly, yielding very instructive interpretations, which is a critical point to understand the different variants of bundle methods. In the following, k is the unit simplex of R k defined are the linearization errors between y i and x k .

Proposition 1. [11]
For µ k > 0, the unique solution of the augmented model problem (8) is whereθ ∈ R k solves Furthermore, there holdŝ and the primal-dual relation The convergence of the algorithm, which requires only that the updated bundle containḡ and one new subgradient from ∂λ 1 (y k+1 ), may be summarized as follows. [15,17]) Suppose that Algorithm 2.1 is implemented with δ = 0 in the stopping test (10). If the algorithm executes a finite number of serious steps, i.e., performs only null steps from some iteration k onwards, then x k ∈ arg min f . If the algorithm generates an infinite sequence of serious steps, then {x k } is a minimizing sequence for f , and is convergent to a minimizer of f , once one exists.
3. Proximal bundle method for D.C. arbitrary eigenvalue. In this section, we will describe the ideas of the proximal bundle method for the nonsmooth and D.C. eigenvalue function minimization.
Because the eigenvalue functions λ l are locally Lipschitz, we next give the Clarke generalized gradient for local Lipschitz function.
Moreover, for such f the subdifferential of f at x can be written which is a well-defined, nonempty, convex and compact subset of R m . We also assume that at each point x both the objective function value λ l (x) and some subgradient g ∈ ∂λ l (x) can be obtained.
The basic idea of the bundle methods is to approximate the whole subdifferential of the objective function by only one arbitrary subgradient at each point. In practice, this is done by gathering subgradients from the previous iterations into a bundle. Suppose that at the k-th iteration of the algorithm we have the current iteration point x k and some trial points y j ∈ R m (from past iterations) and subgradients g j ∈ ∂λ l (y j ) for j ∈ J k , where the index set J k = ∅ and J k ⊂ {1, · · · , k}.
The idea is to approximate the objective function λ l below by a piecewise linear function, that is, λ l is replaced by the so-called cutting-plane model where is the linearization error. Nonconvexity brings many difficulties when compared with the convex case. If l = 1(i.e., the maximum eigenvalue function),λ l k is an underestimate for λ l and α k j ≥ 0 for all j ∈ J k , which measures how good an approximation the model is to the original problem. In the nonconvex (D.C.) case, these facts are not valid anymore, and it may yield negative linearization error value which has a great influence on the convergence of algorithm. To overcome this difficulty, the linearization error α k j can be replaced by the so-called subgradient locality measure where γ ≥ 0 is the distance measure parameter (γ = 0 if l = 1). Then obviously . For considerations on computing and storage, we adopt the following so-called subgradient locality measure strategy where , so the quality β k j indicates how far g j is from ∂λ l (x k ). Hence, the k-th search direction is obtained by solving the following problem: is in terms of the kth polyhedral approximation to λ l . In convex case, (22) reduces to the search direction finding subproblem of the method in [17]. These observations give ground for deriving similar processes described therein.
Proposition 2. For µ k > 0, the unique solution of the augumented model problem (22) is Furthermore, there holdŝ and the primal-dual relation Next proximal bundle algorithm for arbitrary eigenvalue function is organized as follows.

End of the algorithm
It is not hard to find that the number of J k increases as the iterations goes along. In practice this strategy presents serious problems with storage and computation after a large number of iterations. To overcome this drawback, we shall present the subgradient aggregation strategy, which is to aggregate the constraints generated by the subgradients and thus to keep the number of constraints bounded. 4. Proximal bundle algorithm with subgradient aggregation. In the last section, at each iteration of Algorithm 3.1, every previously computed subgradient generates one linear inequality in the current search direction finding subproblem; there are many inequalities at the k-th iteration. When the size of the bundle becomes too big, it is necessary to compress it and clean the model. Here we introduce rules for aggregating the past subgradient information, i.e., this leads to the concept of a first-order proximal bundle algorithm with subgradient aggregation.
First we describe search direction finding subproblems based on subgradient aggregation. For convenience, we suppose that f is convex and consider the kth iteration of the method with subgradient aggregation. For search direction finding the scheme replaces the past subgradients (g j = g(y j ) ∈ ∂λ l (y j ), f k j = λ l (y j )+g T j (x k −y j )), j = 1, · · · , k −1, by just one aggregate subgradient (p k−1 , f k p ), which is their convex combination calculated at the (k − 1)-th iteration. The former linearizations of λ l f j (x) = λ l (y j ) + g j , x − y j = f k j + g j , x − x k and the (k − 1)-th aggregate linearizatioñ for all x ∈ R m when l = 1. Therefore the k-th aggregate polyhedral approximation λ l k agg to f , defined by choosing a set J k ⊂ {1, · · · , k}, k ∈ J k , where the linearization errors satisfy We extend the above construction to the D.C. case. Suppose that at the k-th iteration we have the (k − 1)-th aggregate subgradient (p k−1 , f k p , s k p ) ∈ R m × R × R that satisfies the following generalization of (27): whereθ k−1 j ≥ 0, for j = 1, · · · , k − 1, k−1 j=1θ k−1 j = 1. We define the following aggregate subgradient locality measures The value of β k p shows how far p k−1 is from ∂λ l (x k ). Moreover, only local subgradients g j with small values of s k j contribute to p k−1 . Hence, in this case p k−1 is close to ∂λ l (x k ) by the local upper semicontinuity of ∂λ l .
So we may define the k-th aggregate polyhedral approximation to D.C. function λ l as followsλ and use it for finding the k-th search direction d k that solves the problem min d∈R mλ l k agg (x k + d) + The above problem (33) can be recast as the following quadratic programming problem for (d k ,ŵ k ) ∈ R m × R: then the variableŵ k =λ l k agg (x k + d k ) − λ l (x k ) approximates the derivative of λ l at x k in the direction d k , which can be used for line searching.
Next we show how to update the aggregate subgradient recursively. Set θ k j , j ∈ J k , and θ k p be the Lagrange multipliers of the k-th subproblem (34). These multipliers form a convex combination where the sets {J k } are selected recursively so that and M ≥ 2 is a user-supplied, fixed upper bound on the number of subgradient. Then one can control the size of subproblem (34). Therefore, the current subgradient is and this leads to the following property according to (30) (p k ,f k p ,s k p ) ∈ conv{(g j , f k j , s k j ) : j = 1, · · · , k}.
We can update the linearization values by letting Because s k+1 j = s k j + x k+1 − x k for j ∈ J k , we will set s k+1 p =s k p + x k+1 − x k . So the above updating formulas and (36) yield We have completed the recursion, so these subgradients (g j , f k j , s k j ) for j ∈ {1, · · · , k} \ J k , need not be stored. Now we state an algorithmic procedure for solving the arbitrary eigenvalue problem considered.
Begin of the algorithm Algorithm 4.1 Step 0 (Initialization). Select the starting point x 1 ∈ R m and a final accuracy parameter ≥ 0. Choose fixed positive line search parameters m L , m R , m β ,ā and t,t ≤ 1 and m L + m β < m R < 1, and a distance measure parameter γ > 0, an initial weight µ 1 > 0, a lower bound for weights µ min > 0. Set Set a 1 = 0 and the reset indicator r 1 a = 1. Set the counter k = 1.
Step 1 (Direction finding). Find the solution (d,ŵ) to the following k-th quadratic programming problem where Compute the Lagrange multipliers θ k j , j ∈ J k , and θ k p of (38), setting θ k Step 2 (Stopping criterion). If δ k ≤ , stop. Otherwise, go to Step 3.
Step 3 (Line search). By a line search procedure as given below, find two step sizes t k L and t k R such that 0 ≤ t k L ≤ t k R , and the two corresponding points defined by where β(x, y) = max{|λ l (y) + g(y), x − y |, γ x − y 2 }.
End of the algorithm Next, let us see some remarks about the algorithm.
Remark. (i). By Proposition 2, the k-th subproblem dual of (38) is to find values of multipliers θ k j , j ∈ J k , and θ k p to Any solution of (58) is a Lagrange multiplier vector for (38) and it yields the unique solution (d k ,ŵ k ) of (38) as follows where p k is given by (41). Moreover, any Lagrange multipliers of (38) also solve (58). In particular, they form a convex combination: Thus one may equivalently solve the dual search direction finding subproblem (58) in Step 1 of Algorithm 4.1.
(ii). About the stopping criterion in Step 2, we have the following explanation. The value of the locality measureβ k p given by (42) shows the distance of p k and ∂λ l (x k ). Since the value of the locality measureβ k p is small, a small value of δ k indicates both that p k is small and p k is close to ∂λ l (x k ). Thus 0 is close to ∂λ l (x k ), i.e., x k is an approximately stationary point. Generally, δ k may be thought of as a measure of stationary of x k . In the D.C.(nonconvex) case the stopping criterion is a generalization of the standard criterion of a small value of the gradient of λ l .
(iii). As for the line search in Step 3, it is easy to obtain that the line search is always entered with Hence the criterion guarantees that the objective value at x k+1 is significantly smaller than the one at x k if x k+1 = x k . This ensures that the algorithm doesn't take infinitely many serious steps (t k L > 0) with no significant improvement in the objective function value, which could impair convergence.
(iv). The parametert > 0 is introduced to decrease the number of function and subgradient evaluations at line searches. Here the parametert distinguishes "long" serious steps with t k L ≥t, and "short" serious steps with 0 < t k L <t, for which (47) is satisfied. It will be seen that as far as convergence analysis is concerned, short serious steps are essentially equivalent to null steps (t k L = 0). If t k L ≥t, i.e., a significant decrease of the objective value occurs, there is no need for detecting discontinuities in the gradient of λ l , so the algorithm sets g k+1 = g(y k+1 ). On the other hand, if t k L <t, which indicates that the algorithm is blocked at x k due to the nondifferentiability of λ l , then the criterion (47) ensures that the new subgradient g k+1 , with y k+1 and x k lying on the opposite sides of a discontinuity of the gradient of λ l , will force a significant modification of the next search direction finding subproblem. The criterion (48), which is related to the distance resetting test, prevents the algorithm from collecting the irrelevant subgradient information.
Clearly, the line search rules (45)-(48) are so general that one can devise many procedures for implementing Step 3. For completeness, we give below a procedure for finding step sizes t L = t k L and t R = t k R , which is based on the ideas of Mifflin [20]. Now we shall present a line search algorithm which finds step sizes t k L and t k R such that the requirements in Step 3 hold. We assume that the line search parameters are fixed: m L ∈ (0, 1/2), m R ∈ (m L , 1),t ∈ (0, 1], η :

Begin of the algorithm Algorithm 4.2 Line Search
Step 1. Set t k L := 0 and t := t U := 1.
Step 3. If t k L ≥t or β(x k + td k ) > m β δ k , set t k R := t k L and stop. Otherwise calculate ξ ∈ ∂λ l (x k + td k ) and set t k R := t and stop. Step 4. If t k L := 0, then set t := max{η · t U , and if t k L > 0, then set t := 1 2 (t k L + t U ).
Step 5. Go to Step 2. End of the algorithm The last but not least important question concerning the proximal bundle method is the choice of the weight µ k . We can adopt the simplest strategy: keep it constant µ k ≡ µ f ix . However, this will lead to several difficulties. Due to the relationship (59) in Remark (i), we observe that: (i) If µ f ix is very large, the values of δ k and d k will be small, almost all steps are serious and the descent becomes slow; (ii) If µ f ix is very small, the values of δ k and d k will be large, each serious step will be followed by many null steps. Therefore, we consider µ k as a variable and update it when necessary for these reasons. In what follows we shall present a safeguarded quadratic interpolation technique by Kiwiel [17] for updating µ k .
We denote by ε k v the variation estimate which corresponds to the size of p k +β k p and by i k µ the step counter which counts the number of long serious steps with t k L = 1 and null steps since the latest change of µ k . These variables are initialized by ε k v := +∞, i k µ := 0. In addition, we define µ int k+1 := 2µ k (1 + [λ l (y k+1 ) − λ l (x k )]/δ k ). Begin of the algorithm Algorithm 4.3 Weight Updating Step 1. Set µ := µ k .
Step 3. If t k L = 0 go to Step 5.
End of the algorithm 5. Convergence analysis. In this section, we will study the convergence of Algorithm 4.1. First the convergent result is given. Before presenting our results, let us see a proposition, which is stated in Ref. [4].
When l > 1, the eigenvalue function λ l is not convex any more, and the optimality condition of Proposition 3 is not sufficient, so the methods cannot guarantee even local optimality of the solution. Hence we can only look for some candidates, called stationary points satisfying the above condition (65). We suppose that each execution of Line Search Procedure 4.2 is finite. Naturally, the convergence results assume that the final accuracy tolerance is set to zero. In the absence of convexity, we will content ourselves with finding the stationary points for λ l . Our principal result states that Algorithm 4.1 either terminates at a stationary point or generates an infinite sequence {x k } whose accumulation points are stationary for λ l .
For finite termination case, we have the following conclusion.
Lemma 5.1. Suppose that k ≥ 1 is such that Algorithm 4.1 did not stop before the k-th iteration. Then there exist the numbersμ k i and the vectors (y k,i , f k,i , s k,i ) ∈ R m × R × R, i = 1, · · · , M , satisfying whereĴ k p = J k p (k) ∪ {j : k p (k) < j ≤ k}, M = m + 3, and k p (k) = max{j : j ≤ k and θ j p = 0}. Proof. It follows from Lemma 3.2 of [38], Caratheodory's theorem, and the fact that g j = g f (y j ) for 1 ≤ j ≤ k.
Proof. We discuss in two cases.
(2) Suppose that γ = 0. Then the function is convex (i.e., l = 1) and (70) gives , f or all z ∈ R m and i = 1, · · · , M according to the definition of the subdifferential. Multiplying the above inequality byμ i and summing, we gain that for each z (69) and (71). So by the definition of subdifferential in convex case,p ∈ ∂λ 1 (x).
Note that the results mentioned shows that γ = 0 only if l = 1 (i.e., the maximum eigenvalue function); otherwise γ > 0. Theorem 5.3. If Algorithm 4.1 terminates at the k-th iteration, k ≥ 1, then the pointx = x k is stationary for λ l .
About the case of generating the infinite sequence {x k }, i.e., suppose that the algorithm does not terminate, the succedent result is useful: or equivalently, there exists an infinite set K ⊂ {1, 2, · · · , } such that x k →x and δ k → 0.
Proof. The equivalence follows from the fact that, since we always have δ k = 1 2 p k 2 +β k p andβ k p ≥ 0, δ k is nonnegative for all k. So it implies p k → 0 and β k p → 0. For the rest proof, we can see Ref. [38].
Proof. According to Lemma 5.4, we only need to prove that δ k → 0. For contradiction purposes, assume that δ k ≥δ > 0 for someδ and all large k ∈ K. For x k →x and the continuity of (45) gives t k L δ k → 0. But δ k ≥δ for all large k ∈ K, hence t k L → 0, and we obtain x k+1 − x k → 0 from Lemma 3.3 in [16]. Thus both {x k } k∈K and {x k+1 } k∈K converge tox, and the properties of β(·, ·) imply that β(x k , x k+1 ) → 0. So we have t k L <t and β(x k , x k+1 ) < m β δ k for all large k ∈ K. Since t k L → 0 and δ k ≥δ > 0 for large k ∈ K, and we obtain a contradiction with (46) and the definition of K. Therefore, we have δ k → 0. We complete the proof.
Next we show that the generated sequence in Algorithm 4.1 converges to a stationary point of of D.C. function λ l .
Theorem 5.6. Each accumulation point of the sequence {x k } generated by Algorithm 4.1 is a stationary point for λ l . Moreover, if the level set S = {x ∈ R m : λ l (x) ≤ λ l (x 0 )} is bounded and the final accuracy tolerance is positive, then Algorithm 4.1 terminates in a finite number of iterates.
Proof. Combining Lemma 5.4 with Lemma 5.5, the first assertion is holding.
For the second assertion, we prove by contradiction, then the infinite sequence {x k } ⊂ S would have an accumulation point, sayx. Then Lemma 5.5 would yield (72) and denoteδ be the optimal value of the k-th dual search direction finding subproblem (58), and 0 ≤ δ k ≤δ k , and the algorithm should stop owing to δ k ≤ for sufficiently large k, which contradicts with this assumption.
6. Computational experiments. In this section, for preliminary validation of our approach, we wrote a Matlab implementation of Algorithm 4.1, and analyzed its performance on some test problems. We report the preliminary numerical results on a numerical implementation of the first-order proximal bundle method (Algorithm 4.1) for constrained eigenvalue problems. The goal is to provide a proof-of-concept implementation, not a complete benchmarking of the algorithm. All numerical experiments have been implemented on a computer in Matlab 7.8.0(2012a) with Windows 7 system Intel Core TM i5-2400 Duo CPU 3.10 GHz processor and 8.00 GB memory RAM. We consider the following constrained eigenvalue optimization min λ l (A(x)), l = 1, · · · , n s.t. where x i A i , and A i , i = 1, · · · , m are given n × n symmetric matrices.
According to the exact penalization, problems of the form (74) may be transformed into (1), even though it may be preferable to use the structure of (74) explicitly. So problems (74) may be equivalent to (1) via the exact penalization. We denote the exact penalty function where the penalty coefficient ν ≥ ν * for some threshold value ν * > 0, [x] + stands for the maximum of 0 and x. Hence, here we only need to solve the following problem Our tests are based on the following examples and the matrices A i of the objective functions are generated randomly by computer. All numerical examples are of the form like (75).
The first weight µ 1 is chosen by µ 1 := g 1 . The distance measure parameter γ = 1.0. In the implementation, optimality is declared when the stopping criterion is satisfied. Table 1 reveals some relevant numerical results for the problem. As Boyd et al. proposed in Ref. [34], the second eigenvalue has special significance in some practical problems. So the second largest eigenvalue will be computed here. For each run, and for this algorithm, we give the total number of function and subgradient evaluation (i.e., calls to the oracle), denoted by #f /g. All the coefficients of symmetric matrices A i are randomly generated by computer. We show some relevant data for the problems described above including the dimensionality of the problem (n), n n and n d respectively stand for the numbers of iterations of null steps and descent steps, the computed solutions x k and the computed function values f x k , and Time represents cpu running time, x 0 is the starting point, k is the total number of iterations. The optimal solutions and optimal function values are calculated, Res stands for the relative error, defined by Since Algorithm 4.1 presented in this paper focuses on minimizing the arbitrary eigenvalue function in which exists some structure, there exist limited testing methods suitable for comparison. Moreover, numerical experiments were carried out with different computers, different codings and precisions, hence it can be considered as an indication of performance for evaluating the merits of the compared algorithms.
We compare the numerical implementation of our algorithm with the generalized cutting plane method [16] and the corresponding results are listed in Table 2. The   Table 2 show that Algorithm 4.1 outperforms better on almost all of the test problems. In our opinion, the obtained results show a reasonable performance of our method. In the test cases, the method succeeds in obtaining a reasonably high accuracy. As a conclusion from our limited numerical experiments we can state that our method works very efficiently, and requires less function computational time, executes more descent-step and emphasize that it is worthwhile to apply our algorithm to this class of eigenvalue optimization. 7. Conclusions. In this paper, we mainly propose an explicit proximal bundle algorithm based on the nonsmooth analysis theory for solving a special class of eigenvalue function: arbitrary eigenvalue function λ l which is a class of D.C. functions. Because the objective function is a nonsmooth, nonconvex (D.C.) function, the bundle techniques are applied to alleviate these difficulties. We design the new fast algorithm with subgradient aggregation idea applied to the class of the functions here. Moreover, we can prove its global convergence, in the sense that any accumulation point of the sequence generated by our method is stationary. In addition, some computational results show that the proposed method is effective and has good performance in practice.
Although the method described here has already been proved useful, there are some issues worth pursuing further work before the idea is complete: we only study the implementable algorithm to solve this special class of eigenvalue optimization (D.C. programming), utilizing second-order information of eigenvalue function, we will continue to try to obtain faster algorithms. Meanwhile, some improvement of our algorithm will be strived to extend to other large scale optimization problems. This, however, seems to be a nontrivial task, further improvement is expected from more sophisticated implementations. Detailed work will be researched in later papers.