THE SPACE DECOMPOSITION METHOD FOR THE SUM OF NONLINEAR CONVEX MAXIMUM EIGENVALUES AND ITS APPLICATIONS

. In this paper, we mainly consider optimization problems involving the sum of largest eigenvalues of nonlinear symmetric matrices. One of the diﬃculties with numerical analysis of such problems is that the eigenvalues, regarded as functions of a symmetric matrix, are not diﬀerentiable at those points where they coalesce. The U -Lagrangian theory is applied to the function of the sum of the largest eigenvalues, with convex matrix-valued mappings, which doesn’t need to be aﬃne. Some of the results generalize the corresponding conclusions for linear mapping. In the approach, we reformulate the ﬁrst- and second-order derivatives of U -Lagrangian in the space of decision variables R m under some mild conditions in terms of VU -space decomposition. We characterize smooth trajectory, along which the function has a second-order expansion. Moreover, an algorithm framework with superlinear convergence is presented. Finally, an application of VU -decomposition derivatives shows that U -Lagrangian possesses proper execution in matrix variable.

1. Introduction. Eigenvalue optimization problems have caused considerable research interest recently. As mentioned in [20], it has played an important role from both theoretical and practical aspects in 1980s. Early contributions owe to [5,20,27]. Eigenvalue optimization has a wide range of applications in several areas of engineering and science. This spectrum includes composite materials [4], quantum computational chemistry [39], optimal system design [2,25], shape optimization [6], pole placement in linear system theory, robotics, relaxations of combinatorial optimization problems [8], experimental design [37] and much more. Optimization problems involving eigenvalues of symmetric matrices arise in many applications, see e.g., [20] for survey and numerous references therein.
For the Euclidean space of all n × n real symmetric matrices S n , denote A, B = tr(AB) the inner product for A, B ∈ S n . The n eigenvalues of A, denoted by λ i (A), i ∈ {1, · · · , n}, are arranged in the decreasing order λ 1 (A) ≥ λ 2 (A) ≥ · · · ≥ λ n (A). The sum of the k largest eigenvalues functions is defined as It is known from [28] that λ 1 (A) and σ k (A) are convex functions. Optimization problems involving eigenvalues of symmetric matrices have kinds of theoretical applications. Many classical papers have been addressed in the forms as well as the characterizations of the subdifferential of the function σ k (see [11,28,31,36] and see [7,26,27] for the special case when k = 1). For the purpose of developing efficient algorithms on eigenvalue optimization, nonsmooth analysis of eigenvalue problem plays an essential role. In the 1970's and 1980's first-order algorithms for optimization of nonsmooth functions were developed and applied to eigenvalue optimization problems. At the same time various attempts were made to develop a second-order theory for nonsmooth optimization problems. An approach to a second-order analysis was suggested by Overton [27] and developed further in [9,18,24,28,29], others also apply Overton's method to some particular problems [12,13,14,15,16,17]. As an application, Liu et al [21,22,38] considered the discretized Kohn-Sham (KS) equation which is a fundamental nonlinear eigenvalue problem arising from the density functional theory under certain conditions, and compute k algebraically smallest eigenvalues of a given real symmetric matrix.
In this paper, we study the following composite optimization problem where A : R m → S n is a continuously differentiable mapping. When A is affine we call (1) an affine problem; and a general problem otherwise. The composite function σ k • A is neither smooth nor convex in general, but has local properties similar to convex functions [7,28,35]. This property has attracted much attention on the composite optimization problems (1). The problem (1) has been proposed since 1970s by J.Cullum et al, Overton and Womersley, J.-B. Hiriart-Urruty and D.Ye in 1995 presented its optimality conditions and sensitivity analysis respectively, see [11] and [28]. In this work, we will study the second-order theory towards eigenvalue optimization (1). One of the main difficulties with numerical analysis of such problems is that the eigenvalues, considered as functions of a symmetric matrix, are not differentiable at those points where the multiplicity of λ k (A(x)) is more than one. This causes that the problems be typically nonsmooth. Typically, such is the case at a minimizer in (1). A theoretical explanation of this phenomenon can be found in [31], in truth for a more general eigenvalue function. Furthermore, there is no well-difined Hessian, or second-order object, and a straightforward application of a Newton-type method is not possible. Since the existence of well-defined Hessians is crucial for developing fast algorithms, this paper focuses on identifying second-order information contained in σ k , in spite of its nonsmoothness. Here we assume that the multiplicity r of λ k (A(x * )) at an optimal point x * is known, then the approach consists of minimizing the maximum eigenvalue subject to the constraint that its multiplicity is constant, i.e., on certain smooth manifold: the set M r of matrices whose k-th largest eigenvalue has multiplicity r. We adopt a local C 2 -parametrization of (P) to develop a successive quadratic programming method in this paper. In [19], the authors present the so-called VU-decomposition theory. They show that the objective function f appears to be smooth on the U-subspace and may have some kind of related Hessian, where the nonsmoothness of f is concentrated essentially on V-subspace. Here our motivation is to apply the U-Lagrangian theory to more general case: the function of sum of the largest eigenvalues, with the convex matrix mapping, which is defined in section 2.
The motivation is to show that in view of the idea from Oustry [26], who considered directly the largest eigenvalue function in affine form of inner mapping, we try to obtain VU space decomposition of the sum of the largest eigenvalues with the convex matrix mapping, and we don't make the assumption that it is a linear mapping, i.e., our results are applicable to a broad problem class, in which nonlinear models are permitted. A major difference on linear operator is that the results about the space decomposition relies on some point now. Meanwhile, we assume the regular condition holds, under which we can use the vectors of V-space to generate an implicit function therein from which a smooth trajectory tangent to U can be defined. This condition plays a role similar to that of constraint qualification conditions in nonlinear programming. Once it is satisfied, σ k has a second-order expansion along the associated trajectory. We indicate how these theoretical results may be used for effective development. The resulting VU-decomposition algorithms make a step in the V-subspace, followed by a U-Newton move in order to obtain superlinear convergence. Besides, the eigenvalue function in matrix variable is also researched. We present a weaker condition than transversality, under which the corresponding results are satisfied.
Next we introduce the basic notation and terminology in the remainder parts. Denote S n be the space of n × n symmetric matrices, S + n stands for the cone of n × n positive semidefinite symmetric matrices. Let proj U : R m → U be a projection operator which projects R m onto the subspace U, proj * U : U → R m is the canonical injection U u → u ⊕ 0 ∈ R m . The sign A · B := trAB denotes Fröbenius scalar product of A, B ∈ S n , and A † indicates Moore-Penrose inverse of A, A : S n → R m is the adjoint operator of the linear operator A : R m → S n . Let q −p ≥ 1 be the multiplicity of k-th largest eigenvalue λ k (A) of A , i.e., A lies on the submanifold M(p, q) := {A ∈ S n : λ p (A) > λ p+1 (A) = · · · = λ q (A) > λ q+1 (A)}, where M(p, q) is a C ∞ -submanifold of S n . The eigenvalue λ p+1 (A) ranks first in the group of eigenvalues, which are equal to λ k (A) and is called the leading eigenvalue. Let E p,q (A) be the eigenspace associated with λ p+1 , · · · , λ q , Q p,q (A) := Q 1 (A) be an orthonormal basis of E p,q (A), P 1 (A) be an orthonormal basis associated with λ 1 , · · · , λ p . Denote T M (A) and N M (A) are respectively the tangent and normal spaces to the submanifold M at A ∈ M. The rank of the matrix A is denoted by rank(A). The notation DA(x) is used for the differential of the mapping A(·) at x, where A i (x) = ∂A(x)/∂x i are n × n partial derivatives. Denote A B and A B respectively mean that, A − B is positive definite and positive semidefinite. For other signs, we can refer to [10,32].
This article is organized as follows. In Sect.2, we recall some definitions about U-Lagrangian theory related to the method and matrix convex concepts. In Sect.3, we give first-and second-order development of U-Lagrangian of sum of the largest eigenvalue function σ k . Second-order expansion of σ k is derived under transversality condition. In Sect.4, we illustrate the new approach on an application: eigenvalue function with respect to matrix variables, which displays its foreground of VU theory; list their VU decomposition results. Finally, some possible research topics are pointed out.
2. Preparation and preliminary results. This section contains some background material on nonsmooth analysis and preliminary results which will be used later. We only give concise definitions and results that will be needed in this paper. For a presentation of the U-Lagrangian theory in a more general framework, we refer to [19].
We start by reviewing VU-space decomposition and U-Lagrangians. For a convex function f at a given pointx ∈ R n where f is finite, let g be any subgradient in ∂f (x). Then, letting linY denote the linear hull of a given set Y , the orthogonal subspaces V(x) := lin(∂f (x) − g) and U(x) define the VU-space decomposition atx of [19], i.e., V(x) and U(x) are respectively the subspaces parallel and orthogonal affine hull of the set ∂f (x). These spaces represent the directions fromx for which f behaves nonsmoothly (V(x)) and smoothly (U(x)). The goal is then to find a smooth function which describes f in the directions of U(x). We use the compact notation ⊕ for such decomposition, and write R n = U(x) ⊕ V(x), as well as From (2), the relative interior of ∂f (x), denoted by ri∂f (x), is the interior of ∂f (x) relative to its affine hull, a manifold that is parallel to V(x). Accordingly, where B(0, η) denotes a ball in R n centered at 0, with radius η. Likewise, we can get the following another two equivalent definition forms, which are stated in [19].
Proposition 1. For a finite-valued convex function f atx ∈ R n , g ∈ ∂f (x) are given, we have (i): U(x) is the subspace where f (x; ·) is linear. In other words, U(x) is the subspace where f (x + ·) appears to be "smooth" at 0, i.e., and V(x) = U(x) ⊥ . (ii): For any g ∈ ri∂f (x), U(x) and V(x) are, respectively, the normal and tangent cones to ∂f (x) at g.
For a finite valued convex function f on R n , given a subgradientḡ ∈ ∂f (x) with V-componentḡ V , the U-Lagrangian of f at the primal-dual pair (x,ḡ), depending onḡ V , is as follows where ·, · V denotes a scalar product induced in the subspace V. When the infimum in (3) is attained, the associated set of V-space minimizers is defined by Ifḡ ∈ ri∂f (x), W (u) is nonempty, each U-Lagrangian is a convex function that is differentiable at u = 0, with We summarize some properties of L U in the following theorem, which will be used in later sections. For details, see Ref. [19].
Theorem 2.1. The function L U is well-defined and convex. In addition, if g ∈ ri∂f (x), the set W (u) is nonempty and the following properties hold: where v is taken arbitrary in W (u). 2. When u = 0, we have W (0) = {0} and L U (0) = f (x). Moreover, L U is differentiable at 0 and 3. The multifunction u → ∂L U (u) is continuous at u = 0: 4. For all u ∈ U(x), we have 5. Denoting by ∂f (x + u ⊕ w(u)) ∩ (g + U(x)) the right-hand side of (8), the multifunction u → ∂f (x + u ⊕ w(u)) ∩ (g + U(x)) is continuous at 0: 6. For all u ∈ U(x), W (u) is a nonempty compact convex set, which satisfies and the multifunction u → W (u) is continuous at u = 0: Then, there exists a scalar δ such that 0 < δ ≤ δ 0 and a unique mapping The mapping v is C ∞ , and at u = 0 we have (2): Let A * ∈ M; then there exists δ > 0 such that The partial order in the space S n with respect to the cone S + n is called the Löwner partial order. That is, for A, B ∈ S n , A B if and only if A − B is a positive semidefinite matrix.
According to Bonnans and Shapiro [3], we have the following definition.
We say that the mapping G : R m → S n is matrix convex (on the convex set Q) if it is convex with respect to the Löwner partial order. That is, the following inequality holds for any t ∈ [0, 1] and any x 1 , x 2 ∈ R n (any x 1 , x 2 ∈ Q).
Note that matrix convex is also called positive semidefinite convex (psd-convex). Clearly, if G(x) is matrix convex, then its largest eigenvalue function φ(x) := λ 1 (G(x)) and the sum of largest eigenvalue function σ k (G(x)) are both convex.
Moreover, any affine mapping G(x) : . . , m are given matrices. Let us look at the following several examples of matrix convex mappings, which are stated in [34].
Given the following of the bilinear mapping where A i , B i,j ∈ S n are given matrices. This mapping is matrix convex if and only if m i,j=1 x i x j z T B i,j z ≥ 0 for any x ∈ R m and z ∈ R p . For this example, it is formed as bilinear matrix inequalities (BMIs) with matrix convex mapping, which arise in many applications in automatic control, finance and design engineering.
Example 2.1. Consider the mapping G(Z) := Z 2 . This mapping is matrix convex. According to [3], in order to prove this, it suffices to show that for any A, B ∈ S n and any z ∈ R n , the real valued function ψ(t) : Since the matrix B 2 is always positive semidefinite, it follows that z T B 2 z ≥ 0, and hence indeed ψ(t) is convex.
For this example, we will develop local nonsmooth optimization strategies suited for this new context in Sect.4, and show that they improve the situation considerably.
3.1. VU-space. We study now the function about the sum of the largest eigenvalues, denoted by f k (x) := σ k (A(x)). In this subsection, we give the detailed structure of VU-space for f k (x). First we can obtain a well-known description of f k (x), by composing the subdifferential of convex component with the derivative of the smooth components. where ∂x k for k = 1, . . . , m, P 1 (x) and Q 1 (x) respectively represent the orthonormal basis corresponding to the p largest eigenvalues and k-th eigenvalue.
The relative interior of ∂f k (x) has the expression where Proof. Apply the chain rule given in [10] to obtain (14) and the linear operator of relative interior to obtain (15). When r = 1, we can refer to [29].
We denote [DA(x)] −1 as the multifunction with graph inverse to the one of DA(x), whose form is Next we make an assumption needed in later parts.
The assumption is natural and important, where it happens in [1] and [3]. It can ensure the execution of space decomposition.
In this paper, we would miss a major fact: a good model of f k must consider the local behavior of all active constraints at x. Geometry can add: it suggests that fixing the multiplicity (i.e., the activity) of f k , this point of view is the one adopted in [30]. The surface of activity is actually the smooth manifold M(p, q). This gives us a geometrical interpretation of the subspaces of R(x), which is decomposed as and Proof. First, we take the affine hull of the right-and left-hand sides in (14). Since DA(x) is a linear operator, by convex analysis [32], we obtain According to the definition of VU-decomposition in (2), the first part of (17) holds.
which is just the former one of (16). So we only need to compute V σ k (A(x)) and Therefore we obtain the remain part of (16), likewise, make use of V σ k (A(x)) = U σ k (A(x)) ⊥ , the second part of (17) also holds.
The results on items (2) and (3) are evident.
3.2. The U-Lagrangian function of σ k (A(x)). Take g * ∈ ri∂f k (x * ) and define the U-Lagrangian of f k at (x * , g * ) according to (3); in the following we denote it by L U ,f k (x * , g * ; ·). From item 2 of Theorem 2.1, L U ,f k (x * , g * ; ·) is differentiable at u = 0. We can prove the following composition rule.
Proof. Because of (15), some G * that satisfies the assumption always can be found. Utilizing (6), where the second equal sign holds above, because the operator DA(x * ) is linear. Due to (17), At last, by ∇L U ,f k (A(x * ), G * ; 0) = proj U σ k (A(x * )) G * and the definition of the adjoint operator, and the mapping are introduced.
Here, we would like to identify a characteristic C ∞ -manifold. In order to achieve the purpose, a natural idea is to look for the set of vectors x ∈ R m such that f k (x) has a fixed multiplicity q − p; namely to consider the manifold A −1 (M(p, q)). The obstacle is that, even in the affine case, A −1 (M(p, q)) may cause nondifferentiablity. To ensure that A −1 (M(p, q)) is a smooth manifold in a neighborhood of x * , a assumption that A −1 is transversal to M(p, q) is needed.
Remark 1. Transversality assumption (T)are proposed by Shapiro and Fan (cf, [33,34]). The condition (T) is called constraint nondegeneracy condition in semidefinite programming. In addition, it is an analogue of the condition of linear independence of the gradients of active constraints used in nonlinear programming. In fact, a much weaker condition than transversality will be presented in Section 4.
We can obtain the same results under the condition.
Next we obtain a local equation of W(p, q) via a simple composition rule. In addition, if g * ∈ ri∂f k (x * ), then : the subspaces U f k (x * ) and V f k (x * ) are respectively the tangent and normal spaces to W(p, q) at x * , i.e., (2): there exists ρ > 0 and a C ∞ -mapping v : is a C ∞ tangential parametrization of the submanifold W(p, q).
The mapping (22) is presented as a geometrical interpretation, where U f k (x) is tangent at x * to the ridge. In our statement this geometrical set (22) coincides with W(p, q) in a neighborhood of x * , when g * ∈ ri∂f k (x * ). In order to generate a candidate for being an element of W (u), a nice interpretation of W (·) in (4) is described in the next result, which makes a local description of the surface x * + u ⊕ W (u).
Theorem 3.5. Suppose that the condition (T) holds at x * , let g * ∈ ri ∂f k (x * ). Then there exists ρ > 0 such that for all u ∈ B(0, ρ) ⊂ U f k (x * ), the set W (u) is a singleton: where v(·) is the C ∞ -map defined in Theorem 3.4 (2).
T , which implies the following complementarity condition holds According to matrix analysis, we get the rank condition Furthermore, at u = 0, G = G * ∈ ri∂σ k (A(x * + u ⊕ v)), and M * := G * − P 1 (x * )P 1 (x * ) T = Q 1 (x * )Q 1 (x * ) T , and we find the following strict complementarity condition holds by (15): Because of the continuity of eigenvalues with (6) and (7), there is a positive number ρ > 0 satisfies Combining the above formula with the inequality (23), we obtain which is the strict complementarity condition needed. Then, take ρ small enough, we have so we can apply Theorem 2.2 (2) to derive the formula W (u) = {v(u)}, the proof is done.
Next we follow the path p x * (u) ∈ W(p, q). The subspace E tot (·) and E p,q (·) coincide on the manifold W(p, q) and close enough to x * , where E tot (·) is spanned by the q − p eigenvectors in connection with its p + 1-th to q-th largest eigenvalues. Meanwhile, one chooses an orthonormal basis mapping in a neighborhood of u = 0 is U(x * ) u → Q 1 (p x * (u)) := Q tot (p x * (u)), where the columns of Q tot (·) form an orthonormal basis of E tot (·).
The stage is now set for giving expressions for the derivate of U-Lagrangian. We will state our main result in the following theorem: analytic construction of the existence of ∇ 2 L U ,f k (x * , g * ; 0), called the U-Hessian matrix of f k at x * , in viewing of U-Lagrangian.
Theorem 3.6. Suppose the transversality condition (T) holds at x * and take g * ∈ ri ∂f k (x * ). Then the U-Lagrangian function L U ,f k (x * , g * ; ·) of f k is C ∞ in a neighborhood of u = 0. In particular, at u = 0, where G * is the unique subgradient of ∂σ k (A(x * )) such that g * = DA(x * ) G * and the operator H(A * , G * ) is the symmetric positive semidefinite operator defined by and here we assume at A(x * ), λ 1 (A(x * )) = . . . = λ p (A(x * )). This can also be written where , and U f k (x * ) is given by (16).
for all x ∈ W(p, q) close enough to x * , Theorem 3.5 provides Then L U (x * , g * ; ·) is C ∞ on B(0, ρ). Next we proceed in two steps.
Now according to the condition that Z satisfies, set Z = k−p q−p I q−p + Ξ, where Ξ is an element of H defined in (19). Then introducing we combine the above formula and (28) * is also invertible for u small enough, making use of continuity of Dϕ(x * ) • Dϕ(x * ) * . Then we can get the unique solution Ξ(u) through inverting the two sides of (29), and define C ∞ -mapping so (31) (2) We differentiate the formula (31) at u = 0 to derive second-order term; since we apply a fixed linear operator proj U f k (x * ) , we obtain the sum of the three terms. One of these terms is (30) and Using (3.8) of [33], we obtain the required result. Finally, because of (16) we have and we finish the proof.
In the above theorem, the operator induced by H(A * , G * ) in the subspace U f k (A(x * )) is called the U-Hessian of σ k at pair (A * , G * ). The relevant secondorder information on σ k along the manifold M(p, q) is assembled. Second-order U-Hessian derivatives are useful for specifying the second-order-like expansion of f k on W(p, q), as is showed next. Corollary 1. Assume that the transversality condition (T) is satisfied at x * . Then for all d ∈ R m , such that x * + d ∈ W(p, q) and d → 0, Proof. Let d small enough such that x * + d ∈ W(p, q), employ (12) and set u = apply Theorem 3.6: Moreover, according to the definition of W (u), Hence, . This completes the proof.
4. An example. In this section, we mainly study their explicit VU-decomposition forms of matrix convexity from those examples in Sect.2. Our purpose here is to demonstrate that U-Lagrangian theory can potentially be very efficient in solving some actual problems. With regard to their proofs, we omit all unnecessary details for brevity and give some important conclusions. We should note that though the ideas about the results are similar, as the reader will find, the technical details become much more involved. Yet, their formulation is simple, chain rules are not easy to obtain, we still need to introduce some geometrical conditions to derive them. Besides, we propose a weaker regular condition than transversality condition, which keeps the conclusions still hold.
4.1. The sum of maximum eigenvalue function in matrix variable. Next we mainly give the prominent results about the eigenvalue function with matrix variable, where its form is as follows: where the mapping C satisfies C : S n → S n , and C is matrix convex respective to X ∈ S n . Without loss of generality, we also assume C(X) ∈ M(p, q). Define the differential of the mapping C(X), which has the following fashion: where H ij is ij-component of the matrix H ∈ S n , C ij (X) is the partial derivative of C(·) with respect to ij-th variable X ij . Concerning the adjoint operator of DC(X)(H), by definition of DC(X) and its adjoint, we have for all H ∈ S n and M ∈ S n : Proposition 3. The subdifferential of F (X) is where ∂σ k (C(X)) = {U (X) ∈ S n : Z(X) ∈ S + q−p , trZ(X) = q − p, U (X) = P 1 (X)P 1 (X) T + Q 1 (X)Z(X)Q 1 (X) T }.
(1): The subspaces U F (X) and V F (X) are respectively characterized by and (2): If k = q, then F (X) is a differentiable function, moreover, U F (X) = S n , and V F (X) = {0}. Take the primal-dual pair (X * , G * ) ∈ S n × ri∂(σ k (C(X * ))) and define the U-Lagrangian of F at (X * , G * ) in view of (3); in the following we denote it by L U ,F (X * , G * ; ·). From Theorem 2.1, L U ,F (X * , G * ; ·) is differentiable at U = 0. We can obtain the following composition rule.
(40) Similarly defined in Sect.3, we would like to identify a characteristic C ∞ -manifold for the eigenvalue function with variable in matrix form. A natural idea is to examine the set of vectors X ∈ S n such that σ k (C(X)) has a fixed multiplicity q − p; namely to consider C −1 (M(p, q)). The difficulty is that, even in the affine case, some catastrophe may appear. Then to ensure that C −1 (M(p, q)) is a smooth manifold in a neighborhood of X * , a regular condition is needed. In practice, the transversality condition mentioned in Sect.3 is often too conservative, a weaker one is relevant, which consists in requiring that the image of F intersects cleanly, (Transversality thus implies clean intersection.) i.e., Definition 4.3. [1] We say that the clean intersection condition (C) holds at X * if the dimension of the subspace range DC(X * ) ∩ U σ k (C(X * )) must be constant when X * varies locally in the intersection of M(p, q) with the image of F .
Under clean intersection condition, we obtain that the set C −1 (M(p, q)) is a smooth submanifold of S n and the function F is locally C ∞ on C −1 (M(p, q)), we say that such a point is regular. Next we obtain a local equation of W(p, q) := C −1 (M(p, q)) via a simple composition rule.
Theorem 4.5. Assume (C) is satisfied at X * and take G * ∈ ri∂F (X * ). Then (1): the subspaces U F (X * ) and V F (X * ) are respectively the tangent and normal spaces to W(p, q) at X * . (2): there exists ρ > 0 and a C ∞ -mapping V : such that the mapping is a C ∞ -tangential parametrization of the sub-manifold W(p, q).
Theorem 4.6. Assume (C) is satisfied at X * and take G * ∈ ri ∂F (X * ). Then there exists η > 0 such that for all U ∈ B(0, ρ) ⊂ U F (X * ), the set W (U ) is a singleton: Theorem 4.5 (2). Now we follow the path p X * (U ) ∈ W(p, q). On the manifold W(p, q) and close enough to X * , E tot (·) and E p,q (·) coincide. Hence one chooses an orthonormal basis mapping in a neighborhood of U = 0 is U(X * ) U → Q 1 (p X * (U )) := Q tot (p X * (U )).
So combing with Theorem 4.6, this gives a second-order development of F along W(p, q) = F −1 M(p, q) at X * . Theorem 4.7. Assume (C) is satisfied at X * and take (X * , G * ) ∈ S n × ri ∂F (X * ). Then the U-Lagrangian function L U ,F (X * , G * ; ·) is C ∞ in a neighborhood of U = 0. Moreover, at U = 0, where G * is the unique subgradient of ∂F (C(X * )) such that g * = DC(X * ) G * and the operator H(C * , G * ) is the symmetric positive semidefinite operator defined by and here we assume at X * , λ 1 (X * ) = . . . = λ p (X * ). This can also be written where B(X * ) = proj U σ k (C(X * )) • DC(X * ) • proj * U F (X * ) and U F (X * ) is given by (36). Moreover, we have the following second-order development of F for all D ∈ S n such that X * + D ∈ W(p, q) and D → 0, The theory developed so far strongly suggests the following algorithmic application: near a solution of F , minimize the second-order development of the U-Lagrangian of F . Here we first present a conceptual algorithm which relies on this simple idea.
Consider a minimum point X * and call q − p the multiplicity of λ k (C(X * )). Given X ∈ B(X * , ρ) for some ρ > 0, we need to compute some X + , superlinearly closer to X * . We consider the following conceptual algorithm. U, ∇L U ,P µ (X, G(X); 0) Update. Set X + =X + U .

Remark 2.
Here we use a projection onto the manifold W(p, q) in place of the Vstep in the U-Newton algorithm. It is perpendicular to the manifold at the projected point. So we refer to the "U-Newton method" as the "projected U-Newton method". As the paper [23] has proposed, a fast VU-algorithm performs a corrector-predictor step at each iteration. More precisely, a V (corrector) step is made in order to bring the iterate to the smooth manifold W. Then the U-Newton (predictor) step is performed to gain superlinear decrease of the distance to the solution. This interchange between V (corrector) step and U-Newton step allows to prove fast convergence under reasonable conditions. This is why the U-Newton steps are most important.
In order to get quadratic convergence, we first present the definition of strict complementarity and non-degeneracy condition, which can be seen as the generalization of the regularity assumption needed in all Newton-type methods.  Since the computation ofx in Algorithm 4.1 is not an easy thing. To tackle this difficulty, we proceed in two steps. First we present two propositions, whose ideas come from our latest work in Ref. [12]. For completeness, we shall merely present the results below but omit their proof details.
Remark 3. Different from the SQP in [33], the U-Newton methods are effective only on the smooth manifold W r , while SQP is valid over the whole n×n symmetric matrix space S n . In U-Newton methods X is updated toX + U with U solving (33), where an explicit attempt is to restore X to the manifold W(p, q). SQP is intended to achieve both feasibility and optimality asymptotically.
Just as the U-Newton method depends on a choice of G ∈ ∂F (X) which defines the U-Lagrangian, SQP depends on the choice of approximate Lagrange multipliers. Moreover, this choice means that the Newton and SQP directions are the same.
The U-Hessian matrix defined by G reflects the curvature of F on W(p, q) near the minimizer X * , and leads to the following quadratic convergence. The proof is similar to our result in [12], where the main difference is the variable in matrix form.
Theorem 4.10. If (SOC) holds at X * then there exists ρ > 0 and L > 0 such that, for all X ∈ B(X * , ρ), X + defined by Algorithm 4.2 satisfies: Proof. First we minimize a smooth function F(X) over a smooth manifold V ⊂ S n . Suppose there exists a neighborhood of X * such that V can be expressed by a system of smooth equations E i (X) = 0, i = 1, . . . , t. We adopt the standard Newton method. Let X k be a current iterate point and Θ k the corresponding Lagrange multipliers vector. Then the next iteration is X k+1 = X k + Y k+1 , where Y k+1 is the solution of the following quadratic programming problem and H k = D 2 XX L(X k , Θ k ) is the second-order differential operator of the Lagrangian It is easy to verify that X k+1 and the corresponding Lagrange multipliers vector Θ k+1 can be obtained as a solution of the linear equations in matrix variable F(Z k ) + DF(Z k )(Z − X k ) = 0, where F(Z) = (D X L(X, Θ)(Z), E(X)), E(X) = (E 1 (X), . . . , E t (X)) and Z = (X, Θ).
If the algorithm starts at a point sufficiently close to optimal solution X * and the second-order sufficient optimality conditions holds at X * , then the algorithm converges quadratically. In our Algorithm 5.2, the U-Hessian matrix H(Ĉ(X),Ĝ(X)) coincides with the above H k . So the U-step (52) can be converted to the formula (53). Under the condition of (SOC), the quadratic rate of the convergence is obtained.
Our scheme above is highly conceptual, because a practical algorithm needs to generate convergent estimates of V, U, the operator structure and a positive definite V-Hessian corresponding to an optimal solution and a zero subgradient. Relevant ideas for doing this are contained in later work.

5.
Conclusions. In the present paper, we mainly establish the VU space decomposition for a special class of eigenvalue function, the sum of the largest eigenvalues with the matrix convex mapping, and then reformulate the first-and second-order derivatives of f k in terms of U-Lagrangian. Moreover, by using nice properties of the function, we can easily find the smooth track X (u) satisfying the regular condition, along which there is a second-order expansion of f k . A conceptual algorithm with local superlinear convergence is presented. Finally, this procedure suggests that our results may be used to deal with some practical optimization problems: the eigenvalue function with respect to matrix variables.
The algorithm of this work is only theoretical and conceptual. For numerical and computational consideration, we will continue to develop its rapidly convergent executable algorithm and will consider how to use bundle techniques to approximate proximal points in the later work. In addition, related theory from convex eigenvalue optimizations to nonconvex cases is also our interesting issue.