A SUBGRADIENT-BASED CONVEX APPROXIMATIONS METHOD FOR DC PROGRAMMING AND ITS APPLICATIONS

. We consider an optimization problem that minimizes a function of the form f = f 0 + f 1 − f 2 with the constraint g − h ≤ 0, where f 0 is continu- ous diﬀerentiable, f 1 ,f 2 are convex and g,h are lower semicontinuous convex. We propose to solve the problem by an inexact subgradient-based convex approximations method. Under mild assumptions, we show that the method is guaranteed to converge to a stationary point. Finally, some preliminary numerical results are given.


1.
Introduction. Let X be a finite dimensional real Hilbert space equipped with an inner product ·, · and its induced norm · . In this paper, we consider the following nonlinear optimization problem: where f 0 : X → R is continuous differentiable, f i : X → R, i = 1, 2 are convex, g = (g 1 , g 2 , . . . , g p ) T : X → R p and h = (h 1 , h 2 , . . . , h p ) T : X → R p are lower semicontinuous convex, and K ⊆ X is a closed convex set. The functions f 1 (x) − f 2 (x) and g(x) − h(x) are usually called DC functions (difference of two convex functions), and the problem (1) is called DC programming (DCP). It is clear that if f 0 ≡ 0 and both f 2 (x) and h(x) are affine functions, problem (1) is a convex optimization problem. In applications, there are many problems that either can be directly transformed into the form of DCP or can be solved by a sequence of DCP approximations. We list some of them in the following. For an introduction to the theory and applications of DCP, readers are referred to the survey [11] and the references [6,22,2,3].
where A i is n × n symmetric matrix, b i ∈ R n and r i ∈ R, i = 0, 1, . . . , m. The problem named as quadratically constrained quadratic programming (QCQP) is in the following form: x ∈ R n .
In [1] the author pointed out that the quadratic function can be equivalently written as a DC function. For i ∈ {0, 1, . . . , m}, let λ i be the largest eigenvalue of A i and ρ i be a number satisfying ρ i > max{0, λ i }. Define where I denotes the identity matrix. Note that both g i and h i are convex and q i (x) = g i (x) − h i (x), we have q i (x) is a DC function. Thus, the problem (2) can be written in the form of DCP.
1.2. Low rank matrix optimization. Consider the following matrix optimization problem: rank(X) ≤ r, where f is continuous differentiable, g is convex and the set Q ⊆ R m×n is closed convex. Without loss of generality, we assume m ≤ n. The constraint rank(X) ≤ r is called low rank constraint since r > 0 is very small in applications. Recently, the low rank matrix optimization problem is a hot research topic in many fields, such as applied statistics, image processing, risk management (see [7,16,8]). The low rank constraint is nonconvex and can be written in a DC form. Given a matrix X ∈ R m×n , let σ 1 (X) ≥ σ 2 (X) ≥ · · · ≥ σ m (X) ≥ 0 be the singular values of X arranged in the non-increasing order, the Ky Fan k-norm X (k) of X is defined as the sum of its k largest singular values, i.e., where 1 ≤ k ≤ m. Note that the constraint rank(X) ≤ r is satisfied if and only if σ r+1 (X) + σ r+2 (X) + · · · + σ m (X) = X (m) − X (r) = 0. Therefore, it is possible to solve problem (3) by considering the penalized problem: X ∈ Q.
Since X (k) is convex in X, the above problem is also a type of DCP.
1.3. Chance constrained programs. In stochastic programming community, solving chance constrained programs continue to be an attractive topic since 1960s. For a survey of the topic, readers are referred to Chapter 4 of [20] and references therein. The problem is usually formulated as: x ∈ Q, where ξ is an m-dimensional random vector, Q ⊆ R n is a closed convex set, f : R n → R is a real-valued function and c : R n+m → R is convex in x. In problem (5), the constraint c(x, ξ) ≤ 0 is required to be satisfied with a probability at least 1 − α. The constraint Pr{c(x, ξ) ≤ 0} ≥ 1 − α (6) is called chance constraint (also called probabilistic constraint). Owing to the nonconvex characteristic, chance constrained programs are computationally intractable and have motivated the need for approximation solution schemes.
Let 1 A (·) be the indicator function of set A as: otherwise.
Let Z = c(x, ξ), the chance constraint (6) can be rewritten in the form of: which is equivalent to an expectation constraint, E[1 (0,+∞) (Z)] ≤ α. Since 1 (0,+∞) (z) is clearly nonconvex, one way to approximate the chance constraint is to find a convex approximation g(z) of 1 (0,+∞) (z) such that g(z) ≥ 1 (0,+∞) (z) for any z ∈ R. Then, E[g(Z)] ≤ α would be a convex conservative approximation of constraint (6). In [15], the CVaR approximation is proved to be the best conservative convex approximation, which uses to approximate 1 (0,+∞) (z), where t > 0 is a parameter and [z] + = max{0, z}. And the approximated constraint is formed as The left hand side of the above inequality is well known as conditional value at risk (CVaR), i.e., From Figure 1 we can see that, although CVaR approximation is the best convex conservative approximation, it is not a good approximation. In [10,23,19], the DC approximation is applied by using to approximate 1 (0,+∞) (z). When t > 0 is small enough, it is clearly a much better approximation. Then the chance constraint (6) is approximated by and the corresponding approximation problem of (5) is in the same formulation as problem (1) (when t is fixed).
All these examples demonstrate that the demand for efficient approaches to solve DCP grows rapidly. Due to the nonconvex nature (maybe also nonsmooth), there are few efficient approaches proposed in the literature to address DCP. In [11,22], a combinational method of branch and bound type was established for globally solving the DC programs. One obvious discouraging fact for the branch and bound type method is that it usually needs a lot of iterations to obtain a solution, which makes it powerless for large scale problems. A local method called DCA was studied in [3], which solves the problem for minimizing a DC function on the whole space. Given a good starting point, DCA was proved to converge to a global solution and be very efficient even for the large scale problems. In [1], the author also gave a combined DCA-branch-and-bound algorithm for minimizing a quadratic function under convex quadratic constraints which is in the form of (2).
In this paper, we consider a more general formulation in which the DC term appears both in the object function and in the constraint. We propose an inexact subgradient-based convex approximations method for solving problem (1). The basic idea is that, by linearizing the second term of the DC function, DCP can be approximated by a convex program. We present an iteration method in which a sequence of convex subproblems is solved. We show that the sequence of solutions converge to a stationary point of the DCP. We also report some preliminary numerical results of the method for solving joint chance constrained programs and quadratically constrained quadratic programs. In theory the method only guarantees to get a stationary point, but the numerical results show that for most case we can obtain the global solution as long as a suitable initial point is offered.
The rest of the paper is organized as follows. We give some preliminaries in Section 2. In Section 3, the inexact subgradient-based convex approximations method is presented and its convergence results are established. The numerical results are reported in Section 4.

2.
Preliminaries. For the sake of subsequent discussions, we introduce some related concepts in this section. Let X and Y be finite dimensional real Hilbert spaces.
exists. If g is directionally differentiable at x in every direction d ∈ X , we say that g is directionally differentiable at x.
where X * is the dual space of X .
Definition 2.3. Let S ⊂ X be a closed convex set and x be a point in S, we define the following sets: the radial cone the tangent cone (that is, the closure of the radial cone), and the normal cone and strictly monotone if this inequality is strict when Lemma 2.5. (i) For any proper convex function f : X → R, the mapping ∂f : X ⇒ X * is monotone. (ii) For a nonempty closed convex set C ⊂ X , the mapping N C : X ⇒ X * is monotone.
Definition 2.6. We say f : For an operator M : X → X , we denote M 0 if for any x( = 0) ∈ X , we have x, M x > 0.
3. Inexact subgradient-based convex approximations method. In this section, we shall introduce the inexact subgradient-based convex approximations method (ISCA) for solving DCP. Before that, we first need to discuss the optimal conditions of problem (1).
For a point y ∈ K and V = (v 1 , . . . , v p ) T ∈∂h(y), we define For any v ∈ ∂f 2 (y), define Since h(x) is convex, we have . Then for any y ∈ K, V ∈∂h(y) and v ∈ ∂f 2 (y), CP(y, V, v) is a convex conservative approximation of problem (1).
Lemma 3.2. Consider a pointx ∈ K, for any V ∈∂h(x) and v ∈ ∂f 2 (x), if the relative interior of Φx(V ) is nonempty and thenx is a stationary point of DCP.
Proof. Note that CP(x, V, v) is a convex optimization problem and the generalized Slater condition holds, thenx is an optimal solution of CP(x, V, v) if and only if x ∈ Φx(V ) and ). By Definition 3.1,x is a stationary point. Lemma 3.2 suggests a simple iterative approach to solve DCP by computing where V k ∈∂h(x k ) and v k ∈ ∂f 2 (x k ), which is also the basic idea of sequential convex approximation method used in [10]. The result in Lemma 3.2 can also be characterized by generalized equation with monotone operators. For any y ∈ K, V ∈ ∂h(y) and v ∈ ∂f 2 (y), define a set-valued mapping By Lemma 2.5, we can easily obtain Note that λ 1 , λ 2 ≥ 0, then which implies the mapping T y,V,v is monotone. If 0 ∈ Tx ,V ,v (x), it is clear thatx is a stationary point of DCP.
Step 2. Obtain d k by solving the following convex problem approximately such that where Step 3. Choose α 0 k > 0. Let l k be the smallest nonnegative integer l satisfying where Step 4. If x k+1 = x k , then stop. Otherwise, replace k by k + 1 and go to Step 1.
A similar algorithm was introduced in [8] for solving the unconstrained problem s.t.
In this paper, we extend the algorithm to a more general form. Furthermore, from the computational point of view, the major cost of Algorithm ISCA is solving the subproblem (20) in Step 2. The cost of solving problem (20) exactly at each iteration is prohibitive, which motives us to modify the algorithm to an inexact version. As shown in Step 2, at each iteration we only approximately solve the subproblem. In Step 3, we choose ρ l k by using Armijo line search rule, which can be replaced by various line search rules if f 1 (x) and f 2 (x) are smooth. In the following two lemmas, we shall show that the Armijo rule is well defined in our algorithm. and Proof. From the convexity of f 1 and f 2 , we obtain which proves (24). By the definition of S k (x) and (21), there exists which together with W k ∈∂g(x k + d k ) and the fact G(x k ) ≤ 0 imply Then, by (26) and (27) we have Therefore, The proof is complete.
Note that Assumption 1 can be easily satisfied by choosing M k appropriately.

Convergence analysis.
Under the conditions in Lemma 3.5, if there exists an integer k ≥ 0 such that x k+1 = x k in Algorithm ISCA, we will show x k is a stationary point of problem (1). Otherwise, we will show any accumulation of the infinite sequence {x k } is a stationary point. Proof. Since d k = 0, by (21) we have 0 ∈ S k (x k ), thus 0 ∈ T x k ,V k ,v k (x k ). By Lemma 3.3, we obtain x k is a stationary point of problem (1).
Theorem 3.7. Let the sequences {x k } and {d k } be generated by Algorithm ISCA. Suppose that Assumption 1 holds and f 0 is LC 1 with constant L ≥ 0. For each k ≥ 0, suppose that 2ε k ≤ ν and inf k α 0 k > 0, the following results hold: is an accumulation point of {(x k , V k , v k )} and the relative interior of Φx(V ) is nonempty, thenx is a stationary point of problem (1).
Proof. In this theorem, we consider the case that x k+1 = x k for all k ≥ 0. (i). From Lemma 3.4 and Assumption 1, we have Then, by line search condition (22), we obtain Suppose that d kj 0 when j → +∞. By passing to a subsequence if necessary, we can assume that, for some δ > 0, d kj ≥ δ for all j ≥ 0. Thus, By (29) we have α kj → 0. Together with α kj = α 0 kj ρ l k j and inf k α 0 kj > 0, there existsk > 0 such that for any k j >k, α kj < α 0 kj , α kj ≤ ρ. Furthermore, since α k is chosen by the Armijo rule, it implies that Thus, It implies that or equivalently From the fact −α kj ∆ kj ≥ δνα kj d kj ≥ 0, we have α kj d kj /ρ → 0 as j → +∞. Note that there existsd with d = 1, by further passing to a subsequence if necessary, such that lim j→+∞ d kj d kj =d.
By taking j → +∞ in (30) we have which is a contradiction.
Since the relative interior of Φx(V ) is nonempty, there existsd ∈ R n such that Then for sufficiently large j, x kj +d is in the relative interior of Φ x k j (V kj ). From the inequality Since x kj →x and d kj → 0, we have x kj + d kj →x and w kj → 0. From the outer semicontinuity of ∂f 1 , ∂f 2 ,∂g and∂h atx, for sufficiently large j 0 , the sequences {ξ kj } j≥j0 , {v kj } j≥j0 , {W kj } j≥j0 and {V kj } j≥j0 are bounded. By taking further subsequences respectively, if necessary, there exist ξ ∈ ∂f 1 (x), v ∈ ∂f 2 (x), W ∈∂g(x) and V ∈∂h(x) such that Next we shall prove that {λ kj } is also bounded. Suppose not. Passing to a subsequence if necessary, we assume that λ kj → +∞. Let Passing to a subsequence if necessary, we assumeλ kj →λ, theñ x)), λ = 1. Divide both sides of the first formula in (31) by λ kj , and let j → +∞ we have Fromx +d ∈ intK we have Define x)), for i = 1, 2, . . . , p,λ i ≥ 0, and if i / ∈ I(x),λ i = 0. Then by (33), Thus we obtain a contradiction, which implies {λ kj } is bounded. By taking further subsequence if necessary, there exists λ ∈ N R p − (G(x)) such that λ kj → λ. By taking j → +∞ in (31), we have which impliesx is a stationary point of problem (1).

Implementation issues.
In this subsection, we address several practical issues in the implementation of algorithm ISCA.
(i.) The choice of the initial point x 0 ∈ Φ. First we find anx ∈ K, and choose v ∈ ∂f 2 (x), V ∈∂h(x), then obtaind which is an optimal point of the following convex problem: x + d ∈ K.
Then, the point x 0 =x +d ∈ Φ may be a good choice. (ii.) The choice of the algorithm for solving the subproblems (20). The performance of our algorithm heavily relies on the algorithm for solving the convex subproblems of the form (20). For various applications, the subproblems will be greatly different. For instance, when solving joint chance constrained programs by the Monte Carlo method, we will see the constraint function of the subproblem is a sum of a large number of convex functions. See another example in [8] for a structured low rank matrix optimization problem, the subproblem is a least square positive semidefinite constrained program. Thus, it is impossible for us to give a method which is suitable for various kinds of subproblems. In the numerical experiments, we apply a standard MATLAB solver fmincon for the subproblems. 4. Numerical experiments. In this section, we consider two kinds of numerical problems, and use them to illustrate the performances of our method.
The problem (36) can be reformulated as Note that problem (37) is a chance constrained problem as discussed in subsection 1.3. For any ε > 0, let then, as shown in subsection 1.3, the constraint Pr{c(x, ξ) ≤ 0} ≥ 1 − α can be approximated by g 1 (x, ε) − g 2 (x, ε) ≤ α for a small ε. And a good approximation of problem (37) is given by In [10] the authors proved that, under mild assumptions, the KKT point of problem (P ε ) converges to the KKT point of problem (37) as ε → 0. Note that problem (P ε ) is a DC program, thus our method can be applied to approximately solve problem (37).
Lemma 4.1 (Lemma7.2, [19]). The optimal solution x * of problem (37) is is the inverse chi-squared distribution function with n degrees of freedom.
To implement Algorithm ISCA for solving problem (P ε ), we need to know the closed form expressions of g 1 (x, ε) and g 2 (x, ε) which are generally difficult to obtain. However, we can use Monte Carlo method to get the approximations of those expressions. Let ξ 1 , . . . , ξ N be an independent and identically distributed (i.i.d.) sample of ξ. Letḡ Thenḡ 1 (x, ε) andḡ 2 (x, ε) are the sample average approximations of g 1 (x, ε) and g 2 (x, ε), respectively.
The numerical experiment is carried out in MATLAB 7.10 running on a PC Intel Core i5 CPU (3.20 GHz) and 4 GB RAM. In the following test problems, we set α = 0.1. We terminate our algorithm if Example 1. In problem (37), let n = 3, m = 2, b = 9, N = 10, 000. By Lemma 4.1, the optimal solution is (1.08, 1.08, 1.08) T , the optimal value is about -3.23.
Example 2. In problem (37), let n = 10, m = 10, b = 100, N = 10, 000. By Lemma 4.1, the optimal solution is (2.08, 2.08, · · · , 2.08) T , the optimal value is about -20.82. The numerical results are shown in Figure 2. Take Example 2 for instance, the algorithm typically requires less than 10 iterations to converge to the optimal value. When sample size N = 10, 000, at each iteration, a convex program is solved in 6 seconds on average.

4.2.
Homogeneous quadratic constrained quadratic optimization. In this subsection, we consider a homogeneous quadratic optimization problem in the form of (QCQP) where A k , k = 0, 1, . . . , m are n × n real symmetric or complex Hermitian matrices, F is either the field of real numbers R or the field of complex numbers C, and the superscript * denotes regular transpose or Hermitian transpose. The homogeneous quadratic constrained quadratic optimization problem (QCQP) arises in the transmit beamforming problem for multicasting applications [21,13,9] and the references therein. The QCQP problem is NP-hard [13], even when all matrices are positive semidefinite. A natural approach for solving QCQP is to use its SDP relaxation: (SDP) min Tr(A 0 X) s.t. Tr(A k X) ≥ 1, k = 1, · · · , m, X ∈ S n + , where Tr(·) denotes the trace of a matrix, and S n + denotes the cone of all (Hermitian) positive semidefinite matrices. In [9] it is shown that if the matrix A 0 and all but one of the A k , k = 1, . . . , m are positive semidefinite, the ratio between the optimal value of QCQP and its SDP relaxation is upper bounded by O(m 2 ) when F = R, and by O(m) when F = C. Moreover, when two or more of the A k , k = 1, . . . , m are indefinite, the ratio can be arbitrarily large.
Here, we propose a DC programming method for solving QCQP. For each k = 0, 1, . . . , m, let λ k be the largest eigenvalue of A k . Let ρ k := max{0, λ k } and I be the identity matrix. Define then QCQP can be rewritten as: min h 0 (x) − g 0 (x) s.t. g k (x) + 1 − h k (x) ≤ 0, k = 1, · · · , m, x ∈ F n , which is a DC program in the form of (1). Thus we can use Algorithm ISCA to obtain a stationary point of QCQP. s.t.
From the last two constraints we have |x 1 x 2 | ≥ 2(x 2 3 + 1). Meanwhile, the first two constraints imply |x 1 x 2 | ≤ x 2 3 + x 2 4 − 1. Combining these two inequalities yields x 2 4 ≥ 3. Thus, the optimal value is 3 and the optimal solutions are We also use our algorithm to solve the above example in MATLAB. We choose the initial point randomly by x 0 = rand(4, 1). By running the algorithm 100 replications, we see that it almost always converges to one of the optimal solutions (90 replications), even though the problem is not convex. The algorithm typically requires about 15 iterations to converge. s.t.