A NEW PROXIMAL CHEBYCHEV CENTER CUTTING PLANE ALGORITHM FOR NONSMOOTH OPTIMIZATION AND ITS CONVERGENCE

. Motivated by the proximal-like bundle method [K. C. Kiwiel, Journal of Optimization Theory and Applications, 104(3) (2000), 589-603.], we establish a new proximal Chebychev center cutting plane algorithm for a type of nonsmooth optimization problems. At each step of the algorithm, a new optimality measure is investigated instead of the classical optimality measure. The convergence analysis shows that an ε -optimal solution can be obtained within O (1 /ε 3 ) iterations. The numerical result is presented to show the validity of the conclusion and it shows that the method is competitive to the classical proximal-like bundle method. 2010 Mathematics Subject Classiﬁcation. Primary: 90C25; Secondary: 65K05.


1.
Introduction. Nonsmooth optimization problems (NSO) arise from many fields of applications, for example, in economics [18], mechanics [16], engineering [15] and optimal control [2]. Consider the unconstrained convex minimization problem where f : R n → R is a nonsmooth closed proper convex function. We denote the optimal value of (1) by f * and the optimal solution set by X * . The nonlinear conjugate gradient method is one of the effective algorithms for solving (1), some ideal results in recent years demonstrate its satisfactory performance under special conditions, and the search direction not only satisfies the sufficient descent condition but also belongs to a trust region, see [28,29,27,7]. Proximal-like bundle method is another promising and efficient algorithm for nonsmooth optimization problems, its convergence rate can be very rapid when compared with the conjugate gradient method, and when it comes to the search direction, unlike nonlinear conjugate gradient method, it is less strict for accepting a candidate as a useful direction since it only concerns with the descent of the objective function. Proximal-like bundle methods [13,17,24,20,23,21] approximate the objective function by a regularized cutting plane model which is the sum of a piecewise linear function and a quadratic function, and it has already been generalized to situations using closed convex functions with certain properties in place of a quadratic function. These methods can also be used to solve variational inequality problems, see [22,30,31,25]. Based on identical ideas and techniques in [3,1,8,9], the authors in [17] extend Elzinga-Moore cutting plane algorithm by enforcing the next trial point to be not far away from the previous ones, which removes the compactness assumption. Instead of lower approximations used in proximal bundle methods, the approach in [17] is based on the object regularizing translated functions of the objective function, and it can be viewed as a double regularization approach. In this paper, motivated by the work [13] we present a proximal Chebychev center cutting plane algorithm (pc 3 pa for short) and analyze its convergence from a new point of view which is quite different from the traditional ones for proximallike bundle methods. Under the assumption that for each z i ∈ R n , the function value f (z i ) and one arbitrary subgradient g i ∈ ∂f (z i ) can be computed through an oracle, we focus on the estimation of the negative optimal value w k of subproblem of searching for the next trial point, and we find that w k decreases significantly after a null step and it may serve as a new optimality measure of current iterative point x k . The following question is also answered: after how many iterations at most, an approximate solution with certain finite precision can be obtained and how the approximation accuracy depends on the iteration numbers. We refer the readers to [9,11,10,12] for other discussions of similar efficiency estimations for subgradient projection methods, analytic center cutting plane methods and so on.
The paper is organized as follows: in Section 2, we propose a new pc 3 pa algorithm and apply it to solving (1) by adjusting its update for proximity control parameters and eliminating the approximate stopping criterion. The convergence analysis for the proposed algorithm is presented in Section 3. Section 4 reports some numerical performance of our pc 3 pa algorithm for solving some nondifferentiable problems. In Section 5 we make some conclusions and comparisons.
We denote the usual inner product and norm in R n by ·, · and · , respectively. The subdifferential of a convex function f at x is defined by ∂f (x) = {p ∈ R n : 2. Proximal Chebychev center cutting plane algorithm. In this part, by eliminating approximate stopping criterion we present a new pc 3 pa algorithm with the update for proximity control parameters. The pc 3 pa algorithm proposed in our paper generates a sequence of iterative points {x k } called Chebychev centers, and some trial points z i are generated at the same time, we can evaluate the subgradient g i ∈ ∂f (z i ) and the function value f (z i ) through an oracle as usual. Given current Chebychev center x k , the following outer approximation to the epigraph which is below f (x k ), i.e., the set is defined to be the localization set, where I k = {1, 2, · · · , k}. Obviously, we have X * × {f * } ⊂X x k ,k . Therefore, the basic issue for solving (1) is how to choose the next iterative point x k+1 so as to shrink the localization setX x k ,k . By decreasing the upper bounds f (x k ) we find that the radii of the largest ball insideX x k ,k shrink to zero and the Chebychev centers {x k } of the largest ball insideX x k ,k converge to the minimizer of f if any. The next Chebychev center can be determined by solving the problem min where The optimal value of (3) gives the negative value of the radius of the largest ball insideX x k ,k . The optimal solution of (3) is the next Chebychev center. Unfortunately, the minimization of ψ x k has no reason to be easy since computing the value of ψ x k at any point is already a difficult issue. However, with the trial points z i , i ∈ I k , we can build the following simpler function to approximate ψ x k . Therefore, computing the candidate Chebychev center of the localization setX x k ,k amounts to solving The model functionψ x k ,k approximates function ψ x k in the neighbourhood of current iterative point and this approximation is unlikely to be reliable when it is far away from current iterative point, it is reasonable to enforce the search for the next trial point not too far away from previous ones. By employing the idea of Moreau-Yosida regularization, the next candidate z k+1 is found by solving the following strongly convex quadratic programming associated with the proximal control parameter µ k Note that here we employ the main idea of proximal-like bundle methods for (3) which proceeds by minimizing the model functionψ x k ,k and intends to use the resulting solutions to improve the model functionψ x k ,k again. A quadratic regularization term is needed to avoid the solution to oscillate, it can make the approach more efficient. Obviously, problem (7) is equivalent to where v represents the negative value of the radius of the largest ball insideX x k ,k . The optimization model in (7) can be found in many other science fields, such as [4,6,5,14,19,26]. Let us introduce some useful notations which will be used in the sequel. For each be scaled subgradient and scaled linearization error. Problem (8) can be expressed with the notations above as follows Define the linearization of the translated function f since γ g i ≤ 1 2 and g i ∈ ∂f (z i ), we havẽ The dual problem of (9) can be easily obtained denote the optimal solution of (13) and define the aggregate scaled subgradient and aggregate scaled linearization error respectively by Based on optimality condition of (10): 0 ∈ ∂ϕ k (z k+1 ), the optimal solution of (9) is given by Problem (9) appears to be the same type as the subproblem arising in proximal-like bundle methods, but here g i s and α i,k are used in place of the "ordinary" subgradient g i and linearization α(x k , z i ), and v does not representf k -values either sinceψ x k ,k is not a model for f . Define We have all the necessary ingredients to state our implementable algorithm. Proximal Chebychev Center Cutting Plane Algorithm for (1) (pc 3 pa) : Step 0: Select the parameter 0 < κ < 1 and the proximity control parameter bounds 0 < µ min ≤ µ max < ∞. Choose µ 1 ∈ [µ min , µ max ] and an initial point [k(l + 1) − 1 denotes the iteration number of the lth descent step.] Step 1: If g k = 0, terminate.
Step 2: Solve (13) to obtain λ k i , i ∈ I k . Compute g k a , α k a and γ k a by (14) and (16). Set Step 3: Step 4: Step 5: Increase k by 1 and go to Step 1.

Remark 1.
By imitating the analysis in [17] we have the following convergence result: If there are infinitely many Chebychev centers {x k } k∈N , then f (x k ) → f * as k → ∞. Furthermore, if X * is nonempty, then the sequence {σ k } k∈N (the radius of the largest ball insideX x k ,k ) tends to 0 and the sequence {x k } k∈N converges to an optimal solution of problem (1) as k → ∞. For the case when the algorithm stops at some point x k0 , (x k0 denotes the last Chebychev center generated by pc 3 pa algorithm), it is shown that the sequence {σ k } k∈N tends to 0 as k → ∞ and the optimality of x k0 is obtained.

Remark 2.
We have more freedom in the way of choosing proximity control parameter µ k . Since it controls the strength of the quadratic term in (9), its choice is a difficult task. Here we employ the technique in [18] to update µ k , and other update techniques have been proposed in the literatures, for example, see [9].
3. Convergence analysis. The presented work in this section follows a line of investigation initiated in [13]. We expand and generalize the central idea in [13] to nonsmooth optimization problems based on the so-called localization sets and Chebychev centers. Some techniques have to be adjusted to the new situations.
We start this section by introducing several technical results from [13]. Define w k to be the negative optimal value of subproblem (10)  (12) and (19), it is easy to know that where where The following conclusion characterizes the relationships between v k and w k .

From (22) and Cauchy-Schwartz inequality
According to the boundedness of sequence {g k s } and (27), we can derive a global optimality estimation which involves w k .
Lemma 3.2. The following conclusions hold: w k → 0 as k → ∞, and G < ∞, D < ∞, where Proof. By Lemma 3.1 and Remark 1, 0 ≤ w k ≤ −v k → 0 as k → ∞. The facts µ k ≥ µ min and 0 ≤ µ k ||z k+1 − x k || 2 ≤ −v k indicate z k+1 − x k → 0 as k → ∞. Thus, according to the boundedness of sequence {x k }, {z k+1 } and {g k s } are also bounded since g k s = γ g k g k , 0 < γ g k ≤ 1 2 and g k is locally bounded on R n . Take x to be P X * x k := arg min x∈X * ||x − x k || in (26), in terms of ||x − x k || ≤ D and µ k ≤ µ max , we have the following relations The desired result (27) is obtained.
The result (27) points out that our convergence analysis can boil down to estimating how fast w k decreases. Lemma 3.3 below discusses how to bound the decrease w k−1 − w k via Lagrangian relaxation after a null step.
Lemma 3.4. The following results hold: Proof. (a) If k > k(l), then w k ≤ w k(l) by Lemma 3.3. If k = k(l), according to (12), we have The desired conclusion (a) follows from ||g k s || ≤ G and µ k(l) = µ k ≥ µ min . (b) By Lemma 3.3 and the definition of c l , we have w k c l ≤ and ||g i s || ≤ G. The facts µ k ≥ µ k(l) and w k ≥ 0 yield that The conclusion can be obtained by imitating the proof of Lemma 3.4 in [13]. Now we are ready to state and prove our principle result.
4. Numerical test. In this section, we report numerical results on the computational behaviour of the proposed pc 3 pa algorithm and illustrate the presented convergence results. All numerical experiments were implemented by using MATLAB R2012a and on a PC with 1.80GHz CPU. The quadratic programming solver is QuadProg.M, which is available in the Optimization Toolbox.
We first introduce a subclass of polynomial functions h i : R n → R, i = 1, 2, · · · , n, Then we define several test functions It has been shown that all the test functions are convex. It can be obtained that 0 = min  Tables 1-3 show that by using pc 3 pa algorithm the final objective value is much smaller than 10 −4 in most test problems, whose theoretical results are zero. Our limited computational experiments suggest the good performance and viability of our proposed method for a large class of problems.    Table 3. Test results obtained by pc 3 pa algorithm for min x∈R n f 3 (x). 5. Conclusions. The pc 3 pa algorithm in this paper is based on the so-called localization setX x k ,k and its Chebychev center which is the center of the largest ball inside it. This kind of algorithms can be viewed as a serious alternative to proximallike bundle methods, therefore, its convergence analysis is especially important. We present a new optimality measure w k , the negative optimal value of subproblem of searching for the next trial point, which can be computed easily with the process of iterations. Without additional boundedness assumptions, we conclude that for any ε > 0, after at most O(1/ε 3 ) iterations, we obtain an ε− approximate solution with the help of w k . Compared with the convergence result in [13] in which it only tells us that the sequence {x k } k∈N converges to some optimal solution, our result says exactly after how many iterations at most, what kind of approximate solution can be obtained. It is more convenient for users who would like to acquire an approximate solution with some kind of acceptance tolerance.