A PROXIMAL-PROJECTION PARTIAL BUNDLE METHOD FOR CONVEX CONSTRAINED MINIMAX PROBLEMS

. In this paper, we propose a partial bundle method for a convex constrained minimax problem where the objective function is expressed as max- imum of ﬁnitely many convex (not necessarily diﬀerentiable) functions. To avoid complete evaluation of all component functions of the objective, a par- tial cutting-planes model is adopted instead of the traditional one. Based on the proximal-projection idea, at each iteration, an unconstrained proximal sub- problem is solved ﬁrst to generate an aggregate linear model of the objective function, and then another subproblem based on this model is solved to obtain a trial point. Moreover, a new descent test criterion is proposed, which can not only simplify the presentation of the algorithm, but also improve the numerical performance signiﬁcantly. An explicit upper bound for the number of bundle resets is also derived. Global convergence of the algorithm is established, and some preliminary numerical results show that our method is very encouraging.


(Communicated by Jinyan Fan)
Abstract. In this paper, we propose a partial bundle method for a convex constrained minimax problem where the objective function is expressed as maximum of finitely many convex (not necessarily differentiable) functions. To avoid complete evaluation of all component functions of the objective, a partial cutting-planes model is adopted instead of the traditional one. Based on the proximal-projection idea, at each iteration, an unconstrained proximal subproblem is solved first to generate an aggregate linear model of the objective function, and then another subproblem based on this model is solved to obtain a trial point. Moreover, a new descent test criterion is proposed, which can not only simplify the presentation of the algorithm, but also improve the numerical performance significantly. An explicit upper bound for the number of bundle resets is also derived. Global convergence of the algorithm is established, and some preliminary numerical results show that our method is very encouraging.
1. Introduction. We consider the following convex constrained minimax problem where f (x) = max{f i (x), i ∈ I} with I = {1, . . . , m}, and the component functions f i (i ∈ I) : R n → R are convex but not necessarily differentiable, and C is a nonempty closed convex set in R n . Minimax problems are a typical and special class of nonsmooth optimization problems, which aims to make an "optimal" decision under the "worst" cost. They are widely used in practical applications, such as multi-facility location [28], portfolio selection [40], optimal control [4], energy consumption [1], waste management [24], etc. Due to the special structure of f (x), general-purpose nonsmooth methods ( [33,22,3]) may not be efficient to solve them. So, by exploiting their peculiar structure, a great deal of effort has been devoted to developing efficient methods for minimax problems, see, e.g., [37,6,31,23,41,25,38,39,9,14,15,10,11,7,12].
In particular, for the case where f i (i ∈ I) are continuously differentiable, one common approach for solving the original problem is the smoothing method. There are mainly two smoothing approaches: (i) by introducing a smoothing function, the original problem is approximated by a sequence of smooth optimization problems, and the solution can be obtained from the smoothing function, see e.g., [23,41]; (ii) by introducing an auxiliary variable, the minimax problem can be equivalently reformulated as a smooth problem, and then smooth methods can be designed to solve this special smooth problem, such as penalty methods [6], trust region methods [37,38,39], and sequential (quadratically constrained) quadratic programming [31,14,15].
For the case where f i (i ∈ I) are not necessarily differentiable, or differentiable but their gradients are difficult to calculate, Gaudioso et al. [8] proposed an incremental bundle method for solving convex unconstrained minimax problems, which uses a partial cutting-planes model to avoid complete evaluation of the objective function f . Later, Gaudioso et al. [9] applied the method of [8] to solve the Lagrangian dual of integer programs, and Fuduli et al. [7] further extended it to solve convex minimax problems with infinite linear constraints. Jian et al. [16] proposed a feasible descent bundle method for inequality constrained minimax problems by using the partial cutting-planes idea of [8]. Liuzzi et al. [25] presented a derivativefree method for linearly constrained finite minimax problems by using a smoothing technique. Hare & Nutini [10] and Hare & Macklem [11] proposed a derivative-free gradient sampling method for solving unconstrained minimax problems.
We note that the above methods in the latter case (except for [16]) can only deal with unconstrained or linearly constrained problems. In this paper, we are interested in minimizing a max-function over a convex set with the form (1), and particularly concern that f i (i ∈ I) are not necessarily differentiable.
Our method falls within the well-known class of bundle methods for nonsmooth optimization. In particular, in order to reduce the number of component functions evaluations, we adopt the partial cutting-planes model of [8] instead of the traditional one. As a result, at each iteration, only one component function is newly used to enrich the model. Moreover, in order to deal with the constraint set C efficiently and make the proposed method more practical, we utilize the "proximal-projection" idea of [19,20], which is different from the traditional projection technique. Therefore, at each iteration, an unconstrained proximal subproblem is solved first to generate an aggregate linear model of the objective function, and then another subproblem based on this model is solved to obtain a trial point. We note that solving the latter subproblem is equivalent to projecting a certain point onto the feasible set C, which can be easily calculated or even has a closed-form solution if C has some special structure (e.g., boxes, simplices, balls, and special polyhedrons [5]).
Our algorithm produces a sequence of feasible trial points, and ensures that the objective function is monotonically decreasing on the sequence of stability centers. Global convergence of the algorithm is proved. Finally, the algorithm is improved by the subgradient aggregation strategy, and some preliminary numerical results show that our method is quite efficient.
An important and interesting feature of this paper is that a new descent test criterion different from that of [8] is proposed. Our criterion does not contain a problem-data-independent parameter which may be not easy to choose numerically, and therefore the presentation of our algorithm and the theoretical analysis are simplified. Moreover, from the numerical comparisons reported in Section 5, our criterion performs better than that of [8]. That is to say, our method not only extends the method of [8] to constrained case, but also improves its descent test criterion significantly. Based on the new criterion, Tang et al. [34] presented an improved partial bundle method for solving linearly constrained minimax problems. In addition, we derive an explicit upper bound for the number of bundle resets.
Compared to [19,20], we present a partial bundle variation to the proximalprojection bundle method for the purpose of reducing the number of component function evaluations. What is more, there is a major difference of theoretical analysis between our method and that of [19,20], since except for the proximalprojection technique, our algorithmic framework is more close to that of [8]. We also note that our method is very different from that in [16] due to the proximalprojection strategy and the new descent test criterion. Some other recent methods for general-purpose nonsmooth constrained minimization problems can be found in, e.g., [13,32,17,35,36].
The paper is organized as follows. In Section 2, we present the details of our algorithm and discuss its properties. In Section 3, the global convergence is proved. Improvement of the algorithm by subgradient aggregation is given in Section 4. Preliminary numerical results and conclusion are given in Sections 5 and 6, respectively.
The Euclidean inner product in R n is denoted by x, y = x T y, and the associated norm by · . The subdifferential (in convex analysis [30]) of f at x ∈ R n is denoted by ∂f (x), and each element g ∈ ∂f (x) is called a subgradient. As usual in nonsmooth methods, we assume that there is a subroutine that can evaluate one arbitrary subgradient g(x) ∈ ∂f (x) at any point x. P C (x) := arg min y∈C y − x denotes the Euclidean projector onto C.
2. The algorithm. Let k be the current iteration index, and y , ∈ J k := {1, · · · , k} be given points with subgradients g ∈ ∂f (y ). The classic bundle methods [2] uses the following cutting-planes model for f at the k-th iteration: However, if the modelf k cp (x) is directly applied to the objective function (a maxfunction) of problem (1), then at each point y , we need to calculate the objective value f (y ) and a certain subgradient g ∈ ∂f (y ), which in turn implies that we have to evaluate the values of all the component functions f i (i ∈ I) at y . This is time-consuming, especially when m, the number of the component functions is large. Moreover, for some practical applications (see e.g., [9,7]), the full knowledge of all component functions may be impossible.
In order to avoid complete evaluation of f , Gaudioso et al. [8] proposed that at each point y just evaluate one of the component functions, say f i (y ), for some i ∈ I, along with a corresponding subgradient g i ∈ ∂f i (y ), and then define the partial cutting-planes (PCP) model : which is also a lower polyhedral model for f , i.e.,f k pcp (x) ≤ f (x). This model has been proved to be very efficient in some cases, such as the Lagrangian dual of integer programs [9], and convex semi-infinite minimax problems [7].
In this section, we present a proximal-projection partial bundle method for solving problem (1), which well combines the PCP model (2) and the proximalprojection idea of [19,20].
It is obvious that problem (1) is equivalent to the unconstrained problem where δ C is the indicator function of C, i.e., δ C (x) = 0 if x ∈ C and δ C (x) = ∞ otherwise. In what follows, we aim to design an algorithm based on the model problem (3). Similar to the traditional proximal bundle methods, we may consider the proximal partial bundle subproblem for problem (3) as follows where t k > 0 is the proximal parameter, and x k (called stability center ) is the "best" point obtained so far. However, the subproblem (4) is usually not easy to solve, so we "decompose" it into two subproblems. One is an unconstrained proximal subproblem which is much easier to solve than the original problem and used to generate an aggregate linear model of f . The other subproblem based on such a linear model is solved to obtain a new trial point. Note that solving the second subproblem is equivalent to projecting a certain point onto the feasible set C. Before giving the algorithm, we first provide a brief description of the sequences generated by the following algorithm: {z k } --the sequence of proximal points, at which the aggregate linear models of f are generated; {y k } --the sequence of trial points. If f achieves a sufficient decrease at a trial point, then declare a descent step, and accept it as a stability center; {x k } --the sequence of stability centers. We emphasize that we only require a full knowledge of f at the stability centers. This seems to be the minimum requirement for global convergence. Moreover, from the mechanism of our algorithm, there is no extra computational cost. In fact, once a new stability center is generated, the values of all the component functions are already known. Moreover, from our numerical experience, the number of stability centers is generally a very small proportion of the whole iterations.
Step 1. Proximal point finding. Set Step 2. Projection. Set Step 3. Stopping criterion. Compute If p k ≤ η and k ≤ , STOP. Else if p k > η, go to Step 5. Else ( p k ≤ η and k > ), go to Step 4.
Step 5. Descent test. Extract any index i from the index set I. If Restore the index set by setting I = {1, . . . , m}, let t k+1 := t k , k := k + 1, and return to Step 1.
A few comments on the algorithm are in order.

Remark 1. • (On
Step 0) The algorithm needs to calculate the function value of f at the initial stability center x 1 (see (12)), so in Step 0 we simply use (y 1 , f (y 1 ), g 1 ) to initialize the bundle. In addition,δ 0 C (·) is a linearization of δ C (·) at y 1 , since δ C (y 1 ) = 0 and p 0 C = 0 ∈ ∂δ C (y 1 ). • (On Step 1) In subproblem (5),f k pcp andδ k−1 C approximate f and δ C , respectively; the third term is a quadratic proximal term. This subproblem can be cast as a simple convex quadratic programming (QP), and therefore can be solved efficiently. By solving (5), a proximal point z k+1 is first found, and then a linearizationf k (·) off k pcp (·) at z k+1 is built due to the fact that p k f ∈ ∂f k pcp (z k+1 ). Moreover, there exist multipliers λ k , ∈ J k such that which implies This shows that the linear functionf k (·) is an aggregation (convex combination) of the component functions off k pcp (·). • (On Step 2) The solution of (7) can be obtained by the projection: Step 3) Three cases are considered in Step 3. (a) If both p k and k are "sufficiently small", then the algorithm stops, and we accept the current stability center x k as an approximate optimal solution. (b) If p k is not small, then we accept y k+1 as a new trial point at which either sufficient decrease is achieved (descent step) or a new cutting plane is built to improve the model (null step). (c) The case " p k is small but k is large" implies that the partial cutting-planes model is "bad", so the algorithm makes a bundle reset, i.e., it restarts from the current stability center. • (On Step 4) In the case of bundle reset, the model is not reliable enough, so we should become more "conservative", and therefore decrease the proximal parameter by t k+1 = t k /σ (see [8]). Although we need to know f and a subgradient at the current stability center, there is no extra computational cost, since they have been generated in the previous iterations. • (On Step 5) The descent test criterion (12) is newly introduced in this paper, which has obvious advantages when compared to that of [8]. In fact, Gaudioso et al. [8] (essentially) used the following criterion where θ k > 0 is a problem-data-independent parameter satisfying θ k /t k < η 2 . However, as a stopping tolerance, η should be a suitably small positive constant (say η = 10 −3 in our numerical tests), and then η 2 becomes a much more smaller constant (say η = 10 −6 ). As a result, the numerical choice of the parameters θ k and t k becomes very difficult, and therefore it is usually hard to obtain satisfactory numerical results. In contrast, without the parameter θ k and by adding a coefficient α ∈ (0, 1) to v k , our criterion (12) becomes more flexible and practical, since v k itself represents a predicted decrease. Moreover, the removal of parameter θ k can simplify the presentation of the algorithm and theoretical analysis. From our numerical experience, the criterion (12) performs obviously better than (17). We note that the constant α in (12) plays an important role not only in theoretical analysis but also in improving numerical results. In Step 5, the selection strategy of index i ∈ I does not affect the theoretical analysis, but suitable rules are expected to be useful in practical performance, such as fixed sorted selection, randomized selection, and selection by exploiting knowledge of the active constraints from (15), see, e.g., [29]. If an index i is found such that (12) holds, then a new cutting plane built on the component function f i at y k+1 is added, which can improve the model significantly [8,Lemma 2.4]. Otherwise, if no such index exists, the algorithm enters Step 7.
• (On Step 6) Delete the index i that violates (12) from I until I is empty.
Indeed, for such i, the component function f i achieves sufficient decrease at y k+1 , so it is not necessary to build a cutting plane based on this function. • (On Step 7) Once the algorithm enters Step 7, the feasible descent property (see Lemma 2.2 below) is satisfied, and therefore we accept y k+1 as a new stability center. Note that the component function values f i (y k+1 ) for all i ∈ I are already calculated whenever the algorithm enters Step 7, so we can simply use (14) to enrich the bundle without extra computational cost.
3. Global convergence. In this section, we prove the global convergence of Algorithm 2.1. We first make the following basic assumption. (1) is bounded.
Between two consecutive descent steps, Algorithm 2.1 must take only one of the following two cases: (i) The algorithm loops between Steps 1 and 6, the stability center remains unchanged, and only null steps are made; (ii) The algorithm enters Step 7, the stability center is updated, and a descent step is generated.
In what follows, we first show that Algorithm 2.1 is well defined, i.e., the number of loops between Step 1 and Step 6 is finite, and therefore Algorithm 2.1 must enter Step 7 after a finite number of iterations. In particular, we will show that the following claims hold if the stability center remains unchanged: (a) The algorithm get through Step 4 finitely many times; (b) The algorithm get through Step 6 finitely many times; (c) The number of loops between Step 1 and Step 5 is finite. Before proving (a), we give some important relations about the optimal values of subproblems (5) and (7). Lemma 3.1. Consider the following two subproblems: Then it follows that z k+1 and y k+1 are solutions of subproblems (22) and (23), respectively. Furthermore, the following relations hold: Proof. From (6), we have which implies that z k+1 is the solution of (22). Similarly, it follows from (8) that ∇φ k C (y k+1 ) = 0, so y k+1 is the solution of (23).
The following lemma provides an explicit upper bound for the number of bundle resets, and its proof is an extension of Lemma 4 in [34]. Lemma 3.3. Suppose that Algorithm 2.1 reaches a certain stability center xk for some iterative indexk, and it remains unchanged after that, i.e., Step 7 is omitted and only null steps are made. Then Algorithm 2.1 get through Step 4 finitely many times. In particular, let Nk be the number of times get through Step 4, then there exists a positive constant L such that where A is the smallest integer greater than or equal to A.
Proof. We first note that x k ≡ xk, for all k ≥k. Denote k r (≥k) the index corresponding to the r-th time the algorithm enters Step 4. After entering Step 4, the bundle is reset by B kr+1 = {(xk, f (xk), gk)} with gk ∈ ∂f (xk), and the parameter is updated by t kr+1 =t/σ r .

Remark 2.
In the preceding lemma, the constant L can be chosen as the common Lipschitz modulus of f i over the bounded level set L.
We proceed to prove (b) and (c). From Lemma 3.3 we may assume that the bundle reset does not occur. Lemma 3.4. Suppose that Algorithm 2.1 reaches a certain stability center xk, bundle reset does not occur, and infinitely many null steps are made. Then lim sup k→∞ˇ k ≤ 0, whereˇ k := f i k+1 (y k+1 ) −f k (y k+1 ), and i k+1 is the index generated in Step 5 Proof. First, from the statement of the lemma, it follows that Let g k+1 i k+1 ∈ ∂f i k+1 (y k+1 ) be generated in Step 5. Then from (13), we havě f k+1 pcp (·) ≥ f i k+1 (y k+1 ) + g k+1 i k+1 , · − y k+1 . This together with (41) and the objective functions of (5) and (7) shows thať From (26), (27) and (30), the sequences {φ k f (z k+1 )} k≥k and {φ k C (y k+1 )} k≥k are nondecreasing and bounded above by f (xk), so they are both convergent. By using (30) again, their limits are equal, i.e., which together with (29) and (41) shows that Moreover, (27) implies {y k+1 } is bounded, so from (8) and (15) we have Since we have from (44) that On the other hand, from (8) 46) and (47), we know that the right-hand side of (42) converges to zero, so the lemma holds. Proof. Suppose by contradiction that one of the following cases occurs: (i) infinitely many times through Step 6; (ii) the loop between Step 1 and Step 5 is infinite. In both cases, it follows that p k > η, k ≥k, but at the same time the descent property f (y k+1 ) − f (xk) ≤ αv k , k ≥k is not satisfied. Thus, for each k ≥k, there exists an index i k+1 generated in Step 5 such that f i k+1 (y k+1 ) − f (xk) > αv k . This together with (10) implieš From Lemmas 3.3 and 3.5, the following theorem holds immediately.  where k * is the index for the last iteration. Furthermore, x k * ∈ C can serve as an approximately optimal solution of problems (1).
Proof. The proof is similar to that of Theorem 2 in [34]. 4. Improvement by subgradient aggregation. In this section, similar to [34], we will improve our Algorithm 2.1 by introducing the subgradient aggregation (SA) strategy of [21]. As iterations go along, the number of elements in the bundle may increase infinitely. This could present serious problems with storage and computation after a large number of iterations, so we need to control the size of the bundle and meanwhile keep the theoretical convergence. The subgradient aggregation strategy [21] is the most popular and efficient way to do that. Hence, we utilize it to improve Algorithm 2.1 such that the storage requirements are bounded. To save space, we omit the same steps as in Algorithm 2.1. Step 0. Initialization. Select y 1 ∈ C, set x 1 = y 1 , J 1 = {1}, the initial bundle B 1 = {(y 1 , f (y 1 ), g 1 )}, with g 1 ∈ ∂f (y 1 ). Setf 0 (·) := f (y 1 ) + g 1 , · − y 1 , and δ 0 C (·) := p 0 C , · − y 1 with p 0 C := 0. Select > 0, η > 0, σ > 1,t > 0, α ∈ (0, 1). Set t 1 =t, BR = ∅, k := 1.
Step 5. Descent test. Extract any index i from the index set I. If then set x k+1 := x k (null step). Select subsetJ k satisfying BR ⊆J k ⊆ J k and generate its corresponding bundleB k ⊆ B k , set J k+1 =J k ∪ {k + 1}, and update the bundle by Restore the index set by setting I = {1, . . . , m}, let t k+1 := t k , k := k + 1, and return to Step 1.
Step 6. Updating. Set x k+1 := y k+1 (descent step). SelectJ k ⊆ J k and its corresponding bundleB k ⊆ B k , set J k+1 =J k ∪ {k + 1}, and update Restore the index set by setting I = {1, . . . , m}, let t k+1 :=t, BR = ∅, k := k + 1, and return to Step 1. (15), we know thatf k−1 is an aggregation (convex combination) of the past linearizations, which is now treated as a usual linearization in (49).f k pcp at least contains the linearization at the current trial point y k , and in the case of bundle reset, such linearization is f k := f (y k ) + g k , · − y k ; otherwise, it is f k := f i k (y k ) + g k fi k , · − y k for some i k ∈ I with g k fi k ∈ ∂f i k (y k ). Therefore, it follows that the modelf sa satisfies

Remark 3. (i) From
(ii) The set BR is used to record the first index after a bundle reset, since the corresponding element should be included in the new bundle of Step 5 such that Lemma 3.3 (especially (40)) holds. This together with (52) guarantees that, after suitable modifications, the convergence analysis for Algorithm 2.1 also applies to Algorithm 4.1. Some similar discussions can be found in [20].
(iii) The choice of subsetJ k is very flexible, since theoretically speaking, it can be a singleton or even an empty set. In practice, a popular way is to delete some elements from the bundle when its size reaches a preset maximum value, see [2, Ch. 10] for more details.

5.
Numerical results. This section aims to test the practical effectiveness of Algorithm 4.1. We tested a set of 14 convex minimax problems. The first set of 13 problems are generalized from the unconstrained versions in [26] (a widely used collection of nonsmooth test problems) by imposing suitable constraints. In particular, two classes of constraints are tested here: (i) box constraints: We add a superscript " " to the problem name in Table 1 to indicate that the problem is subjected to a box, and a superscript "•" in Table 2 to indicate a ball constraint. The 14th problem is taken from [27] (Example 8).
The detailed data for the first 13 problems are listed below, where y 1 means the starting point for box constraints, and the starting points for ball constraints are always set to be a. For simplicity, we may write the vectors as row vectors, and use some MATLAB notations: mod(x,y) finds the remainder after division of x by y; ones(p,q) and zeros(p,q) are p-by-q matrices of ones and zeros, respectively. MXHILB: y 1 = 0.1ones(50,1); for j = 1, . . . , 50, if mod(j,2)=0, l j = 0.1, u j = 1.1, else l j = −∞, u j = +∞; a = (−ones (1,25), ones(1,25)), b = 2.
Some explanations are necessary for the above data. The bounds and balls for the first 8 problems and Goffin are simply set such that the unconstrained optimal solutions are excluded. The data for Maxquad are set such that the unconstrained solution is included. The starting points for Maxq and Maxl, and the bounds for Maxq, Maxl and MXHILB are suggested by [18].
Three algorithms are compared: PPPBM + --Algorithm 4.1 of our paper; PPPBM − --Algorithm 4.1 with the descent test criterion proposed in [8], i.e., replace (51) by (17) in Algorithm 4.1; PPBM --the algorithm in [20]. The comparisons between PPPBM + and PPPBM − aim to see the advantage of our new criterion (51) when compared with the criterion (17) proposed in [8]. The comparisons between PPPBM + and PPBM aim to see the advantage of the partial bundle model when compared with the classic bundle model.
All numerical experiments were implemented by using MATLAB R2013b. Subproblem (50) is cast as a QP and solved by MOSEK [42]. The parameters are selected as = 10 −4 , η = 10 −3 , σ = 5, α = 0.3. As it is well known that the choice of the proximal parameter t k and the bundle size may have considerable influence on computational results, we choose a suitablet for each test problems, and set a maximum value M = min{10n, 50} for the bundle sizes. The index i in Step 5 is selected basically in order 1, . . . , m, but we first test the indices that violate (i.e. (51) holds) in previous iterations, since they are more likely to violate at this time.
The numerical results are reported in Tables 1, 2 and 3. The notations are: the number of iterations NI; the number of descent steps ND; the number of component function evaluations Nf i ; the approximate optimal value f * ; the number of equivalent objective function evaluations N eq f , i.e., N eq f =Nf i /m. The comparisons for the first 13 problems are listed in Table 1 (for box constraints) and Table 2 (for ball constraints). From these two tables, we see that PPPBM + needs obviously less evaluations of the component functions than PPPBM − and PPBM. The comparisons for the 14th problem are listed in Table 3. For simplicity, we only list the comparisons between PPPBM + and PPBM, since PPPBM + obviously outperforms PPPBM − from Tables 1-2 and our experience. From Table 3, we also see that PPPBM + needs obviously less evaluations of the component functions than PPBM.
6. Conclusions. In this paper, we have presented a proximal-projection partial bundle method for convex constrained minimax problems. Interesting features of the proposed method are: (i) we extend the partial bundle method of [8] to convex constrained case; (ii) the number of component function evaluations may be reduced greatly by using the partial cutting-planes model; (iii) the constraint set C can be handled efficiently by the proximal-projection technique; (iv) a new descent test criterion is proposed, which has both theoretical and numerical advantages; (v) global convergence is proved, and preliminary numerical results show that our method is promising.
As future work, it would be interesting to test our method for large size problems or practical applications. It is also meaningful to consider the case where the component functions are not necessarily convex.