A proximal alternating direction method for multi-block coupled convex optimization

In this paper, we propose a proximal alternating direction method (PADM) for solving the convex optimization problems with linear constraints whose objective function is the sum of multi-block separable functions and a coupled quadratic function. The algorithm generates the iterate via a simple correction step, where the descent direction is based on the PADM. We prove the convergence of the generated sequence under some mild assumptions. Finally, some familiar numerical results are reported for the new algorithm.


(Communicated by Kok Lay Teo)
Abstract. In this paper, we propose a proximal alternating direction method (PADM) for solving the convex optimization problems with linear constraints whose objective function is the sum of multi-block separable functions and a coupled quadratic function. The algorithm generates the iterate via a simple correction step, where the descent direction is based on the PADM. We prove the convergence of the generated sequence under some mild assumptions. Finally, some familiar numerical results are reported for the new algorithm.
1. Introduction. In this paper, we consider the following problem: where θ i : R di → (−∞, +∞] are closed convex (not necessarily smooth) functions; x i ∈ R di , x = (x 1 , x 2 , . . . , x N ) ∈ R d ; H ∈ R d×d is a symmetric positive semidefinite matrix; g ∈ R d ; A i ∈ R m×di and b ∈ R m .
This kind of problem has many applications in signal and imaging processing, machine learning, management, statistics and engineering computation, traffic management, cloud services and biological data. For example, many cloud traffic management problems [14] can be transformed into the following form : x ij = y j , j = 1, 2 . . . , n, Here each ψ i (x i1 , · · · , x in ) represents the " level of satisfaction " of user i when she receives an amount x ij of resources from facility j, and each ψ i is a concave function. Generally, ψ i is a quadratic function. φ j (y j ) represents the operational expense or congestion cost when facility j allocates an amount y j of resources to all the users. All the functions φ j (j = 1, · · · , n) are convex. To learn more applications, please refer to [1,8,27,33,34], etc., for details.
The augmented Lagrangian function of problem (1) is where λ ∈ R m is the Lagrangian multiplier and β > 0 is a penalty parameter. When the coupled objective function vanishes (H = 0 and g = 0), the problem (1) is reduced to a separable convex programming with linear constraints, which can be solved successfully by many methods. Among these methods, alternating direction method of multiplier (ADMM) is very efficient for solving large scale problems, which solves the problems with the following iterative scheme: and updates the multiplier via where τ > 0 is a positive constant. The classic 2-block ADMM algorithm was originally introduced in early 1970s [16,18], and the convergence has been studied extensively under various conditions and different models. When θ 1 is strong convex, A 1 is identity mapping and A 2 is injective, [16] first proved the convergence of the classic 2-block ADMM for any τ ∈ (0, 2) if θ 2 is linear; When θ 2 is a general nonlinear convex function, the convergence was introduced by [15] for any τ ∈ (0, (1 + √ 5)/2). For the case of N ≥ 3, [5] showed that multi-block ADMM is not necessarily convergent by a counterexample. Many researchers put forward other kinds of conditions or used proximal terms and correction steps to ensure the convergence for solving a certain class of problem. For example, the authors of [24] proposed a Gaussian substitution method, which is a prediction-correction type method whose predictor was generated by the ADMM procedure and the correction was completed by a Gaussian back substitution procedure. In [26], a Jacobian-like method was presented. Each variant was generated by fixing other variants to the last iterates at the prediction step. Hence a parallel computation can be taken at each iteration, which is particularly suitable for large-scale problems. For details of prediction-correction method, please refer to [21,22,24,26] etc. and the references therein.
If the penalty parameter β is limited in a specific range and at least N-2 functions are strongly convex, the convergence of multi-block ADMM was proved by [4,20,7]. If certain error bound condition is satisfied, sufficient small dual stepsize guaranteed linearly convergent of multi-block ADMM [29]. The convergence rate of ADMM was also done by [9,10,32,29]. Recently it has become widely popular in big data related problems arising in machine learning, computer vision, signal processing, and so on; see [3,11,12,19] and the references therein.
In this paper, we consider how to solve (1) while H = 0, i.e. the objective function is not separable. Only few papers focused on the model in the literature [6,8,17,27,37],etc. In [6], the authors proposed a randomly permuted ADMM (RPADMM) to solve the nonseparable multi-block convex optimization, and proved its expected convergence while applying to solve a class of quadratic programming problems. In [17], the authors solved the problem by ADMM, where θ i : R di → (−∞, +∞] are all strongly convex and f (·) is convex function with Lipschitz continuous gradient. An upper bound minimization method was analyzed in [27]. During each iteration, an approximate augmented Lagrangian of the original problem was presented to avoid using the couple function directly. The algorithm updated the dual variable by a gradient ascent step followed by a block coordinate descent(BCD) step for a certain approximate version of the augmented Lagrangian. The convergence analysis depended on strict condition of the couple function and the chosen approximate function. In fact, choosing a proper approximate function for the algorithm is not an easy work due to strict conditions. Similar idea appeared in [8], which introduced majorization techniques and considered the 2-block case for general couple function. If the couple function is merely quadratic and separably smooth, majorized ADMM [8] is exactly same as the one proposed by [27] under a proper choice of the majorization function. As said in [28], "when the objective function is not separable across the variables, the convergence of the ADMM is still open, even in the case where N = 2 and each θ i is convex." We want to know when ADMM can converge while solving such nonseparable problem, and how to implement it efficiently. Based on the work of [37], we use a correction step to obtain a descent direction for multi-block problems, and then focus on analyzing the 3-block problems. Our algorithm need't choose proper majorization function or require strict assumptions.
This paper is organized as follows. In section 2, we give some notations and introduce our algorithm. In section 3, we discuss the convergence of 3-block PADM. Numerical results are reported in section 4. As last, we present some conclusions in section 5.
In this paper, we denote x 2 = √ x T x as the Euclidean-norm and x G = √ x T Gx as G-norm for given symmetric positive definite matrix G. Especially, when G is symmetric positive semidefinite, we denote x G = √ x T Gx.

2.
A proximal alternating direction method. To present the analysis in a compact way, we give some notations first.
where β > 0 is a penalty parameter, H is a symmetric positive semidefinite matrix. We can choose proper matrices Q 1 , · · · , Q N , which make Q and S to be symmetric positive definite. Xu and Han [37] proposed a proximal alternating direction method (PADM) to solve weakly coupled variational inequalities. In this paper, we extend it to multi-block case and apply it to solve (1). Now we present the N -block proximal alternating direction method(PADM) as follows: Algorithm 1. A proximal alternating direction method (denoted by PADM). Choosing the constant β > 0 and matrices Q 1 , Q 2 , · · · , Q N to be symmetric positive definite. For given iterate scheme ω k : is generated as follows.
Remark 2.2. In fact, it is easy to choose the matrices Q and S such that they are symmetric positive definite. Hence φ(ω k ,ω k ) can be regarded as the type of S-norm.

Remark 2.3.
In new approach, we achieve the next iterate via using the descent direction −M T (ω k −ω k ), we call it correction step. For more details about correction step, please refer [22] and [39].
3. Convergence of 3-block PADM. For convenient expression, we specify N = 3 and analyze the convergence of 3-block PADM for the problem (1). The results in this section can be generalized directly to the case when N > 3. We denote As a result, the problem (1) for N = 3 can be written as
(1) H is a symmetric positive semidefinite matrix.
(2) The solution set W * of the problem (15) is nonempty.
The following lemmas present some contract properties of the 3-block PADM, which play an important role in the subsequent analysis.
be generated by (10). Then for any where P and S 0 are defined in (8) and (9), respectively.
Corollary 1. Suppose that Assumption 3.1 holds. Letω k = (x k 1 ,x k 2 ,x k 3 ,λ k ) be generated by (10), then for any ω * ∈ W * , we have (1) The sequence ω k − ω * 2 is non-increasing.  In this section, we present the convergence result of the proposed algorithm.
According to Corollary 3.1, we get i.e.
From the optimal condition of (15), we have which implies ω * ∈ W * . From (26)- (27), for any > 0, there exists an integer K such that ω K −ω K 2 < 2 and ω K − ω * 2 < 2 . According to Corollary 3.1, we know the sequence ω k − ω * 2 is non-increasing. Therefore, for any k > K, 4. Numerical experiments. In the section, we report several numerical results to illustrate the validity of the multi-block proximal alternating direction method, which is denoted as PADM. We code our algorithm in MATLAB R2012b and all the tests are executed on a computer equipped with Inter(R) Core(TM)2 Duo CPU, 2.93 GHZ, 1.93GB RAM, running on Window XP system. Example 1. Solve a linear system of equations. Chen [5] proposed that the classic alternating direction method of multiplier (ADMM) to solve a linear system of equations can not be extended to three or more blocks directly, and gave a counterexample that could be diverge. Next, we consider the linear system of equations where A 1 ∈ R p×m , A 2 ∈ R p×n , A 3 ∈ R p×l , and A i (i = 1, 2, 3) follow the Gaussian distribution N (0, 1). We denote A = (A 1 , A 2 , A 3 ), and x = (x T 1 , x T 2 , x T 3 ) T . x 0 ∈ R p×(m+n+l) is randomly generated vector whose every component follows the Gaussian distribution N (0, 1). We choose b = Ax 0 . The example is a special case for model (15). We compare our method with LADM [24], GADMM [17], ALM1 [26], BSUM-M [27]. The parameter setting of these mentioned methods refer to relevant literatures, especially, penalty parameter of all algorithms are same for the same problem. The rest problem can be done in the same manner.  .

Example 2.
A generalized Nash equilibrium problem. This example is a GNEP from [13], which consists of two players. One player controls a twodimensional variable x = (x 1 , x 2 ), and another player controls a one-dimensional variable y ∈ R. This problem has an infinite number of solutions given by {(t, 11 − t, 8 − t)|t ∈ [0, 2]}. It is obvious that inequality constraints are active at all the solutions of the problem. Hence, it has same solutions as the problem with equality constraint. Then the GNEP is equivalent to the following problem with three players: Each player controls a one-dimensional variable, the first player and the second player have the same objective function. In this experiment, we set Q = 10, β = 0.8, the maximal number of iterations is 100 and the stopping criterion is ω k −ω k ≤ 10 −10 . We use self-adaptive projection method [23] to solve the subproblems (10). Regardless of the starting point, we find that PADM always gets a solution (0, 11,8) and ω k −ω k = 9.0091e − 09. From Fig.2 (a), we see that each player can find optimal solution respectively. From the Fig.2(b), we find that PADM has better results than ALM [35] and GADMM [17], while BSUM-M [27] performs better. The convergence rate of the BSUM-M [27] is faster than other methods and its accuracy is also higher than other methods. We observe that when the stepsize α k is fixed, PADM still obtains a good result. Fig 3  shows the results when α k = 0.2 and β = 0.3. It is necessary to note that choosing a proper stepsize is not easy work for BSUM-M, which reveals the significance of self-adaptive stepsize.  The basis pursuit problem is deemed to recover the sparsity vector x from a small number of observations b applied in compressive sensing effectively. We divide the sparsity vector x into N-block, i.e., x = (x 1 , x 2 , · · · , x N ) T , x i ∈ R ni , i=1,2,. . . ,N. Therefore, (32) can be rewritten as We generated A ∈ R m×n and b ∈ R m similarly to the Example 1. Let m = 250, n = 1000. We randomly generate matrix A following the Gaussian distribution N (0, 1), and orthonormalize its rows. Then we generate a sparse vector x 0 ∈ R n with 25 nonzero entries, each component of x 0 follows Gaussian distribution N (0, 1). The observed vector b is generated by b = Ax 0 + ξ, where ξ ∼ N (0, 10 −3 I).
The maximal number of iterations is set as 500, the stopping criterion is as well as Example 2, and β = 0.05, ε = 10 −15 . We can clearly see the variation of the error with the iteration increased in Fig 4, and PADM has a comparison with FISTA [2], PALM [38], LADM [24], GADMM [17], ALM1 [26], BSUM-M [27]. The results show that PADM obtained superior performance over all other algorithms.
Example 4. The constrained LASSO. We consider the constrained LASSO (CLASSO) problem [30] as follows: where A ∈ R p×n is a feature matrix, b ∈ R n is an observed vector. The l 1 norm is defined as ||x|| 1 = n i=1 |x i |, was known to promote the sparsity of the solution. Many commonly applied statistical problem, such as the fused lasso, generalized lasso, monotone curve estimation etc., can be expressed as special cases of (34). The problem has the same structure as model (1). In this experiment, A ∈ R p×n , C ∈ R m×n and a sparse vector x 0 ∈ R n with 12 nonzero entries is randomly generated whose each element follows the Gaussian distribution N (0, 1). Let p = 120, n = 1200, m = 20. We get vector b = Ax 0 +γ, d = Cx 0 , where each component of γ ∼ N (0, 10 −3 I), and choose the model parameter λ = 0.1. The maximal number of iterations is set as 100, and the stopping criteria ε = 10 −15 . Under this setting, PADM compares with ALM [35] and GADMM [17], BSUM-M [27]. From the Fig.5, we know that the result obtained by PADM is far better than other algorithms. The comparative analysis suggests that PADM is efficient to solve model (1).

5.
Conclusion. In this paper, we proposed a multi-block proximal alternating direction method to solve the linearly constrained convex minimization model with an objective function which is the sum of several separable functions and a coupled quadratic function. Under the mild condition that H is symmetric positive semidefinite, we show the global convergence of the algorithm. The numerical experiments suggested that our method is effective and competitive while comparing with other methods.