A MODIFIED STRICTLY CONTRACTIVE PEACEMAN-RACHFORD SPLITTING METHOD FOR MULTI-BLOCK SEPARABLE CONVEX PROGRAMMING

. We propose a modiﬁed splitting method for a linearly constrained minimization model whose objective function is the sum of three convex func- tions without coupled variables. Our work is mainly inspired by the recently proposed strictly contractive Peaceman-Rachford splitting method (SC-PRSM) for a two-block separable convex minimization model. For the new method, we prove its convergence and estimate its convergence rates measured by iteration complexity in the nonergodic sense. We also test the SC-PRSM on the continuous resource allocation problem, and the numerical results show that our method has a competitive performance with the direct extension of ADMM which usually works well in practice but may fail to converge in theory.

1. Introduction. We consider the following convex minimization model with linear constraints and a separable objective function that is the sum of three convex functions without coupled variables: min θ 1 (x) + θ 2 (y) + θ 3 (z) x ∈ X , y ∈ Y, z ∈ Z, where A ∈ m×n1 , B ∈ m×n2 , C ∈ m×n3 , b ∈ m , X ⊆ n1 , Y ⊆ n2 and Z ⊆ n3 are closed convex sets, θ 1 : X → , θ 2 : Y → and θ 3 : Z → are convex (but not necessarily smooth) functions. Throughout this paper, the matrices A, B and C are assumed to be full column rank and the set collected by the KKT points of (1), denoted by Ω * , is assumed to be nonempty. This kind of separable convex programming models arises frequently in a broad spectrum of application domains such as inverse problems, statistical learning, image processing, computer vision, wireless communication network and so on. For such applications, usually each function in the objective function represents some specific explanation such as a data-fidelity or regularization term. It is thus a common sense that we should take full advantage of these functions' properties/features individually when designing efficient algorithms, instead of treating all the functions/variables as a whole. For a similar separable model but with only a two-block objective function: min θ 1 (x) + θ 2 (y) x ∈ X , y ∈ Y, i.e., θ 3 (z) = 0 and C = 0 in (1), the alternating direction method of multipliers (ADMM), which was proposed originally in [11], becomes extremely popularly recently, because of its simplicity of implementation, wide applicability in various areas and numerical efficiency. The iterative scheme of ADMM for (2) reads as    x k+1 := arg min L(x, y k , λ k ) | x ∈ X , y k+1 := arg min L(x k+1 , y, λ k ) | y ∈ Y , λ k+1 := λ k − β(Ax k+1 + By k+1 − b), where L(x, y, λ) is the augmented Lagrangian function of (2) with the Lagrange multiplier λ and the penalty parameter β > 0. Note that ADMM originates from the Douglas-Rachford splitting method (DRSM) in [8,23], see e.g., [9]. Another equally important splitting method in PDE literature is the Peaceman-Rachford splitting method (PRSM) in [23,27]. Applying the PRSM to the model (2), we obtain the scheme Although the scheme (4) differs from (3) for (2) only in that the Lagrangian multiplier is updated twice in a symmetric way at each iteration and the scheme indeed works well for some applications (see [10]), it is elaborated in [5] that the scheme (4) is not necessarily convergent because the sequence generated by (4) is not necessarily strictly contractive. This difficulty inspired the so-called strictly contractive PRSM (SC-PRSM) method in [14]: Note that the parameter α ∈ (0, 1) in (5) serves as a factor enforcing the strict contraction of the iterative sequence and thus ensuring the convergence. It has been shown in [14] that the SC-RPSM usually outperforms ADMM for a wide range of applications. The success of ADMM and SC-PRSM encourages us to extend them to the threeblock model (1) in the straightforward way and thus obtain the schemes and where Despite of some empirical efficiency in e.g., [28,29], the scheme (6) has been proved in [3] to be not necessarily convergent. Some study on how to ensure the convergence of the scheme (6) under more assumptions can be found in the literature, e.g., [2,4,19,21,22]. For (7), it is also shown in [13] that the convergence is not necessarily guaranteed, either. The failure of convergence of the schemes (6) and (7) indicates that how to extend the well-known and simple schemes (3) and (5) to the more complicated model (1) is not straightforward and we should discuss the algorithmic design sophisticatedly when extending (3) and (5) to the multi-block model (1). In the literature, there are a set of papers focusing on how to modify the direct extension of ADMM (6) so that the convergence is ensured while the simplicity of implementation and good numerical performance is inherited, see, e.g, [12,15,16,20]. As comparison, the research on extending the scheme (5) to multi-block separable convex models is still in infancy.
In the following, we will propose a modified strictly contractive Peaceman-Rachford splitting method (SC-PRSM) -Algorithm 1 that inherits the advantages of the scheme (5) while its convergence is guaranteed under mild assumptions. Denote
Step 2. Update the Lagrange multiplier

SU-HONG JIANG AND MIN LI
Step 3.
Step 5. where and 0 is a matrix with all zero entries in appropriate dimensionality. Set k := k + 1 and go to Step 1. Our modification in the new algorithm is to twist the scheme (7) slightly and then employ a simple correction step with a constant step size to correct the output. The iterative sequence is thus proved to be strictly contractive with respect to the solution set of (1); furthermore, the strict contraction property ensures the convergence of the new algorithm and enables us to estimate its worst-case convergence rate in terms of iteration complexity.
The rest of this paper is organized as follows. In Section 2, we summarize some preliminaries and develop some simple results that are useful for further analysis. Then, we prove the contractive properties and global convergence for the sequence generated by Algorithm 1 in Section 3. Section 4 is devoted to establishing the convergence rate in the nonergodic sense for Algorithm 1. In Section 5, we investigate the numerical performance of our modified SC-PRSM applied to the resource allocation problem. Finally, we draw some conclusions in Section 6.

2.
Preliminaries. In this section, we summarize some useful preliminaries known in the literature and prove some simple conclusions for further analysis.
2.1. Variational reformulation of (1). First, as the work [17,18] for analyzing the convergence rate, we need a variational inequality (VI) reformulation of the model (1) and a characterization of its solution set. More specifically, solving (1) is where and θ(u) := θ 1 (x) + θ 2 (y) + θ 3 (z). (14c) Since the mapping F (w) defined in (14b) is affine with a skew-symmetric matrix, it is monotone. [7] is useful for establishing a worst-case o(1/t) convergence rate on the consecutive iterates distance. we list it as follows.
2.2. Some notations. As mentioned in [1] for ADMM, the variable x is an intermediate variable during the iteration since it essentially requires only v k = (y k , z k , λ k ) in (9)- (12) to generate the (k + 1)-th iterate. For this reason, we define the notation and it suffices to analyze the convergence rate of the sequence {v k } to the set V * in order to study the convergence rate of the sequence {w k } generated by (9)-(12). Note V * is also closed and convex.
Then, we define some matrices in order to present our analysis in a compact way. Let and Below we prove some assertions regarding the matrices just defined. These assertions will be used in our theoretical analysis about the convergence rate of (9)-(12); their role is to make our proof presentable in compact notation. Proof. It's easy to prove that the matrix H has the following decomposition form It is not difficult to verify that the matrix is positive definite if and only if α ∈ (0, 1/2). Note that B and C are full column rank. The assertion is thus proved.

SU-HONG JIANG AND MIN LI
Proof. Using the definitions of the matrices M , Q and H, by a simple manipulation, we obtain HM The first assertion (17) is proved. Consequently, we get Using (15) and the above equation, we have Denote H α,γ by By direct computation, we are able to show all the leading principle minors are positive definite when α ∈ (0, 1/2) and γ ∈ (0, 1/2), and consequently H α,γ 0. Note that B and C are full column rank. We get the assertion (18). The lemma is proved.
3. Convergence analysis. In this section, we analyze the contraction property and global convergence for the sequence {v k } generated by the proposed algorithm. From now on, we assume the parameters α and γ are chosen in the interval (0, 1/2). Then according to Lemmas 2.2 and 2.3, the matrices H and G are positive definite. To further alleviate the notation in our analysis, first, we need to define an auxiliary sequence {w k } asw where (x k+1 ,ỹ k ,z k ,λ k ) is generated by (9) and (11). With the notation ofw k , the proposed algorithm can be written as Now, we start to prove some properties for the sequence {w k } defined in (20).
Lemma 3.1. For given v k ∈ Y ×Z× m , let w k+1 andw k be generated by Algorithm 1. Then, we havew k ∈ Ω and where the matrix Q is defined in (15).
Proof. Since x k+1 =x k , by deriving the first-order optimality condition of the x-minimization problem in (21), we have Using the notation ofλ k in (21), the inequality (23) can be written as Similarly, by deriving the first-order optimality condition of the y-minimization problem in (21), we get Again, using the notation ofλ k in (21), we have Consequently, it follows from (25) that
The following lemma aims at further bounding the term (v−ṽ k ) T Q(v k −ṽ k ) found in Lemma 3.1 by the difference of two quadratic terms involving two consecutive iterates of the sequence {v k } and a quadratic term involving v k and the auxiliary iterateṽ k . This refined bound is convenient for the manipulation over the whole sequence {v k } recursively and thus for establishing the convergence rate of {v k } in the nonergodic sense.
Proof. By using Q = HM and M (v k −ṽ k ) = 1 γ (v k − v k+1 ) (see (12)), it follows that For the vectors a, b, c, d in the same space and a matrix H with appropriate dimensionality, we have the identity In this identity, we take and submit it to the right-hand side of (31). The resulting equation is Now, we deal with the last term of the right-hand side of (32). By using (12) and (17), we get Substituting it in (32), we obtain the assertion (30). The proof is complete. Now we are ready to present an inequality which shows the contraction property for the iterative sequence.
Theorem 3.4. For given v k ∈ Y × Z × m , let w k+1 be generated by Algorithm 1, and H be defined in (16). Then, there is a constant c 0 > 0 such that Proof. Replacing the right-hand side term in (22) with the inequality (30), and using we obtain Setting w = w * in the above inequality, where w * being an arbitrary solution point in Ω * , we get

SU-HONG JIANG AND MIN LI
Since G 0, there is a constant c 0 > 0 such that G c 0 M T HM . Thus from (12), Thus, the assertion (33) is proved. Proof. It follows from (35) that the sequence {v k } is Fejér-monotone with respect to V * , and Since H 0 and G 0, the sequences {v k } and {ṽ k } are bounded, and Furthermore, by using (11) and Ax * + By * + Cz * = b, for any w * ∈ Ω * , we know and thus { x k+1 − x * A T A } is bounded. This shows that the sequence {x k } is also bounded as A is full column rank. Therefore, the sequence {w k = (x k+1 ,ṽ k )} is bounded.
Since the sequence {w k } is bounded, there is a subsequence {w kj } which converges to a cluster point, say w ∞ := (x ∞ , y ∞ , z ∞ , λ ∞ ) ∈ Ω. Taking limits on both sides of (22) along the subsequences {w kj } and {w kj }, and using (36), we obtain that To complete the proof, we show next that w ∞ is actually the unique limit of Note that Therefore, from (36) and (38), we get i.e., lim Combining this with (36), we also have It follows from (36), (37) and w ∞ ∈ Ω * that Since A is full column rank, we also obtain that lim k→∞ x k = x ∞ . Therefore, we have shown that the sequences {w k := (x k , v k )} and {w k := (x k+1 ,ṽ k )} both converge to w ∞ ∈ Ω * . The proof is complete.
4. Convergence rate in the nonergodic sense. In this section, we show that the worst-case convergence rate of Algorithm 1 is o(1/t) in the nonergodic sense. Our starting point for the analysis is the inequality (33), and a crucial property is the monotonicity of the sequence We first take a closer look at the assertion (22) in Lemma 3.1.
Lemma 4.1. Let the sequences {w k } and {w k } be generated by Algorithm 1, and the matrix Q be defined in (15). Then, we have Proof. Setting w =w k+1 in (22), we have Note that (22) is also true for k := k + 1 and thus Setting w =w k in the above inequality, we obtain Adding (41) and (42) and using the monotonicity of F , we get (40) immediately.
Lemma 4.2. Let the sequences {w k } and {w k } be generated by Algorithm 1, and the matrices M , Q and H be defined in (13), (15) and (16), respectively. Then, we have Proof. Adding the equation to both sides of (40), we get By using (see (12) and (17)) to the term v k − v k+1 in the left-hand side of (44), we obtain Proof. According to Lemma 4.3, the sequence { v k − v k+1 2 H } is monotonically non-increasing. The assertion (48) follows from (46) and Lemma 2.1 immediately. The proof is complete.
Corollary 4.6. Let {w t := (x t ,ỹ t ,z t ,λ t )} be defined by (20). Then we have where "dist" denotes the distance between a point to a convex set.
Proof. From (29), we see clearly that and thus This together with Lemma 4.5 proves this corollary.
Note that the left side in (49) is the square residual of the KKT system at w t . Hence Corollary 4.6 actually establishes a worst case o(1/t) convergence rate measured by the KKT square residual for Algorithm 1.

5.
Numerical results. In this section, we test an allocation problem arising from market mechanisms (see, e.g., [6,26,30]) to verify the efficiency of the modified SC-PRSM algorithm. Specifically, we consider a market in which n resources are allocated by 3 users. The cost function of the allocation to the i-th user is defined as θ i and the amount of each resource is defined as b j (j = 1, 2, . . . , n). The market manager wants to obtain a distributing strategy of the allocation which minimizes the total cost subject to the capacity constraints of the resources. Clearly, the allocation problem could be modeled as As [6], the cost functions are constructed based on the one-dimensional function φ listed in Table 1. Specifically, for any given function φ(s i ), we can easily extend it to n dimensions by defining where s = (s 1 , s 2 , . . . , s n ), w = (w 1 , w 2 , . . . , w n ) and k = (k 1 , k 2 , . . . , k n ). Note that for each cost function in Table 1, its proximity operator either has a closed form solution or can be computed by solving a one-dimensional polynomial system. As a result, the proximal operator of θ i can also be computed by a fast procedure and hence all the subproblems in the modified PRSM (or ADMM) can be solved with a fast speed. Piecewise quadratic cost For the allocation problem (51), we test the proposed modified SC-PRSM (abbreviated as MC-ADM), the alternating proximal gradient method [24] (abbreviated as APGM) and the direct extension of ADMM (6) (abbreviated as EADMM). Although (6) could fail to converge, its efficiency has been verified numerically in many applications and it could be served as a benchmark for measuring the performance of the ADMM type methods. In the implementation, we set b = nI in the model; β = 1 and α = 0.49 for the MC-ADM; β = 1, r 1 = 1/6 and r 2 = 1/6 for the APGM; and β = 1 for the EADMM. Figure 1 depicts the numerical performance of the three methods for the allocation problems with n = 100, 400 and 800. Clearly, the EADMM (without theoretical convergence) performs the best for all cases. Our proposed MC-ADM with provable convergence outperforms the APGM and has a competitive performance with the EADMM. Specifically, the MC-ADM runs a bit slower than the EADMM at the first few iterations but they almost attain the optimal value at the same time. Conclusions. This paper proposes a modified SC-PRSM for solving multi-block separable convex programming problems. The proposed algorithm perfectly inherits the advantage of the direct extension of ADMM (6) and SC-PRSM (7), and it can fully exploit the properties of θ i 's. We also discuss the convergence and convergence rate in the nonergodic sense of the proposed algorithm under mild conditions and verify the efficiency of our algorithm by the resource allocation problem.