Solving monotone inclusions involving parallel sums of linearly composed maximally monotone operators

The aim of this article is to present two different primal-dual methods for solving structured monotone inclusions involving parallel sums of compositions of maximally monotone operators with linear bounded operators. By employing some elaborated splitting techniques, all of the operators occurring in the problem formulation are processed individually via forward or backward steps. The treatment of parallel sums of linearly composed maximally monotone operators is motivated by applications in imaging which involve first- and second-order total variation functionals, to which a special attention is given.


Introduction
In applied mathematics, a wide variety of convex optimization problems such as singleor multifacility location problems, support vector machine problems for classification and regression, problems in clustering and portfolio optimization as well as signal and image processing problems, all of them potentially possessing nonsmooth terms in their objectives, can be reduced to the solving of inclusion problems involving mixtures of monotone set-valued operators.
Our problem formulation is inspired by a real-world application in imaging (cf. [14,26]), where first-and second-order total variation functionals are linked via infimal convolutions in order to reduce staircasing effects in the reconstructed images. The problem under investigation follows. Problem 1.1. Let H be a real Hilbert space, z ∈ H, let A : H → 2 H be a maximally monotone operator, and C : H → H be a monotone µ −1 -cocoercive operator for µ ∈ R ++ . Furthermore, for every i = 1, . . . , m, let G i , X i , Y i be real Hilbert spaces, r i ∈ G i , B i : X i → 2 X i and D i : Y i → 2 Y i be maximally monotone operators and consider the nonzero linear bounded operators L i : H → G i , K i : G i → X i and M i : G i → Y i . The problem is to solve the primal inclusion together with its dual inclusion such that ∃x ∈ H : We provide in this paper two iterative methods of forward-backward and forwardbackward-forward type, respectively, for solving this primal-dual pair of monotone inclusion problems and investigate their asymptotic behavior. A very similar problem formulation was recently investigated in [3], however, the proposed iterative scheme there relies on the forward-backward-forward method and is different from the corresponding one which we propose here. However, since it makes a forward step less, the forward-backward method is more attractive from the perspective of its numerical implementation. This phenomenon is supported by our experimental results reported in Section 5.
The article is organized as follows. In Section 2 we introduce notations and preliminary results in convex analysis and monotone operator theory. In Section 3 we formulate the two algorithms and study their convergence behavior. In Section 4 we employ the outcomes of the previous one to the simultaneously solving of convex minimization problems and their conjugate dual problems. Numerical experiments in the context of image denoising problems with first-and second-order total variation functionals are made in Section 5.

Notation and preliminaries
We are considering the real Hilbert space H endowed with an inner product ·, · and associated norm · = ·, · . The symbols and → denote weak and strong convergence, respectively. Having the sequences (x n ) n≥0 and (y n ) n≥0 in H, we mind errors in the implementation of the algorithm by using the following notation taken from [3] (x n ≈ y n ∀n ≥ 0) ⇔ n≥0 x n − y n < +∞. (2.
and is taken to be the empty set, otherwise. For a linear bounded operator L : H → G, where G is another real Hilbert space, the operator L * : G → H, defined via Lx, y = x, L * y for all x ∈ H and all y ∈ G, denotes its adjoint.
Having two proper functions f, g : H → R, their infimal convolution is defined by Moreover, for f ∈ Γ(H) and γ ∈ R ++ the subdifferential ∂(γf ) is maximally monotone (cf. [24]) and it holds J γ∂f = (Id + γ∂f ) −1 = Prox γf . Here, Prox γf (x) denotes the proximal point of γf at x ∈ H, representing the unique optimal solution of the optimization problem inf y∈H γf (y) In this particular situation, relation (2.2) becomes Moreau's decomposition formula When Ω ⊆ H is a nonempty, convex and closed set, the function δ Ω : H → R, defined by δ Ω (x) = 0 for x ∈ Ω and δ Ω (x) = +∞, otherwise, denotes the indicator function of the set Ω. For each γ > 0 the proximal point of γδ Ω at x ∈ H is nothing else than which is the projection of x on Ω. Finally, when for i = 1, . . . , m the real Hilbert spaces H i are endowed with inner product ·, · H i and associated norm · H i = ·, · H i , we denote by . . , q m ) ∈ H, this real Hilbert space is endowed with inner product and associated norm defined via

The primal-dual iterative schemes
Within this section we provide two different algorithms for solving the primal-dual inclusion introduced in Problem 1.1 and discuss their asymptotic convergence. In Subsection 3.2, however, the assumptions imposed on the monotone operator C : H → H are weakened by assuming that C is only µ-Lipschitz continuous for some µ ∈ R ++ . In the following let be and p = (p 1 , . . . , p m ) ∈ X , q = (q 1 , . . . , q m ) ∈ Y, y = (y 1 , . . . , y m ) ∈ G.
We say that (x, p, q, y) If (x, p, q, y) ∈ H ⊕ X ⊕ Y ⊕ G is a primal-dual solution to Problem 1.1, then x is a solution to (1.1) and (p, q, y) is a solution to (1.2). Notice also that Thus, if x is a solution to (1.1), then there exists (p, q, y) ∈ X ⊕ Y ⊕ G such that (x, p, q, y) is a primal-dual solution to Problem 1.1 and if (p, q, y) is a solution to (1.2), then there exists x ∈ H such that (x, p, q, y) is a primal-dual solution to Problem 1.1.
Remark 3.1. The notations (2.1) have been introduced in order to allow errors in the implementation of the algorithm, without affecting the readability of the paper in the sequel. This is reasonable since errors preserve their summability under addition, scalar multiplication and linear bounded mappings.

An algorithm of forward-backward type
In this subsection we propose a forward-backward type algorithm for solving Problem 1.1 and prove its convergence by showing that it can be reduced to an error-tolerant forward-backward iterative scheme. Algorithm 3.1. Let x 0 ∈ H, and for any i = 1, . . . , m, let p i,0 ∈ X i , q i,0 ∈ Y i and z i,0 , y i,0 , v i,0 ∈ G i . For any i = 1, . . . , m, let τ, θ 1,i , θ 2,i , γ 1,i , γ 2,i and σ i be strictly positive real numbers such that Furthermore, let ε ∈ (0, 1), (λ n ) n≥0 a sequence in [ε, 1] and set

5)
and consider the sequences generated by Algorithm 3.1. Then there exists a primal-dual solution (x, p, q, y) to Problem 1.1 such that Proof. We introduce the real Hilbert space (3.6) We introduce the maximally monotone operators Further, consider the set-valued operator which is maximally monotone, since A, B and D are maximally monotone (cf. [ The operator S is skew, as well, hence maximally monotone. As dom S = K, the sum M + S is maximally monotone (see [2, Corollary 24.4(i)]). Finally, we introduce the monotone operator which is, obviously, µ −1 -cocoercive. By making use of (3.2), we observe that From here it follows that Further, for positive real values τ, and define the linear bounded operator It is a simple calculation to prove that V is self-adjoint. Furthermore, the operator V is ρ-strongly positive with The fact that ρ is a positive real number follows by the assumptions made in Algorithm 3.1. Indeed, using that 2ab ≤ αa 2 + b 2 α for every a, b ∈ R and every α ∈ R ++ , it yields for any i = 1, . . . , m Consequently, for each x = (x, p, q, z, y, v) ∈ K, using the Cauchy-Schwarz inequality and (3.8), it follows that In consideration of (2.1), the algorithmic scheme (3.4) can equivalently be written in the form Also, for any n ≥ 0, we consider sequences defined by 1,n , . . . , e 1,m,n ) ∈ G e 2,n = (e 2,1,n , . . . , e 2,m,n ) ∈ G, e 3,n = (e 3,1,n , . . . , e 3,m,n ) ∈ G (3.11) that are summable in the corresponding norm. Further, by denoting for any n ≥ 0 e n = (a n , b n , d n , 0, 0, 0) ∈ K e τ n = an τ , bn θ 1 , dn θ 2 , e 1,n , e 2,n , e 3,n ∈ K, which are also terms of summable sequences in the corresponding norm, it yields that the scheme in (3.10) is equivalent to We now introduce the notations and the summable sequence with terms e V n = V −1 ((V + S)e n − e τ n ) for any n ≥ 0. Then, for any n ≥ 0, we have (3.14) Taking into account that the resolvent is Lipschitz continuous, the sequence having as terms is summable and we have Thus, the iterative scheme in (3.12) becomes which shows that the algorithm we propose in this subsection has the structure of a forward-backward method.
In addition, let us observe that We then introduce the Hilbert space K V with inner product and norm respectively defined, for x, y ∈ K, via Since M + S and Q are maximally monotone on K, the operators A K and B K are maximally monotone on K V . Moreover, since V is self-adjoint and ρ-strongly positive, one can easily see that weak and strong convergence in K V are equivalent with weak and strong convergence in K, respectively. By making use of V −1 ≤ 1 ρ , one can show that B K is (µ −1 ρ)-cocoercive on K V . Indeed, we get for x, y ∈ K V that (see, also, [29, Eq. (3.35)]) As our assumption imposes that 2µ −1 ρ > 1, we can use the statements given in [17,Corollary 6.5] in the context of an error tolerant forward-backward algorithm in order to establish the desired convergence results.
(i) By Corollary 6.5 in [17], the sequence (x n ) n≥0 converges weakly in K V (and therefore in K) to some x = (x, p, q, z, y, v) ∈ zer (A K + B K ) = zer (M + S + Q). By (3.7), it thus follows that (x, p, q, y) is a primal-dual solution with respect to Problem 1.1.
(ii) From [17] it follows and therefore we have B K x n → B K x n or, equivalently, Qx n → Qx as n → +∞. Considering the definition of Q, one can see that this implies Cx n → Cx as n → +∞. As C is uniformly monotone, there exists an increasing function φ C : [0, +∞) → [0, +∞] vanishing only at 0 such that The boundedness of (x n − x) n≥0 and the convergence Cx n → Cx further imply that x n → x as n → +∞.
Then the scheme (3.15) reads and it can be shown to convergence under the relaxed assumption that (λ n ) n≥0 ⊆ [ε, 2 − ε], for ε ∈ (0, 1) (see, for instance, [16,17,23]). are accessed via their resolvents (so-called backward steps), also by taking into account the relation between the resolvent of a maximally monotone operator and its inverse given in (2.2). (iii) The possibility of performing a forward step for the cocoercive monotone operator C is an important aspect, since forward steps are usually much easier to implement than resolvents (resp. proximity operators). Due to the Baillon-Haddad theorem (cf. [2, Corollary 18.16]), each µ-Lipschitzian gradient with µ ∈ R ++ of a convex and Fréchet differentiable function f : H → R is µ −1 -cocoercive.

An algorithm of forward-backward-forward type
In this subsection we propose a forward-backward-forward type algorithm for solving Problem 1.1, with the modification that the operator C : H → H is assumed to be µ-Lipschitz continuous for some µ ∈ R ++ , but not necessarily µ −1 -cocoercive.

Theorem 3.2. In Problem 1.1, let C : H → H be µ-Lipschitz continuous for
In the following we use the sequences in (3.11) for modeling summable errors in the implementation. In addition we consider the summable sequences in K with terms defined for any n ≥ 0 as e n = (a n , b n , d n , 0, 0, 0) and e n = (0, 0, 0, e 1,n , e 2,n , e 3,n ).
Note that (3.20) can equivalently be written as We observe that for one has x n = J γnA K (x n − γ n B K x n ) + e K n for any n ≥ 0 and it holds [ e n + e n ] < +∞.
Thus, (3.26) becomes which is an error-tolerant forward-backward-forward method in K whose convergence has been investigated in [13]. Note that the exact version of this algorithm was proposed by Tseng in [28]. x and x n x. In consideration of (3.7), it follows that (x, p, q, v) is a primal-dual solution to Problem 1.1, x n x, x n x, and for i = 1, ..., m Remark 3.4. (i) In contrast to Algorithm 3.1, the iterative scheme in Algorithm 3.2 requires twice the amount of forward steps and is therefore more time-intensive. On the other hand, many steps in Algorithm 3.2 can be processed in parallel. (ii) A related monotone inclusion problem involving linearly composed parallel sums of maximally monotone operators was investigated in [3], by proposing an iterative scheme which can be also reduced to a forward-backward-forward type iterative scheme. However, the algorithm there is different to the one given in Algorithm 3.2.

Application to convex minimization
In this section we employ the algorithms introduced in the previous one in the context of solving primal-dual pairs of convex optimization problems. The problem under consideration is as follows.
Problem 4.1. Let H be a real Hilbert space, z ∈ H and f, h ∈ Γ(H) such that h is differentiable with µ-Lipschitzian gradient for µ ∈ R ++ . Furthermore, for every i = 1, . . . , m, let G i , X i , Y i be real Hilbert spaces, r i ∈ G i , let g i ∈ Γ(X i ) and l i ∈ Γ(Y i ) and consider the nonzero linear bounded operators L i : together with its conjugate dual problem For every x ∈ H and (p, and, for any i = 1, . . . , m and y i ∈ G, which means that for the primal-dual pair of optimization problems (4.1)-(4.2) weak duality is always given. Considering (x, p, q, y) ∈ H ⊕ X ⊕ Y ⊕ G a solution of the primal-dual system of monotone inclusions (4. 4) it follows that x is an optimal solution to (4.1) and that (p, q) is an optimal solution to (4.2). Indeed, as h is convex and everywhere differentiable, it holds On the other hand, since g i ∈ Γ(X i ) and l i ∈ Γ(Y i ), we have for any i = 1, . . . , m By summing up these equations and using (4.4), it yields which, together with (4.3), leads to the desired conclusion. In the following, by extending the result in [3, Proposition 4.2] to our setting, we provide sufficient conditions which guarantee the validity of (3.5) when applied to convex minimization problems. To this end we mention that the strong quasi-relative interior of a nonempty convex set Ω ⊆ H is defined as Proposition 4.1. Suppose that the primal problem (4.1) has an optimal solution, that and 0 ∈ sqri E, (4.6) where Then Proof. Let x ∈ H be an optimal solution to (4.1). Since (4.6) holds, we have that ( . . , m. Further, because of (4.5), [2, Proposition 15.7] guarantees for any i = 1, . . . , m the existence of y i ∈ G i such that Hence, (x, y) = (x, y 1 , . . . , y m ) is an optimal solution to the convex optimization problem By denoting Thus, 0 ∈ ∂(f + g • L)(x, y).
then (4.6) is also true. This follows by using that in finite dimensional spaces the strong quasi-relative interior of a convex set is nothing else than its relative interior and by taking into account the properties of the latter.

An algorithm of forward-backward type
When applied to (4.4), the iterative scheme introduced in (3.4) and the corresponding convergence statements read as follows.
Furthermore, let ε ∈ (0, 1), (λ n ) n≥0 a sequence in [ε, 1] and set and consider the sequences generated by Algorithm 4.1. Then there exists an optimal solution x to (4.1) and optimal solution (p, q) to (4.2) such that (i) x n x, p i,n p i and q i,n q i for any i = 1, . . . , m as n → +∞. (ii) if h is uniformly convex at x, then x n → x as n → +∞.

An algorithm of forward-backward-forward type
On the other hand, when applied to (4.4), the iterative scheme introduced in (3.20) and the corresponding convergence statements read as follows.

Algorithm 4.2.
Let x 0 ∈ H, and for any i = 1, . . . , m, let p i,0 ∈ X i , q i,0 ∈ Y i , and z i,0 , y i,0 , v i,0 ∈ G i . Set let ε ∈ 0, 1 β+1 , (γ n ) n≥0 a sequence in ε, 1−ε β and set where addition, multiplication and square root of vectors are understood to be componentwise. Further, we consider the forward difference matrix which models the discrete first-order derivative. Note that −D T k D k is then an approximation of the second-order derivative. We denote by A ⊗ B the Kronecker product of the matrices A and B and define where D x and D y represent the vertical and horizontal difference operators, respectively. Further, we define the discrete second-order derivatives matrices and and notice that D 2 = L 1 D 1 . For other discrete second-order derivates involving also mixed partial derivates (in horizontal-vertical direction and vice versa) we refer to [26]. The two different convex optimization problems we considered for our numerical experiments were taken from [26, Example 2.2 and Example 3.1] and readed and respectively, where α 1 , α 2 ∈ R ++ are the regularization parameters and the regularizers correspond to anistropic total variation functionals. One can notice that in both settings a condition of type (4.5) is fulfilled, thus the infimal convolutions are proper, convex and lower semicontinuous functions. Due to the fact that the objective functions of the two optimization problems are proper, strongly convex and lower semicontinuous, they have unique optimal solutions. Finally, in the light of Remark 4.1, a condition of type (4.6) holds, thus, according to Proposition 4.1, the hypotheses of the theorems 4.2 and 4.3 are for both optimization problems ( 2 2 -IC/P) and ( 2 2 -MIC/P) fulfilled. In order to compare the performances of our two primal-dual iterative schemes with algorithms relying on (augmented) Lagrangian and smoothing techniques, using the definition of the infimal convolution, we formulated (5.3) and (5.4) as optimization problems with constraints of the form and ( 2 2 -MIC/P) inf x,y 1 ,y 2 ,z 1 2 x − b 2 + α 1 y 1 1,ω 1 + α 2 z 1,ω 2 , subject to D 1 −Id 0 L 1 x y 2 = y 1 z (5.6) respectively. We performed our numerical tests on the colored test image lichtenstein (see Figure  5.1) of size 256 × 256 making each color ranging in the closed interval from 0 to 1. By adding white Gaussian noise with standard deviation 0.08, we obtained the noisy image b ∈ R n . We took ω 1 = (1, 1) and ω 2 = (1, 1), the regularization parameters in ( 2 2 -IC/P) and ( 2 2 -MIC/P) were set to α 1 = 0.06 and α 2 = 0.2, while the tests were made on an Intel Core i7-3770 processor.
When measuring the quality of the restored images, we used the improvement in signal-to-noise ratio (ISNR), which is given by where x, b, and x k are the original, the observed noisy and the reconstructed image at iteration k ∈ N, respectively. In Figure 5 The double smoothing (DS) algorithm, as proposed in [12], is applied to the Fenchel dual problems of (5.5) and (5.6) by considering the acceleration strategies in [11]. One should notice that, since the smoothing parameters are constant, (DS) solves continuously differentiable approximations of (5.5) and (5.6) and does therefore not necessarily converge to the unique minimizers of (5.3) and (5.4). As a second smoothing algorithm, we considered the variable smoothing technique (VS) in [8], which successively reduces the smoothing parameter in each iteration and therefore solves the primal optimization problems as the iteration counter increases. We further considered the primal-dual hybrid gradient method (PDHG) as discussed in [26], which is nothing else than the primal-dual method in [15]. Finally, the alternating direction method of multipliers (ADMM) was applied to (5.5), as it was also done in [26]. Here, one makes use of the Moore-Penrose inverse of a special linear bounded operator which can be implemented in this setting efficiently, since D T 1 D 1 and D T 2 D 2 can be diagonalized by the discrete cosine transform. The problem which arises in (5.6), however, is far more difficult to be solved with this method (and was therefore not implemented), since the linear bounded operator assumed to be inverted has a more complicated structure. This reveals a typical drawback of ADMM given by the fact that this method does not provide a full splitting, like primal-dual or smoothing algorithms do.
As it follows from the comparisons shown in Figure 5.2, the FBF method suffers because of its additional forward step. However, many time-intensive steps in this algorithm could have been executed in parallel, which would lead to a significant decrease of the execution time. On the other hand, the FB method performs fast and stable in both examples, while optical differences in the reconstructions for ( 2 2 -IC/P) and ( 2 2 -MIC/P) are not observable.