ACCELERATED BREGMAN OPERATOR SPLITTING WITH BACKTRACKING

This paper develops two accelerated Bregman Operator Splitting (BOS) algorithms with backtracking for solving regularized large-scale linear inverse problems, where the regularization term may not be smooth. The first algorithm improves the rate of convergence for BOSVS [5] in terms of the smooth component in the objective function by incorporating Nesterov’s multi-step acceleration scheme under the assumption that the feasible set is bounded. The second algorithm is capable of dealing with the case where the feasible set is unbounded. Moreover, it allows more aggressive stepsize than that in the first scheme by properly selecting the penalty parameter and jointly updating the acceleration parameter and stepsize. Both algorithms exhibit better practical performance than BOSVS and AADMM [21], while preserve the same accelerated rate of convergence as that for AADMM. The numerical results on total-variation based image reconstruction problems indicate the effectiveness of the proposed algorithms.

1. Introduction.The main purpose of this paper is to develop accelerated Bregman Operator Splitting algorithms with backtracking for solving the following convex composite optimization problem (1.1) min where U ∈ C n is a closed convex set, A ∈ C m×n is a matrix, f ∈ C m is a vector, ψ : C d×n → is a proper convex and Lipschitz continuous function, B : C d×n×n → C d×n is a linear operator, and • is the 2-norm in Euclidean space.
The problem (1.1) has attracted much attention in recent years because of its applications in various disciplines, such as signal and image processing [1,2,4,22,27,28], compressive sensing [7,12,18], and machine learning [26,17,3].In those fields, the first term in (1.1) represents the data fidelity and the second term is a regularization term.For instance, the total variation based image reconstruction is often modeled by (1.1), where u = (u 1 , ..., u n ) T is the n-vector form of the image u, the matrix A depends on the physics of the data acquisition, f is the measurement, and is the discrete form of the total variation of u.In (1.2), (∇u) i is a discrete gradient (finite differences along the coordinate directions) of u at the i-th voxel, and α is a parameter, which balances the fidelity term and the regularization term.
There have been a large number of algorithms developed for solving problem (1.1)(see, e.g.[5,13,12,16,23,25,27,29,30,31,32]).Among them, the alternating direction method of multiplier (ADMM) [8,9] is one of the most extensively used techniques.By introducing an auxiliary variable w ∈ C d×n , the problem (1.1) is equivalent to the following constrained optimization problem: The ADMM minimizes (1.4) by iteratively updating (u, w, λ) as follows: (1.5) In many applications, the dimension of the matrix A could be very large and A might be also dense and ill-conditioned.In those cases, the high computational cost for solving the u-subproblem hindered the applicability of ADMM.Recently, there have been active researches on improving the theoretical and practical performance of ADMM.One type of efforts has been focused on developing backtracking strategies to search for larger stepsize with convergence guaranteed [4,5,13].The other effort is on improving the iteration complexity of the ADMM algorithm by incorporating Nesterov's multi-step acceleration schemes [15,6,10,19,11,20,21].To have more insight from those two types of efforts, below we will discuss a few closely related works.
In [29,30], the u−subproblem of (1.5) is solved by Bregman operator splitting (BOS) technique, i.e. linearized ADMM.It updates u k+1 of (1.5) by linearizing 1 2 Au − f 2 at u k and adding a proximal term, i.e. (1.6) It has been shown in [30] that the BOS algorithm converges to an optimal solution of (1.1) with the constant stepsize 1/η ≤ 1/ A T A .In [4], this stepsize is replaced by the Barzilai-Borwein (BB) stepsize, i.e. η = η BB k , where Their experimental results showed that the objective function value decreased much faster by taking η = η BB k than η = A T A 2 .However, the BB stepsize 1/η BB k ≥ 1/ A T A 2 violates the convergence condition of the BOS algorithm.Thus, the convergence of the algorithm is not guaranteed.The improvement was made in [5], where a scheme of BOS with variable stepsize (BOSVS) was developed to improve the performance of ADMM for solving (1.1) with guaranteed convergence.In this work a line search strategy was presented for searching for a better stepsize.The stepsize rule starts with a safeguarded BB stepsize and gradually increases the nominal stepsize η k until the termination condition is satisfied.With a good choice of parameters in the line search conditions, more aggressive stepsize is allowed, especially at the early iterates.The global convergence of the iterates of the BOSVS was established in [5].Later on, it was shown in [14] that the objective function at the average of the BOSVS iterates converge to an optimal value with the rate of O(1/k), where k is the number of iterations.
To improve the iteration complexity of ADMM, in [21] an accelerated ADMM (AADMM) was developed by incorporating Nesterov's fast gradient scheme [20].The AADMM solves a class of convex composite optimization problems with linear equality constraints, which includes the problem (1.3) as a special case.AADMM improves the rate of convergence of ADMM (or BOS) in terms of the smooth component in the objective function from O(1/k) to O(1/k 2 ).The accelerated rate of convergence is achieved by the aggregated iterates rather than the average of the iterates in [14].To improve the practical performance of AADMM, a simple backtracking strategy is incorporated in the algorithm.The idea of the backtracking technique is to search for an underestimated Lipschitz constants L k at the iteration k in consideration of the dependence of the stepsize on L k .The backtracking procedure in [21] starts with a relatively smaller L k , and properly selects involved parameters to solve u k+1 .If an employed line search condition is violated, then L k is doubled and used as the new L k to solve u k+1 .This procedure is repeated until the utilized line search condition holds.
Motivated by the aforementioned work, we propose two accelerated Bregman Operator Splitting (BOS) schemes with backtracking for solving the problem (1.1).The first proposed algorithm improves the convergence rate of BOSVS in terms of the smooth component in the objective function by incorporating Nesterov's multi-step acceleration scheme [20] under the assumption that the feasible set is bounded.The second one can deal with the situation when the feasible set is unbounded.By jointly computing the acceleration parameter and stepsize, the monotonicity condition on the nominal stepsize required in BOSVS [5] and the first proposed algorithm can be removed in the second algorithm.Combining with the good choices of the penalty parameters ρ for updating u, w and λ in (1.5) ( possibly different ρ's in u, w, λ subproblems ), more aggressive stepsize is allowed in the second proposed algorithm.Instead of searching the local Lipschitz constant to satisfy the conservative line search condition in AADMM [21], the proposed algorithms utilize the product of the acceleration parameter and a safeguarded Barzilai-Borwen (BB) choice as the initial stepsize, then gradually increase it until a more relaxed line search conditions than that in [21] is satisfied.The proposed algorithms are capable of hunting for more aggressive stepsize via conducting fewer number of line searches.Meanwhile, the proposed algorithms preserve the same accelerated rate of convergence as that for AADMM.
In [21], a center piece of the theoretical analysis is for the convergence rate of AADMM in terms of the dependence on the Lipschitz constant of the smooth objective.A simple backtracking scheme is proposed in [21] to demonstrate that prior knowledge of the Lipschitz constant is not a definitive requirement.In this work most of the convergence analyses is devoted to maintaining the accelerated convergence results under aggressive stepsize strategy, which was not touched in either [5] or [21].The problem (1.1) is a special case of the problem of interest in [21] which is called the unconstrained composite optimization (UCO) problem there.The backtracking technique in [21] is designed to solve UCO with bounded primal and dual feasible sets.In this manuscript, the proposed ABOSVS II algorithm is able to solve (1.1) with an unbounded primal feasible set without any feasibility residue, which was not discussed in [21].It seems that the technical details for studying (1.1) with unbounded primal feasible set is nontrivial comparing with the analysis in [21].Moreover, in this work we use a different termination criterion from the one for AADMM in [21] to obtain the same accelerated rate of convergence.In [21], the convergence analysis is based on the estimation of the duality gap function for its corresponding saddle point problem.In this work, the termination criterion is based on the error between the objective function values at the aggregated iterates and the optimal solution.Consequently, we are able to conduct a different proof of the accelerated rate of convergence when the feasible set is unbouned by observing the relationship between the Lipschitz continuity of the function ψ in (1.1) and the boundedness of its subgradients.We believe that the strategy in the proof of the case with unbounded feasible sets provides a relatively simpler alternative of the proof of accelerated convergence results in [21].Our experimental results show that the proposed algorithms outperform several state-of-the-art algorithms on total variation based image reconstruction problems.

Outline of the paper
Our paper is organized as follows.Section 2 presents the proposed algorithms, namely Accelerated BOSVS-I (ABOSVS-I) and Accelerated BOSVS-II (ABSOVS-II), for solving the type of problems (1.1).Section 3 studies the convergence analysis for the proposed algorithms.Section 4 is devoted to numerical experiments and comparisons with state-of-the-art algorithms on total-variation based image reconstruction problems.The last section draws the conclusion for this paper.

Notation and terminologies
The Euclidean inner product of two column vectors x, y ∈ C n is denoted by x, y = x T y, where the superscript T denotes the conjugate transpose.Assume u * is an optimal solution of (1.1).Define 2. Proposed algorithms.In this section, we present the frameworks of the ABOSVS-I and ABOSVS-II.The first algorithm incorporates Nesterov's multi-step acceleration scheme to improve the rate of convergence of BOSVS under the assumption that the feasible set is bounded.While the second one is capable of dealing with the case where the feasible set is unbounded.In the second algorithm the acceleration parameter and the stepsize are updated jointly and the penalty parameters ρ in (1.5) are chosen differently for updating u−, w− and λ.For convenience, We use ρ u i , ρ w i , and ρ λ i to replace ρ in those three subproblems, respectively.For ABOSVS-I, with assumption of bounded feasible set, ρ can be chosen to be the same constant in (1.5) without affecting the convergence rate, i.e. (2.1) In the proposed algorithms, the initial choice of the nominal step η i is a safeguarded BB choice: where ).Now we present the scheme of ABOSVS-I in Algorithm 1 under the assumption that the feasible set U is bounded.
From ABOSVS-I, we can see that if α i ≡ 1, then u md i = u i and the aggregate points u ag i+1 , w ag i+1 , λ ag i+1 are exactly the iterates u i+1 , w i+1 , and λ i+1 , respectively.In this case, ABOSVS-I becomes BOSVS with a minor modification.For ABSOVS-I, the derivation of the accelerated convergence rate replies on the asymptotic monotonicity of η i /α i for i = 1, 2, .... Whenever η i /α i is not monotone decreasing, η min is increased by a factor τ > 1 in step 3 for i = 1, 2, .... Hence, if the monotonicity of η i /α i violates continuously, then η i /α i will approach a constant, which is usually smaller than A T A 2 , after a finite number of iterations.Moreover, it should be note that from where β i and the condition on Q i can be chosen jointly as long as that ABOSVS-II can deal with the cases where the feasible set is either bounded or unbounded.When the feasible set is unbounded, the penalty parameters ρ u i , ρ w i , and ρ λ i have to be chosen differently from those for bounded feasible set.More precisely, if the feasible set U is unbounded, we choose (2.4) where K is the total number of iterations.If the feasible set U is bounded, we choose (2.5) in ABOSVS-II by jointly updating the acceleration parameter and the stepsize.The scheme of ABOSVS-II is presented in Algorithm 2.
Two remarks are in place.First, step 3 from Algorithm 1 is not required anymore in Algorithm 2. Second, in Algorithm 2, step 2 actually can be written as Obviously, it also satisfies the condition we impose on 3).Additionally, in (2.6), we can also set Inverse Problems and Imaging Volume 11, No. 6 (2017), 1047-1070 Algorithm 2 Accelerated BOSVS II (ABOSVS-II)
3. Convergence analysis.In this section, we focus on proving the convergence properties of the proposed algorithms.We start with two lemmas describing properties regarding the line search scheme in Algorithms 1 and 2. Lemma 1 below shows that the safeguard stepsize threshold η min in step 3 of ABOSVS-I will stop increasing after a finite number of iteration.
Lemma 3.1.The replacement of η min by τ η min in step 3 of ABOSVS-I can occur in at most a finite number of iterations, denoted by N 0 .
Lemma 3.2.In steps 2 and 3 of Algorithm 1 and step 2 of Algorithm 2 , the number of line search, denoted by l, is less than or equal to log σ ( ||A T A|| ηmin ) , where x is the smallest integer greater than or equal to x for any x ∈ .
Proof.The proof for Algorithm 1 can be obtained by a a similar argument, we only give the proof for Algorithm 2. By step 2 of Algorithm 2 (see also the remark on (2.6)), the line search stops when and the definition of Γ j in Algorithm 2, the condition After rearranging terms, the above relation can be reformulated to or the equivalent form Noting the two negative terms at the right hand side of the above line search stopping criterion, we observe that it is satisfied when . By the definition of η i , such condition is satisfied after l rounds of line search, as long as The following lemma plays a primary role for the convergence analysis of the proposed Algorithms.Since we have ρ w i = ρ u i for both proposed algorithms, in the following proof ρ w i is replaced by ρ u i for the purpose of a unified analysis.Throughout this section, we use notations u e i = u i − u and w e i = w i − w for i ≥ 1.For convenience, we denote 1  2 Au − f 2 by H(u).
Lemma 3.3.For all u ∈ U and all w ∈ C d×n , the iterates {(u ag i , w ag i )} i≥1 generated by Algorithms 1 and 2 satisfy Proof.Since H is differentiable, we have Applying the definition of H(u) and u ag i+1 to the above equation, and observing the relationship Here by the convexity of H(u) and (3.3), we have By (3.4) and the convexity of ψ, we can calculate the following difference.
where s i+1 ∈ ∂ψ(w i+1 ).On the other hand, by the first-order optimality conditions for the sequence (u i+1 , w i+1 , λ i+1 ) generated by Algorithm 1 and 2, for all u ∈ U and w ∈ C d×n , we have and the definition of u e i+1 , w e i+1 , the above equation can be rewritten as ). Substituting (3.7) to (3.5), we have .
To give a further estimation of (3.8), next we focus on estimating terms (I)-(IV).
The following lemma presents an important property of the out (u ag k+1 , w ag k+1 ) generated by Algorithm 1.
Lemma 3.4.Suppose that the parameters ρ u i , ρ w i , and ρ λ i in Algorithm 1 satisfy (2.1), the output (u ag k+1 , w ag k+1 ) generated by Algorithm 1 satisfies Proof.Dividing both sides of (3.1) by α 2 i , using (2.1) and step 1 in Algorithm 1, we have (3.10) Setting u = u * and w = w * , by the relationship 1 , and α 1 = 1, we have (3.9) after summing (3.11) from i = 1 to k.Now we are ready to prove the accelerated convergence rate of the ABOSVS-I algorithm when the feasible sets U and V := dom ψ * are compact, where ψ * is the convex conjugate of ψ(•).
Proof.If η min ≥ A T A , by the definition of (2.2), η 0,i ≡ η min , which yields Γ i > 0. Thus, there is no line search needed based on the backtracking strategy in steps 1 and 2. We have to exclude this case by setting η 0,1 < A T A , which can be easily satisfied.
Next we show several important properties of the sequence {α i } for i ≥ 1.The update of α i+1 in step 4 of Algorithm 1 is actually equivalent to solving α i+1 from , based on which, we have Summing (3.13) and (3.14) from i = 1 to k and by α 1 = 1, easily we can derive In the following proof, we estimate the terms on the right hand side of (3.9).In view of lemma 3.1, there is N 0 such that ηi+1 αi+1 ≤ ηi αi for any i ≥ N 0 + 1.Then the first term of the right hand side of (3.9) can be estimated by (3.16) where By the second inequality of (3.15), α 1 = 1, and (3.17), we have Also, By (3.16), (3.18), and (3.19), we have (3.20) where , which is a finite nonnegative number.To obtain (3.12), we also need the following estimation: where the first equality was obtained by summing the sequences of {u ag i+1 − Bu ag i+1 } for i ≥ 1 and using the same technique in Lemma 3.4 and the last inequality was derived by using the same process as that in (3.19).
By the convexity of ψ(•), for ∀λ ∈ V , we have Next we focus on analyzing the accelerated convergence rate of ABOSVS-II algorithm.First we need to establish an important lemma similarly as lemma 3.4 before giving the accelerated convergence rate of ABOSVS-II.Lemma 3.6.Suppose that the parameters ρ u i , ρ w i , and ρ λ i in Algorithm 2 satisfy (2.4) or (2.5), the output (u ag k+1 , w ag k+1 ) generated by Algorithm 2 satisfy Proof.Dividing by α k η k from both sides of (3.1), we have Setting u = u * and w = w * , by the relationship 1 αiηi = 1−αi+1 αi+1ηi+1 , and α 1 = 1, we get (3.23) after summing (3.24) from i = 1 to k.
Now we analyze the accelerated convergence rate of the ABSOVS-II algorithm when the feasible sets U is unbounded.In this case, the total number of iterations has to be fixed in advance and chosen based on a worst-case complexity analysis.Additionally, since ψ is Lipschitz continuous , for all w ∈ C d×n , we have ξ ≤ ∆, for ∀ξ ∈ ∂ψ(w).

Inverse Problems and Imaging
Volume 11, No. 6 (2017), 1047-1070 Theorem 3.7.Suppose that the parameters ρ u i , ρ w i , and ρ λ i in Algorithm 2 satisfy (2.4), then the output (u ag k+1 , w ag k+1 ) generated by Algorithm 2 satisfies Proof.Since η min ≤ η 0,1 , we have η min ≤ A T A .Then, by definition of η 0,i , we get Denote σ l η 0,i by η i , where l is the number of line searches in step 3 of ABOSVS-II.By 1 αiηi = 1−αi+1 αi+1ηi+1 and the definition of η i , we have (3.28) 1 Then, by induction we can get, with where we used (3.27) and the definition of η i .This estimate is crucial for obtaining the accelerated rate of convergence.Also, since (3.30) 1 by induction again, we obtain Next we estimate the terms on the right hand side of (3.23).By the definition of ρ u i and ρ λ i , and (3.31), it is clear that Also, using the definition of ρ u i and ρ λ i and the non-increasing property of the sequences { and 2η min .
Next we focus on estimating w ag k+1 − Bu ag k+1 in order to obtain (3.26).Similarly as (3.21), for ∀λ ∈ C d×n , we can obtain (3.37) Thus, by 1 By setting k = K − 1, we have The only left work is to bound the term λ k+1 .By the optimality condition of w-subproblem in step 3 of Algorithm 2, we have By the convexity of ψ(•), we have , where ξ ∈ ∂ψ(Bu ag K ).Then, by (3.36), (3.40), (3.41), we have (3.26).The following theorem gives the accelerated convergence rate of the ABOSVS-II algorithm when the feasible sets U is compact.Since its proof is similar with that for Theorem 3.7, we just present the convergence rate result without proof.Theorem 3.8.Suppose that the parameters ρ u i , ρ w i , and ρ λ i in Algorithm 2 satisfy (2.5), then the output (u ag k+1 , w ag k+1 ) generated by Algorithm 2 satisfies η min (k + 1) .
4. Numerical results.In this section, we conduct several experiments on synthetic data and the data from partially parallel imaging (PPI) to examine the performance of the proposed algorithms.We also compare them with several stateof-the-art algorithms.All the algorithms are implemented in MATLAB, R2015a on a computer with a 2.6GHz Intel i5 processor.
4.1.Total-variation based image reconstruction.In this subsection, we present the numerical results on solving the following TV based image reconstruction problem: where U := {u ∈ n : l * ≤ u (i) ≤ u * for all i = 1, ..., n} or n , the entries of A ∈ m×n are randomly generated from a normal distribution N (0, 1) or a Bernoulli distribution that takes equal probability for the value 1 and −1, respectively.The measurements f are generated by f = Au true + ε, where u true is the true image, ε is the Gaussian noise with distribution N (0, σ).We apply ABSOVS-I, ABOSVS-II, BOSVS1 , TVAL32 and ALADMML3 to solve (4.1).We consider two instances of this problem.In the first instance, we set U := {u ∈ n : 0 ≤ u (i) ≤ 1 for all i = 1, ..., n}.The dimension of A is 22500 × 22500.The  , are presented in Figure 1 and 2, respectively.In the first instance, we can see that the proposed algorithms perform much better than BOSVS, especially ABOSVS-II.Also, it is evident that ABOSVS-I and ABOSVS-II outperform TVAL3 and ALADMML in this case.In the second instance, ABOSVS-II and ALADMML achieve better performance than BOSVS, and it is also evident that ABOSVS-II outperforms TVAL3 and ALADMML.
where F p is the undersampled Fourier transform defined by F p := P F, and F is the Fourier transform, P is a binary matrix representing the undersampling pattern, s l is the sensitivity map for the l-th channel, and f l is the measurement.The symbol is the Hadamard product between two vectors.For notation simplicity, let where S l := diag(s l ) ∈ C n×n is the diagonal matrix with s l ∈ C n on the diagonal, l = 1, 2, ..., L. Then the above optimization problem can be rewritten as which clearly can be solved by the proposed algorithm.Figure 3 and 6 show the sensitivity maps of data1 and data2, respectively.2.
For data1 and data2, the performance of BOSVS, TVAl3, ALADMML, and ABOSVS-II is shown in Table 3 and 4 in terms of the objective function value, relative error, and CPU time in seconds.From Table 3 and 4, we can see that ABOSVS-II, ALADMML converge much faster than BOSVS, and the performance of TVAL3 is between ALADMML and BOSVS.ABOSVS-II has the best performance among them.The reconstructed images by those three algorithms and the differences between the reconstructed images and the true images are shown in Figure 5 and 8 for comparison.Clearly, the images recovered by ABOSVS-II, ALADMML, and TVAL3 have much sharper resolution than that by BOSVS, and the image recovered by ABOSVS-II is slightly better than that by ALADMML and better than that by TVAL3.

Figure 1 .
Figure 1.The objective function values and relative error vs. CPU time for first instance

InverseFigure 2 .Figure 3 .
Figure 2. The objective function values and relative error vs. CPU time for second instance

Figure 4 and 7 √ 2 + ε im l / √ − 2 )
show the true images and sampling pattern corresponding to data1 and data2, respectively.The Cartesian mask only samples 18% of the Fourier coefficients.In this experiment, the measurements {f l } are generated byf l = P (F S l u true + ε re l / , for l = 1, ..., L,where ε re l and ε im l are the noise with entries independently generated from distribution N (0, 10 −4 √ nI n ).The size of data1 is n = 256 × 256 and the size of data2 is n = 512 × 512.The parameter settings for ABOSVS-II are given in Table

Figure 8 .
Figure 8.Comparison of 400 iterations of BOSVS, TVAL3, AL-ADMML, and ABOSVS-II in partially parallel image reconstruction (Top) and their differences with true image (Bottom) using data2.From left to right: BOSVS, TVAL3, ALADMML, and ABOSVS-II. any 3.3) Inverse Problems and ImagingVolume 11, No. 6 (2017), 1047-1070 U is a compact set, clearly C κ is a finite nonnegative number.By the optimality condition of u-subproblem in step 4, we have since

Table 1 .
[4]]meter settings for ABOSVS-I and ABOSVS-II in 4.1.Note that ρ and τ are only used in the first instance true is a 150 × 150 Shepp-Logan phantom[24]generated by MATLAB with intensities in [0, 1].Moreover, we set the standard derivation σ = 10 −2 .We run 100 iterations to compare the performance of ABOSVS-I, ABOSVS-II, BOSVS, TVAL3 and ALADMML.In the second instance, we have U := n .The dimension of A is 11250 × 22500.utrue is a brain image[4]of dimension 150 × 150.The σ is the same as that in first instance and 200 iterations are executed in this case.The parameters setting for both instances is provided in Table1.The decreasing of the objective function and the relative error of the reconstruction ũ, defined by

Table 2 .
Parameter settings for ABOSVS-II in 4.2

Table 3 .
Comparison of objective function value, relative error, and CPU time in seconds using data1

Table 4 .
Comparison of objective function value, relative error, and CPU time in seconds using data2