Analysis non-sparse recovery for relaxed ALASSO

This paper considers recovery of signals that are corrupted with noise. We focus on a novel model which is called relaxed ALASSO (RALASSO) model introduced by Z. Tan et al. (2014). Compared to the well-known ALASSO, RALASSO can be solved better in practice. Z. Tan et al. (2014) used the \begin{document}$ D $\end{document} -RIP to characterize the sparse or approximately sparse solutions for RALASSO when the \begin{document}$ D $\end{document} -RIP constant \begin{document}$ \delta_{2k} , where the solution is sparse or approximately sparse in terms of a tight frame \begin{document}$ D $\end{document} . However, their estimate of error bound for solution heavily depends on the term \begin{document}$ \Vert D^*D\Vert_{1, 1} $\end{document} . Besides, compared to other works on signals recovering from ALASSO, the condition \begin{document}$ \delta_{2k} is even stronger. Based on the RALASSO model, we use new methods to get a better estimate of error bound and give a weaker sufficient condition in this article for the inadequacies of the results by Z. Tan et al. (2014). One of the result of this paper is to use another method called the robust \begin{document}$ \ell_2 $\end{document} \begin{document}$ D $\end{document} -Null Space Property to obtain the sparse or non-sparse solution of RALASSO and give the error estimation of RALASSO, where we eliminate the term \begin{document}$ \Vert D^*D\Vert_{1, 1} $\end{document} in the constants. Another result of the paper is to utilize the \begin{document}$ D $\end{document} -RIP to obtain a new condition \begin{document}$ \delta_{2k} which is weaker than the condition \begin{document}$ \delta_{2k} . To some extent, RALASSO is equivalent to ALASSO and the condition is also weaker than the similar one \begin{document}$ \delta_{3k} by J. Lin, and S. Li (2014) and \begin{document}$ \delta_{2k} by Y. Xia, and S. Li (2016).


1.
Introduction. For the last fifteen years, the recovery of sparse signals has been a very active area of recent research in mathematics, engineering and computer science [7,8,11] and triggered an enormous amount of research activities in radar systems [16] and medical imaging [18].
In compressed sensing, the linear model is given by where A ∈ R m×n is the sensing matrix(m < n), b ∈ R m is a vector of noisy observations and w ∈ R m is a noise vector. The purpose is to reconstruct the unknown signal x based on A and b. Generally speaking, we have m < n and cannot solve the ill-posed inverse problem without any other assumptions. When x is a sparse vector, under advisable conditions on A, we can utilize the 0 -minimization to search for the sparsest vector as the solution of this problem. Based on the fact that handling the 0 -minimization is NP-hard [20], we turn to where D k (k = 1, ..., p) are the columns of D. In other words, if D is a tight frame, we have DD * = I.
When the matrix AD satisfies the restricted isometry property(RIP), the method can stably recover a sparse signal f where the frame D is required to have columns that are extremely uncorrelated [22]. But if the columns of D are highly correlated, AD may not satisfy the RIP generally [22,9]. It turns out to have good results along with the 1 -analysis model which finds the estimator x directly by solving an 1minimization problem [9,19], referred to as analysis basis pursuit(ABP). Combining the LASSO estimator, the recovery problem can be formulated as min x∈R n We call the model (1) as analysis LASSO(ALASSO) [12,25], which is equivalent to ABP [9,19]. To ensure stable recovery, we need the matrix A to satisfy the D-RIP [9]. Definition 1.2 (D-RIP). Let D be an n × p matrix. A matrix A is said to obey the restricted isometry property adapted to D of order k with constant δ if for all k-sparse vectors f ∈ R p , we have The D-RIP constant δ k is defined as the smallest δ such that (2) holds.
In this paper, we pay attention to a novel model: which we call relaxed ALASSO(RALASSO) [23]. As the name implies, RALASSO is similar to ALASSO. If we choose ρ → ∞, then the third item implies z = D * x and the problem is equivalent to ALASSO. But in general, RALASSO can be solved better in practice. So far, ALASSO can be solved by two common methods: interior point methods [4] and alternating direction method of multipliers(ADMM) [5,1]. Interior point methods work efficiently in the low-dimension case, but when the dimension grows, the rate of convergence of these techniques becomes very slow because they need solutions of linear systems. And the efficiency of ADMM relies highly on nice structure of the matrix A. Based on the knowledge of optimization, some scholars turn to find fast algorithms to deal with high dimensional data when the matrix A has any structure not just nice structure.
The fast iterative shrinkage thresholding algorithm (FISTA) [2] or a monotone version of FISTA (MFISTA) [3] are fast versions of first-order algorithms which are extensively used in dealing with large dimensional data and do not need A to have nice structure. But it is obvious that these fast versions of first-order algorithms are hard to be directly applied to ALASSO because of the non-smooth term D * x 1 .
Given a closed proper convex function f : R n → R ∪ {∞}, the proximal operator of f is defined as The proximal operator is commonly used to solve the following problem: is a closed proper convex function. The core of FISTA or MFISTA is still the proximal gradient method whose updates are Here, L is the upper bound of Lipschitz constant of gradient ∇F . Thus, directly applying MFISTA to ALASSO requires computing the proximal operator of λ D * x 1 . However, considering RALASSO, the idea is to introduce an auxiliary variable z so that we can avoid computing the proximal operator of λ D * x 1 . Let F (x, z) = 1 2 Ax−b 2 2 + 1 2 ρ z−D * x 2 2 , G(z) = λ z 1 and we only need to compute the proximal operator of λ z 1 . It is obvious that computing the proximal operator of λ D * x 1 is very difficult while it is easy to know the proximal operator of λ z 1 . This is the reason why a decomposition-based MFISTA(DFISTA) is proposed to solve this analysis sparse recovery problem in [23].
In this paper, based on the RALASSO model, we use new methods to get a better estimate of error bound and give a weaker sufficient condition for the inadequacies of the results in [23]. One of the result of this paper is to use another method called the robust 2 D-Null Space Property to obtain the sparse or non-sparse solution of RALASSO and give the error estimation of RALASSO, where we eliminate the term D * D 1,1 in the constants. Another result of the paper is to utilize the D-RIP to obtain a new condition δ 2k < 0.3162 which is weaker than the condition δ 2k < 0.1907. It is easy to see that when ρ → ∞, RALASSO is equivalent to ALASSO and the condition is also weaker than the similar one δ 3k < 0.25 in [17] and δ 2k < 0.25 in [26].

2.
Notations and organization. The following notations are used throughout this paper. We use capital italic bold letters to represent matrices and lower-case italic bold letters to represent vectors. Let D * be the conjugate matrix of D. The set of indices of the nonzero entries of a vector x is called the support of x and denoted as supp(x). Denote T ⊂ {1, 2, ..., n} to be an index set and for a given matrix D ∈ R m×n , T c is the complement of T , D T is the submatrix of D formed from the columns of D indexed by T , while setting all other columns to zero. Thus D * T is denoted by the matrix that maintains the rows of D * indexed by T . And we use x i to represent the i-th element of x and A i (A j ) to represent the i-th row(column) of A. For a vector x, the 1 , 2 norms are denoted by x 1 , For a matrix A, A 2 denotes the spectral norm of A and A p,q denotes the norm of A from p to q , i.e., A p,q = max x Ax q x p . The paper is organized as follows. In Section 3, we give the main results and their Gaussian noise forms. Finally, in Section 4, we give the main lemmas used in the proofs and prove the main theorems.
3. Main results. In [23], the authors used the D-RIP to obtain the solution of RALASSO when δ 2k < 0.1907. However, their estimate of error bound for solution heavily depends on 1 -norm for this tight frame. Therefore, the error bound estimate for the solution of RALASSO is not optimal. Then we use another method called as the robust 2 D-Null Space Property to obtain the sparse or non-sparse solution of RALASSO to eliminate the term D * D 1,1 . [15]). Let D be an n × p matrix. A matrix A ∈ R m×n is said to obey the robust 2 D-Null Space Property of order k with constants β > 0 and 0 < γ < 1 2 , if for any x ∈ R n , we have (3) where the matrix A is required to satisfy the Theorem 3.2. Let D ∈ R n×p be a tight frame, and let A ∈ R m×n be a measurement matrix satisfying the 2 -DNSP of order k with constants (β, γ). Let y = Ax + w, where w is the noise that satisfies D * A * w ∞ ≤ λ 2 . Then the solutionx RAL to RALASSO satisfies where c 1 is a constant depending on the β and γ, c 2 ,c 3 are constants depending only on γ, and β > 0 and 0 < γ < 1 2 are constants depending on δ 2k . Compared to the Theorem IV.1 in [23], we eliminate the term D * D 1,1 in the constant c 1 . When the noise w is Gaussian, we also obtain the error estimation of RALASSO in the following theorem. Theorem 3.3. Let D ∈ R n×p be a tight frame, and let A ∈ R m×n be a measurement matrix satisfying the 2 -DNSP of order k with constants (β, γ). Assume that w ∼ N (0, σ 2 I) and thatx RAL is the solution of RALASSO with λ = 4σα √ log p is a fixed constant, the above conclusion still holds.
Since the 2 -DNSP is a relaxation of the D-RIP, we can build some connection between the 2 -DNSP and the D-RIP. Here we give a proposition from [15].
where σ min and σ max are the smallest and largest singular values of D, then the matrix A satisfies the 2 -DNSP of order k with constants β = 2 Based on the Proposition 1, we can know that if σ min = σ max , then δ 2k < 1 9 is the largest bound to make the matrix A satisfy the 2 -DNSP and it is obvious that the condition is very strong. We also utilize the D-RIP to solve RALASSO and obtain a new bound δ 2k < 0.3162 which is stated in Theorem 3. Compared to the condition δ 2k < 0.1907 in [23], our result is better.
Theorem 3.4. Let D ∈ R n×p be a tight frame, and let A ∈ R m×n be a measurement matrix satisfying the D-RIP with δ 2k < 0.3162. Consider the measurement y = Ax + w, where w is the noise that satisfies D * A * w ∞ ≤ λ 2 . Then the solutioñ x RAL to RALASSO satisfies where c 4 is a constant depending on the D-RIP constant δ 2k and D * D 1,1 , and c 5 ,c 6 are constants depending only on the D-RIP constant δ 2k .
Similarly, we provide the error estimation of RALASSO in the following theorem when the noise w is Gaussian and A satisfies the D-RIP.
Theorem 3.5. Let D ∈ R n×p be a tight frame, and let A ∈ R m×n be a measurement matrix satisfying the D-RIP with δ 2k < 0.3162. Assume that w ∼ N (0, σ 2 I) and thatx RAL is the solution of RALASSO with λ = 4σ √ 2 log p. Then we have Remark 2. Similar to Remark 1, when λ ≥ 4σ √ 2 log p is a fixed constant, the above conclusion still holds.
When choose ρ → ∞ in Theorem 3, then the third item implies z = D * x and the problem is equivalent to ALASSO. And the result can be generalized directly to ALASSO. In [17], the similar bound is δ 3k < 0.25 which is equivalent to δ 2k < 0.0833 by using Corollary 3.4 in [21]. Obviously, our result is better. The condition δ 2k < 0.3162 is also weaker than δ 2k < 0.25 which is stated in the Remark 11 of [26] for ALASSO.

4.
Proofs. Let T be the set of k largest coefficients of D * x in magnitude. We decompose index set T c into sets of size k. Denote those sets T 1 , T 2 , ... where T 1 denotes the k largest coefficients of D * T c x in magnitude, T 2 denotes the next k largest coefficients and so on. For simplify, we denote h =x RAL − x wherex RAL is the optimal solution of RALASSO.

Supporting lemmas.
Here we give some useful supporting lemmas that can be used when proofing the main theorems: Lemma 4.3. Let D ∈ R n×p be a tight frame, for any x ∈ R p , we have Lemma 4.4. The optimal solutionx RAL for RALASSO satisfies Proof. From the definition of T j and some properties of norms, the following inequality holds, Based on Lemma 4, we have And together with (4), we have Thus, applying (6), we obtain So the error estimation is further translated to estimate the bound of D * T h 2 . Applying the 2 -DNSP, we have The inequality (8) holds because of (5), and (9) utilizes the Cauchy-Schwarz Inequality. Thus, Based on Lemma 4 again, we have Therefore Applying (10) and (11) to (7), we derive which finishes the proof.

4.2.2.
A proof of Theorem 3. Before giving the proof of Theorem 2, we firstly complete the proof of Theorem 3. The proof mainly follows from the ideas in [6] and [27]. But we still adopt some tricks to get over the obstacles in the proof.
Proof. Based on Lemma 4, we have where u i ∈ R n is (k − m)-sparse, and Denote c, µ are determined constants. And N j=1 (λ j β j ) − cβ i − µD * h is also 2k-sparse. Based on Lemma 2, we set c = 1 2 and get Depending on Lemma 5 and m < k, we can get Since -sparse vectors, we utilize the definition of D-RIP and add (16) to (15) to obtain By the equations (13) and The second inequality holds because of Lemma 3 when ( (14), we have Then the inequality (25) can be translated into which is a second-order inequality for X. In order to get the higher bounds of X from the inequality (26), we denote that the second-order item coefficient is negative. Then, into (26), we obtain By solving this inequality, we get Finally, applying (7) together with (27), we have which finishes the proof.

4.2.3.
Proofs of Theorem 2 and Theorem 4. We firstly give the proof of Theorem 2.
Proof. Based on the result of Theorem 1, we need only to compute the probability . Let z i = AD i ,w σ AD i 2 , then z i has Gaussian distribution N (0, 1). By using the union bound and α = max i∈[p] AD i 2 , we have The last inequality follows from the Gaussian tail probability bound. Thus, the probability P( D * A * w ∞ ≤ λ 2 ) ≥ 1 − 1 √ 2π log pp , which finishes the proof. Theorem 4 can be obtained similarly, but the difference is the bound of AD i 2 . From the definition of D-RIP, we have Repeat the steps in the proof of Theorem 2, we can derive the result. Proof. Firstly, we expand the right-hand side and left-hand side of the equation of Lemma 2: where (28) is the first item of the left-hand side and (29) is the second one. Therefore, we only need to prove (30) And after calculation, we get Thus, Then the proof is completed.

A proof of Lemma 3.
Proof. Based on the fact that I = D * D + D * ⊥ D ⊥ , where D * ⊥ is the complement of D * , we can get