Recovery of seismic wavefields by an l q -norm constrained regularization method

Reconstruction of the seismic wavefield from sub-sampled data is an important problem in seismic image processing, this is partly due to limitations of the observations which usually yield incomplete data. In essence, this is an ill-posed inverse problem. To solve the ill-posed problem, different kinds of regularization technique can be applied. In this paper, we consider a novel regularization model, called the \begin{document}$l_2$\end{document} - \begin{document}$l_{q}$\end{document} minimization model, to recover the original geophysical data from the sub-sampled data. Based on the lower bound of the local minimizers of the \begin{document}$l_2$\end{document} - \begin{document}$l_{q}$\end{document} minimization model, a fast convergent iterative algorithm is developed to solve the minimization problem. Numerical results on random signals, synthetic and field seismic data demonstrate that the proposed approach is very robust in solving the ill-posed restoration problem and can greatly improve the quality of wavefield recovery.


1.
Introduction. In seismology, due to limitations of the observations, the observed data is usually incomplete, e.g., some traces are lost, or random sampling (insufficient sampling) to save cost of seismic acquisition. In that situation, a key obstacle is how to invert the model using only incomplete, sub-sampled data [14,18,25]. Recovery of the original wavefield from incomplete observed data is generally an ill-posed problem. To solve the problem, two main issues are how to establish a proper mathematical model and how to solve the minimization model.
To successfully recover a signal without error, according to Nyquist-Shannon sampling theorem, the signal acquisition systems require that the sampling rate is twice the maximum frequency. This sampling theorem is hard to satisfy in practice.
As an alternative, compressive sensing (CS) has recently received a lot of attention in the signal and image processing community. Instead of relying on the bandwidth of the signal, the CS uses the basic assumption: sparsity. That means using sparse assumption-promoting techniques to produce "compressed modes" that are sparse and localized in space, and then solving L q -norm regularized variational equations in mathematics and physics. Therefore, the (approximately) sparse signals can be recovered efficiently from incomplete information. The sparsity can lead to efficient estimations and compression of the signal via a linear transform, e.g., sine, cosine, wavelet, curvelet and beamlet transforms [41,15,21]. The method involves taking a relatively small number of non-traditional samples in the form of projections of the signal onto random basis elements or random vectors [8,1,2]. Therefore, if the signal has a sparse representation on some basis, it is possible to reconstruct the signal using few linear measurements. Nevertheless this is a special case of Tikhonov regularization with a priori knowledge, the relationship between the compressive sensing and the Tikhonov regularization has been addressed recently [36].
To address the seismic recovery problem, let us start with a signal x in Ndimensional space. Suppose that we have M observation data d i = A i x, i = 1, 2, · · · , M , where A i for each i is a row vector, which represents the impulse response of the i-th sensor. The product of A i with x yields the i-th component of data d. Let the matrix A be composed of M row vectors of A i (i = 1, 2, · · · ), and we set A = [A T 1 , A T 2 , · · · , A T M ] T , where A T i stands for the transpose of the row vector A i , then the observation data can be reformulated as d = Ax. The aim of the compressive sensing is to use limited observations d i (i = 1, 2, · · · , M ) with M N to restore the input signal x [36]. There are many ways to choose an orthogonal transform matrix based on some orthogonal bases, e.g., sine curve, wavelet, curvelet and framelet, and so forth [15]. As we are mainly concerned with new methods to solve the linear system d = Ax in this paper, we choose a simple wavelet orthogonal basis to form the transform matrix Ψ. The wavelet function we used in this paper is the Haar wavelet [19], which forms an orthonormal system for the space of square-integrable functions on the unit interval. Suppose x is the original wavefield which can be spanned by a series of orthogonal bases Ψ i (t). These bases for all i constitute an orthogonal transform matrix Ψ such that (1) x where m i = (x, Ψ i ). Using operator expression, m = Ψ * x, where Ψ * denotes the adjoint for the operator Ψ. The vector m is thought of as the sparse or compressive expression of the signal x. Letting L = AΨ, where A is the impulse signal response as mentioned before, the reconstruction problem of the sparse signal m reduces to solving a simple problem d = Lm. If we regard m i as the weight or coefficient of linear combinations for the signal x, the reconstruction of the signal x becomes to find the coefficient vector m. Let us begin with a compact problem where L is a measurement matrix of finite rank which maps m from parameter space into observation space. One may readily see that the problem (2) can be regarded as a data recovery problem: m is the model, i.e., the representation of x in the transformed domain and L is the generating matrix that yields the acquired data d. Problem (2) is usually ill-posed due to the fact that existence, uniqueness and stability of the solution may be violated. Conventional methods for solving such an ill-posed problem are Tikhonov's regularization [27,37].
In establishing the regularization model, there are different kinds of ways: smoothing or nonsmoothing. A norm · usually promotes the assumed structure of the solution. Sparsity is a special structure of the solution. We say that the solution m is sparse, when most of the elements are zero. A natural model to satisfy the sparse solutions of the linear system Lm = d (here we assume that the data d is noiseless) is the equality constrained minimization model with l 0 norm. Though the "l 0 counting norm" yields the sparsest solution, it is well known that the minimization of m l0 is an N P -Hard problem [22]. The minimization model based on l 1 norm approximates the minimization model based on l 0 norm quite well, while the sparsity is retained [3,11,10]. However, the l 1 -norm constrained regularization may yield inconsistent selections when applied to variable selection in some situations, e.g., conflict of optimal prediction and consistent variable selection may occur for statistical learning [39]; furthermore, it usually introduces extra bias in parameter estimation in linear regression, where the bias contribution of the estimator can be bounded and cannot be neglected [20], and cannot restore a signal with the least measurements when applied to compressed sensing [4]. Hence, further modifications and improvements are necessary.
As a special case of Tikhonov regularization, we consider sparsity-constrained minimization model in this paper, the sparsity is enforced by an l q -norm (0 < q < 1) constraint to the solution. For a general l p -l q minimization model [33] (3) J α p,q [m] := In different cases of choosing values of p and q, several mathematical techniques have been developed: e.g., the iterative reweighted algorithm for recovering sparse vectors [17], the split Bregman iteration for L1-regularization [12], the basis pursuit denoising (BPDN) criterion using (orthogonal) matching pursuit method [6,28] and the least absolute shrinkage and selection operator (LASSO) [26] for l 1 -norm constrained minimization problems. Efficient optimization algorithms including conjugate gradient methods with preconditioning techniques [16] and gradient projection methods [7,31,11,10] can be applied. For the BPDN problem with δ the upper bound of the l 2 norm of the misfit between simulated data to the observation, a particular method called the interior point (IP) solution method can be employed [34]. However the IP solutions may be physically meaningless for some geophysical problems [32], therefore this method must be carefully used.
Since geophysical inverse problems are usually in large scale, hence, fast solving methods are most welcome. Therefore, in solving the l q -norm constrained minimization problem, we develop a fast convergent iterative algorithm by incorporating a boundedness constraint on the solution. This method provides a fast computation and yields a stable solution. Numerical experiments on signal processing and seismic wavefield restoration problem indicate the robustness and applicability of our algorithms.
2. Modeling and methodology. Inspecting the algorithms effective for l 0 and l 1 regularization problems, we are particularly interested in finding of a first-order iterative algorithm for l q -norm (q ∈ (0, 1)) constrained regularization. This is promoted not only by the fact that the first-order iterative algorithms are adequate, efficient for high-dimensional problems (this is crucial for compressed sensing application), but also by the advantage that it is relatively easy to specify the regularization parameter in implementation of the algorithms. Therefore, as long as such a fast iterative algorithm can be developed, the l q -norm constrained regularization could be applied as powerfully, even more powerfully, as l 1 -norm constrained regularization, which then, hopefully, advances an essential step towards the better solution of sparsity problems.
Our choice of the l 2 − l q minimization model is based on the following observations: 1. The l q -norm (q ∈ (0, 1)) regularization can assuredly generate more sparse solutions than the l 1 -norm regularization; 2. The l q norm lq for 0 < q < 1 is neither convex nor Lipschitz continuous; solving the nonconvex, non-Lipschitz continuous minimization problem should yield the sparse solution as good as q = 0 while retaining fast convergence.
2.1. The fast convergent solving method. To solve the l 2 − l q (q ∈ (0, 1)) minimization model using optimization technique, we need to calculate the gradient and/or the Hessian information of J α 2,q [m] depending on gradient type or Newton type of methods to be used.
The gradient of the objective function J α 2,q [m] can be expressed as where diag(·) denotes diagonalization of a vector.
With the gradient and Hessian information, fast algorithms can be established. In the current paper, we only consider the first-order method, i.e., only the gradient information will be carried out to perform a recovery of the seismic wavefield. The basic iterative formula for the first-order method reads as: denote m k as the updated wavefield at the k-th iteration, then the next update will be where b 1 < b 2 are two positive parameters. Typical values of b 1 and b 2 in our calculations are that b 1 = 0.4 and b 2 = 0.9. Under the assumption that the gradient of the objective function J α 2,q is Lipschitz continuous with Lipschitz constant C, it can be proved that the objective function {J α 2,q [m k ]} is a decreasing sequence and satisfies J α where < s k , −g k > refers to the angle between the search direction s k and negative gradient vector −g k . To accelerate convergence of the iterative method, we consider the lower bounds for nonzero entries in the optimal solution of the minimization model (3) with p = 2 and q ∈ (0, 1). The lower bounds are established in [5] for data fitting problem will be used for our seismic data recovery problem. Since J α 2,q [m] ≥ α m q lq , the objective function J α 2,q is bounded below. From equation (5), we obtain that . Suppose M * q consists of local minimizers of (3) (p = 2, q ∈ (0, 1)), and let where we let l i be the i-th column of the matrix L, hence B i has fixed value for each i (i = 1, 2, · · · ), then for any m * ∈ M * q , using the second necessary condition of the minimization of J α 2,q [m], i.e., the positive definiteness of the Hessian at the minima, we have that: Therefore, using iterative formula (6), its convergence can be accelerated given the current iteration point m k since m k+1 does not need to be updated once components of m k + τ k s k is bounded by B i .
We outline a realistic fast convergent algorithm as follows: 1. Give initial value m 0 , input matrix L, data d, regularization parameter α > 0, parameter 1 > q > 0 and tolerance > 0; for i = 1, 2, · · · , compute 2. At the k-th step, compute the gradient of J α 2,q [m k ] and denoted it by g k , let s k = −g k ; 3. Perform a line search using Armijo-Goldstein rule (7) and (8) to get a steplength τ k ; 4. Update the formula m temp k+1 = m k + τ k s k , and compute If m k+1 − m k ≤ , then an optimal solution m * is found, Stop; Otherwise, let k be k + 1, go to Step (ii).

2.2.
Choosing the regularization parameter α. We remark that our method works for any choice of the parameter q ∈ (0, 1). Choosing the regularization parameter α is also a key issue. To choose a reasonable value of the regularization parameter α, we look back at B i again. Since the denominator of B i is non-related with α, we simply investigate the function φ(α, q) = (αq(1 − q)) 1 2−q . The signification of φ(α, q) is to choose proper values of q and α. Note that our aim is to find the lower bounds of B i to accelerate the convergence, hence to finish this job, φ(α, q) should be as small as possible. For fixed value of α, it is straightforward to calculate the derivative of the function φ(α, q) with respect to q yielding the stable point of q = 1/2. Therefore, in this paper, we simply choose the value of q as 0.5. If q is chosen, then to reach the minimum of φ(α, q) so as to the lower bound of B i for i = 1, 2, · · · , N , α is less than 0.05 will be sufficient. Therefore, in this paper we simply choose the value of α as 0.001. An illustration of this phenomenon is shown in Figure 1.

Complexity analysis.
Regarding to the computational complexity of our method, the main computation is the calculation of the objective gradient. Since our method is a gradient method with the truncation technique, it possesses low computational cost, less storage and it is free to the initial point. Under the exact line search, our method has a linear convergence rate. Therefore our method can be applied for large datasets problem.
3. Numerical results. To verify the feasibility of our algorithm, we consider four numerical examples. We start our simulation from a simple one-dimensional sparse signal reconstruction. Then we consider an example of reconstruction of seismic shot gathers. The synthetic example is similar to . Finally, two practical data applications are performed.
3.1. Random signal reconstruction. In [35], the authors consider a sparse signal m ∈ R N , which is measured (sensed) by a random measuring matrix L ∈ R M ×N (M < N ). Then d = Lm ∈ R M is the measurement vector. Every row of the matrix L can be seen as a measuring operator, whose inner product with m is a measurement. M < N means the number of measurements is smaller than the length of the signal, thus the number of measurements is compressed. In their simulation, M is chosen as 140 and N equals 200, then the problem is to recover the original signal m from the measurement d. The true signal is marked by legend "o" and the recovered signal is marked by "+". We apply our algorithm to the same example, the comparing results of the input signal and the restoration, and the difference between the true and restored signals, are shown in Figures 2 (a) and 2 (b), respectively.
Furthermore, we consider a harder signal reconstruction problem than that of [35], i.e., instead of M equaling 140, we consider M as 80. This is a severely illposed problem. Since the measurement is random, therefore, the data is randomly recorded. The original sparse signal is shown in Figure 2 (c) with legend "o" lines.
Using our fast convergent iterative algorithm for l q -norm constrained regularization problem, the restoration results ("+" lines) comparing with the original signal is shown in Figure 2 (c). Difference between the true and restored signals is shown in Figure 2 (d). It is evident from the comparison that our algorithm is robust in reconstruction of sparse signals.
This example shows that our method works for randomly generated data using random measurement matrix. Therefore, it would be a reliable and stable method for signal reconstruction problem.
People may wonder whether other choices of the regularization parameter α will yield similar results. From a just glance, we can choose any values of α which is greater than 0. However, either too small or too large values of α will yield unsatisfactory results. Again, we consider the above second signal reconstruction problem. Figure 3 (a) is an illustration of the original signal and its recovery for smaller α = 1.0 × 10 −4 , and the difference between the true and restored signals is shown in Figure 3 (b); Figure 3 (c) is an illustration of the original signal and its recovery for larger α = 0.5, and the difference between the true and restored signals is shown in Figure 3 (d). This confirms our assertion above. For the parameter q, we remark that q = 1/2 is a representative value for q ∈ (0, 1). Since we have obtained the optimal value of the parameter q ∈ (0, 1), there is no need to try other values of q.

3.2.
Reconstruction of shot gathers. Now we consider a seismogram generated from a 7 layers geologic velocity model [35] where the spatial sampling interval of 15 meters and the time sampling interval of 0.002 second, the shot gathers are shown in Figure 4. The data with missing traces are shown in Figure 5 (a). Using our l q -norm constrained regularization modeling and the fast convergent iterative approach, the recovered wavefield is shown in Figure 5 (b). It is clear from the reconstruction that most of the details of the wavefield are preserved. To show the good performance of our method, we plot the frequency information of the sub-sampled data the restored data in Figures 5 (c) and 5 (d), respectively. It is clear that the aliasing (just like noise) of the sub-sampled data is reduced greatly in the recovered data. The difference of the original data and the recovered data is illustrated in Figure 6 (a). Virtually, all the initial seismic energy is recovered with minor errors. Though there are still the artifacts such as vertical stripes, we consider it might be caused by ill-posed nature of the inversion process and insufficient iterations. As a comparison with other interpolating techniques, we consider the Fourier transform based interpolating technique. This method is well described in literature, e.g., [24,9,40,41]. The reconstruction result, its frequency information and the difference to the original data are shown in Figures 6 (b)-6 (d), respectively.
Finally, to quantitatively describe the ability of our algorithm in restoration of the wavefield from sampled data, we define the signal-to-noise ratio as (9) SN R = 10 log 10 where d T refers to the input seismic wavefield (we refer it to the true signal), d R refers to the restoration. We also define the relative error of the original data to the restored data as To show the speed of our method, we compare it with the standard spectral projected gradient for l 1 minimization (SPGL1) method which is well referred in literature [29]. Our calculations show that for our method, it needs 15.8711 seconds to obtain the convergent results; whereas for SPGL1 method, it requires 26.5988 seconds to obtain the convergent results with the SN R equaling 25.1392 and the Err equaling 0.4997.
3.3. Field data. Finally, we examine the efficiency of the new method with field data. The first seismic data about Tarimu oil field is provided in Figure 7  The subsampled gather is shown in Figure 7 (b) with half of the original traces randomly deleted. This sub-sampled gather was used to restore the original gather with suitable solution methods. With our l q -norm constrained minimization model and the fast convergent algorithm to restore the data, the result is shown in Figure  7 (c). Frequency of the recovered data is shown in Figure 7 (d). Comparing the subsampled image with the original image, the restored image can reconstruct most of the details. We apply the Fourier transform based interpolating technique to the field data. The reconstruction result is shown in Figure 8 (a). Frequency of the corresponding recovered data is shown in Figure 8   of the restored seismic data using our proposed method and the Fourier transform based method, it reveals that our method yields a more accurate recovery. Our next field data is a marine shot gather shown in Figure 9 (a) which consists of 256 traces with spacing 25m and time sampling interval 2ms. There are damaged traces in the gather. The subsampled gather is shown in Figure 9 (b) with half of the original traces randomly deleted. This sub-sampled gather was used to restore the original gather with suitable solution methods. With our l q -norm constrained minimization model and the fast convergent algorithm to restore the data, the results are shown in Figures 9 (c) and 9 (d), respectively. Comparing the subsampled image with the original image, the restored image can reconstruct most of the details. In addition the damaged trace in the original gather was restored as a good trace. Again, we apply the Fourier transform based interpolating technique to the field data. The reconstruction result is shown in Figures 10 (a) and 10 (b), respectively. Again, from comparison of the frequency of the restored seismic data using our proposed method and the Fourier transform based method, it indicates that our method yields a more accurate recovery.

Conclusion.
Sparse optimization has broad applications in seismic data processing, such as time-frequency analysis, de-noising, multiple attenuation, interpolation and migration. This paper focuses on data restoration (interpolation) problem. Major issues affecting the effect of restoration are: sparse transforms, sampling, sparse modeling and solving methods. The sparse transform will influence the restoration results, the sparser of the data in transform domain, the easier to restore the original data; fast and effective solving methods are the key of sparse optimization; and proper sampling methods will save acquisition time and cost greatly. We mainly study sparse modeling and solving methods in this paper, while the sparse transform is based on wavelet transform and the sampling technique is based on the piecewise random sub-sampling developed in [35].
We first reformulate the data restoration problem using an l q -norm constrained minimization model, which is a nonconvex and non-Lipschitz continuous minimization problem. Based on the lower bound of the local minimizers of the minimization problem l 2 -l q (0 < q < 1), we then develop a fast, effective first-order iterative algorithm for realizing the minimization problem. Numerical results on synthetic problems and the field data example indicate potential usage of our method for practical applications. We remark that we only consider the gradient descent method in the present paper. For the seismic wavefield recovery using the l q norm minimization, the (L-)BFGS method may be applied to this kind of problems, which will be our next stage of investigation.