NON-LOCAL BLIND HYPERSPECTRAL IMAGE SUPER-RESOLUTION VIA 4D SPARSE TENSOR FACTORIZATION AND LOW-RANK

,

resolution.The later one has higher spatial while lower spectral resolution while the former one is the opposite.Almost all HSI super-resolution work assumes the degrading operators between the low resolution image and its high resolution analog are known but in reality they are not available.It is therefore important to address the issue of blind super-resolution, the problem of simultaneously resolution enhancement and degrading operators.
Super-resolution, especially blind super-resolution is a seriously ill-posed problem.Appropriate selections of regularization plays an essential role in the success of this problem.Spatial/spectral similarity and sparsity have proven to be effective.Methods in the literature are based on either 2D matrix or 3D tensor modeling.Matrix based methods unfold the 3D multispectral and hyperspectral images into 2D matrices so that for instance each row corresponding to one spectral band at all spatial locations.Matrix decomposition and sparsity are then adopted for super-resolution.In [13], Kawakami et al. introduced a sparse matrix factorization method, which reconstructs the HSI by multiplying the learned dictionary from the HR-MSI and sparse coefficients from the LR-HSI.Wycoff et al. [31] proposed a sparse non-negative matrix factorization method that exploits both non-negativity and sparsity of HSI.In [37], Yokoya et al. proposed a coupled non-negative matrix factorization (CNMF) method for HSI super-resolution by alternately applying non-negative matrix factorization (NMF) [17] unmixing to LR-HSI and HR-MSI.However, the sparsity constraints of HSIs is not considered.In [24], Simões et al. presented a convex formulation for HSI super-resolution (known as HySure) based on vector total variation regularization, which promotes piecewise-smooth solutions with discontinuity along spectral direction.In addition, the spatial and spectral responses of the sensors are both estimated from the observed images.In [32], an adaptive sparse matrix representation method was proposed by Wei et al..In [1], a non-parametric Bayesian sparse representation (BSR) method for the fusion of HSI and MSI was proposed by Akhtar et al..In [6], Dong et al. proposed a clustering-based non-negative structured sparse representation approach with spatial-spectral sparsity of the HSI.In [39], Zhang et al. developed a novel clustering manifold structure based HSI super-resolution framework.The learned manifold structure from the HR-MSI can well capture the spatial correlation of the target HR-HSI.
Transforming 3D images into 2D matrices however loses spectral correlation.Tensor modeling better preserves the data structure.Recently, tensor based methods have been successfully applied to hyperspectral image processing, such as restoration [22,4,9,25,35,36], segmentation [40], matching [20], recognition [11,12], classification [41,42] and unmixing [3,19,23].In [7], Dian et al. proposed a non-local sparse tensor factorization (NLSTF) model for HSI super-resolution, which induces spatial dictionaries and core tensors from HR-MSI and spectral dictionaries from LR-HSI, respectively.However, this method does not combine LR-HSI and HR-MSI to estimate the dictionaries and the core tensor for each cluster.Chang et al. [4] presented a weighted low-rank tensor recovery model by using high order singular value decomposition.It updates core tensors iteratively while fixing the dictionaries along every mode.In [14], a coupled tensor factorization model was proposed by Kanatsoulis et al. to tackle the HSI and MSI fusion task when the spatial degradation operator is unknown or inaccurately estimated.This approach however didn't consider the spatial non-local self-similarity and tensor sparsity prior.Very recently, Li et al. [18] proposed a coupled sparse tensor factorization (CSTF) based HSI super-resolution model, however, without considering the non-local spatial similarity in the HSI.To incorporate non-local similarity and global correlation across spectrum, cubes based methods are adopted.Images are divided into full band patches (FBP) and grouped into clusters based on similarity.For each of the cluster, its FBPs are either organized into a 3rd order tensor by unfolding along spectral dimension [4,36] or a 4th order tensor without any unfolding to preserve the high dimension structure [22,7].The computation is then done cluster by cluster.In this work, we adopt 4th order tensor modeling to best preserve the data structure.
The majority of the hyperspectral super-resolution work assumes the spatial degrading operator is known or is easy to be estimated before hand.In practice, they are usually unknown or hard to be estimated accurately.In this paper, we propose a novel blind HSI super-resolution when the spatial degradation operators are completely unknown.This method combines non-local regularity and sparse tensor factorization into a unified framework.We first group similar 3D cubes into clusters using K-means ++.The similar cubes in each cluster are organized in a 4th order tensor and represented using Tucker decomposition.The HSI super-resolution problem can be changed to calculate the spatial and spectral dictionaries of three modes and the corresponding core tensor for each cluster.Moreover, a new tensor sparsity measure is designed, and it can be easily extended to many other problems.Furthermore, an effective proximal ADMM based algorithm is proposed to solve the blind HSI super-resolution problem.Numerical experiments show that the proposed approach provides impressive blind HSI super-resolution results.
The rest of the paper is organized as follows.In Section 2, we briefly introduce basis notations and review some preliminaries on tensors.We present the proposed HSI super-resolution model in Section 3. In Section 4, we show the blind algorithm and some details about the implementation.Experimental results are laid out in Section 5 and conclusion follows in Section 6.
2. Notations and preliminaries.Let X ∈ R I 1 ×I 2 •••×I N be an N th -order real valued tensor and . ||X || 0 denote the l 0 semi-norm counting the number of non-zero elements in X .
The mode-n vectors of a tensor X are those obtained by fixing every index except the n-th one.For instance, mode-1 vectors are X (1 : , etc.The mode-n matricization, also known as the unfolding or flattening, of a tensor and obtained by arranging the mode-n vectors to be the columns.The n-mode product of a tensor Besides, the n-mode product can also be calculated by the matrix multiplication: For distinct/same mode multiplication, we have Definition 2.1 (Tucker Decomposition).Tucker decomposition of an N th -order tensor X ∈ R I 1 ×I 2 •••×I N can be written as Here, An ∈ R In×Mn are the factor matrices along mode n and Let the vectorization of a tensor X be defined as vec(X ) ≡ vec(X (1) ) ∈ R I 1 I 2 •••I N , i.e. the long vector obtained by stacking all the mode-1 vectors vertically.The mode-n unfolding of the Tucker decomposition (3) is [15]: and its vector representation [5] is where x = vec(X ), y = vec(Y) and the symbol "⊗" denotes the Kronecker product of matrices, which has the following properties that will prove useful in our discussions: 3. The proposed method.Let X ∈ R W ×H×S be the target HR-HSI to be recovered, where W, H and S denote the dimensions of the width, height and spectral mode, respectively.The tensor Y ∈ R w×h×S denotes the corresponding LR-HSI, and Z ∈ R W ×H×s denotes the HR-MSI of the same scene, where w < W, h < H and s < S. The purpose of HSI super-resolution is to estimate the HR-HSI X by fusing the LR-HSI Y with the HR-MSI Z as illustrated in Figure 1.Reconstructing X from Y, Z when there is noise is an ill-posed inverse problem.When there no knowledge of the downgrading matrices, it is more difficult.
Due to modeling accuracy and large size of the data, the computation is usually not done on the entire domain but on clusters of small cubes.Clustering similar cubes together and conducting cluster wise computation not only save computation but also bring non-local regularity.A collection of various similar high dimensional cubes leads to a natural high order tensor.We will adopt high order tensor decomposition to better model the similarity among cubes.We divide the HR-MSI Z into 3D cubes where K is the number of clusters, n k is the number of cubes in the k th cluster and Z (k,j) denotes the j th cube of the k th cluster.Furthermore, the similar cubes in the k th cluster can be organized as a 4 th order tensor: Z (k) ∈ R d W ×d H ×s×n k , the fourth mode indicates cubes and Z (k) (:, :, :, j) = Z (k,j) .According to the spatial correspondence, we can group the cubes of LR-HSI Y into K clusters: , and then organize them in each cluster into 4D tensors Y (k) ∈ R dw ×d h ×S×n k .The target HR-HSI X will be reconstructed cluster by cluster and each cluster is denoted as For the k th cluster, when there is no noise, the acquired where P 1 ∈ R dw ×d W and P 2 ∈ R d h ×d H are the blurring and downsampling matrices along the width (first) mode and height (second) mode, respectively.In this work, P 1 and P 2 are both completely unknown.Besides, where P 3 ∈ R s×S is the downsampled matrix of the spectral (third) mode.Based on Tucker decomposition, the kth cluster X (k) of the HR-HSI X can be formulated as where the matrices denote the dictionaries of the width, height and spectral mode, respectively, and the tensor The proposed tensor sparsity measure for X (k) is defined as where is the unfolding along mode-4, λc and µ are two positive parameters.The first term enforces sparsity under Tucker decomposition.The second term originates from the fact that the cubes in the cluster have some self-similarity/correlation that will leads to low-rank of X (k) (4) .

Inverse Problems and Imaging
Volume 14, No. 2 (2020), 339-361 The minimization problem with l 0 semi-norm and rank terms is non-convex.In order to simplify the computation, we replace the l 0 semi-norm by the l 1 norm and the rank with the nuclear norm, where • * is the nuclear norm defined as the sum of the singular values of X (k) (4) .
4. Blind algorithm.Since the spatial degradation operators P 1 and P 2 are unknown in practice, we design a blind super-resolution algorithm to solve the proposed model.Representing X (k) in terms of Tucker decomposition, the proposed model is: min We solve the above minimization problem by introducing three auxiliary variables M where ).Then its augmented Lagrangian function is with the form: where P (k) is the Lagrange multiplier and λ 2 is a positive parameter.For better convergence, we solve the problem under the proximal ADMM framework.We alternate some subproblems iteratively and each subproblem can be updated as follows.
4.1.C (k) subproblem.With the other parameters fixed, C (k) can be updated by solving min Inverse Problems and Imaging Volume 14, No. 2 (2020), 339-361 where pre represents the estimated C (k) in the previous iteration.Furthermore, since C (k) contains n k cubes, the above problem can be equivalently reformulated as min According to the relationship between the Tucker model and Kronecker representation (equation ( 5)), the above problem can be formulated as min where ) are vectors by stacking vertically all the mode-1 vectors of tensor C (k,j) , Y (k,j) , Z (k,j) and O (k,j) , respectively, and the matrices k are the dictionaries.To solve C (k) , we add three auxiliary variables B to substitute C (k) , and rewrite the above problem as min Then we can use standard augmented Lagrangian method to produce the following scheme: ), ), By applying an alternating minimization algorithm, (14) becomes Inverse Problems and Imaging Volume 14, No. 2 (2020), 339-361 where 1)For C (k) t+1 , it can be solved by soft thresholding: , it has a closed-form solution: where tity matrices with different sizes, and Since D T D is a very large matrix, directly calculating it will increase the computation and slow down the operation speed.Therefore, problem (17) can be equivalently separated as the following three problems: Next, we can further simplify the computation of the 2 and B (k) 3 subproblems by using the method in [18] when , where E i is unitary, Σ i is diagonal.Then, we can get where Σ 3 ⊗ Σ 2 ⊗ Σ 1 + ηc 1 I 1 is a diagonal matrix and its inverse is very easy to calculate, and A T (4) = unfold 4 (A) T .Similarly, we set S * k = P 3 S k and E i , Σ i , for i = 1, 2, 3, are unitary matrices and non-negative diagonal matrices holding the eigenvectors and eigenvalues of k , respectively.Then, we have With the other parameters fixed, W k can be updated by solving min Furthermore, the above problem can be rewritten in mode-1 as the following: (1) are mode-1 unfolding matrices of tensors Z (k,j) , O (k,j) and C (k,j) , respectively, and In addition, we can denote ], Then, the problem can be reformulated as min By using vec(AXB) = (B T ⊗ A)vec(X), and by letting w k = vec(W k ), we can easily rewrite the above problem as min where This problem has a closed-form solution written as The above problem can be rewritten in mode-2 as follows: min where ].In addition, and

Inverse Problems and Imaging
Volume 14, No. 2 (2020), 339-361 Similarly, letting h k = vec(H k ), the above problem can be easily rewritten as where This problem has a closed-form solution written as 4.4.S k subproblem.With the other parameters fixed, S k can be updated by solving min With the other parameters fixed, W * k can be updated by solving min Furthermore, the above problem can be rewritten in mode-1 as follows: min where and are mode-1 unfolding matrices of tensors Y (k,j) and C (k,j) , respectively, and ] and G (1) ].Then, the problem can be reformulated as min This problem has a closed-form solution written as With the other parameters fixed, H * k can be updated similarly by solving min The above problem can be rewritten in mode-2 as follows: min This problem has a closed-form solution written as where ].
4.7.P 1 subproblem.With the other parameters fixed, P 1 can be updated by solving min This problem has a closed-form solution written as 4.8.P 2 subproblem.With the other parameters fixed, P 2 can be updated by solving min This problem has a closed-form solution written as Inverse Problems and Imaging Volume 14, No. 2 (2020), 339-361 4.9.M (k) subproblem.With other parameters fixed, M (k) can be updated by solving min where which has been proven to have the following closed-form solution: where V 1 ΣV T 2 is the SVD of unfold 4 (L + P (k) ) and S λ (Σ) is the soft thresholding operator on diagonal matrix Σ with parameters λ.The ii-th entry of the diagonal matrix S λ (Σ) is defined as 4.10.P (k) subproblem.The Lagrangian multipliers are updated by Once the dictionaries W k , H k , S k and core tensors C (k) (k = 1, 2, • • • , K) are known, all overlapping HR-HSI cubes can be estimated by using equation ( 6) and returned to the original place to reconstruct the target HR-HSI X .We summarize the proposed algorithm in Algorithm 1.

Algorithm 1 (Blind Algorithm).
Input: LR-HSI Y, HR-MSI Z, P 3 .Output: HR-HSI X .Group the similar cubes of Z and Y into cluster tensors Z (k) and Y (k) by K-means ++, initialize W k , H k and S k by using Tensor factorization technique on Z (k) and Y (k) , respectively, and W * k , H * k are initialized by averaging every a (a is the scaling factor) rows of  (42).until stopping criterion is satisfied.Estimate X (k) by the equation (6).Reformulate X (k) to obtain X .
Next, we will state the convergence result of our algorithm for C (k) in the following theorem.The basic convergence result can be found in several references, such as [27,34].
) is the minimizer of problem (13).Suppose 5. Numerical experiments.In this section, we numerically demonstrate the superior performance of the proposed blind HSI super-resolution model on the simulated and real HSIs.In general, the number of clusters K is set as n 80 with n the total number of cubes.The sparsity regularization parameters λc and µ of the proposed model are both tested over a range of parameters 10 d , d = −6, −5, −4, −3, −2, −1, and then we choose λc = 10 −4 and µ = 10 −3 for all the data for simplicity.In addition, the parameter λ 1 used to balance two fidelity terms is empirically set as 10, and we simply set the algorithm parameter λ 2 = 10 −3 .We compare the proposed method with several related HSI super-resolution methods: a subspace regularization method HySure [24] 1 , a coupled non-negative matrix factorization method CNMF [37] 2 and a coupled blind tensor factorization method STEREO [14] 3 .For fair comparison, all the methods are tested with the same given spectral downsampling matrix P 3 , instead of estimating it as in Hysure and CNMF.Moreover, the spatial downsampling location of Hysure is given accurately and the key parameters are optimally chosen in each method.All the experiments are run under Windows 7 and MATLAB R2017a with Intel Core i5-5200U CPU@2.80GHz and 8GB memory.5.1.Evaluation metrics.We adopt five quantitative metrics, including peak signal-to-noise ratio (PSNR), structural similarity (SSIM), root mean square error (RMSE), spectral angle mapper(SAM) and erreur relative globale adimensionnelle Synthèse (ERGAS).PSNR is easily defined based on the mean square error(MSE), , where X k and X k denote the k th band images of the ground-truth HSI X ∈ R W ×H×S and the estimated HSI X ∈ R W ×H×S , respectively.MSE(X k , X k ) is the mean squared error between X k and X k .SSIM is used for predicting the perceptual quality in the image.The SSIM of HSI is designed as the sum of SSIM k of the k th band: where µ X k and µ X k denote the mean value of the elements in X k and X k , respectively.σ X k and σ X k denote the variances of X k and X k , respectively.σ X k X k denotes the covariance between X k and X k .PSNR and SSIM are commonly used in imaging science to verify experimental results, and the higher the value of them, the better the image quality.RMSE is used to measure differences between X and X , which is defined as The lower the RMSE, the lower the reconstruction error.In addition, SAM is defined as where x ij: and xij: represent the spectral vectors at spatial pixel (i, j) of X and X , respectively.SAM measures the spectral distortion between the estimated and ground-truth images, and small SAMs correspond to good spectral quality.ERGAS is defined as , where a is spatial downsampling factor, and small ERGAS values indicate good fusion performance.
5.2.Simulated HSI super-resolution.In this section, we test the effectiveness of the proposed method on the CAVE database [38], which contains 32 hyperspectral images, each with 31 spectral bands, ranging from 400nm to 700nm with resolution 10nm, and a spatial resolution of 512 × 512.We use the CAVE database as the ground truth image X , and simulate the LR-HSI Y by employing an 8 × 8 uniform blur or a 7 × 7 Gaussian blur with 0 mean and standard deviation 3 to X before downsampling with scaling factor 8 along each of the spatial directions.The HR-MSI Z is generated by degrading the hyperspectral image X using a spectral transform matrix P 3 based on the response of a Nikon D700 camera 4 .The average values of each of the five quantitative measures for each compared HSI super-resolution methods on all 32 scenes have been listed in Table 1.It can be seen that the proposed method leads to better quantitative results than all the other three methods.For better comparison, we also show the SAM values of 32 images with uniform blur corresponding to different methods in Figure 2. We can observe that the proposed method (red color) has the lowest SAM values for all images in CAVE database.In Figure 3, we compare their visual performance in peppers (cartoon image) and flowers (texture image) of the CAVE dataset with Gaussian blur.The first and third columns of Figure 3 show the constructed images of the peppers at the 8 th band and the flowers at the 14 th band, respectively.Their corresponding absolute error images are shown in the second and fourth columns.Blue color corresponds to low error.We observe that all competing methods keep spatial structures well, but the proposed method leads to the lowest reconstruction errors and the best visual quality in restoring both smooth areas and texture regions.More details can be observed from the zoomed in patches in Figure 3.The proposed method performs better than other test methods both quantitatively and qualitatively.2. We can clearly see that the proposed method significantly outperforms other methods with respective to all the quantitative measures, especially for the case of Gaussian blur. Figure 4 shows the reconstructed images and the corresponding absolute error images of all the competing methods at the 60 th and 130 th bands (the bright one and the dark one).We observe that the proposed method still leads to best results among all test methods.In addition, we report a numerical evidence for the convergence of the proposed algorithm.The functional energy curve as a function of the iteration for the first similar cluster with uniform blur is shown in Figure 5.
We can observe that it rapidly stabilizes in a few iteration numbers.In this section, we consider to verify the robustness of the proposed model to Gaussian noise by using the Pavia data [8].The whole image contains 610 × 340 pixels and 115 spectral bands, acquired by the reflective optics system imaging spectrometer (ROSIS).In our experiments, only a subimage of size 256 × 256 × 93 was chosen after removing the water vapor absorption bands.To generate the noisy LR-HSI Y, the ground truth image X is downsampled by first using the uniform blur with scaling factor 8 as before, and we then add Gaussian noise.The noisy HR-MSI Z is firstly simulated by degrading X using the IKONOS-like reflectance spectral response filter [33] and then adding Gaussian noise.We use SNRh and SNRm to represent SNR of the simulated noisy LR-HSI and HR-MSI, respectively.Table 3

Figure 1 .
Figure 1.Illustration of the hyperspectral image super-resolution task.
×d H ×s with overlaps, where d W and d H denote the spatial width and height dimensions, the overlaps along two spatial modes are lw = d W /2 and l h = d H /2, respectively, n = nwn h is the number of cubes and nw = (W − lw)/(d W − lw), n h = (H − l h )/(d H − l h ).Similarly, we create another set of small 3D cubes {Y i } n i=1 ⊂ R dw ×d h ×S from the LR-HSI Y in the same way, where dw = d W /a, ds = d H /a, a is the scaling factor (e.g. in this work, we set a = 8, d W = d H = 16, dw = d h = 2).Then all the 3D cubes of the HR-MSI Z can be grouped into K clusters {Z (k,j) } n k j=1

Figure 2 . 3 .LRFigure 3 .
Figure 2. The SAMs of all competing methods on CAVE dataset with uniform blur lists the quantitative results of all compared methods on the Pavia image with different Gaussian noise.The visual comparison results are shown in Figure6.The first and second columns show the reconstructed images and the corresponding error images at the 45 th band of the Pavia image with SNRh 30dB and SNRm 35dB, respectively.The third and forth columns show the reconstructed images and the corresponding error images at the 75 th band of the Pavia image with SNRh 35dB and SNRm 40dB, respectively.It can be observed that the proposed model has lower reconstruction errors and better quantitative results than the other methods.

Figure 4 .
Figure 4.The reconstructed images and the corresponding error images of the Indian Pines image at the 60 th and 130 th bands with uniform blur and scaling factor 8.

Table 1 .
Comparison on the average values of each of the five quantitative measures on 32 scenes from CAVE dataset with scaling factor 8.

Table 2 .
The performance comparison of the methods on the Indian Pines image with scaling factor 8. Experimental results on remote sensing data corrupted by Gaussian noise.

Table 3 .
The performance comparison of noisy cases on the Pavia image with uniform blur and scaling factor 8.