LANTERN: learn analysis transform network for dynamic magnetic resonance imaging with small dataset

This paper proposes to learn analysis transform network for dynamic magnetic resonance imaging (LANTERN) with small dataset. Integrating the strength of CS-MRI and deep learning, the proposed framework is highlighted in three components: (i) The spatial and temporal domains are sparsely constrained by using adaptively trained CNN. (ii) We introduce an end-to-end framework to learn the parameters in LANTERN to solve the difficulty of parameter selection in traditional methods. (iii) Compared to existing deep learning reconstruction methods, our reconstruction accuracy is better when the amount of data is limited. Our model is able to fully exploit the redundancy in spatial and temporal of dynamic MR images. We performed quantitative and qualitative analysis of cardiac datasets at different acceleration factors (2x-11x) and different undersampling modes. In comparison with state-of-the-art methods, extensive experiments show that our method achieves consistent better reconstruction performance on the MRI reconstruction in terms of three quantitative metrics (PSNR, SSIM and HFEN) under different undersamling patterns and acceleration factors.


Introduction
Dynamic magnetic resonance imaging is able to provide important anatomical and functional information in a spatial-temporal manner. However, a fundamental challenge of MRI is its slow imaging speed a.k.a long imaging time, which hinders its wide applications. To address this challenge, there have been different efforts devoted by researchers ranging from prompting hardware to software developments such as parallel imaging using phased array coils [1], fast imaging sequences [2], and reduced-scan techniques with advanced image reconstruction algorithms.
Our specific focus here is the signal processing based MR image reconstruction from incomplete k-space data. Since the acquisition time of k-space is proportional to its amount, this strategy accelerates MR scan by undersampling or partial sampling of k-space. Undersampling introduces violations of the Nyquist sampling theorem and may cause aliasing and blurring issues by direct inverse Fourier transform [3], [4]. To solve these issues, prior knowledges are normally incorporated in the reconstruction formulation as regulations. Specifically, under the support of the well-known compressed sensing (CS) theory, researchers have developed a series of dynamic image reconstruction methods by exploiting either spatial or temporal redundancy or both with different sampling patterns. For example, with a random k-t sampling pattern, k-t FOCUSS [5] was proposed, whose special cases included the celebrated k-t BLAST and k-t SENSE [6]. There was also a k-t iterative support detection (k-t ISD) method to improve the CS dynamic MR imaging methods [7]. These methods along others have explored different sparsifying transforms such as total variation (TV) and wavelet in spatial domain [8]- [10]; Fourier transform [11], [12], finite difference [13], and principal component analysis [14], [15] in the temporal domain; and 3D transforms such as wavelet-Fourier transform [16] or 3D wavelet transform in the spatial-temporal domain [17]. Besides the predefined transforms, dictionary learning has also been investigated. For example, temporal gradient sparsity was explored by [3] with adaptively trained dictionary and a patchbased 3-D spatiotemporal dictionary was trained for sparse representations of the dynamic image sequence [18]. In addition to the sparsifying transforms, low rank has been utilized to complete missing or corrupted entries for a matrix as well. Typical instances include Bo Zhao et al proposed combine PS and sparsity constraints to improve MR reconstruction performance [19] and the L+S [20], k-t SLR [13] methods. These methods all made great contributions to dynamic MR imaging. Nevertheless, the prior knowledges utilized are still limited to few samples or reference images [21]. Furthermore, the iterative reconstruction can be time-consuming with parameters hard to tune.
Deep learning based MR image reconstruction is an emerging field to accelerate MR scan. There are model-based deep learning methods that formulate the prior regularization iterative reconstruction process into network learning process. Typical examples include variational network (VN-net) [22], alternating direction method of multipliers network (ADMM-net) [23] and Model DL [24], etc. [21]. In addition to the model based methods, there are also direct end-to-end learning techniques that identify the mapping relationship between the undersampled and fullysampled pairs. Instances consist of AUTOMAP [25], U-net [26], KIKI-net [27], recursive dilated net [28], and so on so forth [29]. Among all of them, there are only a few end-to-end learning networks for dynamic MR imaging [4], [30]. These works directly learn the mapping relationship and have shown great experimental results. Nevertheless, the explanation of this work is more empirical, which hasn't taken advantage of the theoretically explainable compressed sensing framework.
To bridge the gap between the models based dynamic imaging work and the empirical direct map learning framework, this work proposes a convolutional analysis transform network learning for dynamic magnetic resonance imaging with convolutional dubbed as LANTERN. Specifically, we initialize the discrete cosine transform(DCT)in the spatial domain and total variation (TV) in the temporal domain to fully exploit the redundancy of dynamic image sequnces. To optimize the models, we use Alternating Direction Method of Multipliers (ADMM) which is a valid variable separable method. Nevertheless, the reconstruction is normally time-consuming and its parameters have to be hand-tuned. Inspired by the convolutional neural network, we use CNN to learn all the parameters in the above formula, and then use the trained model for previously unseen data. We only collect part of the k-space data for reconstruction this can reduce the acquisition time and our model is trained offline, so it only requires a short reconstruction time. Our experiments demonstrate that the proposed scheme can effective reconstruct dynamic MR image with high accuracy and fast speed. Compared with the state-of-the-art method, D5C5 [4], kt SLR [13], our method presents superior performance in both quantitative and qualitative analysis.

Dynamic imaging model
In dynamic MR imaging, the measured signal t ϵ ℂ at time t can be described as follows where ϵ ℂ * is the measurement matrix and ∈ ℂ is the measurement noise for the t-th vectorized cardiac phase image ∈ ℂ ; = 2 ; is an × undersampling matrix whose rows are extracted from an × identity matrix according to the k-space sampling locations at time t ( << ). 2 ∈ ℂ × is the unitary matrix representing the 2D Fourier transform. Suppose a total of Q cardiac phases are acquired, the entire acquisition process can be described as follows: Adopting augmented Lagrangian technique, the constrained problem in (4) can be transformed into the following unconstrained one: Where are Lagrangian multipliers; ρ represents the scaling factor; the above optimization problem can be further divided into three subproblems by using the alternating direction multiplier method with an assistant variable introduced: 2) Alternating direction minimization algorithm (a) Subproblem . Adopting least squares to solve the first equation in Eq. (6), we have Then further let = , with P = diag{P 1 , P 2 , … P t , . . P Q } ∈ × and = { , , … , } ∈ ℂ × , we have the following solution Where is a diagonal matrix that can be quickly calculated. (b) Subproblem . For the update of , we adopts the gradient descent method. With Where ℋ means the derivative of the prior regulation function and is the step size. Therefore, we have Where, 1 = 1 − ρ, 2 = ρ, ̃= . Rewrite the above formula (7) to the following formula (8), split ( , ) into the layer, 1 layer, layer and 2 layers. In particular, we consider filter as convolution kernel. And ℋ is approximated by learning a piecewise linear function (•).

LANTERN network architecture
In order to exploit the extensive temporal and spatial redundancy of dynamic magnetic resonance imaging, we propose a LANTERN architecture was loosely inspired by [23], which adaptively learns sparse convolution kernels and regularization parameters through convolutional neural networks without artificial adjustment. The flow of the network is shown in Fig.1 (A). The input is an undersampled k-space data. After N iterations, a reconstructed image can be obtained. The reconstructed image and label are used to calculate the mean square error, and then the backpropagation is used to update the parameters in the network. This way, you can get a high-quality, artifact-free image that is close to Ground Truth. Fig.1 (B) shows the specific process of the nth iteration, each layer corresponding to the following forward propagation formula, wherein the detail process of the layer is described in detail in Fig.1 (c). The layer includes 1, and 2, where 1 and 2 learn filter operators that make the image sparse. A piecewise linear function is used to approximate the derivative of the regularization function. Where, the n represents the nth iteration and the N represents a total of N iterations. The K express the priori loop for k times and (n, k) means that in the nth iteration, the a priori loops k times.

1) The formula for the propagation of each layer is as follows
In Fig.1 (b) and (c), the stands for reconstruction layer ( ) . The input is undersampled k-space data and ( −1) , ( −1) . Stands for addition layer ( , ) . Performs simple weighted summation operation. The input is ( , −1) , ( ) 1 And 2 stands for convolution layer. The input is ( , −1) ( , ) . In which we used 3D convolution in the spatial domain, mainly for feature extraction, however, the parameter amount of 3D convolution is relatively large. In order to reduce the parameter quantity, we use 2D convolution in the time domain TV filter. 2) The process of back-propagation parameter update is as follows As shown in Fig.1. In the process of back-propagation, the gradient is calculated in an opposite order. So, the gradient of the loss layer is as follows: The gradient of the is as follows, update parameter ( ) . N represents a total of N iterations. We also compute the gradient to its inputs: We also compute the gradient to its inputs: We also compute the gradient to its inputs: We also compute the gradient to its inputs:

Dataset
We collected 101 fully sampled dynamic cardiac MR data using a 3T MRI system (SIEMENS MAGNETOM Trio) with a T1-weighted cine flash sequence. The scan parameters were TR/TE=2.58/49.14ms, number of slices=25, slice thickness=8mm, FOV=280mm, spatial resolution=1.5mm, and sampling matrix size=192×100. We cut all 3D cardiac data into 126×126×16 volumes and perform Fourier transform to obtain the K-space data. By applying corresponding masks, images with different under-sampling patterns were obtained (See Table 1 for details). Finally, 150 cardiac data were generated with 100 data for network training, and 50 for network testing.

Implementation
Considering that D5C5 is a method based on big data sets, if we use 100 data to train, the reconstruction results are not good enough, and the experimental results also verify the conjecture. So, for the D5C5 method, we cut our data to 126*126*16 and get 3200 data, 2900 data for training, 300 data for testing, and then calculated the average NMSE, PSNR, SSIM. The size of the applied filter is 3*3*9, where the first eight (3*3*8) are DCT and the last one (3*3*1) is TV. The online model training took 45 hours on an Intel Xeon (R) CPU E5-2640 V4 @2.40GHz × 40, 64G. Table 1 shows the experimental setup in this paper, with a maximum acceleration factor of 11x in 1D Random undersampling mode and a maximum acceleration factor of 15x in 2D Radial undersampling mode. It is worth noting that the acceleration factors in the table are net acceleration factors.  Figure 2 and Figure 3 show the comparison of the proposed method when the number of training images were increased from 50 to 120. And the average reconstruction quantitative metrics of the 50 test data as shown in Table 2. The proposed method does not require a large amount of data, and can reconstruct a good quality image even in the case of fewer data samples. And from the results, it can be found that the reconstruction result is the best when the data amount is 100. Therefore, we selected 100 images for experiments in subsequent works.   Fig. 3. The comparison between average PSNR value of the 50 test data and different amount of data based on 1D Random sampling at an accelerated factor of 4.

Effect of initializations: Random Gauss, DCT, DCT+TV
We used the methods of Random Gauss, DCT, and DCT+TV to initialize the proposed model and compare the experiments. Figure 4 shows the reconstruction visual results under several different initialization methods. It can be seen that the reconstruction result initialized by Gaussian noise is the worst, and the reconstruction result under DCT+TV initialization is slightly better than DCT. The PSNR value of DCT+TV is also the highest. Table 3 is the average quantization index corresponding to the reconstruction result under the data volume of 100 and 1Drandom 4x acceleration, which also proves that the DCT+TV method is superior to the other two initialization methods.

Comparison with state-of-the-art methods
To further verify the feasibility of the proposed model based dynamic MR image reconstruction algorithm, we compare the performance of the proposed method with compressed sensing based k-t SLR technique and data-driven based D5C5 algorithm. It can be seen from the results of 1D random 4x and 5x acceleration factors in Figure 5 and Figure 6 that the reconstructed image by the k-t SLR method is somewhat blurred. The reconstruction visual effect of the D5C5 algorithm is close to the proposed method, but the PSNR value is the highest from the proposed method. Table 4 compares the quantitative indicators of 4x, 7x, and 11x acceleration factors under 1Drandom sampling. We can see that the proposed method has the best index. We also plotted a quantitative index plot of the reconstruction results from 2x to 11x acceleration factors, as shown in Figure 7.
The performance of all methods is declining as the acceleration factor increases. However, our approach has always maintained optimal performance.    In addition, we also conducted an experimental comparison of the radial undersampling trajectory. Table 5 shows the average quantified index values of the test data with the radial sampling at 7x, 11x, and 15x acceleration factors. It shows that our method is far superior to the comparison algorithm in PSNR, SSIM or HFEN. The comparison algorithm D5C5 and k-t SLR have their own advantages. The PSNR and HFEN values of D5C5 are better than k-t SLR, and the SSIM value of k-t SLR is relatively better. We give a visual comparison of the reconstruction results of the radial sampling under the 11x acceleration factor. The reconstruction results of the k-t SLR and D5C5 algorithms are both fuzzy. The proposed method is closest to the original image, and can also be seen from the error map. Fig. 9 is a graph showing the variation of the average quantization index of the test data of several methods with the increase of the acceleration factor, and the results are consistent with those described in Table 5.

Convergence analysis
The results in the previous section demonstrate that our method has the best test performance.
Due to the small number of samples used, over-fitting problems may occur in machine learning studies. To prove that our method is convergent, there is no over-fitting. We give the training and validation error curves of the proposed model. As shown in Figure 10, a total of 400 epochs are trained, and the training and validation errors are always decreasing, which proves that the proposed method has no over-fitting.

Discussion
Our method simultaneously sparsely constrains the spatial and temporal domains of dynamic magnetic resonance images and uses the end-to-end framework to learn the parameters. By comparing with the most advanced dynamic magnetic resonance image reconstruction algorithm, as shown in Figures 4, 5 and 7, it can be seen from the visual effect that the proposed method can reconstruct clearer results and the image detail content recovery is more accurate. Under the 4x acceleration factor of 1Drandom sampling, the contrast algorithm D5C5 is slightly better than the k-t SLR, and under the 1Drandom sampling 5x acceleration factor, D5C5 is significantly better than the k-t SLR. The reconstruction results of the two contrast algorithms under the 11x acceleration factor of radial sampling have their own advantages and disadvantages. Although D5C5 is higher in PSNR value, D5C5 is more serious for image smoothing, while k-t SLR contains some noise-like artifacts. Comparatively, the reconstruction accuracy of the proposed method is higher. Figures 6  and 8 further demonstrate that the results of the proposed method are superior to the comparison algorithm in terms of various quantitative indicators. Therefore, due to sparse constraints in space and time, and combining traditional constraint methods with deep learning, the proposed algorithm improves the image reconstruction ability and compensates for the limitations of pure deep learning algorithms that require big data. We can train a better reconstruction model with less sample data than D5C5 algorithm which requires a large amount of data. In addition, in the reconstruction time, the proposed method reconstructs an image for less than 3s, the reconstruction time of the k-t SLR is about 200s, and the D5C5 is within one second. In terms of time, although it is a little slower than the D5C5, it is still very fast.

Conclusion
This work proposed a model-based convolutional dynamic MR imaging framework. The framework is able to fully exploit the redundancy in spatial and temporal of dynamic MR images. The experiment results have shown that the proposed method can reconstruct better MR images than the state-of-the-art algorithms in a shorter time under the same acceleration factors.