AN AUGMENTED LAGRANGIAN METHOD FOR SOLVING A NEW VARIATIONAL MODEL BASED ON GRADIENTS SIMILARITY MEASURES AND HIGH ORDER REGULARIZATION FOR MULTIMODALITY REGISTRATION

. In this work we propose a variational model for multi-modal image registration. It minimizes a new functional based on using reformulated normalized gradients of the images as the ﬁdelity term and higher-order derivatives as the regularizer. We ﬁrst present a theoretical analysis of the proposed model. Then, to solve the model numerically, we use an augmented Lagrangian method (ALM) to reformulate it to a few more amenable subproblems (each giving rise to an Euler-Lagrange equation that is discretized by ﬁnite diﬀerence methods) and solve iteratively the main linear systems by the fast Fourier transform; a multilevel technique is employed to speed up the initialisation and avoid likely local minima of the underlying functional. Finally we show the convergence of the ALM solver and give numerical results of the new approach. Comparisons with some existing methods are presented to illustrate its eﬀectiveness and advantages.


1.
Introduction. Image registration consists in finding a reasonable spatial geometric transformation between given two images of the same object taken at different times or acquired using different devices. It is a challenging task required in diverse fields of astronomy, optics, biology, chemistry, medical imaging and remote sensing and particularly in medical imaging. For an overview of image registration methodology and approaches, we refer to [11,23,24,31]. Here, we focus on deformable image registration for multi-modality images using variational approaches which belong to the class of the widely used methods ( [2,5,7,14,21,22,36]) and aim to find a better gradients-based model than the standard gradient models.
It is informative to illustrate the notation of the image registration modelling by considering a pair of mono-modal images: Given a fixed image (also called reference) and a moving image (also called template), which are represented by scalar functions T, R : Ω ⊂ R d −→ R, find a reasonable geometric transformation ϕ(u)(x) = x + u(x), u : R d −→ R d such that: (1) T [ϕ(u)] = T (x + u(x)) = R.
This is an equation of the unknown displacement field u, which is supposed to be sought in a properly chosen functional space. The reconstruction model (1) is an illposed inverse problem and thus regularisation techniques are needed to overcome ill-posedness. Generally, the regularisation technique turns an ill-posed problem such as model (1) into a well-posed one which minimizes an energy compromised of a regularisation term (mostly a semi-norm of a functional space that is fixed a priori) and a data fidelity term. In summary, the desired displacement u, in some appropriate space H, is a minimizer of the following joint energy functional: This model may be used for registering both mono-modality and multi-modality images.
Here in (2), the first term S(u) is a regularisation term which controls the smoothness of u and reflects our expectations by penalising unlikely transformations. Many works tackled the question of how to choose the "best" regularisation term that gives the more possible plausible transformation. Various regularizers have been proposed, such as first-order derivatives based on total variation [4,16], diffusion [9] and elastic regularizer registration models and higher-order derivatives-based on linear curvature [10], mean curvature [6] and Gaussian curvature [17] models; we can refer also to [5,22,39,40,41].
The second term D(T (u), R) is a fidelity measure, which quantifies distance or similarity of the transformed template image T (u) and the reference R, whereas λ is a positive weight which controls the trade-off between them. In the case of monomodal images, the fixed and the moving images have similar features and same intensity ranges. Thus, either the L 1 − distance (Sum of Absolute Differences) D = T − R 1 or the well-known choice L 2 − distance (Sum of Squared Differences) between R and T (u) i.e. D = T − R 2 2 = Ω (T (u) − R) 2 dx may be used as a similarity measure. Clearly such a measure only makes sense for mono-modal images.
For a pair of multi-modal images T, R (generated from independent imaging techniques), unfortunately, one cannot minimize T − R since values of T, R are not directly comparable. That is, only the patterns of T, R bear some resemblance to each other, not their values (so called intensity values). Therefore, intensities of the same object in different images are not similar which makes the problem much harder than the mono-modality case. Hence many good models as from [23] for mono-modal images and also the elegant mathematical approach of optimal transport [8] cannot be used. For multi-modal images, various similarity measures have been used and include Mutual Information [20,26,33] and Normalised Gradient Field [15,18,30]. Recently [3] proposed a cross-correlation similarity measure based on reproducing kernel Hilbert spaces and found advantages over Mutual Information. Below, we briefly review these two commonly used measures: mutual information and normalized gradient fields.
Mutual Information (MI). It takes its origin from the theory of information and was firstly proposed in [33]. Several variants of MI approach were proposed in recent years (see [20,26]), showcasing its great capability as well as limitations. The basic idea behind MI is the comparison of the histograms of the two images instead of comparing their intensities. The Mutual information between the two images if given by the following quantity: where p R , p R are probability distributions of the gray values in R and T , whereas p T,R is the joint probability of the gray values which can be derived from the joint histogram. As the MI measure involves histograms, its inherent disadvantages are how to choose the size of bins and how to remedy the lack of spatial relationships to avoid mis-registrations. In addition, the measure also fails when features with different intensities in the first image have similar intensities in the second one [19], which is the case in perfusion imaging. Normalised Gradient Field (NGF). The basic idea of the Normalised Gradient Field (NGF) [15,18,30] is the use of a derived information from the image intensity, i.e, the gradient. Similarity measures depending in the gradients or geometry of the images, which naturally encode information about the shape, can be better. The key idea behind the NGF measure is to align the gradients ∇T (u) and ∇R by minimizing the cosines distance between them. More precisely, on each point x ∈ Ω, try to find a displacement u(x) such that cos Θ = 1 where Θ is the angle between ∇T (x + u(x)) and ∇R(x). Therefore, the NGF consists in minimization of the following energy: where ∇ n T (u) = ∇T (u)/ ∇T (u) and ∇ n R = ∇R/ ∇R are normalised unit vectors. As the NGF uses the product scalar between the two vectors ∇ n R and ∇ n T (u), it will not work well when the gradients are null or very weak. In other words, suppose that in a large region of the image T , we have ∇T ⊥ ∇R and then 1 − (∇ n T · ∇ n R) 2 ) ≈ 1, which means that solving the optimization problem (2) is equivalent to only smoothing the deformation u in this region whereas the similarity measure does not play a role in the energy, which is not reliable. As an example, we consider the images in the Fig 1(a-b) where ∇ n T · ∇ n R = 0 a.e in Ω due to one of ∇ n T, ∇ n R being zero, we see that if we use the NGF in (2), there is no change in the template image because of the reason mentioned before, so T (u), obtained using the NGF as measure, shown in Fig 1(c) is not correct. If we use the ratio #N of the number of pixels where ∇ n T · ∇ n R = 0 over the total number of pixels, we have observed the current NGF would not give a good registration result if #N ≤ 25%. In this work, believing in the elegance of geometric fitting, we aim to improve the above NGF for these cases. we are primarily motivated to explore the potential of normalised gradients beyond its standard form. Our question is whether or not a better normalised gradients-based model than the well-known form [15,18,30] is possible.
The outline of the paper is as follows. In Section 2, we propose our variational model which minimizes an energy with new similarity measures and we prove by variational techniques the existence of a minimizer. Section 3 is dedicated to the numerical solution of the proposed model by an augmented Lagrangian approach and analysis of convergence. Finally, Section 4 concerns the implementation and the presentation of several numerical examples to test the efficiency and robustness of the proposed approach.
2. The new multi-modality model. Since our formulation consists of two building blocks: a similarity measure D and a regularization term S, we now discuss our choice of regularizers and the distance measure. Because our emphasis is on the latter, almost all regularizers suitable for variational registration models of monomodal images may also be used. Figure 1. Example of Reference and Template images where ∇ n T · ∇ n R = 0 (or one of ∇ n T, ∇ n R is zero) a.e in Ω.
Choice of similarity measure. To motivate our proposed measure D, consider the NGF example in Fig 1. For this specific example, note that where ∇ n T · ∇ n R = 0 we have ∇ n T − ∇ n R = 0. This suggests a revised NGF model with the new measure D GF replacing the standard NGF measure D N GF : As expected, such a model can solve the example from Fig 1(a-b) with acceptable registration result shown in Fig 1(d). This suggests that a better choice of normalised gradients as similarity measure is possible for multi-modal registration scenario. Moreover, to enhance this idea, we use Fig. 2 to show that alignment of two vectors X = ∇T, Y = ∇R from a large discrepancy on the left to the small discrepancy on the right amounts to minimization of the distance |X|+|Y |−|X +Y | (which is similar to minimizing cos θ(X, Y ) as in D N GF ). Below we shall combine the ideas of minimizing both |X − Y | and |X| + |Y | − |X + Y |. Figure 2. Three examples of the triangle inequality for triangles with sides X, Y and Z. The left example shows a case where |Z| is much less than the sum |X| + |Y | of the other two sides, and the right example shows a case where |Z| is only slightly less than |X| + |Y |.
Choice of a regularizer. As mentioned, there is a large class of possible regularizers that we could choose from. Here we choose a robust regulariser that allows large and smooth deformation, comprised of both first order and second order derivatives for the deformation field. Based on the new measure, we propose to register the two functions R, T from different image modalities by solving the following minimization problem: where TM(T (u), R) = (|∇T (u)| + |∇R| − |∇T (u) + ∇R|) 2 . Here in the term D GF , we must use the normalized gradients rather than the usual gradients because the difference in the magnitude of gradients of R and T (u) is large in multimodality images. Moreover, we can easily prove that minimizing the length of TM(T (u), R) = |∇T (u)| + |∇R| − |∇T (u) + ∇R| is equivalent to minimize the angle θ between the vectors ∇T (u) and ∇R, which leads to the alignment of the edges of R and T (u); note that an alternative to minimizing the above TM is to minimize TM n (T (u), R) = |∇ n T (u)| + |∇ n R| − |∇ n T (u) + ∇ n R| based on normalized gradients. However, this will lead to a more difficult problem to solve numerically due to higher non-linearity. Our primary choice for regularization is the diffusion model [9] which uses first-order derivatives and promotes smoothness. As affine linear transformations are not included in the kernel of the H 1 -regularizer, we desire a regularizer which can penalize such transformation. As such, we add the regularizer based on second-order derivatives (LLT) to the model which allows to remove the need of any preregistration step of affine transformation. The secondorder derivatives allows also getting smooth transformations [41]. The constraint C(u) > 0 on the determinant in the minimization problem (5) guarantees that the resulting deformation field ϕ = x + u suffers no mesh folding and thus is physically plausible; see also [12,13,28]. Different alternatives were proposed to ensure invertibility by adding another regularisation term depending on the determinant of the transformation to the registration objective function; see [2].
Mathematical analysis of the proposed model. Most registration models are non-convex with respect to u and consequently, if solutions exist, there are local minimizers or solutions are generally not unique. Below we prove the existence of a minimizer for problem (5). Before stating the main result, we first consider the concept of Carathéodory functions.
We will use some theory about integrals of higher-order. It also sets up assumptions with which our optimisation problem (5) admits a minimiser.
To analyse the proposed model (5), it is convenient to rewrite the energy J (·) by merging all terms under one integral in the following form: To apply the Lemma 2.2, we assume that |∇R| and |∇T (u)| are bounded almost everywhere by a constant c > 0. Then, we have the following result: The energy functional J (·) is coercive and wlsc in W.
Proof. The coercivity can easy obtained using the Poincaré inequality. In fact, the later guarantees that defines a norm in the space W. Using the positivity of D GF (T (u), R) and D T M (T (u), R), we have: , which directly gives the coercivity of J (·). For the weak lower semi-continuity, we now verify that the functions f (·) fulfils the assumptions in Lemma 2.2: i) Since the gradient of the fixed and the moving image ∇R and ∇T (u) are assumed to be continuous, f (·) is Carathéodory function. ii) It is easy to check that f (x, u, ψ, Θ) are convex with respect to Θ, clearly implying that it is quasi-convex.
iii) For condition (iii), we have |∇ n T (u)| ≤ 1 and |∇ n R| ≤ 1, which means that: Moreover, using the fact that |∇R| and |∇T (u)| are bounded almost everywhere by a constant c > 0, we get Therefore, using inequalities (9) and (10), we have: Then, the function f (·) fulfils the condition (iii) of Lemma 2.2 with a(x) ≡ λc 2 + 2λ which implies that the energy J (·), is wlsc in W.
We are now ready to prove the existence of a solution for the minimization model (5). Based on Lemma 2.2 and Lemma 2.3, we have the following result: Proof. Consider a minimizing sequence (u n ) n ⊂ A of J (·) , i.e., The coercivity of J (·) guarantees that the sequence (u n ) n∈N is uniformly bounded W. Thus, there exists a subsequence, still denoted (u n ) n∈N , such that u n n→∞ u weakly in W. Using the weak lower semi-continuity of J (·), we obtain that the limit u is a minimizer of J (·). It remains to prove that u fulfils the constraint C(u) > 0. Now, we show that A is weakly closed subset of W. Let u k be a weakly convergent sequence to u in W. From the definition of the space W, we have that u k is weakly convergent to u in W 1,2 (Ω) and u k is weakly convergent to u in W 2,2 (Ω). Moreover, as the sets weakly closed for W 1,2 -topology and W 1,2 -topology (see [27]), respectively, we get that u ∈ A 1 and u ∈ A 2 . Then, the limit u belongs to the intersection A = A 1 ∩ A 2 and thus A is weakly closed. Therefore, the minimizer u belongs to the set A, i.e., C(u) ≥ > 0, which finishes the proof.
3. Augmented Lagrangian method (ALM). The energies J (·) are highly nonlinear, and their numerical resolution is a non-trivial task. Thus, we propose an Augmented Lagrangian Method (ALM) which is often used for solving constrained minimization problems by replacing the original problem by an unconstrained problem. The method is similar to the penalty method where the constraints are incorporated in the objective functional and the problem is solved using alternating minimization of the sub-problems; see [1,29,32,35,43,44] for various successful applications.

3.2.
Discretization and sub-problems. The images and the displacement fields are discretized on a uniform mesh using vertex centred discretization. We assume that the discrete solution u i,j = u(x i , y j ), i = 1, · · · , l, j = 1, · · · , c have l × c pixels, where l and c are the numbers of rows and columns in the image, respectively. Other quantities are set up similarly. For sake of simplicity, we use a generic notation u for discussing discretization. For the discrete differential operators, we assume periodic boundary conditions for u. By choosing periodic boundary conditions, the action of each of the discrete differential operators can be regarded as a circular convolution of u and allows the use of fast Fourier transform (see [25,34,38] for more details). The discrete gradient is an operator from R l×c to R, given by ∇u = (∂ x u, ∂ y u) where ∂ x and ∂ y are forward difference operators defined as follows: The discrete divergence is an operator from R l×c to R and, for n = (n 1 , n 2 ), given by div n = ← − ∂ x n 1 + ← − ∂ y n 2 where backward difference operators are defined by Then, the discrete Laplace operator is given by ∆u = div (∇u). Similarly, we define the following (forward and backward) second-order discrete differential operators: where ∂ yx u = ∂ xy u and ← − ∂ yx u = ← − ∂ xy u. Based on the above operators, we define the following fourth-order differential operator: Thus the first version of an ALM algorithm is shown in Algorithm 1.
The u-subproblem. Fixing K k , p k , n k , m k and λ k i (i = 1, . . . , 5), the u-subproblem consists in finding u k+1 from solving the following minimization problem: It is clear that the above minimization problem admits at least a solution u = (u 1 , u 2 ) by solving the following system of PDEs in Ω: , λ k 5 ) = 0 with the periodic boundary conditions on ∂Ω. To solve the previous non-linear PDEs, we use a fast time marching method, i. e., find u k+1 = (u k+1 where dt is the time step, u k+1 old is the solution at the previous iteration for the time marching method and To solve the above fourth-order equations in each time step iteration, we use the 2-dimensional discrete Fourier transforms. In fact, we have: , and L F(u k+1 2 ) = F(F 2 (u k+1 old )), where L = I + αdtF(∆·) + α 1 dtF(div 2 .∇ 2 ·). The operator F(·) is the Fourier transform and " " means point-wise multiplication of matrices. Therefore, the discrete solutions u 1 and u 2 can be obtained by applying the inverse of the discrete two-dimensional Fourier transform to the previous equation and we have: where " " means point-wise division of matrices.
Remark 1. We emphasizes that computing the determinant is a non-trivial task. A discretization which well ensures that the map is diffeomorphic is discussed in [2,13] and is based on finite element method. In our case, we are not using this discretization in the numerical computation as we are solving a system of PDEs defined only on the nodal points. However, the discretization is used for computing the determinant after getting the solution to check if the obtained map is diffeomorphic.
The K-subproblem. Fixing u k+1 , p k , n k , m k and λ k i (i = 1, · · · , 5), the K-problem involves the minimization of the following energy: This minimization problem is solved through its optimality condition: (24) − r 2 ∆K k+1 + r 1 K k+1 = r 1 T (u k+1 ) − r 2 div p k − div λ k 2 + λ k 1 . We take advantage from the use of the 2-dimensional discrete Fourier transforms to compute K. In fact, applying the Fourier transforms to where " " means point-wise multiplication of matrices, RS is the right side of (24) and LS = −r 2 F(∆·) + r 1 I.
Therefore, the discrete solution is given by: where F −1 (·) is the inverse of the discrete two-dimensional Fourier transform.
The p-subproblem. Fixing u k+1 , K k+1 , n k , m k and λ k i (i = 1, · · · , 5), the psubproblem consists in minimizing, w.r.t., p, the following energy: It is challenging to solve the above p-minimization problem due to the non-differentiability of |p| in the quadratic term. To alleviate this situation, we consider a fixedpoint formulation by lagging |p k |n k in the k th iteration instead of the constraint p = |p|n k . Thus, a simple reformulation rewrites the above problem as an equivalent minimization problem: where the quantity Res does not depend on p, β = −λ k 3 · n k − λ(|m k | − |∇R|) and The minimization problem (27) has a closed from solution which is explicitly given by the following shrinkage-like formula: The n-subproblem. Fixing u k+1 , K k+1 and p k+1 and λ k i (i = 1, · · · , 5), the nproblem consists in solving the following minimization problem: The above problem has a closed from solution which is is explicitly given by: The m-subproblem. To find the optimal value of m k+1 , we solve the following optimisation sub-problem: The above problem is equivalent to minimizing the following energy: where both Res and C do not depend on p, with C given by: The solution is explicitly given by:

Lemma 3.1 ([37]
). Let f : R → R be a closed, proper and convex function. Let (w n ) n∈N be a sequence of distinct functions in dom f converging to w * ∈ int(domf ) and let S n ∈ ∂f (w n ). Then there exists a subsequence (S n k ) k∈N that converges to the point S * , where S * ∈ ∂f (w * ).
In the sequel, we give a partial result about the limit behaviour of the solutions generated by the ALM method. Let us consider the space: Proposition 2. If the sequence (u k , K k , p k , n k , λ k 1 , λ k 2 , λ k 3 , λ k 4 , λ k 5 ) ∈ X , generated by the ALM method, converges to a point (u * , K * , p * , n k , λ * 1 , λ * 2 , λ * 3 , λ * 4 , λ * 5 ) ∈ X , then the limit point satisfies the following first-order optimality conditions: is a stationary point of model (5).
Proof. By (15), (16), (17) and (18), we have: From (18), we get: (37) 0 = lim(λ k+1 5 − λ k 5 ) = lim max{−λ k 5 , −σF (u k+1 )} = min{λ * 5 , σF (u * )}. Back to the optimality condition for the u-subproblem in (21), taking the limit in (21) over k and considering equalities (34) and (36), we get: Now, we consider the optimality conditions for the K-subproblem and take the limit over k: where we used div∇K * = ∆K * . Using the equalities (33) and (34), ∇K * − p * = 0 and K * − T (u * ) = 0. Then divλ * 2 − λ * 1 = 0. The optimality condition for the modified p-subproblem (27) leads to: where S k+1 p ∈ ∂|p k+1 | and β is given in (28). By Lemma 3.1, there exists a subsequence, still denoted by S k p ∈ ∂|p k |, converging to S * p ∈ ∂|p * |. Taking the limit over k and taking into account equalities (34) and (35), we obtain: For the n-subproblem, the optimality conditions give: λ(n k+1 − ∇ n R) + r 3 (p k+1 − |p k+1 |n k+1 ) 2 + λ k 3 dx = 0 Considering the limit over k (35), we get: The same analysis applied to the optimality condition for the m-subproblem (31) leads to the equality: Finally we remark on getting the initializations by a multiresolution technique, also to avoid local minima and to speed up registration. We use a scale space approach by resizing the original images to a sequence of coarser ones where computations are cheap and register these smaller images (see Fig. 3). Then starting from the coarsest level, we interpolate the obtained transformation fields to get a starting guess on finer (next) levels until the original resolution on the finest level is reached. In this section, we assess the performance of the proposed model (denoted by "New Model" below) and its algorithm. We compare the proposed model with two other multimodality models: • A MI model (denoted by MI below) that combines the regularizer (6) and the MI similarity measure (3); • A NGF model (denoted by NGF below) that combines the regularizer (6) and the standard NGF similarity measure (4). To measure the quality of the registered images, the following quantity (38) GF er = F (∇T (u), ∇R) F 0 is used as the relative reduction of the dissimilarity, where for two vectors x = (x 1 , x 2 ) and y = (y 1 , y 2 ), we have Here, F 0 = F (∇T (u), ∇R) if u = 0. For additional criteria to measure the goodness of registration, we also use the relative normalized gradient fields For all the numerical experiments presented here, we summarise the comparative results in a table where we give the error computed using formulas (38)- (39). To measure mesh validity, we compute C(u) = det (I + ∇u) from (5) and monitor if it is positive. In order to reduce the number of parameters to tune, we set r 1 = 5 , r 2 = 10 and r 3 = r 4 = 100 in all numerical experiments unless stated otherwise. We consider N max = 70 the maximum number of iterations for New Model from Algorithm 2 and we stop the iterations before reaching N max = 70 if the following stopping criterion is satisfied for a given tolerance τ = 10 −3 , where l and c are the numbers of rows and columns in the image. Though we can use all equations from Algorithm 1 to stop iterations, we find that the above stopping criterion based on its 4th equation is sufficient as it includes information about the gradients of both images. Thus, it can control the ALM iterations and the quality of registration at the same time. For each variable u 1 and u 2 , we computed the residual via finite differences approximation and the global residual is taken as the sum. The residual is given by the quantity , u kij = (u kij 1 , u k 2 ) and u kij Example 1. In the first example, we consider a synthetic image to illustrate the type of images where mutual information (MI) and the normalized gradient field (NGF) models are at disadvantages. We obtain a good result using New Model as seen in Fig.4. Here, the NGF and MI models were tested for different regularization parameters. The optimal choices are considered by making different tests where we set α 1 = 1, α = 0.01α 1 and we vary λ such that α1 λ ∈ {10 −5 , 5 × 10 −5 , 10 −4 , 5 × 10 −4 , 10 −3 , 10 −2 } for MI, and α1 λ ∈ {2.5, 2, 1.5, 1, 0.5, 0.1} for NGF. The optimal parameters were α1 λ = 10 −4 and α1 λ = 10 −4 = 0.5 for MI and NGF, respectively. They were chosen such that the registered image is very close to the reference and the transformations does not suffer from mesh folding. For comparison, we used the Jaccard similarity coefficient (JSC) which is defined as follows: where S T and S R represent, respectively, the segmented regions of interest (with red contour) in the deformed template (after registration) and the reference.
Examples 2 and 3. In Fig 5,   Example 4. In Fig. 8, we present the result of registering two diffusion-MRI images of size 256×256 with respectively high and low b-value diffusion. Since the intensity values for different b-values are not comparable, conventional non-modality registration models (that rely on matching the images based on the intensity values) will fail. We show the registration results by our compared 3 models in Fig. 8. We notice that NGF and MI models give comparable results. However, our New Model gives the best result comparing to the other two and visually, the reference and the transformed template are well aligned in all regions. Since C(u) > 0, all transformed grids have no mesh folding.   Example 5. In the next experiment in Fig. 9, our aim is to investigate capabilities of the proposed models for registration of MRI-T1 and MRI-T2 images in higher  Fig. 6. Example 3 zoomed in the red squares (see Fig. 6): From left to right; Zooms in the reference R and the registered T (u) using New model, NGF and MI, respectively. resolution 512 × 512. We can observe from overlaying of the registered and the reference images that all models work fine in producing acceptable registration results, however the registered result by New Model produces the best alignment in all parts and gives the better similarity value than NGF (here identical to MI). We also show the resulting transformed grids for all models where there is no mesh folding due to C(u) > 0. For the above 4 examples (Ex.2-Ex.5), in Fig 10, we display the evolution of the error versus the ALM iteration to the final solution. We also plot the evolution of the residual for the energy (5) as a function of ALM iterations. Here we see that our ALM algorithm converges though the convergence is not monotone. Example 6. Example 6 tests the registration of a MRI image to a PET with much noise In Fig. 12, we present the results obtained using the New Model, NGF and MI. Clearly, New Model and MI perform better than NGF in this case and in particular the New Model performs the best (even though it is slightly better than MI Model). We display an overlaying of the registered and the reference images which shows that the registered result by New Model produces the best alignment.
Regularisation parameters dependence test. In Table 3, we compare the sensitivity of the proposed model with respect to varying the ratio α1 λ . The model was tested on Example 2 where we set α 1 = 1, α = 0.01α 1 and we vary λ for all experiments. We can see a clear process of the changes of the relative error where the best error is obtained for α1 λ = 0.017 and the error increases as the ratio decreases more than 0.017.  Fig. 6 5. Conclusions. Image registration is an increasingly important and often challenging image processing task with a broad range of applications such as in astronomy, optics, biology, chemistry and medical imaging. In this paper to improve     term which combines first-and second-order derivatives of the displacement. After showing the solution existence, we present a fast ALM for its numerical implementation. Experimental tests confirm that our proposed model performs better in multi-modality images registration than compared models. It is pleasing to see much improved results over established models within the same modelling framework. Future work will consider generalizations to 3 dimensions and registration of images that do not have dominant gradients.  Table 2. Registration results of the different models for processing Examples 1-6. The errors are computed using formula (38), (40) and (39). Here, #N is the ratio of the number of pixels where ∇ n T · ∇ n R = 0 over the total number of pixels, whereas #G is the ratio of number pixels where GF(T, R) + TM(T, R) = 0 over the total number of pixels.  Table 3. Registration results for α1 λ -dependence tests of New Model for processing Example 3. The relative errors are computed using the normalized gradient fitting formula (38). In all cases, we set α = 0.01α 1 .