Quasiconformal model with CNN features for large deformation image registration

This work was supported in part by the National Science Foundation under Grant No. DMS-2002103 (to Gary P. T. Choi), and HKRGC GRF under project ID 14305919 (to Lok Ming Lui)

  • Image registration has been widely studied over the past several decades, with numerous applications in science, engineering and medicine. Most of the conventional mathematical models for large deformation image registration rely on prescribed landmarks, which usually require tedious manual labeling. In recent years, there has been a surge of interest in the use of machine learning for image registration. In this paper, we develop a novel method for large deformation image registration by a fusion of quasiconformal theory and convolutional neural network (CNN). More specifically, we propose a quasiconformal energy model with a novel fidelity term that incorporates the features extracted using a pre-trained CNN, thereby allowing us to obtain meaningful registration results without any guidance of prescribed landmarks. Moreover, unlike many prior image registration methods, the bijectivity of our method is guaranteed by quasiconformal theory. Experimental results are presented to demonstrate the effectiveness of the proposed method. More broadly, our work sheds light on how rigorous mathematical theories and practical machine learning approaches can be integrated for developing computational methods with improved performance.

    Mathematics Subject Classification: 65D18, 68U05, 68U10, 68T07.


  • Figure 1.  An illustration of how the Beltrami coefficient $ \mu $ determines the conformality distortion. Under a quasiconformal map $ f $, an infinitesimal circle around a point $ p $ is mapped to an infinitesimal ellipse centered at $ f(p) $, where the major axis length and the minor axis length are given by $ |f_z(p)|(1+|\mu(p)|) $ and $ |f_z(p)|(1-|\mu(p)|) $, respectively. Therefore, the maximal dilation of $ f $ is $ K(f) = \frac{1+||\mu||_{\infty}}{1-||\mu||_{\infty}} $. Also, the orientation change of the major axis of the ellipse is given by $ \arg(\mu(p))/2 $

    Figure 2.  An illustration of receptive field

    Figure 3.  The process of obtaining feature vectors from the images. The two images on the left are the moving image and the fixed image. We first partition both images into smaller patches, and then feed each patch into the truncated classification network to obtain a 3D array of size $ m \times n \times d $, where $ m, n $ are the number of receptive fields along the width and height of the input image depending on stride, kernel and padding size, and $ d $ is the dimension of the feature vector depending on the architecture of the network. We can then transform this 3D array into a vector by vertically stacking along one direction. Here we partition each image into $ 3\times 3 = 9 $ patches for illustrative purposes, and hence 9 vectors in $ \mathbb{R}^{mnd} $ are produced for each of the two images as shown on the right. In practice, a finer partition is often used

    Figure 4.  An illustration of the multiresolution Scheme. We first coarsen both input source and target images as shown in Fig. 4a and Fig. 4b. Then, we run our proposed algorithm on this coarsen pair. From the obtained mapping on this coarsest level, with linear interpolation, we warp the source image on second level to yield Fig. 4c and register it against Fig. 4d using our proposed algorithm. Finally, with the mapping from the last level, we interpolate it to warp the source image on the finest level as shown in Fig. 4e and register it against Fig. 4f. Refer to Fig. 6 for the final registration result

    Figure 5.  The 'Z' to '2' example

    Figure 6.  The eagle example

    Figure 7.  The rabbit example

    Figure 8.  The first hand X-ray example

    Figure 9.  The second hand X-ray example

    Figure 10.  The lung CT example

    Figure 11.  The chest CT example

    Table 1.  The performance of different image registration methods for various synthetic and real medical images. Here, $ E_{\text{sim}} $ measures the accuracy of the registration mapping as described in Equation (37), $ E_{\text{smooth}} $ measures the smoothness of the mapping as described in Equation (38), $ E_{\text{total}} $ measures the overall quality of the mapping as described in Equation (39), and the number of flips reflects the bijectivity of the mapping. For each example and each measure, the best entry among all methods is in bold

    ${{bf Method}}$ Results ($E_{\text{sim}}$/$E_{\text{smooth}}$/$E_{\text{total}}$/#Flips)
    'Z' to '2' (Fig. 5) Eagle (Fig. 6) Rabbit (Fig. 7) Hand 1 (Fig. 8) Hand 2 (Fig. 9) Lung (Fig. 10) Chest (Fig. 11)
    Our method 0.3099 0.4476
    0.7575 0
    0.0916 0.1558
    0.2474 0
    0.1953 0.1436
    0.3389 0
    0.1417 0.1075
    0.2492 0
    0.1317 0.1536
    0.2853 0
    0.2435 0.4774
    0.7210 0
    0.0368 0.1414
    0.1783 0
    DDemons[47] 2.0660 0.0481
    2.1141 121
    0.3724 0.2470
    0.6194 3697
    0.1806 0.1278
    0.3083 174
    0.4528 0.2651
    0.7180 6036
    0.4273 0.2718
    0.6991 8925
    0.6332 0.2271
    0.8602 4191
    0.0725 0.3565
    0.4290 16316
    LDDMM[5] 2.0488 0.2941
    2.3629 105
    0.3570 0.2388
    0.5968 26
    0.5973 0.0738
    0.6711 1
    0.7018 0.1842
    0.8860 0
    0.9563 0.1902
    1.1465 6
    0.6843 0.2568
    0.9410 209
    0.2446 0.0717
    0.3164 0
    Elastix[30] 3.6377 0.9185
    4.5562 68280
    0.1324 0.1779
    0.3103 1158
    0.1808 0.1679
    0.3487 0
    0.1103 0.1513
    0.2617 0
    0.1856 0.1406
    0.3263 0
    0.3432 0.3203
    0.6635 3579
    0.0530 0.1408
    0.1938 0
    DROP[19] 2.0554 0.3084
    2.3638 2515
    0.2509 0.1518
    0.4027 0
    0.1788 0.1634
    0.3423 0
    0.3021 0.1186
    0.4207 0
    1.2514 0.1406
    1.3920 0
    0.5065 0.6193
    1.1259 3663
    0.2415 0.1841
    0.4256 0
