\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

In Focus - hybrid deep learning approaches to the HDC2021 challenge

  • *Corresponding author: Clemens Arndt

    *Corresponding author: Clemens Arndt 
Abstract / Introduction Full Text(HTML) Figure(8) / Table(3) Related Papers Cited by
  • In this work, we present our contribution to the Helsinki Deblur Challenge 2021. The goal of the challenge was to recover images of sequences of letters from progressively out-of-focus photographs. While the blur model was unknown, a dataset of sharp and blurry images was provided. We propose to tackle this problem in a two-step process: (i) the blur models are first extracted and estimated from the provided dataset, and (ii) then incorporated into the reconstruction process. Here, we present three different ways of integrating the estimated model into learning-based methods: (i) an educated deep image prior employing the estimated model in the loss function, (ii) a learned iterative approach that directly employs the estimated model in the architecture and (iii) a fully learned approach where we used the estimated model to simulate additional training data. These three models are improved versions of our original contributions to the challenge. We compare and benchmark them on the released test set of the HDC2021.

    Mathematics Subject Classification: Primary: 68T07, 45Q05.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Letters and digits of the fonts Times and Verdana. Both are shown in the same font size

    Figure 2.  The figure (a) is taken from [13], published under CC BY 4.0 license (https://creativecommons.org/licenses/by/4.0/deed.en)

    Figure 3.  First row: Learned blurring kernels $ k $ for blurring steps $ 4, 9, 14 $, and $ 19 $. Second row: Visualization of the learned lens distortion model $ D $ for blurring steps $ 4, 9, 14 $, and $ 19 $

    Figure 4.  Strategy of the educated DIP (EDIP)

    Figure 5.  An outline of the learned iterative architecture. After each update on scale $ l $ the output is upsampled and passed to the next scale

    Figure 6.  Reconstruction of the LGD for one image from the sanity check. The model without pretraining on Urban100 shows how the model extrapolated text in the reconstruction

    Figure 7.  Four examples with different blur levels from the Times and Verdana test set. From left to right: Ground truth, blurry measurement, U-Net/EDIP, LGD, and StepNet reconstruction

    Figure 8.  Reconstructions on one sanity test image. From left to right: Ground truth, blurry measurement, U-Net/EDIP, LGD, and StepNet reconstruction

    Table 1.  Results for U-Net, LGD, EDIP and StepNet on the test set for the two font types on four selected blur levels. The OCR accuracy is calculated w.r.t. the middle row of the reconstruction. We report the mean and standard deviation calculated over the 20 test images in each set

    Times Level 4 Level 9 Level 14 Level 19
    OCR U-Net $ 91.00 \pm 21.42 $ $ 81.15 \pm 15.43 $ $ 38.55 \pm 13.62 $ $ 19.30 \pm 11.45 $
    LGD $ 85.05 \pm 21.72 $ $ 81.80 \pm 14.43 $ $ 71.15 \pm 17.54 $ $ 42.95 \pm 17.66 $
    EDIP $ 91.00 \pm 21.42 $ $ 81.15 \pm 15.43 $ $ 38.55 \pm 13.62 $ $ 19.30 \pm 11.45 $
    StepNet $ 87.50 \pm \phantom{0}8.96 $ $ 65.00 \pm 14.43 $ $ 51.30 \pm 13.27 $ $ 24.15 \pm 13.20 $
    OCRI U-Net $ 92.00 \pm 21.70 $ $ 98.00 \pm \phantom{0}4.00 $ $ 68.75 \pm 13.69 $ $ 21.40 \pm 12.34 $
    LGD $ 96.75 \pm \phantom{0}3.96 $ $ 94.50 \pm \phantom{0}6.69 $ $ 85.95 \pm 16.80 $ $ 63.65 \pm 14.98 $
    EDIP $ 92.00 \pm 21.70 $ $ 98.00 \pm \phantom{0}4.00 $ $ 68.75 \pm 13.69 $ $ 21.40 \pm 12.34 $
    StepNet $ 97.25 \pm \phantom{0}5.12 $ $ 94.75 \pm \phantom{0}6.61 $ $ 84.15 \pm 14.31 $ $ 23.85 \pm 10.37 $
    Verdana Level 4 Level 9 Level 14 Level 19
    OCR U-Net $ 91.25 \pm 21.90 $ $ 96.75 \pm \phantom{0}4.55 $ $ 68.65 \pm 13.31 $ $ 20.90 \pm 11.34 $
    LGD $ 96.00 \pm \phantom{0}5.62 $ $ 94.25 \pm \phantom{0}5.76 $ $ 86.45 \pm 15.15 $ $ 62.40 \pm 15.22 $
    EDIP $ 91.25 \pm 21.90 $ $ 96.75 \pm \phantom{0}4.55 $ $ 68.65 \pm 13.31 $ $ 20.90 \pm 11.34 $
    StepNet $ 96.00 \pm \phantom{0}6.44 $ $ 94.00 \pm \phantom{0}7.35 $ $ 80.15 \pm 16.09 $ $ 22.95 \pm \phantom{0}9.89 $
    OCRI U-Net $ 92.20 \pm 21.70 $ $ 98.00 \pm \phantom{0}4.00 $ $ 68.75 \pm 13.69 $ $ 21.40 \pm 12.34 $
    LGD $ 96.75 \pm \phantom{0}3.96 $ $ 94.50 \pm \phantom{0}6.69 $ $ 85.95 \pm 16.80 $ $ 63.65 \pm 14.98 $
    EDIP $ 92.20 \pm 21.70 $ $ 98.00 \pm \phantom{0}4.00 $ $ 68.75 \pm 13.69 $ $ 21.40 \pm 12.34 $
    StepNet $ 97.25 \pm \phantom{0}5.12 $ $ 94.75 \pm \phantom{0}6.61 $ $ 84.15 \pm 14.31 $ $ 23.85 \pm 10.37 $
    Combined Level 4 Level 9 Level 14 Level 19
    OCR U-Net $ 91.13 \pm 21.66 $ $ 88.95 \pm 13.79 $ $ 53.60 \pm 20.08 $ $ 20.10 \pm 11.42 $
    LGD $ 90.53 \pm 16.78 $ $ 88.03 \pm 12.63 $ $ 78.80 \pm 18.09 $ $ 52.68 \pm 19.14 $
    EDIP $ 91.13 \pm 21.66 $ $ 88.95 \pm 13.79 $ $ 53.60 \pm 20.08 $ $ 20.10 \pm 11.42 $
    StepNet $ 91.75 \pm \phantom{0}8.88 $ $ 79.50 \pm 18.48 $ $ 65.73 \pm 20.63 $ $ 23.55 \pm 11.68 $
    OCRI U-Net $ 92.50 \pm 21.68 $ $ 90.28 \pm 13.20 $ $ 54.90 \pm 19.31 $ $ 20.58 \pm 12.02 $
    LGD $ 92.20 \pm 16.72 $ $ 88.43 \pm 13.43 $ $ 80.48 \pm 18.11 $ $ 53.75 \pm 19.26 $
    EDIP $ 92.50 \pm 21.68 $ $ 90.28 \pm 13.20 $ $ 54.90 \pm 19.31 $ $ 20.58 \pm 12.02 $
    StepNet $ 94.20 \pm \phantom{0}7.62 $ $ 80.65 \pm 18.03 $ $ 68.53 \pm 19.99 $ $ 23.95 \pm 11.81 $
     | Show Table
    DownLoad: CSV

    Table 2.  Mean and standard deviation for the U-Net, LGD, EDIP and StepNet on the test set (text images of both fonts) and on the the sanity images. The PSNR and SSIM are calculated using a data range of $ 1 $

    Level 4 Text Sanity
    PSNR SSIM PSNR SSIM
    Blurred $ 16.29 \pm 0.40 $ $ 0.403 \pm 0.008 $ $ 13.42 \pm 0.98 $ $ 0.188 \pm 0.073 $
    U-Net $ 25.19 \pm 2.10 $ $ 0.630 \pm 0.013 $ $ 11.41 \pm 2.56 $ $ 0.320 \pm 0.124 $
    LGD $ 23.12 \pm 0.38 $ $ 0.605 \pm 0.004 $ $ 13.44 \pm 1.25 $ $ 0.311 \pm 0.101 $
    EDIP $ 25.19 \pm 2.10 $ $ 0.630 \pm 0.013 $ $ 14.21 \pm 1.27 $ $ 0.324 \pm 0.115 $
    StepNet $ 23.40 \pm 0.88 $ $ 0.605 \pm 0.007 $ $ 13.91 \pm 1.78 $ $ 0.310 \pm 0.116 $
    Level 9 Text Sanity
    PSNR SSIM PSNR SSIM
    Blurred $ 14.38 \pm 0.42 $ $ 0.442 \pm 0.008 $ $ 11.98 \pm 0.76 $ $ 0.193 \pm 0.077 $
    U-Net $ 22.18 \pm 0.83 $ $ 0.607 \pm 0.006 $ $ 12.40 \pm 1.03 $ $ 0.307 \pm 0.098 $
    LGD $ 23.82 \pm 0.88 $ $ 0.612 \pm 0.006 $ $ 12.63 \pm 1.01 $ $ 0.299 \pm 0.102 $
    EDIP $ 22.18 \pm 0.83 $ $ 0.607 \pm 0.006 $ $ 13.23 \pm 0.92 $ $ 0.312 \pm 0.107 $
    StepNet $ 22.13 \pm 0.71 $ $ 0.596 \pm 0.005 $ $ 12.51 \pm 1.04 $ $ 0.262 \pm 0.099 $
    Level 14 Text Sanity
    PSNR SSIM PSNR SSIM
    Blurred $ 13.97 \pm 0.37 $ $ 0.458 \pm 0.009 $ $ 11.15 \pm 0.80 $ $ 0.193 \pm 0.078 $
    U-Net $ 19.50 \pm 0.57 $ $ 0.597 \pm 0.007 $ $ 11.43 \pm 1.01 $ $ 0.290 \pm 0.094 $
    LGD $ 21.72 \pm 0.57 $ $ 0.609 \pm 0.006 $ $ 11.72 \pm 0.90 $ $ 0.296 \pm 0.097 $
    EDIP $ 19.50 \pm 0.57 $ $ 0.597 \pm 0.007 $ $ 12.21 \pm 0.70 $ $ 0.300 \pm 0.097 $
    StepNet $ 20.41 \pm 0.43 $ $ 0.600 \pm 0.007 $ $ 11.66 \pm 1.10 $ $ 0.256 \pm 0.099 $
    Level 19 Text Sanity
    PSNR SSIM PSNR SSIM
    Blurred $ 12.48 \pm 0.33 $ $ 0.548 \pm 0.008 $ $ \phantom{0}8.65 \pm 1.17 $ $ 0.228 \pm 0.103 $
    U-Net $ 17.61 \pm 0.46 $ $ 0.583 \pm 0.007 $ $ \phantom{0}9.83 \pm 1.82 $ $ 0.264 \pm 0.105 $
    LGD $ 19.06 \pm 0.68 $ $ 0.599 \pm 0.007 $ $ 10.39 \pm 1.47 $ $ 0.279 \pm 0.103 $
    EDIP $ 17.61 \pm 0.46 $ $ 0.583 \pm 0.007 $ $ \phantom{0}9.83 \pm 1.82 $ $ 0.264 \pm 0.105 $
    StepNet $ 17.35 \pm 0.44 $ $ 0.582 \pm 0.007 $ $ 10.06 \pm 1.40 $ $ 0.248 \pm 0.105 $
     | Show Table
    DownLoad: CSV

    Table 3.  Inference time of the different methods. For EDIP the inference time is provided in the case that the data error of the initial U-Net was higher than the threshold. All computations were done on a GeForce RTX 3090 with 24GB memory

    Level 4 Level 9 Level 14 Level 19
    U-Net $ 2.345 \pm 0.950\, $s
    LGD $ 0.198 \pm 0.017\, $s
    EDIP $ 199.5 \pm 2.5\, $min $ 385.3 \pm 12.6\, $min $ 476.0 \pm 5.7\, $min $ 476.0 \pm 5.7\, $min
    StepNet $ 0.075 \pm 0.045 $s $ 0.151 \pm 0.042 $s $ 0.437 \pm 0.061 $s $ 0.869 \pm 0.12 $s
     | Show Table
    DownLoad: CSV
  • [1] J. Adler and O. Öktem, Solving ill-posed inverse problems using iterative deep neural networks, Inverse Problems, 33 (2017), 124007, 24 pp. doi: 10.1088/1361-6420/aa9581.
    [2] S. ArridgeP. MaassO. Öktem and C.-B. Schönlieb, Solving inverse problems using data-driven models, Acta Numerica, 28 (2019), 1-174.  doi: 10.1017/S0962492919000059.
    [3] D. O. BaguerJ. Leuschner and M. Schmidt, Computed tomography reconstruction using deep image prior and learned reconstruction methods, Inverse Problems, 36 (2020), 094004.  doi: 10.1088/1361-6420/aba415.
    [4] R. Barbano, J. Leuschner, M. Schmidt, A. Denker et al., Is deep image prior in need of a good education? preprint, arXiv: 2111.11926, (2021).
    [5] A. Beck and M. Teboulle, Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems, IEEE Transactions on Image Processing, 18 (2009), 2419-2434.  doi: 10.1109/TIP.2009.2028250.
    [6] D. C. Brown, Decentering distortion of lenses, Photogrammetric Engineering, 32 (1996), 444-464. 
    [7] A. W. Fitzgibbon, Simultaneous linear estimation of multiple view geometry and lens distortion, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). IEEE., (2001). doi: 10.1109/CVPR.2001.990465.
    [8] R. C. Gonzalez and R. E. Woods, Digital image processing (3rd edition), Prentice-Hall, Inc., (2006).
    [9] P. C. Hansen, J. G. Nagy and D. P. O'leary, Deblurring images: Matrices, spectra, and filtering, Society for Industrial and Applied Mathematics, (2006). doi: 10.1137/1.9780898718874.
    [10] A. HauptmannJ. AdlerS. Arridge and O. Öktem, Multi-scale learned iterative reconstruction, IEEE Transactions on Computational Imaging, 6 (2020), 843-856.  doi: 10.1109/TCI.2020.2990299.
    [11] A. Hauptmann, B. Cox, F. Lucka, N. Huynh et al., Approximate k-Space models and deep learning for fast photoacoustic reconstruction, International Workshop on Machine Learning for Medical Image Reconstruction, Springer, Cham, 2018. doi: 10.1007/978-3-030-00129-2_12.
    [12] J.-B. Huang, A. Singh and N. Ahuja, Single image super-resolution from transformed self-exemplars, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2015), 5197-5206. doi: 10.1109/CVPR.2015.7299156.
    [13] M. Juvonen, S. Siltanen and F. S. de Moura, Helsinki Deblur Challenge 2021: Description of photographic data, preprint, arXiv: 2105.10233 (2021).
    [14] A. Kay, Tesseract: An open-source optical character recognition engine, Linux J., 2007 (2007).
    [15] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, arXiv preprint, arXiv: 1412.6980, (2014).
    [16] F. Knoll, J. Zbontar, A. Sriram, M. J. Muckley, et al., fastMRI: A publicly available raw k-Space and DICOM dataset of knee images for accelerated MR image reconstruction using machine learning, Radiology: Artificial Intelligence, 2 (2020), e190007. doi: 10.1148/ryai.2020190007.
    [17] J. Leuschner, M. Schmidt, D. O. Baguer and P. Maass, LoDoPaB-CT, a benchmark dataset for low-dose computed tomography reconstruction, Scientific Data, 8 (2021). doi: 10.1038/s41597-021-00893-z.
    [18] S. Lunz, A. Hauptmann, T. Tarvainen, C.-B. Schonlieb and S. Arridge, On Learned Operator Correction in Inverse Problems, SIAM Journal on Imaging Sciences doi: 10.1137/20M1338460.
    [19] O. Ronneberger, P. Fischer and T. Brox, U-Net: Convolutional networks for biomedical image segmentation, Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, 9351 (2015), Springer. doi: 10.1007/978-3-319-24574-4_28.
    [20] L. I. RudinS. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268.  doi: 10.1016/0167-2789(92)90242-F.
    [21] O. Scherzer, M. Grasmair, H. Grossauer, M. Haltmeier, et al., Variational Methods in Imaging, Springer, 2009. doi: 10.1007/978-0-387-69277-7.
    [22] M. I. SezanG. PavlovicA. M. Tekalp and A. T. Erdem, On modeling the focus blur in image restoration, ICASSP, 91 (1991), 2485-2488.  doi: 10.1109/ICASSP.1991.150905.
    [23] E. Y. Sidky and X. Pan, Report on the AAPM deep-learning sparse-view CT (DL-sparse-view CT) Grand Challenge, Medical Physics, 49 (2022), 4935-4943.  doi: 10.1002/mp.15489.
    [24] D. Ulyanov, A. Vedaldi and V. Lempitsky, Deep image prior, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, 9446-9454.
    [25] Z. WangA. C. BovikH. R. Sheikh and E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity, IEEE Transactions on Image Processing, 13 (2004), 600-612.  doi: 10.1109/TIP.2003.819861.
    [26] Y. Wu and K. He, Group normalization, Proceedings of the European Conference on Computer Vision (ECCV), (2018), 3-19. doi: 10.1007/978-3-030-01261-8_1.
    [27] L. Xu and J. Jia, Two-phase kernel estimation for robust motion deblurring, European Conference on Computer Vision, (2010). doi: 10.1007/978-3-642-15549-9_12.
    [28] L. Xu, J. S. Ren, C. Liu and J. Jia, Deep convolutional neural network for image deconvolution, Advances in Neural Information Processing Systems, 27 (2014).
    [29] B. Xu, N. Wang, T. Chen and M. Li, Empirical evaluation of rectified activations in convolutional network, arXiv preprint, arXiv: 1505.00853, (2015).
    [30] J. Zbontar, F. Knoll, A. Sriram, T. Murrell, et al., fastMRI: An open dataset and benchmarks for accelerated MRI, preprint, arXiv: 1811.08839, (2019).
    [31] K. Zhang, W. Zuo, S. Gu, L. Zhang, et al., Learning deep CNN denoiser prior for image restoration, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 3929-3938. doi: 10.1109/CVPR.2017.300.
    [32] Z. Zhang, A flexible new technique for camera calibration, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (2000), 1330-1334.  doi: 10.1109/34.888718.
    [33] International Organization for Standardization, ISO 12232: 2019 - photography - digital still cameras - determination of exposure index, ISO speed ratings, standard output sensitivity, and recommended exposure index, ICS 37.040.10 Photographic Equipment. Projectors, 3 (2019).
  • 加载中

Figures(8)

Tables(3)

SHARE

Article Metrics

HTML views(3717) PDF downloads(500) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return