\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Deblurring photographs of characters using deep neural networks

  • *Corresponding author: Thomas Germer

    *Corresponding author: Thomas Germer 

Part of this work was done at Heinrich Heine University

Abstract / Introduction Full Text(HTML) Figure(14) / Table(3) Related Papers Cited by
  • In this paper, we present our approach for the Helsinki Deblur Challenge (HDC2021). The task of this challenge is to deblur images of characters without knowing the point spread function (PSF). The organizers provided a dataset of pairs of sharp and blurred images. Our method consists of three steps: First, we estimate a warping transformation of the images to align the sharp images with the blurred ones. Next, we estimate the PSF using a quasi-Newton method. The estimated PSF allows to generate additional pairs of sharp and blurred images. Finally, we train a deep convolutional neural network to reconstruct the sharp images from the blurred images. Our method is able to successfully reconstruct images from the first 10 stages of the HDC 2021 dataset. Our code is available at https://github.com/hhu-machine-learning/hdc2021-psfnn.

    Mathematics Subject Classification: Primary: 68U10; Secondary: 78A46.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Simplified experimental setup reproduced from the HDC2021 description of photographic data [5], consisting of two cameras and a beamsplitter mirror that allows both cameras to capture images of the E Ink display. One camera is correctly focused with low ISO setting while the other camera is misfocused with high ISO setting, resulting in noisy and blurry images

    Figure 2.  Visualization of warping operation

    Figure 3.  Difference between sharp and (warped) blurry image

    Figure 4.  Degree of polynomial features (Equation 2) versus $ \mathcal{L}_\text{warping} $ (Equation 5) averaged over the image pairs of the fifth stage of the HDC2021 dataset, including standard error bars. Note that the y-axis is offset to emphasize the small difference between losses, which is relatively small compared to the standard error

    Figure 5.  Estimated point spread functions for stages 0 through 10. For better readability, all PSFs have been scaled to the same image size. They vary between $ 31 \times 31 $ from the smallest size up to $ 261 \times 261 $ for stage 10

    Figure 6.  The sharp images from the DIV2K dataset $ V^S $ (middle left) are convolved with the point spread function $ P $ (left) and brightness adjusted ($ \tau $) to form a blurry image $ V^B $ for training. Note that the blurry image is slightly smaller than the initial sharp image since we only train on the valid convolution region. Before training, we crop the sharp image to the same size

    Figure 7.  Feature maps of FBA-Net [2]. The output of the initial strided convolution (blue) of the input image (gray) is transformed with a max-pool layer (red), followed by 16 bottleneck layers (yellow), one pyramid pooling layer (green) and a mix of convolutions (blue) and upsampling operations (turquoise). The skip connections are indicated with arrows

    Figure 8.  Two $ 320 \times 320 $ pairs of cropped training samples from the HDC2021 dataset (left) and DIV2K dataset (right)

    Figure 9.  The padded blurry image is decomposed into overlapping tiles, which are then deblurred, cropped and reassembled (blue). The additional green tile with dashed outline shows that tiles in the blurry input image (left) overlap by 160 pixels, while tiles in the deblurred output image (right) do not

    Figure 10.  Comparison between different tile cropping and reassembling methods. A horizontal and vertical seam is clearly visible when neighboring tiles are deblurred individually. Cropping overlapped tiles greatly reduces those artifacts. No apparent seam is visible when blending overlapping tiles

    Figure 11.  Evolution of $ \mathcal{L}_\text{deblur}(\theta_s) $ and the corresponding OCR error for stage 9 of the HDC 2021 dataset over 50000 training batches

    Figure 12.  Evolution of deblurred images from our test dataset after training for a certain number of batches

    Figure 13.  Qualitative comparison of deconvolution methods not based on deep learning for stage 9 of the HDC 2021 dataset

    Figure 14.  Blurry images, deblurred images and sharp images

    Table 1.  Comparison of different networks and training methods for stage 9 of the HDC 2021 dataset. Variations marked with $ ^\dagger $ were trained on different datasets which include synthetically generated text images. Three methods not based on deep learning were also included for reference

    Variation OCR score Parameters
    U-Net 11 35.85 $ 7.76 \times 10^6 $
    U-Net 22 4.15 $ 7.78 \times 10^6 $
    U-Net 33 3.66 $ 31.04 \times 10^6 $
    IndexNet 57.38 $ 3.69 \times 10^6 $
    FBA-Net 75.34 $ 34.67 \times 10^6 $
    FBA-Net without noise aug. 66.48 $ 34.67 \times 10^6 $
    FBA-Net without warping 29.84 $ 34.67 \times 10^6 $
    FBA-Net$ ^\dagger $ synthetic text & DIV2K 84.57 $ 34.67 \times 10^6 $
    FBA-Net$ ^\dagger $ syn. & real text & DIV2K 83.62 $ 34.67 \times 10^6 $
    FBA-Net$ ^\dagger $ only synthetic text 83.56 $ 34.67 \times 10^6 $
    Richardson-Lucy [18,14] 34.36
    Frequency-based [11] 5.77
    Sparse [11] 38.23
     | Show Table
    DownLoad: CSV

    Table 2.  Comparison of different cropping strategies for tiled deblurring of the Times font images of stage 9 of the HDC2021 dataset

    Overlap OCR score
    No overlap 65.37
    Overlap & crop 70.27
    Overlap & blend 71.98
     | Show Table
    DownLoad: CSV

    Table 3.  OCR scores for stages 0 to 10 of the HDC2021 dataset

    Stage 0 1 2 3 4 5 6 7 8 9 10
    Score 96.28 95.28 95.50 96.30 96.40 97.03 94.33 91.97 85.92 73.80 70.17
     | Show Table
    DownLoad: CSV
  • [1] E. Agustsson and R. Timofte, Ntire 2017 challenge on single image super-resolution: Dataset and study, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, (2017), 126-135. doi: 10.1109/CVPRW.2017.150.
    [2] M. Forte and F. Pitié, F, B, alpha matting, preprint, arXiv: 2003.07711.
    [3] K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770-778. doi: 10.1109/CVPR.2016.90.
    [4] M. Hirsch, S. Sra, B. Schölkopf and S. Harmeling, Efficient filter flow for space-variant multiframe blind deconvolution, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, (2010), 607-614. doi: 10.1109/CVPR.2010.5540158.
    [5] M. Juvonen, S. Siltanen and F. Silva de Moura, Helsinki Deblur Challenge 2021: Description of photographic data, preprint, arXiv: 2105.10233.
    [6] J. Kim, J. K. Lee and K. M. Lee, Accurate image super-resolution using very deep convolutional networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), 1646-1654. doi: 10.1109/CVPR.2016.182.
    [7] D. P. Kingma and J. Ba, Adam: A method for stochastic optimization, preprint, arXiv: 1412.6980.
    [8] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin and J. Matas, DeblurGAN: Blind motion deblurring using conditional adversarial networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2018), 8183-8192. doi: 10.1109/CVPR.2018.00854.
    [9] O. Kupyn, T. Martyniuk, J. Wu and Z. Wang, DeblurGAN-v2: Deblurring (orders-of-magnitude) faster and better, Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 8878-8887. doi: 10.1109/ICCV.2019.00897.
    [10] V. I. Levenshtein, Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics Dokl., 10 (1965), 707-710. 
    [11] A. Levin, R. Fergus, F. Durand and W. T. Freeman, Image and depth from a conventional camera with a coded aperture, ACM Transactions on Graphics, 26 (2007), 70-es. doi: 10.1145/1275808.1276464.
    [12] D. C. Liu and J. Nocedal, On the limited memory BFGS method for large scale optimization, Mathematical Programming, 45 (1989), 503-528.  doi: 10.1007/BF01589116.
    [13] H. Lu, Y. Dai, C. Shen and S. Xu, Indices matter: Learning to index for deep image matting, Proceedings of the IEEE/CVF International Conference on Computer Vision, (2019), 3266-3275. doi: 10.1109/ICCV.2019.00336.
    [14] L. B. Lucy, An iterative technique for the rectification of observed distributions, The Astronomical Journal, 79 (1974), 745-754.  doi: 10.1086/111605.
    [15] S. Nah, T. H. Kim and K. M. Lee, Deep multi-scale convolutional neural network for dynamic scene deblurring, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), 257-265. doi: 10.1109/CVPR.2017.35.
    [16] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai and S. Chintala, PyTorch: An imperative style, high-performance deep learning library, Advances in Neural Information Processing Systems 32 (NeurIPS), (2019), 8024-8035.
    [17] S. Qiao, H. Wang, C. Liu, W. Shen and A. Yuille, Micro-batch training with batch-channel normalization and weight standardization, preprint, arXiv: 1903.10520.
    [18] W. H. Richardson, Bayesian-based iterative method of image restoration, Journal of the Optical Society of America, 62 (1972), 55-59.  doi: 10.1364/JOSA.62.000055.
    [19] O. Ronneberger, P. Fischer and T. Brox, U-Net: Convolutional networks for biomedical image segmentation, Medical Image Computing and Computer-Assisted Intervention (MICCAI), (2015), 234-241. doi: 10.1007/978-3-319-24574-4_28.
    [20] C. J. Schuler, H. C. Burger, S. Harmeling and B. Schölkopf, A machine learning approach for non-blind image deconvolution, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2013), 1067-1074. doi: 10.1109/CVPR.2013.142.
    [21] C. J. SchulerM. HirschS. Harmeling and B. Schölkopf, Learning to deblur, IEEE Transactions on Pattern Analysis and Machine Intelligence, 38 (2015), 1439-1451.  doi: 10.1109/TPAMI.2015.2481418.
    [22] R. Smith, An overview of the Tesseract OCR engine, Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR), 2 (2007), 629-633.  doi: 10.1109/ICDAR.2007.4376991.
    [23] J. Sun, W. Cao, Z. Xu and J. Ponce, Learning a convolutional neural network for non-uniform motion blur removal, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 769-777.
    [24] Y. Wu and K. He, Group normalization, Proceedings of the European Conference on Computer Vision (ECCV), (2018), 3-19. doi: 10.1007/978-3-030-01261-8_1.
    [25] L. XuJ. S. RenC. Liu and J. Jia, Deep convolutional neural network for image deconvolution, Advances in Neural Information Processing Systems, 27 (2014), 1790-1798. 
    [26] S. W. Zamir, A. Arora, S. Khan, M. Hayat, F. S. Khan, M.-H. Yang and L. Shao, Multi-stage progressive image restoration, IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2021), 14821-14831. doi: 10.1109/CVPR46437.2021.01458.
    [27] K. ZhangW. Zuo and L. Zhang, FFDNet: Toward a fast and flexible solution for CNN based image denoising, IEEE Transactions on Image Processing, 27 (2018), 4608-4622.  doi: 10.1109/TIP.2018.2839891.
    [28] H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia, Pyramid scene parsing network, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, (2017), 2881-2890. doi: 10.1109/CVPR.2017.660.
  • 加载中

Figures(14)

Tables(3)

SHARE

Article Metrics

HTML views(2919) PDF downloads(276) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return