\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Restoring severely out-of-focus blurred text images with Deep Image Prior

  • *Corresponding author: Leonardo A. Ferreira

    *Corresponding author: Leonardo A. Ferreira 
Abstract / Introduction Full Text(HTML) Figure(13) / Table(6) Related Papers Cited by
  • Deblurring is a classical image processing problem with several techniques to solve it. However, increasingly complex methods are required as the degradation increases. Meanwhile, Deep Image Prior (DIP) is a technique, based on artificial neural networks, that does not depend on training data and presents promising results in several image processing tasks. In this work, we proposed the combination of DIP with a supervised convolutional neural network to deblur severely out-of-focus blurred text images from the Helsinki Deblur Challenge (HDC2021) dataset. We evaluated the deblurred text images using optical character recognition results and had a satisfactory performance up to the 16th highest blur category. We deblurred natural images of the same dataset to characterize our method as a general-purpose deblurring algorithm, recovering moderate details up to the 13th highest blur category. The experimental results were competitive against other state-of-art methods, showing the potential and robustness of our method. However, its high computational demands may hinder real-time applications.

    Mathematics Subject Classification: 68T07, 68U10.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Scheme of the DIP algorithm for deblurring. A noise tensor ($ \mathbf{N} $) is processed by a CNN ($ \phi_{ {\mathbf{ \pmb{\mathsf{ θ}} }} }(\cdot) $) which generates an output, which is given as input to the blur model ($ \varphi_{ {\mathbf{ \pmb{\mathsf{ α}} }} }(\cdot) $) to generate a blurred image. In the first stage, a loss function $ L_1(\cdot, \cdot) $ compares this blurred output to the input image of the deblurring algorithm ($ \mathbf{Y} $). An optimizer updates the parameters of the CNN and the blur model to minimize the loss value. Consequently, the CNN output becomes closer to the desired sharp image

    Figure 2.  (a) The initial uniform disk kernel. (b) Rings area, displayed by the regions between two consecutive dashed circles, totaling four rings beyond the kernel border and six rings within. (c) The remaining white circle is the core, and the dotted circle marks the kernel boundary. Optimization updates only the rings and the core of the kernel

    Figure 3.  Scheme of the proposed supervised CNN inference phase. First, the input full-size image is divided into patches (purple square). The patch is given as input to the supervised CNN and a central square (green dashed lines) is extracted from the corresponding output. Then, the final image is composed of this extracted square (green solid lines). By repeating this procedure throughout the image, with steps of $ 64 $ pixels between the extracted patches, the whole image is restored

    Figure 4.  Scheme of the third stage. At iteration $ 175 $, we calculate the average image from the last $ 50 $ DIP network outputs. The resulting image is the input to the supervised CNN and $ \tilde{\mathbf{X}}_3 $ is obtained. Further $ 50 $ iterations of optimizing $ L_3 $ are performed, in which the loss function compares the result of the supervised CNN to the DIP network output, and the final result is the mean of the method output observed during these 50 iterations

    Figure 5.  Deblurred images from category $ 8 $: Times New Roman

    Figure 6.  Deblurred images from category $ 8 $: Verdana

    Figure 7.  Deblurred images from category $ 8 $: Natural

    Figure 8.  Blur Category $ 12 $ examples. Rows from top to bottom: Times New Roman, Verdana, and Natural

    Figure 9.  Blur Category $ 15 $ examples. Rows from top to bottom: Times New Roman, Verdana, and Natural

    Figure 10.  Blur Category $ 19 $ examples. Rows from top to bottom: Times New Roman, Verdana, and Natural

    Figure 11.  Blur Category $ 14 $ examples. Rows from top to bottom: Times New Roman, Verdana, and Natural

    Figure 12.  Results of the blur category 8 with the initial radius of the blur kernel being 2 pixels greater than the estimated value

    Figure 13.  QR Code examples. From top to bottom: Sharp, blurred and DIP-I reconstructions

    Table 1.  Neural network architecture used for the DIP ($ \phi_{ {\mathbf{ \pmb{\mathsf{ θ}} }} } $)

    Parameter DIP
    Input size $ 320 \times 512 \times 32 $
    Output size $ 320 \times 512 \times 1 $
    No. of encoder layers 8
    No. of decoder layers 8
    No. of filters in each encoder layer 128 (all)
    No. of filters in each decoder layer 128 (all)
    Size of filters in each encoder/decoder layer $ 3 \times 3 $ (all)
    No. of skip layers 8
    No. of filters in each skip layer 32
    Size of filters in each skip layer $ 1 \times 1 $
    Activation function ReLU
    Activation function (last layer) Sigmoid
    Loss Function ($ L_1 $) See Equation (11)
    Optimizer Adam
     | Show Table
    DownLoad: CSV

    Table 2.  Values of the estimated radius for each blur category used in this study

    Blur category Radius (pixels)
    8 13
    12 20
    15 26
    19 44
     | Show Table
    DownLoad: CSV

    Table 3.  Neural network architecture used for the supervised CNN ($ \phi^{sup}_{ {\mathbf{ \pmb{\mathsf{ θ}} }} }) $

    Parameter Supervised CNN
    Input size $ 96 \times 96 \times 1 $
    Output size $ 96 \times 96 \times 1 $
    No. of encoder layers 5
    No. of decoder layers 5
    No. of filters in each encoder layer [64, 64, 128, 128, 256]
    No. of filters in each decoder layer [128, 128, 64, 64, 1]
    Size of filters in each encoder/decoder layer $ 3 \times 3 $ (all)
    No. of skip layers 4
    Activation function ReLU
    Activation function (last layer) ReLU
    Loss Function ($ L_2 $) 1 - SSIM
    Optimizer Adam
     | Show Table
    DownLoad: CSV

    Table 4.  Values of the parameters used for training the CNN

    Parameter Value
    Batch size 64
    Batches per epoch 500
    Max no. of epochs 30
    Tolerance (early stopping) 1 × 10−6
    No. of epochs (early stopping) 4
    L1 regularization 1 × 10−9
     | Show Table
    DownLoad: CSV

    Table 5.  Quantity of correctly identified characters using OCR in the resulting images of each method. Presented as mean $ \pm $ standard deviation

    Method/Category 8 12 15 19
    Ground truth $ 93.8 \pm 9.0 $ $ 95.5 \pm 7.0 $ $ 96.5 \pm 6.7 $ $ 96.0 \pm 5.2 $
    DIP-O $ 58.9 \pm 25.9 $ $ 32.0 \pm 25.6 $ $ 45.6 \pm 23.0 $ $ 0.0 \pm 0.0 $
    DIP-I $ 80.9 \pm 18.7 $ $ 79.7 \pm 16.6 $ $ 73.4 \pm 21.5 $ $ 37.8 \pm 18.9 $
    DIP-M $ 81.9 \pm 19.1 $ $ 72.3 \pm 24.3 $ $ 78.3 \pm 14.9 $ $ 31.5 \pm 16.7 $
    DIP-P $ \mathbf{91.7 \pm 12.1} $ $ \mathbf{ 87.8 \pm 11.0} $ $ \mathbf{82.9 \pm 13.7} $ $ \mathbf{39.8 \pm 18.6} $
     | Show Table
    DownLoad: CSV

    Table 6.  FR-IQA measures in blur category $ 8 $

    Text Natural
    Method SSIM PSNR SSIM PSNR
    DIP-O $ 0.500\pm 0.021 $ $ 18.8\pm 0.3 $ $ 0.260\pm 0.105 $ $ 12.2\pm 1.7 $
    SelfDeblur $ 0.523\pm 0.012 $ $ 17.0\pm 0.4 $ $ 0.262\pm 0.102 $ $ 11.7\pm 1.3 $
    DeepRED $ 0.575\pm 0.008 $ $ 18.0\pm 0.3 $ $ \mathbf{0.292\pm 0.108} $ $ 12.0\pm 1.5 $
    DIP-I $ 0.613\pm 0.007 $ $ 20.2\pm0.4 $ $ 0.287\pm 0.112 $ $ \mathbf{12.2\pm 1.6} $
    DIP-M $ 0.613\pm 0.007 $ $ 20.5\pm0.4 $ $ 0.284\pm 0.112 $ $ \mathbf{12.2\pm 1.6} $
    DIP-P $ \mathbf{0.617\pm 0.006} $ $ \mathbf{22.5\pm 0.6} $ $ 0.261\pm 0.110 $ $ 8.5\pm 2.2 $
     | Show Table
    DownLoad: CSV
  • [1] S. ArridgeP. MaassO. Öktem and C.-B. Schönlieb, Solving inverse problems using data-driven models, Acta Numerica, 28 (2019), 1-174.  doi: 10.1017/S0962492919000059.
    [2] D. O. Baguer, J. Leuschner and M. Schmidt, Computed tomography reconstruction using deep image prior and learned reconstruction methods, Inverse Problems, 36 (2020), 094004, 24 pp. doi: 10.1088/1361-6420/aba415.
    [3] R. Barbano, J. Leuschner, M. Schmidt, A. Denker, A. Hauptmann, P. Maass and B. Jin, Is deep image prior in need of a good education?, arXiv preprint, (2021), arXiv: 2111.11926v1.
    [4] M. Benning and M. Burger, Modern regularization methods for inverse problems, Acta Numerica, 27, (2018), 1-111. doi: 10.1017/S0962492918000016.
    [5] M. Bertero, P. Boccacci, G. Desiderà and G. Vicidomini, Image deblurring with Poisson data: From cells to galaxies, Inverse Problems, 25 (2009), 123006, 26 pp. doi: 10.1088/0266-5611/25/12/123006.
    [6] W. Burger and M. J. Burge, Digital Image Processing: An Algorithmic Introduction using Java, 2$^{nd}$ edition, Springer-Verlag, London, 2016. doi: 10.1007/978-1-4471-6684-9.
    [7] J. Courtney, SEDIQA: Sound emitting document image quality assessment in a reading aid for the visually impaired, J. Imaging, 7 (2021), 168.  doi: 10.3390/jimaging7090168.
    [8] S. DittmerT. KluthP. Maass and D. O. Baguer, Regularization by architecture: A deep prior approach for inverse problemss, J. Math. Imaging Vision, 62 (2019), 456-470.  doi: 10.1007/s10851-019-00923-x.
    [9] G. Evangelidis and E. Psarakis, Parametric image alignment using enhanced correlation coefficient maximization, IEEE Transactions on Pattern Analysis and Machine Intelligence, 30 (2008), 1858-1865.  doi: 10.1109/TPAMI.2008.113.
    [10] Y. Feng, Y. Shi and D. Sun, Blind Poissonian image deblurring regularized by a denoiser constraint and deep image prior, Math. Probl. Eng., 2020 (2020), 9483521, 15 pp. doi: 10.1155/2020/9483521.
    [11] K. GongC. CatanaJ. Qi and Q. Li, PET image reconstruction using deep image prior, IEEE Transactions on Medical Imaging, 38 (2019), 1655-1665.  doi: 10.1109/TMI.2018.2888491.
    [12] P. C. Hansen, Discrete Inverse Problems: Insight and Algorithms, Fundamentals of Algorithms, 7. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2010. doi: 10.1137/1.9780898718836.
    [13] P. C. Hansen, J. G. Nagy and D. P. O'Leary, Deblurring Images: Matrices, Spectra, and Filtering, Fundamentals of Algorithms, 3. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2006. doi: 10.1137/1.9780898718874.
    [14] M. Hradiš, J. Kotera, P. Zemcık and F. Šroubek, Convolutional neural networks for direct text deblurring, Proceedings of the British Machine Vision Conference (BMVC), 10 (2015). doi: 10.5244/C.29.6.
    [15] M. F. M. Jimenez, O. DeGuchy and R. F. Marcia, Deep convolutional autoencoders for deblurring and denoising low-resolution images, 2020 International Symposium on Information Theory and Its Applications (ISITA), (2020), 549-553.
    [16] M. Juvonen, S. Siltanen and F. S. Moura, Helsinki deblur challenge 2021: Description of photographic data, arXiv preprint, (2021), arXiv: 2105.10233.
    [17] M. Juvonen, S. Siltanen and F. S. Moura, Helsinki deblur challenge 2021 open photographic dataset, (2021), Available from: https://zenodo.org/record/4916176.
    [18] M. Juvonen, S. Siltanen and F. S. Moura, Helsinki deblur challenge 2021 test dataset, (2021), Available from: https://zenodo.org/record/5713637.
    [19] D. P. Kingma and J. Ba, ADAM: A method for stochastic optimization, 3rd International Conference on Learning Representations, (2015), arXiv: 1412.6980.
    [20] J. KohJ. Lee and S. Yoon, Single-image deblurring with neural networks: A comparative survey, Computer Vision and Image Understanding, 203 (2021), 103134.  doi: 10.1016/j.cviu.2020.103134.
    [21] O. Kupyn, T. Martyniuk, J. Wu and Z. Wang, Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better, 2019 IEEE/CVF International Conference on Computer Vision (ICCV), (2019), 8877-8886. doi: 10.1109/ICCV.2019.00897.
    [22] H. LanJ. ZhangC. Yang and F. Gao, Compressed sensing for photoacoustic computed tomography based on an untrained neural network with a shape prior, Biomedical Optics Express, 12 (2021), 7835.  doi: 10.1364/boe.441901.
    [23] D. Li and T. Jiang, Blur-specific no-reference image quality assessment: A classification and review of representative methods, The Proceedings of the International Conference on Sensing and Imaging, Springer International Publishing, (2019), 45-68.
    [24] J. Li, Y. Nan and H. Ji, Un-supervised learning for blind image deconvolution via Monte-Carlo sampling, Inverse Problems, 38 (2022), 035012, 26 pp. doi: 10.1088/1361-6420/ac4ede.
    [25] J. Liang, D. Doermann and H. Li, Camera-based analysis of text and documents: A survey, International Journal of Document Analysis and Recognition (IJDAR), 7 (2005), 84-104. doi: 10.1007/s10032-004-0138-z.
    [26] S. Lim, Characterization of noise in digital photographs for image processing, Proceedings Volume 6069, Digital Photography II, International Society for Optics and Photonics, (2006), 219-228. doi: 10.1117/12.655915.
    [27] J. Liu, Y. Sun, X. Xu and U. S. Kamilov, Image restoration using total variation regularized deep image prior, 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (2019), 7715-7719. doi: 10.1109/ICASSP.2019.8682856.
    [28] Y. LiuK. GuG. ZhaiX. LiuD. Zhao and W. Gao, Quality assessment for real out-of-focus blurred images, Journal of Visual Communication and Image Representation, 46 (2017), 70-80. 
    [29] M. Makarkin and D. Bratashov, State-of-the-art approaches for image deconvolution problems, including modern deep learning architectures, Micromachines, 12 (2021), 1558.  doi: 10.3390/mi12121558.
    [30] G. Mataev, P. Milanfar and M. Elad, DeepRED: Deep image prior powered by RED, Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, (2019).
    [31] T. T. H. NguyenA. JatowtM. Coustaty and A. Doucet, Survey of post-OCR processing approaches, ACM Comput. Surv., 54 (2022), 1-37.  doi: 10.1145/3453476.
    [32] G. OngieA. JalalC. A. Metzlerand R. G. BaraniukA. G. Dimakis and R. Willett, Deep learning techniques for inverse problems in imaging, IEEE Journal on Selected Areas in Information Theory, 1 (2020), 39-57.  doi: 10.1109/JSAIT.2020.2991563.
    [33] A. D. Rasamoelina, F. Adjailia and P. Sinčák, A review of activation function for artificial neural network, 18th World Symposium on Applied Machine Intelligence and Informatics (SAMI), (2020), 281-286. doi: 10.1109/SAMI48414.2020.9108717.
    [34] D. Ren, K. Zhang, Q. Wang, Q. Hu and W. Zuo, Neural blind deconvolution using deep priors, Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2020), 3341. doi: 10.1109/cvpr42600.2020.00340.
    [35] C. J. ShinT. B. Lee and Y. S. Heo, Dual image deblurring using deep image prior, Electronics, 10 (2021), 2045.  doi: 10.3390/electronics10172045.
    [36] P. SinhaImage Acquisition and Preprocessing for Machine Vision Systems, 1$^{st}$ edition, SPIE Press, Bellingham, 2012.  doi: 10.1117/3.858360.
    [37] R. Smith, An overview of the tesseract OCR engine, Ninth International Conference on Document Analysis and Recognition (ICDAR), (2007), 629-633. doi: 10.1109/ICDAR.2007.4376991.
    [38] C. Sun, A. Shrivastava, S. Singh and A. Gupta, Revisiting unreasonable effectiveness of data in deep learning era, Proceedings of the 2017 IEEE international conference on computer vision (ICCV), (2017), 843-852. doi: 10.1109/ICCV.2017.97.
    [39] D. UlyanovA. Vedaldi and V. Lempitsky, Deep image prior, International Journal of Computer Vision, 128 (2020), 1867-1888.  doi: 10.1007/s11263-020-01303-4.
    [40] Z. WangA. C. BovikH. R. Sheikh and E. P. Simoncelli, Image quality assessment: From error visibility to structural similarity, IEEE Transactions on Image Processing, 13 (2004), 600-612.  doi: 10.1109/TIP.2003.819861.
    [41] Z. Wang, Z. Wang, Q. Li and H. Bilen, Image deconvolution with deep image and kernel priors, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), (2019). doi: 10.1109/iccvw.2019.00127.
    [42] P. Ye and D. Doermann, Document image quality assessment: A brief survey, 2013 12th International Conference on Document Analysis and Recognition (ICDAR), (2013). doi: 10.1109/ICDAR.2013.148.
    [43] G. Zhai and X. Min, Perceptual image quality assessment: A survey, Science China Information Sciences, 63 (2020), 211301.  doi: 10.1007/s11432-019-2757-1.
    [44] J. Zhang, J. Pan, J. Ren, Y. Song, L. Bao, R. W. H. Lau and M.-H. Yang, Dynamic scene deblurring using spatially variant recurrent neural networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), (2018). doi: 10.1109/cvpr.2018.00267.
    [45] K. Zhang, W. Ren, W. Luo, W.-S. Lai, B. Stenger, M.-H. Yang and H. Li, Deep image deblurring: A survey, arXiv preprint, (2022), arXiv: 2201.10700.
    [46] H. ZhaoO. GalloI. Frosio and J. Kautz, Loss functions for image restoration with neural networks, IEEE Transactions on Computational Imaging, 3 (2017), 47-57.  doi: 10.1109/TCI.2016.2644865.
  • 加载中

Figures(13)

Tables(6)

SHARE

Article Metrics

HTML views(3776) PDF downloads(323) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return