\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Let's enhance: A deep learning approach to extreme deblurring of text images

  • *Corresponding author: Theophil Trippe

    *Corresponding author: Theophil Trippe 

1This work was done while at Utrecht University, The Netherlands
2This work was done prior to joining Amazon.

Abstract / Introduction Full Text(HTML) Figure(23) Related Papers Cited by
  • This work presents a novel deep-learning-based pipeline for the inverse problem of image deblurring, leveraging augmentation and pre-training with synthetic data. Our results build on our winning submission to the recent Helsinki Deblur Challenge 2021, whose goal was to explore the limits of state-of-the-art deblurring algorithms in a real-world data setting. The task of the challenge was to deblur out-of-focus images of random text, thereby in a downstream task, maximizing an optical-character-recognition-based score function. A key step of our solution is the data-driven estimation of the physical forward model describing the blur process. This enables a stream of synthetic data, generating pairs of ground-truth and blurry images on-the-fly, which is used for an extensive augmentation of the small amount of challenge data provided. The actual deblurring pipeline consists of an approximate inversion of the radial lens distortion (determined by the estimated forward model) and a U-Net architecture, which is trained end-to-end. Our algorithm was the only one passing the hardest challenge level, achieving over $ 70\% $ character recognition accuracy. Our findings are well in line with the paradigm of data-centric machine learning, and we demonstrate its effectiveness in the context of inverse problems. Apart from a detailed presentation of our methodology, we also analyze the importance of several design choices in a series of ablation studies. The code of our challenge submission is available under https://github.com/theophil-trippe/HDC_TUBerlin_version_1.

    Mathematics Subject Classification: Primary: 94A08, 68T07; Secondary: 68T20.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Examples from the HDC dataset with different blur levels (left and right column). The severity of blurring increases with each level (20 in total, ranging from 0 to 19). The center column shows the corresponding reconstructions with our deep-learning-based pipeline (examples taken from the validation set)

    Figure 2.  Schematic depiction of our deblurring pipeline. Top left (green): Data is synthesized for augmenting the available HDC training data (the details of the background removal step described in Section 3.1 are omitted here; see Fig. 6 for a precise depiction of that step). Center (purple): The reconstruction pipeline consists of an inverse lens distortion and a modified U-Net architecture. Bottom: The modified architecture differs from the vanilla U-Net (cf. [41]) by adding an introductory pooling layer and additional down- and up-sampling levels, to increase the overall receptive field

    Figure 3.  Visualization of the coordinate deformation caused by radial lens distortion

    Figure 4.  Illustration of the HDC experimental setup taken from [46]. Two identical photo cameras target the same e-ink display with the help of a half-transparent beamsplitter mirror

    Figure 5.  Examples of HDC training data pairs of sharp and blurred images showing text (top row) and a calibration target (second row). Examples of the test data (third row) and sanity check data (bottom row) were unknown to the participants before the submission deadline

    Figure 6.  Illustration of the forward blur model with background removal and addition. The background (bg) $ \mathbf{x}_0 $ and its corresponding blurred version $ \mathbf{y}_0 $ are estimated from the provided calibration target showing a single central point. They are subtracted and added before and after the discrete convolution with the estimated blur kernel, respectively

    Figure 7.  Estimated kernels $ \mathbf{k}_B $ for blur levels 4, 9, 14, and 19 with enhanced contrast. The octagonal shape is typical for the polygonal shutter lenses of modern apertures, indicating that these estimated kernels reflect the underlying physical reality

    Figure 8.  Examples of sharp and blurred image pairs from the (a) original HDC data, (b) synthetic HDC data, and (c) synthetic sanity check data. All blurred images correspond to the same blur level

    Figure 9.  Example reconstruction results (right column) together with the sharp ground-truth images (left column) and blurry input images (middle column) for the blur levels 4, 9, 14, and 19. For the levels 14 and 19, both fonts (upper: Times New Roman, lower: Verdana) are shown. The right column shows the OCR Score for these particular samples, and for the sake of completeness, also the standard evaluation metrics SSIM and PSNR are reported; see Fig. 14, Fig. 19 and Fig. 20 for corresponding average scores

    Figure 10.  Example reconstructions of two out-of-distribution images from the HDC sanity check data, shown for blur levels 4, 9, 14, and 19

    Figure 11.  Summary of results from the HDC. The figure plots the blur level against the average OCR scores achieved by each participating team on the HDC test set. Our winning submission is highlighted as the bold line. Note that some teams have submitted multiple methods, and we took the most accurate one in each case

    Figure 12.  Comparison of a blurry image from the HDC dataset with the results of two simulated forward models: a simple one, using only a convolution with a single spatially invariant blur kernel, and the one from Section 3.1, using a spatially variant blur kernel that accounts for radial lens distortion

    Figure 13.  Visualization of the estimated radial lens distortion and the corresponding inverse distortion, shown for the blur levels 4, 9, 14, and 19

    Figure 14.  Summary of OCR scores of all ablation studies presented in Section 4.3. Analogously to Fig. 11, the OCR scores are computed as the average over the challenge test set. Each line corresponds to a different modification of our final challenge submission, which is highlighted as the bold line. For analogous plots reporting SSIM and PSNR, see Fig. 19 and Fig. 20 in Appendix A.1

    Figure 15.  Comparison of deblurring networks trained only on synthetic data ( = only pre-training), trained only on the original HDC data ( = only fine-tuning), and trained as in Section 3.4 ( = both)

    Figure 16.  Comparison of deblurring U-Nets without initial inverse radial distortion (middle column) and with initial inverse radial distortion as described in Section 3.3 (right column); see Fig. 17 for a zoom of the results of level 19

    Figure 17.  Zoom of level 19 from Fig. 16, comparing the deblurring results of U-Nets without (left) and with (right) inverse radial distortion as an initial step

    Figure 18.  Comparison of deblurring U-Nets of different depths ( = number of down- and up-sampling steps)

    Figure 19.  Average SSIM scores for our ablation studies from Section 4.3; cf. Fig. 14

    Figure 20.  Average PSNR scores for our ablation studies from Section 4.3; cf. Fig. 14

    Figure 21.  Out-of-distribution performance on examples from the test dataset. The images are taken from the blur levels $ i = 4 , 9 , 14 , 17 $, and 19, while the used reconstruction pipelines have been trained on the easier levels $ i-1 $ and $ i-2 $ (right column = pipeline of level $ i $ for reference)

    Figure 22.  Out-of-distribution performance on examples from the test dataset. The images are taken from the blur levels $ i = 4 , 9 , 14 $, and 17, while the used reconstruction pipelines have been trained on the harder levels $ i+1 $ and $ i+2 $ (right column = pipeline of level $ i $ for reference). Note that for level $ i = 19 $, there exists no corresponding pipelines, which explains why we have included level 17 instead

    Figure 23.  Average OCR Scores on the test data for our final pipeline but applied to adjacent levels

  • [1] Tesseract OCR, URL: https://github.com/tesseract-ocr/tesseract, 2022.,
    [2] N. Adaloglou, Understanding the receptive field of deep convolutional networks, URL: https://theaisummer.com/receptive-field/, 2020.
    [3] J. Adler and O. Öktem, Learned primal-dual reconstruction, IEEE Trans. Med. Imag., 37 (2018), 1322-1332.  doi: 10.1109/TMI.2018.2799231.
    [4] H. K. AggarwalM. P. Mani and M. Jacob, MoDL: Model-based deep learning architecture for inverse problems, IEEE Trans. Med. Imag., 38 (2018), 394-405.  doi: 10.1109/TMI.2018.2865356.
    [5] J. R. Alvim, K. N. Filho, M. L. B. Junior, R. D. B. Brotto, R. da Rocha Lopes, T. A. P. P. Teixeira and V. C. Lima, HDC Submission From São Paulo Group, URL: https://github.com/vclima/deblur_submit, 2021.
    [6] J. M. M. Anderson, A deblurring algorithm for impulse based forward-looking ground penetrating radar images reconstructed using the delay-and-sum algorithm, In IEEE Radar Conference (RadarConf18), 2018, 1377-1382. doi: 10.1109/RADAR.2018.8378765.
    [7] S. ArridgeP. MaassO. Öktem and C.-B. Schönlieb, Solving inverse problems using data-driven models, Acta Numerica, 28 (2019), 1-174.  doi: 10.1017/S0962492919000059.
    [8] Y. Bahat, N. Efrat and M. Irani, Non-uniform blind deblurring by reblurring, In IEEE International Conference on Computer Vision (ICCV), 2017, 3306-3314. doi: 10.1109/ICCV.2017.356.
    [9] B. Bascle, A. Blake and A. Zisserman, Motion deblurring and super-resolution from an image sequence, In Lecture Notes in Computer Science, Springer Berlin Heidelberg, 1996,571-582. doi: 10.1007/3-540-61123-1_171.
    [10] J. M. Bioucas-Dias, M. A. T. Figueiredo and J. P. Oliveira, Total variation-based image deconvolution: A majorization-minimization approach, In IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2006.
    [11] D. C. Brown, Decentering distortion of lenses, Photogrammetric Engineering, 32 (1966), 444-462. 
    [12] A. BubS. GondromM. MaislN. Uhlmann and W. Arnold, Image blur in a flat-panel detector due to compton scattering at its internal mountings, Measurement Science and Technology, 18 (2007), 1270-1277.  doi: 10.1088/0957-0233/18/5/013.
    [13] W. Demtröder, Electrodynamics and Optics, Springer Cham, 2019.
    [14] G. W. DonaldL. Snyder and M. W. Vanner, Local computed tomography via iterative deblurring, Scanning, 18 (1996), 582-588.  doi: 10.1002/sca.4950180808.
    [15] A. W. Fitzgibbon, Simultaneous linear estimation of multiple view geometry and lens distortion, In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2001. doi: 10.1109/CVPR.2001.990465.
    [16] H. Gao, X. Tao, X. Shen and J. Jia, Dynamic scene deblurring with parameter selective sharing and nested skip connections, In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019. doi: 10.1109/CVPR.2019.00397.
    [17] M. Genzel, I. Gühring, J. Macdonald and M. März, Near-exact recovery for tomographic inverse problems via deep learning, In K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvari, G. Niu, and S. Sabato, editors, Proceedings of the 39th International Conference on Machine Learning (ICML), 2022, 7368-7381.
    [18] M. Genzel, J. Macdonald and M. März, Solving inverse problems with deep neural networks – Robustness included?, IEEE Trans. Pattern Anal. Mach. Intell., 2022. doi: 10.1109/TPAMI.2022.3148324.
    [19] I. GoodfellowJ. Pouget-AbadieM. MirzaB. XuD. Warde-FarleyS. OzairA. Courville and Y. Bengio, Generative adversarial networks, Communications of the ACM, 63 (2020), 139-144.  doi: 10.1145/3422622.
    [20] K. Gregor and Y. LeCun., Learning fast approximations of sparse coding, In Proceedings of the 27th International Conference on Machine Learning (ICML), 2010,399-406.
    [21] K. HammernikJ. SchlemperC. QinJ. DuanR. M. Summers and D. Rueckert, Systematic evaluation of iterative deep neural networks for fast parallel mri reconstruction with sensitivity-weighted coil combination, Magn. Reson. Med., 86 (2021), 1859-1872.  doi: 10.1002/mrm.28827.
    [22] P. C. Hansen, J. G. Nagy and D. P. O'Leary, Deblurring Images, Society for Industrial and Applied Mathematics (SIAM), 2006. doi: 10.1137/1.9780898718874.
    [23] G. Hinton, O. Vinyals and J. Dean, Distilling the knowledge in a neural network, Preprint, arXiv: 1503.02531, 2015.
    [24] D. P. Kingma and J. L. Ba, Adam: A method for stochastic optimization, Preprint, arXiv: 1412.6980, 2014.
    [25] D. P. Kingma and M. Welling, Auto-encoding variational bayes, Preprint, arXiv: 1312.6114, 2013.
    [26] M. Knudsen, F. L.-S. Pedersen and K. Scheel, HDC submission from DTU group 1, URL: https://github.com/KennethScheel/HDC_2021_team_DTU_1, 2021.
    [27] O. Kupyn, V. Budzan, M. Mykhailych, D. Mishkin and J. Matas, Deblurgan: Blind motion deblurring using conditional adversarial networks, In IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi: 10.1109/CVPR.2018.00854.
    [28] O. Kupyn, T. Martyniuk, J. Wu and Z. Wang, Deblurgan-v2: Deblurring (orders-of-magnitude) faster and better, In IEEE/CVF International Conference on Computer Vision (ICCV), 2019, 8877-8886. doi: 10.1109/ICCV.2019.00897.
    [29] V. I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, Soviet Physics Doklady, 10 (1966), 707-710. 
    [30] A. Levin, Y. Weiss, F. Durand and W. T. Freeman, Understanding and evaluating blind deconvolution algorithms, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009, 1964-1971. doi: 10.1109/CVPR.2009.5206815.
    [31] J. LiZ. Liu and Y. Yao, Defocus blur detection and estimation from imaging sensors, Sensors, 18 (2018), 1135.  doi: 10.3390/s18041135.
    [32] V. MongaY. Li and Y. C. Eldar, Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing, IEEE Signal Process. Mag., 38 (2021), 18-44.  doi: 10.1109/MSP.2020.3016905.
    [33] S. Nah, T. H. Kim and K. M. Lee, Deep multi-scale convolutional neural network for dynamic scene deblurring, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. doi: 10.1109/CVPR.2017.35.
    [34] A. Ng, A chat with Andrew on MLOps: From model-centric to data-centric AI, URL: https://youtu.be/06-AZXmwHjo, 2021.
    [35] S. I. Nikolenko, Synthetic Data for Deep Learning, Preprint, arXiv: 1909.11512, 2019. doi: 10.1007/978-3-030-75178-4.
    [36] T. M. Nimisha, A. K. Singh and A. N. Rajagopalan, Blur-invariant deep learning for blind-deblurring, In IEEE International Conference on Computer Vision (ICCV), 2017, 4762-4770. doi: 10.1109/ICCV.2017.509.
    [37] G. OngieA. JalalC. A. MetzlerR. G. BaraniukA. G. Dimakis and R. Willett, Deep learning techniques for inverse problems in imaging, IEEE Journal on Selected Areas in Information Theory, 1 (2020), 39-56.  doi: 10.1109/JSAIT.2020.2991563.
    [38] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga and A. Lerer, Automatic differentiation in PyTorch, Contribution to the NIPS 2017 Autodiff Workshop, available online: https://openreview.net/forum?id = BJJsrmfCZ, 2017.
    [39] F. H. Pedersen, M. E. Dahlgaard, M. T. R. Henriksen and R. O. Ochoa, HDC Submission From DTU Group 2, https://github.com/raulorteg/HDC2021_Team_DTU, 2021.
    [40] D. M. Pelt, HDC Submission From Leiden University Group, URL: https://github.com/dmpelt/hdc2021_pelt, 2021.
    [41] O. Ronneberger, P. Fischer and T. Brox, U-net: Convolutional networks for biomedical image segmentation, In International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), Springer, 2015,234-241. doi: 10.1007/978-3-319-24574-4_28.
    [42] G. Ros, L. Sellart, J. Materzynska, D. Vazquez and A. M. Lopez, The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes, In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. doi: 10.1109/CVPR.2016.352.
    [43] L. I. RudinS. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268.  doi: 10.1016/0167-2789(92)90242-F.
    [44] C. J. SchulerM. HirschS. Harmeling and B. Scholkopf, Learning to deblur, IEEE Trans. Pattern Anal. Mach. Intell., 38 (2015), 1439-1451.  doi: 10.1109/TPAMI.2015.2481418.
    [45] S. Siltanen, M. Juvonen and F. Moura, Helsinki Deblur Challenge 2021, URL: https://www.aapm.org/GrandChallenge/DL-sparse-view-CT/, 2021.
    [46] S. Siltanen, M. Juvonen and F. Moura, Helsinki Deblur Challenge 2021 - Results, URL: https://zenodo.org/record/4916176, 2021.
    [47] S. Siltanen, M. Juvonen and F. Moura, Helsinki Deblur Challenge 2021 - Results, URL: https://www.fips.fi/HDCresults.php#anchor1, 2021.
    [48] X. Tao, H. Gao, X. Shen, J. Wang and J. Jia, Scale-recurrent network for deep image deblurring, In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018. doi: 10.1109/CVPR.2018.00853.
    [49] A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-Posed Problems, Wiley, 1977.
    [50] T. Trippe, M. Genzel, M. März and J. Macdonald, HDC Submission From TU Berlin Group, URL: https://github.com/theophil-trippe/HDC_TUBerlin_version_1, 2021.
    [51] A. WangT. Qiu and L. Shao, A simple method of radial distortion correction with centre of distortion estimation, Journal of Mathematical Imaging and Vision, 35 (2009), 165-172.  doi: 10.1007/s10851-009-0162-1.
    [52] L. YuanJ. SunL. Quan and H.-Y. Shum, Image deblurring with blurred/noisy image pairs, ACM Transactions on Graphics, 26 (2007), 1-es.  doi: 10.1145/1275808.1276379.
    [53] K. Zhang, W. Luo, Y. Zhong, L. Ma, B. Stenger, W. Liu and H. Li, Deblurring by realistic blurring, In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020. doi: 10.1109/CVPR42600.2020.00281.
    [54] K. ZhangW. RenW. LuoW.-S. LaiB. StengerM.-H. Yang and H. Li, Deep image deblurring: A survey, International Journal of Computer Vision, 130 (2022), 2103-2130.  doi: 10.1007/s11263-022-01633-5.
  • 加载中

Figures(23)

SHARE

Article Metrics

HTML views(3158) PDF downloads(150) Cited by(0)

Access History

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return