Non-local dependency is a very important prior for many image segmentation tasks. Generally, convolutional operations are building blocks that process one local neighborhood at a time which means the convolutional neural networks(CNNs) usually do not explicitly make use of the non-local prior on image segmentation tasks. Though the pooling and dilated convolution techniques can enlarge the receptive field to use some nonlocal information during the feature extracting step, there is no nonlocal priori for feature classification step in the current CNNs' architectures. In this paper, we present a non-local total variation (TV) regularized softmax activation function method for semantic image segmentation tasks. The proposed method can be integrated into the architecture of CNNs. To handle the difficulty of back-propagation for CNNs due to the non-smoothness of nonlocal TV, we develop a primal-dual hybrid gradient method to realize the back-propagation of nonlocal TV in CNNs. Experimental evaluations of the non-local TV regularized softmax layer on a series of image segmentation datasets showcase its good performance. Many CNNs can benefit from our proposed method on image segmentation tasks.
Citation: |
Figure 1. An example of segmentation results by applying the algorithm of [34] and our proposed method on an image from BSD500. When using 4 geometrical nearest neighbors, the weights are set to 1. The segmentation is quite smooth and missing details (Figure 1(b)). When we use Eq. (11) to compute W, the segmentation results are with more details and better accuracy
Figure 2.
Given an input
Figure 4. An enlarged view of segmentation results from Figure 3
Figure 7. An enlarged view of segmentation results from Figure 6 column 2
Figure 8. An enlarged view of segmentation results from Figure 6 column 1
Table 1. Results of Unet, RUnet and NLUnet trained on WBC Dataset
Table 2. Results of AUnet, RAUnet and NLAUnet trained on WBC Dataset
Method | AUnet [23] | RAUnet | NLAUnet |
mIoU | 90.75 | 91.01 | 91.69 |
Accuracy | 97.35 | 97.40 | 97.57 |
RE | 1.43 | 1.41 | 1.43 |
[1] |
R. Adams and L. Bischof, Seeded region growing, IEEE Transactions on Pattern Analysis and Machine Intelligence, 16 (1994), 641-647.
doi: 10.1109/34.295913.![]() ![]() |
[2] |
M. Z. Alom, M. Hasan, C. Yakopcic, T. M. Taha and V. K. Asari, Recurrent residual convolutional neural network based on u-net (r2u-net) for medical image segmentation, arXiv: 1802.06955.
![]() |
[3] |
V. Badrinarayanan, A. Kendall and R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, arXiv: 1511.00561.
doi: 10.1109/TPAMI.2016.2644615.![]() ![]() |
[4] |
L. Barghout and L. Lee, Perceptual information processing system, US Patent App. 10/618,543, (2004).
![]() |
[5] |
M. Benning, C. Brune, M. Burger and J. Müller, Higher-order tv methods–enhancement via bregman iteration, Journal of Scientific Computing, 54 (2013), 269-310.
doi: 10.1007/s10915-012-9650-3.![]() ![]() ![]() |
[6] |
H. Birkholz, A unifying approach to isotropic and anisotropic total variation denoising models, Journal of Computational and Applied Mathematics, 235 (2011), 2502-2514.
doi: 10.1016/j.cam.2010.11.003.![]() ![]() ![]() |
[7] |
J. Canny, A computational approach to edge detection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 8 (1986), 679-698.
doi: 10.1016/B978-0-08-051581-6.50024-6.![]() ![]() |
[8] |
G. Gilboa and S. Osher, Nonlocal operators with applications to image processing, Multiscale Modeling & Simulation, 7 (2008), 1005-1028.
doi: 10.1137/070698592.![]() ![]() ![]() |
[9] |
K. He, X. Zhang, S. Ren and J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classification, in Proceedings of the IEEE International Conference on Computer Vision, IEEE, 2015, 1026–1034.
doi: 10.1109/ICCV.2015.123.![]() ![]() |
[10] |
F. Jia, J. Liu and X. Tai, A regularized convolutional neural network for semantic image segmentation, Analysis and Applications, (2020) 1–19.
![]() |
[11] |
M. Johnson-Roberson, C. Barto, R. Mehta, S. N. Sridhar, K. Rosaen and R. Vasudevan, Driving in the matrix: Can virtual worlds replace human-generated annotations for real world tasks?, preprint, arXiv: 1610.01983.
doi: 10.1109/ICRA.2017.7989092.![]() ![]() |
[12] |
M. Kass, A. Witkin and D. Terzopoulos, Snakes: Active contour models, International Journal of Computer Vision, 1, (1988) 321–331.
doi: 10.1007/BF00133570.![]() ![]() |
[13] |
P. Krähenbühl and V. Koltun, Efficient inference in fully connected crfs with gaussian edge potentials., Advances in Neural Information Processing Systems, (2011), 109–117.
![]() |
[14] |
A. Krizhevsky, I. Sutskever and G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, (2012), 1097–1105.
doi: 10.1145/3065386.![]() ![]() |
[15] |
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard and L. D. Jackel, Backpropagation applied to handwritten zip code recognition, Neural Computation, 1 (1989), 541-551.
doi: 10.1162/neco.1989.1.4.541.![]() ![]() |
[16] |
G. Lin, C. Shen, A. V. D. Hengel and I. Reid, Efficient piecewise training of deep structured models for semantic segmentation, in Proceedings of the IEEE Conference on Computer Cision and Pattern Recognition, IEEE, 2016, 3194–3203.
doi: 10.1109/CVPR.2016.348.![]() ![]() |
[17] |
J. Long, E. Shelhamer, and T. Darrell, Fully convolutional networks for semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 2015, 3431–3440.
doi: 10.1109/CVPR.2015.7298965.![]() ![]() |
[18] |
M. Lysaker, A. Lundervold and X.-C. Tai, Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time, IEEE Transactions on Image Processing, 12, (2003), 1579–1590.
doi: 10.1109/TIP.2003.819229.![]() ![]() |
[19] |
D. R. Martin, C. C. Fowlkes and and J. Malik, Learning to detect natural image boundaries using local brightness, color, and texture cues, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26 (2004), 530-549.
doi: 10.1109/TPAMI.2004.1273918.![]() ![]() |
[20] |
K. Mikula, A. Sarti and F. Sgallari, Co-volume level set method in subjective surface based medical image segmentation, in Handbook of Biomedical Image Analysis, Springer, (2005), 583–626.
doi: 10.1007/0-306-48551-6_11.![]() ![]() |
[21] |
D. Mumford and J. Shah, Optimal approximations by piecewise smooth functions and associated variational problems, Communications on Pure and Applied Mathematics, 42 (1989), 577-685.
doi: 10.1002/cpa.3160420503.![]() ![]() ![]() |
[22] |
H. Noh, S. Hong and B. Han, Learning deconvolution network for semantic segmentation, in Proceedings of the IEEE International Conference on Computer Vision, IEEE, 2015, 1520–1528.
doi: 10.1109/ICCV.2015.178.![]() ![]() |
[23] |
O. Oktay, et al., Attention u-net: Learning where to look for the pancreas, preprint, arXiv: 1804.03999.
![]() |
[24] |
N. Otsu, A threshold selection method from gray-level histograms, IEEE Transactions on Systems, Man and Cybernetics, 9 (1979), 62-66.
doi: 10.1109/TSMC.1979.4310076.![]() ![]() |
[25] |
O. Ronneberger, P. Fischer and T. Brox, U-net: Convolutional networks for biomedical image segmentation, in International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer, 2015,234–241.
doi: 10.1007/978-3-319-24574-4_28.![]() ![]() |
[26] |
L. I. Rudin, S. Osher and E. Fatemi, Nonlinear total variation based noise removal algorithms, Physica D: Nonlinear Phenomena, 60 (1992), 259-268.
doi: 10.1016/0167-2789(92)90242-F.![]() ![]() ![]() |
[27] |
B. Schölkopf, K. Tsuda and J.-P. Vert, Support Vector Machine Applications in Computational Biology, MIT press, 2004.
![]() |
[28] |
L. Shapiro and G. C. Stockman, Computer Vision, Prentice Hall, 2001.
![]() |
[29] |
J. Shi and J. Malik, Normalized cuts and image segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (2000), 888-908.
![]() |
[30] |
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, arXiv: 1409.1556.
![]() |
[31] |
M. Unger, T. Mauthner, T. Pock and H. Bischof, Tracking as segmentation of spatial-temporal volumes by anisotropic weighted tv, in International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, Springer 2009,193–206.
doi: 10.1007/978-3-642-03641-5_15.![]() ![]() |
[32] |
P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell, Understanding convolution for semantic segmentation, in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), IEEE, 2018, 1451–1460.
doi: 10.1109/WACV.2018.00163.![]() ![]() |
[33] |
K. Wei, K. Yin, X.-C. Tai and T. F. Chan, New region force for variational models in image segmentation and high dimensional data clustering, preprint, arXiv: 1704.08218.
doi: 10.4310/AMSA.2018.v3.n1.a8.![]() ![]() |
[34] |
K. Yin and X.-C. Tai, An effective region force for some variational models for learning and clustering, Journal of Scientific Computing, 74 (2018), 175-196.
doi: 10.1007/s10915-017-0429-4.![]() ![]() ![]() |
[35] |
F. Yu and V. Koltun, Multi-scale context aggregation by dilated convolutions, preprint, arXiv: 1511.07122.
![]() |
[36] |
L. Zelnik-Manor and P. Perona, Self-tuning spectral clustering, Advances in Neural Information Processing Systems, (2005), 1601–1608.
![]() |
[37] |
X. Zheng, Y. Wang, G. Wang and J. Liu, Fast and robust segmentation of white blood cell images by self-supervised learning, Micron, 107 (2018), 55-71.
doi: 10.1016/j.micron.2018.01.010.![]() ![]() |