
- Previous Article
- MFC Home
- This Issue
-
Next Article
Direct and converse theorems for King type operators
Online First articles are published articles within a journal that have not yet been assigned to a formal issue. This means they do not yet have a volume number, issue number, or page numbers assigned to them, however, they can still be found and cited using their DOI (Digital Object Identifier). Online First publication benefits the research community by making new scientific discoveries known as quickly as possible.
Readers can access Online First articles via the “Online First” tab for the selected journal.
Expression recognition method combining convolutional features and Transformer
1. | School of Electronic Information Engineering, Beihang University, China |
2. | Elite Digital Technology Co., Beijing, China |
Expression recognition has been an important research direction in the field of psychology, which can be used in traffic, medical, security, and criminal investigation by expressing human feelings through the muscles in the corners of the mouth, eyes, and face. Most of the existing research work uses convolutional neural networks (CNN) to recognize face images and thus classify expressions, which does achieve good results, but CNN do not have enough ability to extract global features. The Transformer has advantages for global feature extraction, but the Transformer is more computationally intensive and requires a large amount of training data. So, in this paper, we use the hierarchical Transformer, namely Swin Transformer, for the expression recognition task, and its computational power will be greatly reduced. At the same time, it is fused with a CNN model to propose a network architecture that combines the Transformer and CNN, and to the best of our knowledge, we are the first to combine the Swin Transformer with CNN and use it in an expression recognition task. We then evaluate the proposed method on some publicly available expression datasets and can obtain competitive results.
References:
[1] |
T. Pang and A. Hussain,
Constants across culture in the face and emotion, Journal of Personality and Social Psychology, 17 (1971).
|
[2] |
X. S. Wei, C. L. Zhang, H. Zhang and J. Wu,
Deep bimodal regression of apparent personality traits from short video sequences, IEEE Transactions on Affective Computing, 9 (2017), 303-315.
|
[3] |
X. S. Wei, Y. Z. Song, O. Mac Aodha, J. X. Wu, Y. Peng, J. Tang, J. Yang and S Belongie,
Fine-grained image analysis with deep learning: A survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, (2021).
|
[4] |
K. He, X. Zhang, S. Ren and J. Sun,
Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770-778.
|
[5] |
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, AN. Gomez, L. Kaiser and I. Polosukhin,
Attention is all you need, Advances in Neural Information Processing Systems, (2017).
|
[6] |
T. Ma, M. Mao, H. Zheng, P. Gao, X. Wang, S. Han, E. Ding, B. Zhang and D. Doermann, Oriented object detection with transformer, preprint, (2021), arXiv: 2106.03146. |
[7] |
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit and N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, preprint, (2020), arXiv: 2010.11929. |
[8] |
C. Sun, A. Shrivastava, S. Singh and A. Gupta,
Revisiting unreasonable effectiveness of data in deep learning era, Proceedings of the IEEE International Conference on Computer vision, (2017).
|
[9] |
W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo and L. Shao, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, preprint, (2021), arXiv: 2102.12122.
doi: 10.1109/ICCV48922.2021.00061. |
[10] |
H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan and L. Zhang, Cvt: Introducing convolutions to vision transformers, preprint, (2021), arXiv: 2103.15808.
doi: 10.1109/ICCV48922.2021.00009. |
[11] |
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, preprint, (2021), arXiv: 2103.14030.
doi: 10.1109/ICCV48922.2021.00986. |
[12] |
T. Baltrusaitis, M. Mahmoud and P. Robinson,
Cross-dataset learning and person-specific normalisation for automatic Action Unit detection, IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, (2015), 1-6.
|
[13] |
C. Shan, S. Gong and P. W. McOwan,
Facial expression recognition based on Local Binary Patterns: A comprehensive study, Image and Vision Computing, 27 (2009), 803-816.
|
[14] |
B. Jiang, B. Martinez, M. F. Valstar and M. Pantic,
Decision level fusion of domain specific regions for facial action recognition, 2014 22nd international conference on pattern recognition, (2014), 1776-1781.
|
[15] |
B. Fasel,
Robust face analysis using convolutional neural networks, International Conference on Pattern, (2002).
|
[16] |
C. Pramerdorfer and M. Kampel, Facial expression recognition using convolutional neural networks: State of the art, preprint, (2016), arXiv: 1612.02903. |
[17] |
A. Krizhevsky, I. Sutskever and G. E. Hinton,
Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 25 (2012), 1097-1105.
|
[18] |
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, (2014), arXiv: 1409.1556. |
[19] |
A. F. Agarap, Deep learning using rectified linear units (relu), preprint, (2018), arXiv: 1803.08375. |
[20] |
S. Ioffe and C. Szegedy,
Batch normalization: Accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning, (2015), 448-456.
|
[21] |
B. Li and D. Lima,
Facial expression recognition via ResNet-50, International Journal of Cognitive Computing in Engineering, 2 (2021), 57-64.
|
[22] |
D. Orozco1, C. Lee, Y. Arabadzhi and D. Gupta,
Transfer learning for facial expression recognition, Florida State Univ.: Tallahassee, (2018).
|
[23] |
P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I. Matthews,
The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, (2010).
|
[24] |
M. Lyons, S. Akamatsu, M. Kamachi and J. Gyoba,
Coding facial expressions with gabor wavelets, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, (1998), 200-205.
|
[25] |
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li and F.-F. Li,
Imagenet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009).
|
[26] |
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov and S. Zagoruyko,
End-to-end object detection with transformers, European Conference on Computer Vision, (2020).
|
[27] |
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles and H. Jegou,
Training data-efficient image transformers and distillation through attention, International Conference on Machine Learning, (2021).
|
[28] |
B. Sun, L. Li, G. Zhou, X. Wu, J. He, L. Yu, D. Li and Q. Wei,
Combining multimodal features within a fusion network for emotion recognition in the wild, Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, (2015), 497-502.
|
[29] |
R. Girshick, J. Donahue, T. Darrell and J. Malik,
Rich feature hierarchies for accurate object detection and semantic segmentation, IEEE Computer Society, (2013).
|
[30] |
J. Li, D. Zhang, J. Zhang, J. Zhang, T. Li, Y. Xia, Q. Yan and L. Xun,
Facial expression recognition with faster R-CNN, Procedia Computer Science, 107 (2017), 135-140.
|
[31] |
S. Ren, K. He, R. Girshick and J. Sun,
Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (2017), 1137-1149.
|
[32] |
A. Mollahosseini, D. Chan and M. H. Mahoor,
Going deeper in facial expression recognition using deep neural networks, 2016 IEEE Winter Conference on Applications of Computer Vision, (2016).
|
[33] |
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich,
Going deeper with convolutions, IEEE Computer Society, (2014).
|
[34] |
G. E. Hinton, S. Osindero and Y. W. Teh,
A fast learning algorithm for deep belief nets, Neural Computation, 18 (2014), 1527-1554.
doi: 10.1162/neco.2006.18.7.1527. |
[35] |
G. E. Hinton and R. R. Salakhutdinov,
Reducing the dimensionality of data with neural networks, Science, 313 (2006), 504-507.
doi: 10.1126/science.1127647. |
[36] |
S. Hochreiter and J. Schmidhuber,
Long short-term memory, Neural Computation, 9 (1997), 1735-1780.
|
[37] |
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio,
Generative adversarial nets, Advances in Neural Information Processing Systems, (2014), 2672-2680.
|
[38] |
Z. Peng, W. Huang, S. Gu, L. Xie, Y. Wang, J. Jiao and Q. Ye,
Conformer: Local features coupling global representations for visual recognition, Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 367-376.
|
[39] |
Y. Chen, X. Dai, D. Chen, M. Liu, X. Dong, L. Yuan and Z. Liu, Mobile-former: Bridging mobilenet and transformer, preprint, (2021), arXiv: 2108.05895. |
[40] |
A. Stergiou, R. Poppe and G. Kalliatakis,
Refining activation downsampling with SoftPool, Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 10357-10366.
|
[41] |
R. Müller, S. Kornblith and G. E. Hinton, When does label smoothing help?, preprint, (2019), arXiv: 1906.02629. |
[42] |
E. Barsoum, C. Zhang, C. C. Ferrer and Z. Zhang,
Training deep networks for facial expression recognition with crowd-sourced label distribution, Proceedings of the 18th ACM International Conference on Multimodal Interaction, (2016).
|
[43] |
I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D. H. Lee, Y. Zhou, C. Ramaiah, F. Feng, R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. Xie, L. Romaszko, B. Xu, Z. Chuang and Y. Bengio,
Challenges in representation learning: A report on three machine learning contests, International Conference on Neural Information Processing, (2013), 117-124.
|
[44] |
A. Mollahosseini, B. Hasani and M. H. Mahoor,
Affectnet: A database for facial expression, valence, and arousal computing in the wild, IEEE Transactions on Affective Computing, 10 (2017), 18-31.
|
[45] |
S. Li and W. Deng,
Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition, IEEE Transactions on Image Processing, 28 (2018), 356-370.
doi: 10.1109/TIP.2018.2868382. |
[46] |
A. Bulat and G. Tzimiropoulos,
How far are we from solving the 2d and 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks), Proceedings of the IEEE International Conference on Computer Vision, (2017), 1021-1030.
|
[47] |
S. Miao, H. Xu and Z. Han,
Recognizing facial expressions using a shallow convolutional neural network, IEEE Access, 7 (2019), 78000-78011.
|
[48] |
K. Wang, X. Peng, J. Yang, D. Meng and Y. Qiao,
Region attention networks for pose and occlusion robust facial expression recognition, IEEE Transactions on Image Processing, 29 (2020), 4057-4069.
|
[49] |
K. Wang, X. Peng, J. Yang, S. Lu and Y. Qiao,
Suppressing uncertainties for large-scale facial expression recognition, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), 6897-6906.
|
[50] |
X. Fan, Z. Deng, K. Wang, X. Peng and Y. Qiao,
Learning discriminative representation for facial expression recognition from uncertainties, 2020 IEEE International Conference on Image Processing, (2020), 903-907.
|
[51] |
H. Li, M. Sui, F. Zhao, Z. Zha and F. Wu, MViT: Mask Vision Transformer for Facial Expression Recognition in the wild, preprint, (2021), arXiv: 2106.04520. |
[52] |
X. Zhao, X. Liang, L. Liu, T. Li, Y. Han, N. Vasconcelos and S. Yan,
Peak-piloted deep network for facial expression recognition, European Conference on Computer Vision, (2016), 425-442.
|
[53] |
H. Ding, S. K. Zhou and R. Chellappa,
Facenet2expnet: Regularizing a deep face recognition net for expression recognition, 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition, (2017), 118-126.
|
[54] |
S. Minaee, M. Minaei and A. Abdolrashidi,
Deep-Emotion: Facial expression recognition using attentional convolutional network, Sensors, 21 (2021), 3046.
|
[55] |
Z. Cui, T. Song, Y. Wang and Q. Ji,
Knowledge augmented deep neural networks for joint facial expression and action unit recognition, Advances in Neural Information Processing Systems, 33 (2020).
|
[56] |
M. Aouayeb, W. Hamidouche, C. Soladie, K. Kpalma and R. Seguier, Learning vision transformer with squeeze and excitation for facial expression recognition, preprint, (2021), arXiv: 2107.03107. |
[57] |
T. H. Vo, G. S. Lee, H. J. Yang and S. H. Kim,
Pyramid with super resolution for In-the-Wild facial expression recognition, IEEE Access, 8 (2020), 131988-132001.
|
[58] |
F. Ma, B. Sun and S. Li, Robust facial expression recognition with convolutional visual transformers, preprint, (2021), arXiv: 2103.16854. |
show all references
References:
[1] |
T. Pang and A. Hussain,
Constants across culture in the face and emotion, Journal of Personality and Social Psychology, 17 (1971).
|
[2] |
X. S. Wei, C. L. Zhang, H. Zhang and J. Wu,
Deep bimodal regression of apparent personality traits from short video sequences, IEEE Transactions on Affective Computing, 9 (2017), 303-315.
|
[3] |
X. S. Wei, Y. Z. Song, O. Mac Aodha, J. X. Wu, Y. Peng, J. Tang, J. Yang and S Belongie,
Fine-grained image analysis with deep learning: A survey, IEEE Transactions on Pattern Analysis and Machine Intelligence, (2021).
|
[4] |
K. He, X. Zhang, S. Ren and J. Sun,
Deep residual learning for image recognition, 2016 IEEE Conference on Computer Vision and Pattern Recognition, (2016), 770-778.
|
[5] |
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, AN. Gomez, L. Kaiser and I. Polosukhin,
Attention is all you need, Advances in Neural Information Processing Systems, (2017).
|
[6] |
T. Ma, M. Mao, H. Zheng, P. Gao, X. Wang, S. Han, E. Ding, B. Zhang and D. Doermann, Oriented object detection with transformer, preprint, (2021), arXiv: 2106.03146. |
[7] |
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit and N. Houlsby, An image is worth 16x16 words: Transformers for image recognition at scale, preprint, (2020), arXiv: 2010.11929. |
[8] |
C. Sun, A. Shrivastava, S. Singh and A. Gupta,
Revisiting unreasonable effectiveness of data in deep learning era, Proceedings of the IEEE International Conference on Computer vision, (2017).
|
[9] |
W. Wang, E. Xie, X. Li, D. P. Fan, K. Song, D. Liang, T. Lu, P. Luo and L. Shao, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions, preprint, (2021), arXiv: 2102.12122.
doi: 10.1109/ICCV48922.2021.00061. |
[10] |
H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan and L. Zhang, Cvt: Introducing convolutions to vision transformers, preprint, (2021), arXiv: 2103.15808.
doi: 10.1109/ICCV48922.2021.00009. |
[11] |
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin and B. Guo, Swin transformer: Hierarchical vision transformer using shifted windows, preprint, (2021), arXiv: 2103.14030.
doi: 10.1109/ICCV48922.2021.00986. |
[12] |
T. Baltrusaitis, M. Mahmoud and P. Robinson,
Cross-dataset learning and person-specific normalisation for automatic Action Unit detection, IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, (2015), 1-6.
|
[13] |
C. Shan, S. Gong and P. W. McOwan,
Facial expression recognition based on Local Binary Patterns: A comprehensive study, Image and Vision Computing, 27 (2009), 803-816.
|
[14] |
B. Jiang, B. Martinez, M. F. Valstar and M. Pantic,
Decision level fusion of domain specific regions for facial action recognition, 2014 22nd international conference on pattern recognition, (2014), 1776-1781.
|
[15] |
B. Fasel,
Robust face analysis using convolutional neural networks, International Conference on Pattern, (2002).
|
[16] |
C. Pramerdorfer and M. Kampel, Facial expression recognition using convolutional neural networks: State of the art, preprint, (2016), arXiv: 1612.02903. |
[17] |
A. Krizhevsky, I. Sutskever and G. E. Hinton,
Imagenet classification with deep convolutional neural networks, Advances in Neural Information Processing Systems, 25 (2012), 1097-1105.
|
[18] |
K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, preprint, (2014), arXiv: 1409.1556. |
[19] |
A. F. Agarap, Deep learning using rectified linear units (relu), preprint, (2018), arXiv: 1803.08375. |
[20] |
S. Ioffe and C. Szegedy,
Batch normalization: Accelerating deep network training by reducing internal covariate shift, International Conference on Machine Learning, (2015), 448-456.
|
[21] |
B. Li and D. Lima,
Facial expression recognition via ResNet-50, International Journal of Cognitive Computing in Engineering, 2 (2021), 57-64.
|
[22] |
D. Orozco1, C. Lee, Y. Arabadzhi and D. Gupta,
Transfer learning for facial expression recognition, Florida State Univ.: Tallahassee, (2018).
|
[23] |
P. Lucey, J. F. Cohn, T. Kanade, J. Saragih, Z. Ambadar and I. Matthews,
The extended cohn-kanade dataset (ck+): A complete dataset for action unit and emotion-specified expression, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, (2010).
|
[24] |
M. Lyons, S. Akamatsu, M. Kamachi and J. Gyoba,
Coding facial expressions with gabor wavelets, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition, (1998), 200-205.
|
[25] |
J. Deng, W. Dong, R. Socher, L. J. Li, K. Li and F.-F. Li,
Imagenet: A large-scale hierarchical image database, 2009 IEEE Conference on Computer Vision and Pattern Recognition, (2009).
|
[26] |
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov and S. Zagoruyko,
End-to-end object detection with transformers, European Conference on Computer Vision, (2020).
|
[27] |
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles and H. Jegou,
Training data-efficient image transformers and distillation through attention, International Conference on Machine Learning, (2021).
|
[28] |
B. Sun, L. Li, G. Zhou, X. Wu, J. He, L. Yu, D. Li and Q. Wei,
Combining multimodal features within a fusion network for emotion recognition in the wild, Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, (2015), 497-502.
|
[29] |
R. Girshick, J. Donahue, T. Darrell and J. Malik,
Rich feature hierarchies for accurate object detection and semantic segmentation, IEEE Computer Society, (2013).
|
[30] |
J. Li, D. Zhang, J. Zhang, J. Zhang, T. Li, Y. Xia, Q. Yan and L. Xun,
Facial expression recognition with faster R-CNN, Procedia Computer Science, 107 (2017), 135-140.
|
[31] |
S. Ren, K. He, R. Girshick and J. Sun,
Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence, 39 (2017), 1137-1149.
|
[32] |
A. Mollahosseini, D. Chan and M. H. Mahoor,
Going deeper in facial expression recognition using deep neural networks, 2016 IEEE Winter Conference on Applications of Computer Vision, (2016).
|
[33] |
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich,
Going deeper with convolutions, IEEE Computer Society, (2014).
|
[34] |
G. E. Hinton, S. Osindero and Y. W. Teh,
A fast learning algorithm for deep belief nets, Neural Computation, 18 (2014), 1527-1554.
doi: 10.1162/neco.2006.18.7.1527. |
[35] |
G. E. Hinton and R. R. Salakhutdinov,
Reducing the dimensionality of data with neural networks, Science, 313 (2006), 504-507.
doi: 10.1126/science.1127647. |
[36] |
S. Hochreiter and J. Schmidhuber,
Long short-term memory, Neural Computation, 9 (1997), 1735-1780.
|
[37] |
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville and Y. Bengio,
Generative adversarial nets, Advances in Neural Information Processing Systems, (2014), 2672-2680.
|
[38] |
Z. Peng, W. Huang, S. Gu, L. Xie, Y. Wang, J. Jiao and Q. Ye,
Conformer: Local features coupling global representations for visual recognition, Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 367-376.
|
[39] |
Y. Chen, X. Dai, D. Chen, M. Liu, X. Dong, L. Yuan and Z. Liu, Mobile-former: Bridging mobilenet and transformer, preprint, (2021), arXiv: 2108.05895. |
[40] |
A. Stergiou, R. Poppe and G. Kalliatakis,
Refining activation downsampling with SoftPool, Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), 10357-10366.
|
[41] |
R. Müller, S. Kornblith and G. E. Hinton, When does label smoothing help?, preprint, (2019), arXiv: 1906.02629. |
[42] |
E. Barsoum, C. Zhang, C. C. Ferrer and Z. Zhang,
Training deep networks for facial expression recognition with crowd-sourced label distribution, Proceedings of the 18th ACM International Conference on Multimodal Interaction, (2016).
|
[43] |
I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D. H. Lee, Y. Zhou, C. Ramaiah, F. Feng, R. Li, X. Wang, D. Athanasakis, J. Shawe-Taylor, M. Milakov, J. Park, R. Ionescu, M. Popescu, C. Grozea, J. Bergstra, J. Xie, L. Romaszko, B. Xu, Z. Chuang and Y. Bengio,
Challenges in representation learning: A report on three machine learning contests, International Conference on Neural Information Processing, (2013), 117-124.
|
[44] |
A. Mollahosseini, B. Hasani and M. H. Mahoor,
Affectnet: A database for facial expression, valence, and arousal computing in the wild, IEEE Transactions on Affective Computing, 10 (2017), 18-31.
|
[45] |
S. Li and W. Deng,
Reliable crowdsourcing and deep locality-preserving learning for unconstrained facial expression recognition, IEEE Transactions on Image Processing, 28 (2018), 356-370.
doi: 10.1109/TIP.2018.2868382. |
[46] |
A. Bulat and G. Tzimiropoulos,
How far are we from solving the 2d and 3d face alignment problem?(and a dataset of 230,000 3d facial landmarks), Proceedings of the IEEE International Conference on Computer Vision, (2017), 1021-1030.
|
[47] |
S. Miao, H. Xu and Z. Han,
Recognizing facial expressions using a shallow convolutional neural network, IEEE Access, 7 (2019), 78000-78011.
|
[48] |
K. Wang, X. Peng, J. Yang, D. Meng and Y. Qiao,
Region attention networks for pose and occlusion robust facial expression recognition, IEEE Transactions on Image Processing, 29 (2020), 4057-4069.
|
[49] |
K. Wang, X. Peng, J. Yang, S. Lu and Y. Qiao,
Suppressing uncertainties for large-scale facial expression recognition, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), 6897-6906.
|
[50] |
X. Fan, Z. Deng, K. Wang, X. Peng and Y. Qiao,
Learning discriminative representation for facial expression recognition from uncertainties, 2020 IEEE International Conference on Image Processing, (2020), 903-907.
|
[51] |
H. Li, M. Sui, F. Zhao, Z. Zha and F. Wu, MViT: Mask Vision Transformer for Facial Expression Recognition in the wild, preprint, (2021), arXiv: 2106.04520. |
[52] |
X. Zhao, X. Liang, L. Liu, T. Li, Y. Han, N. Vasconcelos and S. Yan,
Peak-piloted deep network for facial expression recognition, European Conference on Computer Vision, (2016), 425-442.
|
[53] |
H. Ding, S. K. Zhou and R. Chellappa,
Facenet2expnet: Regularizing a deep face recognition net for expression recognition, 2017 12th IEEE International Conference on Automatic Face and Gesture Recognition, (2017), 118-126.
|
[54] |
S. Minaee, M. Minaei and A. Abdolrashidi,
Deep-Emotion: Facial expression recognition using attentional convolutional network, Sensors, 21 (2021), 3046.
|
[55] |
Z. Cui, T. Song, Y. Wang and Q. Ji,
Knowledge augmented deep neural networks for joint facial expression and action unit recognition, Advances in Neural Information Processing Systems, 33 (2020).
|
[56] |
M. Aouayeb, W. Hamidouche, C. Soladie, K. Kpalma and R. Seguier, Learning vision transformer with squeeze and excitation for facial expression recognition, preprint, (2021), arXiv: 2107.03107. |
[57] |
T. H. Vo, G. S. Lee, H. J. Yang and S. H. Kim,
Pyramid with super resolution for In-the-Wild facial expression recognition, IEEE Access, 8 (2020), 131988-132001.
|
[58] |
F. Ma, B. Sun and S. Li, Robust facial expression recognition with convolutional visual transformers, preprint, (2021), arXiv: 2103.16854. |





Method | CNN Block | FERPlus | CK+ | AffectNet-8 | RAF-DB |
Swin | no | 0.855 | 0.975 | 0.587 | 0.855 |
Swin+CNN(Ours) | yes | 0.874 | 0.982 | 0.607 | 0.878 |
Method | CNN Block | FERPlus | CK+ | AffectNet-8 | RAF-DB |
Swin | no | 0.855 | 0.975 | 0.587 | 0.855 |
Swin+CNN(Ours) | yes | 0.874 | 0.982 | 0.607 | 0.878 |
Method | Face alignment | FERPlus | CK+ | AffectNet-8 | RAF-DB |
Swin+CNN(Ours) | no | 0.860 | 0.965 | 0.594 | 0.864 |
Swin+CNN(Ours) | yes | 0.874 | 0.982 | 0.607 | 0.878 |
Method | Face alignment | FERPlus | CK+ | AffectNet-8 | RAF-DB |
Swin+CNN(Ours) | no | 0.860 | 0.965 | 0.594 | 0.864 |
Swin+CNN(Ours) | yes | 0.874 | 0.982 | 0.607 | 0.878 |
Method | pre-training | FERPlus | CK+ | AffectNet-8 | RAF-DB |
Swin+CNN(Ours) | no | 0.862 | 0.977 | 0.589 | 0.868 |
Swin+CNN(Ours) | yes | 0.874 | 0.982 | 0.607 | 0.878 |
Method | pre-training | FERPlus | CK+ | AffectNet-8 | RAF-DB |
Swin+CNN(Ours) | no | 0.862 | 0.977 | 0.589 | 0.868 |
Swin+CNN(Ours) | yes | 0.874 | 0.982 | 0.607 | 0.878 |
Method | GPU Memory Usage |
Swin Transformer | 5705MiB |
Ours | 6505MiB |
Method | GPU Memory Usage |
Swin Transformer | 5705MiB |
Ours | 6505MiB |
[1] |
Hyeontae Jo, Hwijae Son, Hyung Ju Hwang, Eun Heui Kim. Deep neural network approach to forward-inverse problems. Networks and Heterogeneous Media, 2020, 15 (2) : 247-259. doi: 10.3934/nhm.2020011 |
[2] |
Zheng Chen, Liu Liu, Lin Mu. Solving the linear transport equation by a deep neural network approach. Discrete and Continuous Dynamical Systems - S, 2022, 15 (4) : 669-686. doi: 10.3934/dcdss.2021070 |
[3] |
Editorial Office. Retraction: Honggang Yu, An efficient face recognition algorithm using the improved convolutional neural network. Discrete and Continuous Dynamical Systems - S, 2019, 12 (4&5) : 901-901. doi: 10.3934/dcdss.2019060 |
[4] |
Yuantian Xia, Juxiang Zhou, Tianwei Xu, Wei Gao. An improved deep convolutional neural network model with kernel loss function in image classification. Mathematical Foundations of Computing, 2020, 3 (1) : 51-64. doi: 10.3934/mfc.2020005 |
[5] |
Lars Grüne. Computing Lyapunov functions using deep neural networks. Journal of Computational Dynamics, 2021, 8 (2) : 131-152. doi: 10.3934/jcd.2021006 |
[6] |
Harbir Antil, Thomas S. Brown, Rainald Löhner, Fumiya Togashi, Deepanshu Verma. Deep neural nets with fixed bias configuration. Numerical Algebra, Control and Optimization, 2022 doi: 10.3934/naco.2022016 |
[7] |
Mingliang Xue, Xiaodong Duan, Wanquan Liu. Eliminating other-race effect for multi-ethnic facial expression recognition. Mathematical Foundations of Computing, 2019, 2 (1) : 43-53. doi: 10.3934/mfc.2019004 |
[8] |
Christopher Oballe, David Boothe, Piotr J. Franaszczuk, Vasileios Maroulas. ToFU: Topology functional units for deep learning. Foundations of Data Science, 2021 doi: 10.3934/fods.2021021 |
[9] |
Richard Archibald, Feng Bao, Yanzhao Cao, He Zhang. A backward SDE method for uncertainty quantification in deep learning. Discrete and Continuous Dynamical Systems - S, 2022, 15 (10) : 2807-2835. doi: 10.3934/dcdss.2022062 |
[10] |
Ziju Shen, Yufei Wang, Dufan Wu, Xu Yang, Bin Dong. Learning to scan: A deep reinforcement learning approach for personalized scanning in CT imaging. Inverse Problems and Imaging, 2022, 16 (1) : 179-195. doi: 10.3934/ipi.2021045 |
[11] |
Marcello Delitala, Tommaso Lorenzi. Recognition and learning in a mathematical model for immune response against cancer. Discrete and Continuous Dynamical Systems - B, 2013, 18 (4) : 891-914. doi: 10.3934/dcdsb.2013.18.891 |
[12] |
Jianfeng Feng, Mariya Shcherbina, Brunello Tirozzi. Stability of the dynamics of an asymmetric neural network. Communications on Pure and Applied Analysis, 2009, 8 (2) : 655-671. doi: 10.3934/cpaa.2009.8.655 |
[13] |
Weiping Li, Haiyan Wu, Jie Yang. Intelligent recognition algorithm for social network sensitive information based on classification technology. Discrete and Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1385-1398. doi: 10.3934/dcdss.2019095 |
[14] |
Martin Benning, Elena Celledoni, Matthias J. Ehrhardt, Brynjulf Owren, Carola-Bibiane Schönlieb. Deep learning as optimal control problems: Models and numerical methods. Journal of Computational Dynamics, 2019, 6 (2) : 171-198. doi: 10.3934/jcd.2019009 |
[15] |
Nicholas Geneva, Nicholas Zabaras. Multi-fidelity generative deep learning turbulent flows. Foundations of Data Science, 2020, 2 (4) : 391-428. doi: 10.3934/fods.2020019 |
[16] |
Miria Feng, Wenying Feng. Evaluation of parallel and sequential deep learning models for music subgenre classification. Mathematical Foundations of Computing, 2021, 4 (2) : 131-143. doi: 10.3934/mfc.2021008 |
[17] |
Govinda Anantha Padmanabha, Nicholas Zabaras. A Bayesian multiscale deep learning framework for flows in random media. Foundations of Data Science, 2021, 3 (2) : 251-303. doi: 10.3934/fods.2021016 |
[18] |
Suhua Wang, Zhiqiang Ma, Hongjie Ji, Tong Liu, Anqi Chen, Dawei Zhao. Personalized exercise recommendation method based on causal deep learning: Experiments and implications. STEM Education, 2022, 2 (2) : 157-172. doi: 10.3934/steme.2022011 |
[19] |
Cai-Tong Yue, Jing Liang, Bo-Fei Lang, Bo-Yang Qu. Two-hidden-layer extreme learning machine based wrist vein recognition system. Big Data & Information Analytics, 2017, 2 (1) : 59-68. doi: 10.3934/bdia.2017008 |
[20] |
Hao Li, Honglin Chen, Matt Haberland, Andrea L. Bertozzi, P. Jeffrey Brantingham. PDEs on graphs for semi-supervised learning applied to first-person activity recognition in body-worn video. Discrete and Continuous Dynamical Systems, 2021, 41 (9) : 4351-4373. doi: 10.3934/dcds.2021039 |
Impact Factor:
Tools
Metrics
Other articles
by authors
[Back to Top]