Eliminating other-race effect for multi-ethnic facial expression recognition

It has been noticed that the performance of multi-ethnic facial expression recognition is affected by other-race effect significantly. Though this phenomenon has been noticed by psychologists and computer vision researchers for decades, the mechanism of other-race effect is still unknown and few work has been done to compensate or remove this effect. This work proposes an ICA-based method to eliminate the other-race effect in automatic 3D facial expression recognition. Firstly, the depth features are extracted from 3D local facial patches, and independent component analysis is applied to project the features into a subspace in which the projected features are mutually independent. The ethnic-related features and expression-related features are supposed to be separated in ICA subspace. Hence, ethnic-sensitive features are then determined by an entropy-based feature selection method and discarded to depress their influence on facial expression recognition. The proposed method is evaluated on benchmark BU-3DFE database, and the experimental results reveal that the influence caused by other-race effect can be suppressed effectively with the proposed method.

1. Introduction. Facial expressions play an important role in human nonverbal communication since they provide an effective way to express intentions and convey emotions. Automatic facial expression recognition has been studied intensively in past decades, due to its potential applications in computer animation, human computer interaction, and psychology. Psychological studies have found that multiethnic facial expression recognition performance is significantly affected by facial ethnical features. The work done in [15] has revealed the existence of other-race effect (ORE) in human face perception and processing, i.e. people show better performance in memorizing and recognizing the facial features of their own race or ethnicity than that of other-race or other-ethnicity. The existence of ORE will not only interfere the cross-ethnicity or cross-culture communication [9], but also bring difficulties to facial expression recognition algorithms [17]. In fact, face-based biometric recognition techniques are always affected by the demographic information carried in facial images, such as ethnicity, age, and gender etc.. Such information normally mixes together with each other, which causes difficulties for effective feature extraction. Though the other-race effect has been noticed by computer vision scientists for decades, there are very few work focuses on how to remove or ease its influence to facial expression recognition algorithms.
Other-race effect was first proposed by the work of [8] in 1914, which refers human has better ability in recognizing the faces of own race than that of other races. It had been an active research topic in psychological study ever since the first report [15] which was published in 1969. The psychological studies on ORE usually aim to verify the universality of human face perception and interpret this phenomenon in social cognition. At the same time, many psychological researchers paid their attention to the influence caused by other-race effect to facial expression understanding and recognition. The existence of ORE in facial expression recognition has been verified by many works [3,1,23,18] and the neuromechanism of ORE generation are discussed accordingly. These pioneering studies in psychological field shed lights on the investigation of ORE in computer vision models.
This paper proposes a ICA-based method to eliminate other-race effect from 3D facial expression recognition by removing the ethnic-related features from face images. The 3D face images from BU-3DFE database [24] are used in the experiments which are designed to evaluate the proposed method because they are not only invariant to illumination changes and pose variations, but also rich in shape and deformation [25]. Firstly, the depth features of 3D faces are extracted from local facial patches and then projected into a subspace constructed by independent component analysis (ICA), in which the projected features are supposed to be mutually independent. The independency of the features in ICA subspace facilitates ethnicrelated feature selection, in which the ethnic-related features are determined by a feature learning algorithm wrapped with ethnicity classification. Finally, the features without ethnicity information after removing the ethnic-related ones are used to conduct facial expression recognition. The experimental results reveal that the multi-ethnic facial expression recognition performance can be improved significantly with the proposed method.
2. Related works. The cross ethnicity facial expression differences are the critical reason why the other-race effect exists multi-ethnic expression recognition. Actually, the universality of facial expressions of emotion has remained one of the most controversial opinions in the biological and social sciences ever since Darwin's seminal works [4]. The universality hypothesis claims that all humans with different cultures, ethnicities, and races use the same facial muscle movements to communicate six basic internal emotions [7,19]. Based on the universality hypothesis, Ekman [6] proposed facial action coding system (FACS) to encode facial features of emotions, and six prototypic expressions (happy, surprise, fear, disgust, anger, and sad) are described accordingly. The six prototypic expression definition by Ekman has been widely adopted by facial expression recognition researches in both the psychological and the computer vision area. However, recent studies find out that facial expressions vary among different ethnicities or cultures, which undermines the assumed expression universality. The work [11] revealed that the individuals of different races use various expressions to communicate same emotions. Another work [12] analyzed several basic expressions and facial action units of Caucasians and Asians by clustering, and the results indicate that the six prototypic expressions are not universal. These findings suggest that facial expression features differ across races. In order to explore the mechanism of other-race effect and ease its influence to expression recognition, it is straightforward to determine what kind of facial features are highly related to ethnicities, and then investigate how these features affect facial expression recognition performance in computational models.
The early researches about other-race effect were mainly conducted by psychologists, which focused on the factors that may cause the effect. The progress achieved by psychological investigation has inspired the computer vision researchers to study the other-race effect from the computational perspective. Several computational models have been proposed to inspect the mechanism of ORE in face image based recognition tasks. O'Toole et al. [16] reviewed the works which focus on the ORE in face recognition algorithm from the perspective of feature learning, and discussed the contribution of training data and feature learning strategy to generate otherrace effect, and the conditions on which other-race effect could be simulated based on computational models. Another work [21] simulated other-race effect in 3D facial expression recognition using the Caucasian and East Asian individuals' expression images in BU-3DFE database. The facial depth features are extracted to represent expressions for each ethnic group, and cross-validation results show that the performance of 3D facial expression recognition is affected by other-race effect significantly. Meanwhile, Fu et al. [9] studied the influence caused by facial ethnicity information in multi-race facial feature analysis, and pointed out that other-race effect in face recognition and facial expression recognition is mainly caused by training data. It is worth to construct a training database with equal samples from all the races or embed imbalanced learning into biometric recognition framework to solve the issues caused by other-race effect. In [3], a computational model EMPATH [2] is designed based on neural network to encode the varieties of the expression perception between two cultural groups. The results show that 'expression dialect' exists across different cultural groups, which causes the own-group recognition advantage.
The above works focused on either verifying the existence of ORE or the conditions to simulate ORE in facial expression recognition based on computational models. There are barely no work which seeks to ease the influence caused by other-race effect in facial expression recognition, except the work [9] points out that imbalanced learning may help to alleviate the influence. In fact, the only work focused on racial bias reduction is done by [20], in which a deep information maximization adaptation network (IMAN) was proposed to narrow down the racial bias in face recognition. Inspired by the method adopted in blind signal source separation, this work proposes an ICA-based other-race effect elimination method, which decomposes the mixed demographic information in human face by projecting face images to independent components. A ethnic-related facial feature selection is then designed based on mutual information evaluation. The ethnic-related features are removed according to its relevance score, and finally the expression recognition is performed based on the remaining features. The experimental results show that the proposed method could ease the influence caused by other-race effect and improve the performance of multi-ethnic facial expression recognition. determined by two types of attributes, the internal ones like the ethnicity, age, and gender etc. and the external ones like the pose and illumination. All kinds of information mix together, which always affect face-based computational recognition tasks. Unlike the pose and illumination variations, the external factors could be under control at the stage of capturing images of faces. The facial ethnicity always co-exists with the expression features in this work. Thus, it is necessary to separate facial expression features and ethnic features before recognition. Inspired by the idea of blind signal source separation, this work embed independent component analysis in facial expression feature extraction to separate ethnicity information from expression images. In fact, independent component analysis has been applied to facial expression recognition for decades, though ICA is mainly adopted to learn a subspace for face image representation. However, in our work, ICA is utilized to identify what kind of features are related to facial ethnicity information, which facilitates the other-race effect elimination. Theoretically, the features in ICA subspace are mutually independent, which can help to ease the influence caused by other-race effect if the ethnic-related features are removed.
3.1. ICA-based facial expression feature decomposition. As a non-convex 3D object, human face usually deforms non-rigidly according to different expressions. In order to encode facial expression feature sufficiently, many kind of features [26,14,13] have been proposed to represent 3D facial expressions. The raw scan of faces are recorded by a 3D point cloud. A smooth facial surface is fitted to the given face point cloud by grid fitting [5] and then 30 facial landmarks are detected to extract local depth features. Finally, the expression on each 3D face is represented by the local depth features around the 30 landmarks [22].
The influence to facial expression recognition caused by other-race effect could be eased if the ethnic-related features can be identified and removed. However, human face contains different kinds of information, including ethnicity, gender, age etc., which usually mix together after feature extraction. Obviously, discarding any of these features directly may cause the loss of discriminative expression information, which will lower the recognition performance. Hence, this work proposes to decompose facial features into statistically independent components and then use their combination to represent facial expressions. The decomposition is supposed to decorrelate the mixed information and facilitates the race-sensitive feature elimination. Suppose there are N faces x i (i = 1, 2, ..., N ) in the training set that could be observed, and human facial features are affected by m factors s j (j = 1, 2, ..., m) that are mutually independent. That is to say, the factors that affect facial features are assumed to be independent, and facial features could be represented by the linear combination of these factors where X = [x 1 , x 2 .., x N ] T is the observed training images, S = [s 1 , s 2 .., s m ] T is the matrix composed by the independent factors, and A is an n × m matrix. In fact, equation (1) is the basic model of ICA, which describes how the observed signals, i.e. face x i , are composed by the independent factors s j . Generally, the independent factor s j could not be observed and the mixing matrix A is unknown, the only information available is the training data. Hence, the aim of facial features decomposition is to estimate mixing matrix A and independent components s j based on the observed training face images. However, it is unfeasible to estimate the independent component s j from training images X without knowing the mixing matrix A. An alternative method is to find a demixing matrix W using ICA, and then independent componentsŜ could be obtained byŜ = W X, in which theŜ is as approximate as the real independent factors S. The training images could be projected into ICA subspace if the demixing matrix are obtained, and 3D facial features could be represented based on the independent components. The demixing matrix could be estimated by fastICA algorithm [10]. Using the demixing matrix W , the independent components or basis could be calculated as follows, ., e M ] T is the estimated independent basis, each e k represents one statistically independent basis. Given a face images f , it could be represented by a linear combination of E, where a k is the projection coefficients.

3.2.
Ethnic-related feature selection and elimination. The facial features in ICA subspace are supposed to be statistically and mutually independent. Hence, the influence caused by other-race effect can be suppressed by filtering the ethnic-related features out. In this work, we proposed an ethnic-related feature selection method by identifying the most discriminative features for ethnicity classification. The feature combination which achieves the best ethnicity classification performance are considered as ethnic-related features. In multi-ethnic facial expression recognition, the training samples have the labels of ethnicity and expression. The training data X for expression recognition are split into two subset for race classification, denoted as X gallary and X probe . The ethnic-related features could be identified by an incremental feature learning process, during which the relevance and redundancy among the features are considered. Since the ethnic label r in the training set X gallary and the feature set F = {f 1 , f 2 , · · · , f i , · · · , f n } are known, the mutual information between feature f i and the ethnicity e could be calculated to measure the relevance by where p(f i ) and p(e) is the probability distribution of feature f i and ethnicity e, and p(f i , e) is the joint probability distribution of feature f i and ethnicity e. Additionally, feature redundancy exists among the ethnic-related features, which can be measured by mutual information between feature f i and f j as follows, It can be seen that that combination of most relevant features does not necessarily form the optimal feature set for ethnicity classification, due to the existence of redundancy among the most relevant features. Therefore, the feature learning criterion is define to maximize the relevance as well as minimize the redundancy: where |F | is the size of the depth feature set F . The most discriminative features for ethnicity classification could be obtained by maximizing this feature learning criterion.

Algorithm 1 Forward Feature Selection for Ethnic-related Features
Require: Original features in ICA subspace S Ensure: Optimal ethnic-related features: S * Initialize: Ethnicity classification error: E = 1, cnt = 1; Candidate feature set: As shown in Algorithm 1, the optimal ethnic-related features can be identified one by one by a forward process according to the criterion in Equation (7). Suppose the original facial feature set S and the optimal ethnic-related feature set S * , the feature set S deORE = S − S * can be obtained by discarding the features in S * . The remained features in S deORE are used for multi-ethnic facial expression recognition in the proposed method. The features belong to this set have statistically the least correlation with facial ethnicity, which are expected to alleviate the influence caused by other-race effect. In the learning process, the features are sorted by the mutual information with ethnicity and the one which can lower the recognition error most is then added to the final feature set in each iteration. This learning process continues until the classification error stop decreasing or all the candidate features have been added. The features in the final feature set S opt are statistically insensitive to ethnicity variations and could achieve the best expression recognition performance. Hence, they are considered as the best features for multi-ethnic facial expression recognition.

Experiment setup.
The proposed method is evaluated on the BU-3DFE database [24], which is originally collected for 3D facial expression recognition. This database contains 3D face scans of 100 individuals, with each individual showing the 6 prototypic expressions with 4 different intensity and neutral state. As shown in Table 1, the individuals of BU-3DFE database come from 6 different ethnicities, but the sample size of each ethnicity are extremely unbalanced. The majority of the individuals are from White, which is 51 out of 100. The rest are from East-Asian, Black, Hispanic-Latino, Indian, and Middle-East Asian. In order to evaluate the proposed method in multi-race facial expression recognition, the 3D faces of individuals from Black, Hispanic-Latino, Indian, and Middle-East Asian are combined together as the training data. Meanwhile, the White individuals and East-Asian individuals are used for testing. This setup also meets the commonly used personindependent setup for facial expression recognition. The proposed method is evaluated on White and East-Asian individuals respectively, and the result of recognition performance with and without other-race effect elimination are compared to show the effectiveness of the proposed method.

4.2.
Results on East-Asian individuals. In order to evaluate the effectiveness of the propose method, the individuals from the ethnicities of Indian, Middle-East Asian, Black, and Hispanci-Latino are combined together as the train samples, while the individuals of East-Asian are used for testing. The depth features of the 3D facial local patches are extracted around 30 landmarks and independent component analysis is then applied to construct a subspace in which the features are mutually independent. The ethnic-related features are then identified using mutual information based feature selection. To ease the influence caused by other-race effect, the selected ethnic-related features are removed from the ICA subspace. The remaining features without ethnic information are utilized to perform the final facial expression recognition. For comparison, the expression recognition performance of the ICA features before removing ethnic information is also obtained. The features in ICA subspace are sorted descendingly according to the relevance with ethnicity, and then removed one by one in each iteration in feature selection. The experiment are repeated 500 times and the average recognition rates are recorded. As illustrated in Figure 1, the facial expression recognition rates are plotted when the ethnic-related feature was removed. It can be seen that the recognition rate increases gradually with fluctuations when ethnic information decreases in remaining features. The recognition rate keeps relatively steady when about 100 ethnic-related features are removed. The confusion matrix of the recognition before and after the ethnic-related feature elimination have been illustrated in Figure 2(a) and 2(b). The comparison of these two figures show that the recognition rates of expression anger, disgust, fear, sad and happy are improved significantly by eliminating ethnic-related features. The average recognition rate is improve from 50% to 59.55%. Specifically, the recognition performance of all the expressions are improved to different extent, except anger. The recognition rate of anger decreases slightly   because the removed ethnic-related features may cause the loss of the sadness information. It worth noting that the recognition rate of sadness is improved from 30.21% to 65.63% when the proposed method is applied, which is quite significant. This improvement suggest that effectiveness of the proposed method. are utilized to perform the final facial expression recognition. For comparison, the expression recognition performance of the ICA features before removing ethnic information is also achieved. Figure 3 shows the facial expression recognition rates when the ethnic-related feature was removed one by one in descending order according to the relevance with ethnicity. It can be seen that the recognition rate increases rapidly when ethnic information is removed gradually. Similarly, the recognition rate reaches the peak and becomes relatively steady when about 100 ethnic-features are removed. The recognition rate decreases quickly when more features are discarded. The confusion matrix of the recognition before and after the ethnic-related feature elimination have been illustrated in Figure 4(a) and 4(b). The comparison of these two figures show that the recognition rates of expression anger, disgust, fear, sad and happy are improved significantly by eliminating ethnic-related features. The average recognition rate is improve from 56.25% to 67.88%. The recognition rates of anger, disgust, fear, and sadness are improved by 17.7%, 12.5%, 14.58%, and 19.79% respectively. The significant margin suggests that the proposed method is effective in other-race effect elimination.

5.
Conclusion. The influence caused by other-race effect to facial expression recognition has been studied by psychologists for decades. The computer vision scientists found that this effect also lowers the performance of facial expression recognition algorithms. Hence, it is necessary to remove the influence caused by other-race effect in current algorithms. Inspired by the idea of blind source signal separation, this paper proposes an ICA-based other-race effect elimination method for facial expression recognition. The expression features were extract from local patches on 3D faces and then projected into ICA subspace, in which the feature component are statistically and mutually independent. The ethnic-related features are then determined by feature selection with the purpose of ethnicity classification. Finally, the ethnic-related features are removed according to the relevance with ethnicities, and the remaining features are used to perform the facial expression classification. The experimental results show that the proposed method is effective to other-race effect elimination, and the performance of multi-race facial expression recognition can be improved significantly.