PREDICTING 72-HOUR REATTENDANCE IN EMERGENCY DEPARTMENTS USING DISCRIMINANT ANALYSIS VIA MIXED INTEGER PROGRAMMING WITH ELECTRONIC MEDICAL RECORDS

. The proportion of patients who reattended emergency department (ED) within 72 hours is an important indicator of quality of care. This study develops a practical framework to predict patients who will reattend ED in 72 hours from a clinical perspective. We analyze 328,733 ED patients from 1 January 2011 to 31 December 2013, with an average of 4.6% reattendances. We feature over 100 factors including demographics, diagnosis, patient acuity, chief complaints, selected laboratory tests, summarized vital signs. Using univariate analysis, a pool of risk variables is selected for subsequent factor selection. We then apply ﬁlter methods to derive a set of candidate factors. With these factors in combination with suggestions from ED clinicians, a mixed integer programming model based on discriminant analysis is proposed to determine a classiﬁcation rule for 72-hour reattendance. In numerical experiments, various small subsets of risk factors are used for classiﬁcation and prediction. The results show that favorable predicting performances can be achieved in both training and test sets.

1. Introduction. Emergency department (ED) attendances throughout the world have been steadily increasing [22,23,26,35,37]. This puts a strain on healthcare infrastructure and frontline staff. The proportion of patients who reattended ED within 72 hours is an important indicator of quality of care [13,28,36]. ED reattendance can be related to the nature of the disease, medical errors, and shortcomings in care during their initial treatment [13,27,36]. An in-depth review of the literature discussing ED discharge processes [5] suggested elements of high-quality discharges and risk factors for suboptimal discharges. Early reattendances to the ED may involve patients who are in a high-risk population [1,12,15]. However, other factors, such as high ED utilization and ED overcrowding resulting in decreasing efficiency, may contribute to this issue as well [24]. For hospital managers and clinicians, one of the main concerns is to reduce these early reattendances as much as possible.
In the past decades, a number of research work related to ED reattendance have been conducted from different ED settings with different perspectives. A reader is referred to [1,3,9,10,14,16,21,25,29,33,40] in this regard, just to name a few. Risk factors studied in the literature include diagnosis, patient chief complaints, patient demographic factors, and clinical measures. Amongst, some studies focused on specific areas, such as pediatric mental health care [25], acute pulmonary embolism [3], chronic obstructive pulmonary disease (COPD) exacerbations [9], and chest pain [29]. Most research work in this regard were retrospective observational studies using historical patient adminstration data while some was prospective observational study [29] or based on a population survey [16], for instance. In addition to analyze potential risk factors concerning reattendance, it is also important to predict patient early reattendance based on patient characters associated with identified factors of interest. This study seeks to investigate these two issues based on historical ED patient administration data of the local context.
Decision trees and regression analysis were most commonly used for predicting ED early reattendance in the past, see [11,31,34] just for example. In particular, studies using linear regression models indicated that errors in medical care or patient education might increase the risk of early reattendances. However, as pointed out in [19], the models may fail to accurately predict a reattendance visit since the association between the binary value of reattendance and risk factors is unlikely to a simple linear relationship. Recently, Lee et al [19] designed a clinical decision tool using discriminant analysis via mixed integer programming (DAMIP) to predict patient reattendance within 72 hours in two pediatric emergency departments (PEDs) of United States (U.S.). Using patient administration data of two PEDs, numerical results demonstrated that this approach remarkably outperformed these traditional methods in predicting patient early reattendance. As a machine learning model, DAMIP establishes a classification rule based on a training set which could achieve much higher blind predictive accuracy on an independent set of patients. This favorable feature was further elaborated in a recent multi-site evidence-based study [17] involved with more than 700 healthcare sites in U.S. Classification under consideration is performed via the DAMIP approach first developed by Gallagher et al [8], which realizes the optimal parameters of Anderson's classification model [2]. It seeks to derive a rule for classifying entities into several groups, each being known a priori. A classification rule is basically formed with the data collected on a sample of entities with known group classifications, which can be generated by solving an established optimization problem such as mixed integer programming (MIP) and linear programming (LP) [8,20]. According to the obtained rule, we can classify a new entity to a certain group. In the literature, such optimization-based prediction and classification methods were extensively applied to study various types of medicine or biology data [18]. The classification rule, which was generated based on only a small set of risk factors (around 10 factors), can perform very well in prediction as stated in [19]. For the problem under consideration, this feature is particularly enjoyable and important to clinicians and managers for better understanding several key factors of interest, facilitating managers to improve current treatment and care process. However, regression analysis approach usually relies on relatively larger number of risk factors in prediction while and decision tree is often criticized due to its instability and complexity and so on.
These limitations would undermine their transformed implementation to some extent from practical perspective. On the other hand, clinical decision tool proposed in [19] was with a specific focus on pediatric emergency care. In the literature, there has been little study to predict ED early reattendance for the adult population by using the DAMIP approach and to further improve general emergency care. Motivated by the above, the present study aims to fill this gap.
This study is a research collaboration with clinicians from an emergency department of a tertiary urban hospital where the data under investigation is very broad consisting of patient demographic information; patient medical information like medical history, diagnosis, vital signs, and lab tests; and patient social information, and so on. Slightly different from the framework presented in [19] or conventional classification method, in this paper we are interested to propose a practical framework to predict 72-hour reattendance in combination with clinical perspectives. Following advices from ED clinicians, one purpose of this study is to help clinicians analyze some clinical factors they might be unaware or neglect in routine emergency care. Clinicians like to understand the potential role of these factors in terms of early reattendance, rather than commonly known risk factors in ED like patient acuity category, gender, and age. Under current emergency care procedure, clinicians generally have put necessary attention and extra efforts when treat patients related to such known risk factors. As such, clinicians are more interested to explore any potential to improve emergency care by studying early reattendance including those unfamiliar risk factors of interest. Hence, this study seeks to introduce a practical framework to predict early reattendance with a set of patient features consisting of risk factors obtained through feature selection and clinical factors recommended by clinicians. The process of feature selection under consideration is a hybrid approach of heuristic and actual method to some extent.
Specifically, in this paper we first conducted univariate analysis on electronic medical records (EMR) of all ED attendances during 1 January 2011 to 31 December 2013 and chose a pool of potential risk factors for subsequent analysis. Second, in feature selection, we applied commonly used filter methods [6,32] to choose candidate risk factors among these potential risk factors. The factors were chosen according to their occurrence frequency and rankings associated with the underlying filter methods. Third, using a set of selected factors by incorporating advices from ED clinicians, we applied DAMIP model to determine a classification rule for predicting patient likely to reattend within 72 hours. Predictions with a training set and a test set were conducted to evaluate the performance of the proposed model. The prediction results showed that the DAMIP approach is favorable, compared with logistic regression. Of particular interest in this study, sensitivity was achieved at around 40% using a small number of risk factors (around 10 factors), which did not include some known key risk factors following clinicians' suggestion. This result is satisfying and remarkable because the overall 72-hour reattendance rate was very low (i.e., about 4.6%). The obtained results may inform ED clinicians and hospital managers for further improving the quality of emergency patient care. This study is an exercise of the use of operations research in emergency care management from a clinical perspective. We believe this work would enrich and complement their applications in health services research.
The rest of this paper is organized as follows. Section 2 presents model development of the study in detail, including the setting, filter methods, and mixed integer programming for discriminant analysis. Numerical experiments are presented in Section 3, followed by discussions in Section 4. Section 5 concludes the paper.
2. Model development. In this section, we first detail data preparation process, followed by introducing a practical framework for feature selection, classification and reattendance prediction by employing filter methods and the DAMIP approach. At the end of this section, we present a tractable model reformulation for computational implementation.
2.1. Setting. The hospital in this study is a tertiary urban hospital in Singapore with about 1,500 beds. We collected 328,733 ED attendances that met the inclusion criteria of a repeat visit within 72 hours of the index visit during the period from 1 January 2011 to 31 December 2013. Patients were identified from the patient information system of ED, among which 15,171 patients (about 4.6%) reattended ED within 72 hours. The patients were classified into two groups, i.e., patients who reattended ED within 72 hours; and patients who were discharged but did not revisit ED within 72 hours.
The electronic medical records contained about 140 factors for each patient, including patient demographic information; chief complaints; ICD 9-CM (International Classification of Diseases, 9th Revision, Clinical Modification) descriptions [7]; medical history; lab tests; handover; vital signs; patient information at triage such as language, mobility, time, date, patient acuity category (PAC), and social issues.
Patient demographic information included age, sex, race, nationality, residential status. Patient chief complaints were classified to 18 categories according to advices from clinicians. In Singapore, there are four levels of patient acuity category, with PAC1 being the most serious and PAC4 the least. In this study, we considered the number of handovers between doctors (e.g., when a doctor leaves a shift) a patient had as a potential risk factor. Medical history was defined by the total number of known medical conditions a patient had. In this study, social issues and national service were included in the analysis. Social issues basically follow ICD-9-CM descriptions such as patients with generalised weakness, severe social and financial issues, no home, social care reasons due to family violence, and so on. National service, a compulsory military draft, might be a unique feature of patients in Singapore, compared with other countries or regions. All male citizens and second-generation permanent residents of Singapore are required to serve 2 years of full-time national service. In this paper, we were interested to explore whether this factor would play a potential role in early reattendance and thereby included this factor in the study data.

2.2.
A practical framework to predict ED reattendance. We present a practical framework to investigate ED early reattendance, consisting of four steps, i.e., data preprocessing, factor/feature selection, classification and prediction as follows.
Step 1. Pre-selection of risk factors. Use univariate analysis to derive statistically significant factors associated with ED 72-hour reattendance.
Step 2. Factor selection. For the group of factors derived in Step 1, use filter methods, which will be elaborated in Section 2.2.1, to generate the corresponding sets of risk factors.
Choose a set of candidate factors consisting of most frequently occurred factor among top rankings in the sets of selected factors and some clinical factors recommended by ED clinicians.
Step 3. Classification. Derive a classification rule by solving DAMIP model with the set of discriminatory factors obtained in Step 2.
Step 4. Prediction. Using the classification rule obtained in Step 3, predict the possibility of a new patient whether he/she would reattend ED in 72 hours.
In what follows, we discuss factor selection using filter methods and the development of DAMIP model.

Filter methods.
We analyzed the set of risk factors derived from univariate analysis by applying feature selection technique. Note that the set of selected factors used for classification and prediction is high dimensional data set, which is a significant challenge in data mining. As pointed out by Brown et al [6], in this circumstance, some of factor information might be redundant in the context of others, resulting in important issues like over-fitting or computational burden of processing many similar features. Hence, it is important and necessary to identify meaningful and smaller subset of these factors using the process of feature selection [6,30,32,38,39].
In this paper, we used different filter methods for feature selection discussed in Brown et al [6] and the underlying filter criteria are JMI(joint mutual information), MIFS(mutual information feature selection), CMIM(conditonal mutual information maximisation), MRMR (max-relevance min-redundancy), ICAP(interation capping), CIFE(conditional infomax feature extraction), DISR(double input symmetrical relevance), CMI(conditional mutual information), and ConDred(conditional redundancy). For each filter method, with patient EMR data consisting of potential risk factors derived in Step 1, the top 15 factors were chosen by implementing the MATLAB toolbox FEAST developed by Brown et al, and we then obtained 9 corresponding sets of selected factors. We selected a subset of factors, which are of high frequency of occurrences in these 9 sets. This set of selected factors was used as candidate discriminatory factors for classification and prediction. In this paper, we chose 10 candidate factors. We also included some factors recommended by clinicians. Their advices were made usually from the perspective of clinical treatment together with the management of care process. The underlying process of factor selection is different from traditional feature selection in machine learning. Instead, it is flexible by including factors according to the inputs from experts' point of view. Although this process is nonstandard, it appears to be useful and important from the practical perspective. In particular, in the situation where care process is effectively performed at areas related to important but known risk factors like patient acuity category and age, clinicians should draw much attention to risk factors of interest, which were not selected as top risk factors by usual filtering methods. We think the underlying factor selection process incorporating clinical perspectives is very important and with potential high impact in practice, which however is usually neglected in academia. We adopted 9 filter methods to determine a set of candidate risk factors. In fact, people may use any filter method of interest in feature selection. Following similar arguments in Lee et al [19], we chose 10 factors generated by filter methods in this paper. Actually, the number factors basically depends on the choice of modelers or decision-makers from practical perspectives. One may choose 15 or 20 top factors in analysis. However, it might not be useful or even impractical to improve hospital care operations if choose too many factors like over 30 risk factors in prediction. Evidently, it will greatly increases the difficulty and infeasibility to improve emergence care process concerning too many risk factors, constrained by the limited capacity and resource in emergency department.
2.2.2. DAMIP model. In this section, we present a general model for discriminant analysis and its approximation using mixed integer programming. A detailed discussion in this regard can be found in [18][19][20]. Given a training sample of entities each having different features/factors and with known group classification, the classification model seeks to find a function to classify entities into groups based on the selected set of features. Some notations such as parameters and variables used in the model are listed below. For ease of reading and the completeness, we include necessary mathematical expositions concerning mathematical models in the Appendix.
Let R g denote the region containing group g entities, g = 1, . . . , G, and R 0 denote the reserved-judgment region. The model under consideration is based on Anderson's method [2] for finding a partition {R 0 , R 1 , ..., R G } that maximizes the probability of correct classification subject to constraints on the misclassification probability limits. Since it is difficult or even impossible to solve this primary problem in practice, we resort to its approximation counterpart for numerical computation under some standard assumptions in discriminant analysis. That is, with the assumptions of multivariate normal group-conditional distributions with a common covariance matrix and equal prior probabilities for all groups, we estimate conditional density functions f g and prior probabilities π g , g = 1, . . . , G, for a given training data. As we shall see in Appendix, the classifier is defined using the discriminant function L g (x) where the coefficients were derived by solving the underlying mixed integer programming. Further, these coefficients are actually the unique Lagrangian multipliers of the corresponding optimization problem. Then, we can derive a classifier for prediction using the DAMIP approach. In literature, there has been a number of research work concerning classification and pattern recognition, for example the excellent work concerning linear support vector machine by Bennet and Mangasarian [4], where a linear classifier was studied by solving a linear programming problem.
For the ED reattendance problem under consideration, we choose a training data of ED patients with selected discriminatory factors and two known group classifications, i.e., 72-hour return and non-return. We derive the corresponding MIP model (3) with underlying k factors for classification. It is known that the model can be solved efficiently using the well developed commercial software like CPLEX, which is popular and widely used in the community of numerical optimization and operations research. In addition, the input parameters involved in (3), such asp 1 and p 2 , the sample covariance matrix and its inverse, can be derived readily using some software such as MATLAB.

Results.
3.1. Data preparation and model implementation. The patient data of this study were extracted from the ED Web system of a public hospital in Singapore from 1 January 2011 to 31 December 2013. There were more than 100 factors involved in this study. The factors included diagnosis, PAC, chief complaints, demographic, vital signs (such as average pulse, average temperature, diastolic blood pressure range), laboratory parameters (for example albumin, basophils, creatinine, red blood cell count, white blood cell count, sodium, potassium, urea within normal, borderline, or abnormal range), medical history (hypertension, hyperlipidemia, diabetes, asthma, heart problem, stroke, gastric problem, etc), handover, together with derived factors like national service and social issue. We excluded the factor of radiology since there was no available results in the data. Respiratory rate was excluded in the study as the result was based on estimations by the staff.
During data preparation, we merged all information from various sources such as tables of vital signs, laboratory results, handover forms, and diagnosis, into a single table using Excel spreadsheet according to a deidentified patient identifier in the ED Web system.
Numerical experiment was performed by implementing codes in SPSS 17.0, MAT-LAB 7.8.0 and CPLEX 12.4. Following the proposed integrated framework, we first conducted univariate analysis using SPSS and derived a set of potential risk factors. After that, we applied filter methods by running the FEAST toolbox in MATLAB to select a group of risk factors for classification. Finally, we implemented the CPLEX build-in solver for solving the DAMIP model (3) to determine the classifier as desired.

Univariate analysis.
Using univariate analysis, we derived a total of 47 potential risk factors (p < 0.001) as shown in Table 1, each being assigned an index for ease of reference.

Factor selection.
We used filter methods to select a subset of risk factors listed in Table 1. The filter methods were JMI, MIFS, CMIM, MRMR, ICAP, CIFE, DISR, CMI, ConDred, which are commonly used in feature selection. We implemented each filter method using the FEAST toolbox in MATLAB on the data consisting of 47 factors for each patient as shown in Table 1. Then, factors with 15 top rankings generated with each filter method were selected for further analysis. These factors were summarized in Table 2. We calculated their frequency of occurrences associated with the underlying 9 filter methods. Namely, the frequency of each factor is a percentage of its total number of occurrences divided by 9. For example, PAC status was one of top 15 factors generated using all filter methods. Then, its occurrence frequency would be equal to 100%.
In this study, we chose the top 10 factors by considering their high occurrence frequencies and top rankings. Further, we also included clinical measures like average temperature, average pulse and diastolic BP range for predicting patient early reattendance, as recommended by ED clinicians. Then, we derived the total of 13 factors denoted by S 0 as follows, where the first ten factors were arranged according to their rankings from the highest to the lowest. As discussed previously and following the suggestions from ED, some familiar risk factors to clinicians like PAC status, age and gender were not included although they were of high occurrence frequency and top ranking. Note that these factors were not directly associated with treatment and care process, unlike clinical factors shown in S 0 .
3.4. Classification and prediction. Discriminant analysis was then focused on patient data with risk factors in S 0 together with known patient classification groups (i.e., return and non-return). As we shall see later, we also used the following subsets of S 0 to derive different classification rules for prediction.
where | S | denotes the Cardinality of a set S, i.e., number of elements of the set S. In our analysis, the training set consists of 109, 674 patients and the remaining patient data were used for blind prediction. We included both 10-fold cross-validation and the blind prediction results to reflect the consistency of predictive power of the underlying classification rules. The values of input parameters in the DAMIP model (3) were chosen as α = α 12 = α 21 = 0.15, M = 100, ε = 0.01, β = 2, and γ = 3. Numerical study was conducted by implementing codes in CPLEX 12.4 to solve the MIP problem (3).
Further data analysis indicated that the 72-hour reattendance rate for patients with social issues was as high as 65.6%, compared with the low reattendance rate of 4.6% on average. However, the total number of patients in this group was very small, accounting for only 0.33% of total patients of the study. This implied that   Table 3. Prediction Results Including Factor of Social Issues the factor of social issues might not play a key role in ED 72-hour reattendance overall. We then conducted further analysis without the factor of social issues in prediction using four sets of risk factors S 1 , S 2 , S 3 , and S 4 as defined above. The purpose was to achieve a satisfying prediction result using risk factors as few as possible. In other words, we would like to find the possible smallest set of risk factors, with which one could attain certain good prediction results. This is different from classical prediction methods like decision tree and regression. Prediction results were reported in Tables 3 and 4. We also compared with prediction results conducted using logistics regression as shown in Table 5.   Table 5. Prediction Results Using Logistic Regression 4. Discussion. This paper introduces a practical framework which consists of univariate analysis, filter methods and discriminant analysis in combination with clinicians' advices to predict ED patient 72-hour reattendance. Briefly, filter methods were applied to select risk factors of interest and discriminant analysis was then employed to determine the classification rule. Different from traditional methods such as decision tree and regression, discriminant analysis under consideration was based on Bayesian discriminant rule with some prior distributional assumptions, where clinical perspectives were incorporated. We used a mixed integer programming model to approximate the associated discriminant analysis problem, for a given sample of ED patients. In this study, a relatively small number ranging from 9 to 13 of discriminatory factors were used to predict ED early reattendance. As discussed previously, these factors were mostly related to patient diagnosis and treatment except the factor of social issues. Clinicians would draw particular attention on these factors once needed in patient treatment and care. Table 4 indicated that the proposed model predicted 72-hour reattendance with a satisfying accuracy of more than 72.6% overall. Of particular interest in this study, sensitivity, which describes the prediction accuracy for patients who would reattend ED within 72 hours, was about 40% in the four scenarios under consideration. In contrast to the overall reattendance rate as low as 4.6% and considering risk factors used for prediction without including some known key risk factors, the obtained results were favorable. Also, the respective sensitivity and specificity results between training and test sets were very close in all scenarios under consideration. This indicated that the proposed model appears to be stable and consistent. We also performed prediction analysis using logistic regression as demonstrated in Table 5. Logistic regression can achieve very high overall prediction accuracy of more than 90%. In particular, the predictive accuracy of specificity was as high as over 96% in most cases under consideration. However, this approach performed poor with the predictive accuracy of sensitivity as low as 1.7% in training set and 1.0% in test set. And, only 4.6% predictive accuracy was achieved in training set when choosing 50% cutoff probability. In addition, over 50% sensitivity prediction accuracy could be attained at the cost of choosing 5% cutoff.
On the other hand, sensitivity and overall prediction results in Lee et al [19] were higher than those derived in this study, where prediction results were 70.5% and 82.2% for two PEDs under consideration. There might be a couple of reasons leading to these different prediction performances. One main reason could be that our analysis included clinical factors clinicians suggested, but also excluded those well known factors like PAC status, age and gender from practical perspectives, which inevitably affected prediction performance of the DAMIP approach. Another reason might be the nature of different ED settings. In this paper, we studied an adult emergency department with approximately 500 patient visits per day while Lee's study focused on pediatric emergency department with a relatively low daily visits (about 280 attendances per day). Patient conditions and care in adult emergency departments appear to be much more complicated than in pediatric emergency departments, which could make more challenging to predict early reattendance and thereby affect prediction accuracies as well.

4.1.
Limitations. This work was a retrospective study. The data collection may be incomplete. However, as the data was collected from multiple sources, this issue might be minimized. We anticipate that we may not have any variables that would completely fulfil the criteria to be a potential predictor or to derive a clinical decision rule. In this situation, we will highlight the variables as being associated with the outcome and seek to validate them in future studies.

5.
Conclusion. In this paper, we proposed an integrated framework to study 72hour reattendance in emergency department from clinical perspectives, including pre-selection of factors using univariate analysis, factor selection using filter methods, and classification and prediction using DAMIP model. A relatively small number of discriminatory factors were used in prediction. The obtained prediction results were satisfying in both training and test sets. In the future, we are interested to explore the implementation of the model in practice. Furthermore, we would like to apply DAMIP approach to investigate early reattendance focusing on specific group of ED patients of interest, for example, patients with the diagnosis of chronic obstructive pulmonary disease since in general it was a large contribution to ED reattendances [9]. ers for their constructive suggestions and comments, which help to improve the presentation of the paper. Appendix.
Discriminant analysis. First, we list some notations such as parameters and variables used in the model development.
• N : number of entities of the training sample • G: number of groups • k: number of feature attributes of each group • n g : total number of entities in group g of the training sample, g = 1, . . . , G • f g : group conditional density function, g = 1, . . . , G • π g : prior probability, g = 1, . . . , G • x gj ∈ k : k attributes of entities in group g, g = 1, . . . , G, j = 1, . . . , n g • α hg ∈ (0, 1): misclassification limit concerning n g entities of group g, that is, proportion of group g training entities classified to region R h should be no more than α hg , h, g = 1, . . . , G, h = g • R g : region containing group g entities, g = 1, . . . , G • R 0 : reserved-judgment region The underlying model is based on Anderson's method [2] for finding a partition {R 0 , R 1 , ..., R G } that maximizes the probability of correct classification subject to constraints on the misclassification probability limits as follows.
It is known [2] that under some conditions there exists λ ig ≥ 0, i, g ∈ {1, . . . , G}, i = g, such that As we shall see later, the nonnegative coefficients λ ig involved can be derived by solving the subsequent DAMIP model. For simplicity, let For a given set of solutions λ ig and a new entity with feature attribute vector x ∈ k , let h := arg max g {L g (x) : g = 0, 1, . . . G}. Then, we call this entity is classified to group h. In practice, it is difficult or even impossible to solve problem (1) directly. In what follows, we discuss its approximation counterpart for numerical implementation.
The DAMIP approximation. Under standard assumptions in discriminant analysis, namely, multivariate normal group-conditional distributions with a common covariance matrix and equal prior probabilities for all groups, we estimated conditional density functions f g and prior probabilities π g , g = 1, . . . , G, for a given training data. Specifically, we used the normal group conditional densityf g as an estimate,f wherex g ∈ k denotes the sample mean vector for group g, S ∈ k×k is the pooled sample covariance matrix as follows where S g ∈ k×k is the sample covariance matrix for group g and where {x g,1 , . . . , x g,ng } is the k-attribute sample of entities in group g. Accordingly, we derived the following approximation,p g (x), of p g (x) , g = 1, . . . , G.
In addition, we usedπ g = 1/G to estimate π g for g = 1, . . . , G. We also introduced a 0-1 binary decision variable, u hgj , to indicate whether or not the jth entity from group g is allocated to group h. Then, model (1) can be solved by the following approximation (Lee, 2007).
Model reformulation. We chose a training data of ED patients with selected discriminatory factors and two known group classifications, i.e., 72-hour return and non-return. Then, G = 2. Let g = 1 represent the group of ED 72-hour reattendance, g = 2 otherwise. Let n 1 = m, n 2 = N − m. So, among the total of N patients, m patients were in group 1 while N − m patients were in group 2. We assume there are k factors selected for classification. Without loss of generality, we arranged the training data matrix X with two classified groups in the following way.
 ∈ N ×k , X 1 = x 1 , · · · , x m , X 2 = x m+1 , · · · , x N , where x i ∈ k is a column vector representing the values of k factors of the ith patient in the training data, i = 1, . . . , N . For brevity in description, we introduce the following notations. It is evident that u, v, y, µ, η ∈ N . The model (2)  Problem (3) is a mixed integer programming which can be solved efficiently by the well developed software like CPLEX. The inputs needed in model (3) likep 1 ,p 2 , the sample covariance matrix and its inverse, can be derived readily using some software such as MATLAB.