Article Contents
Article Contents

# Predictive analytics for 30-day hospital readmissions

The first author is supported by MTSU FRCAC grant 2019

• The 30-day hospital readmission rate is the percentage of patients who are readmitted within 30 days after the last hospital discharge. Hospitals with high readmission rates would have to pay penalties to the Centers for Medicare & Medicaid Services (CMS). Predicting the readmissions can help the hospital better allocate its resources to reduce the readmission rate. In this research, we use a data set from a hospital in North Carolina during the years from 2011 to 2016, including 71724 hospital admissions. We aim to provide a predictive model that can be helpful for related entities including hospitals, health insurance actuaries, and Medicare to reduce the cost and improve the clinical outcome of the healthcare system. We used R to process data and applied clustering, generalized linear model (GLM) and LASSO regressions to predict the 30-day readmissions. It turns out that the patient's age is the most important factor impacting hospital readmission. This research can help hospitals and CMS reduce costly readmissions.

Mathematics Subject Classification: Primary: 62P10, 62J05; Secondary: 91C20.

 Citation:

• Figure 1.  Histogram of days between the admissions (DBA)

Figure 2.  Histogram of the log(DBA+1)

Figure 3.  Box plot of patient marital status VS log(DBA+1)

Figure 4.  The cluster dendrogram of $\texttt{Doctor.Number}$. Each vertical line at Height 0 represents one doctor. The doctors inside the same red box are clustered as one

Figure 5.  Box plot of PATIENT.SEX.CODE VS log(DBA+1) split by inpatient (I) and outpatient (O)

Figure 6.  Box plot of Age vs log(DBA+1) split by inpatient (I) and outpatient (O)

Figure 7.  Residual vs Fitted. The left figure is for GLM with features selected from the above session, the right figure is for OLS with all predictors

Figure 8.  Q-Q plots of GLM (left) vs ordinary least squares regression (OLS)

Table 1.  The mean and median of seven levels of PATIENT.MARITAL.STATUS

 PATIENT.MARITAL.STATUS mean median n D 4.06432 4.20469 8122 M 4.31861 4.48864 26568 P 4.84009 5.01728 21 S 4.42558 4.67283 22164 U 3.41953 2.83321 120 W 4.05355 4.11087 7196 X 4.05785 4.07754 1492

Table 2.  The largest loadings of related predictors in the first principal component (PC1)

 Variable Name PC1 ICD.PROCEDURE.CODE1 -0.38498161 HOSPITAL.SERVICE.CODEG -0.38284437 ICD9$\_$DIAGNOST$\_$CODE3 -0.29296143 PATIENT.DDRG1 -0.19955252 ICD9$\_$DIAGNOST$\_$CODE2 0.38297941 ICD.PROCEDURE.CODE5 0.38210699 HOSPITAL.SERVICE.CODEB 0.36536618 PATIENT.DRG4 0.24330584

Table 3.  The results of GLM on training data generated by R

 Coefficients: Estimate Std. Error t value p-value Significant Code (Intercept) 5.99381 0.1633 36.7 < 2e-16 *** DOCTOR.NUMBER2 -0.60542 0.06983 -8.67 < 2e-16 *** DOCTOR.NUMBER3 -0.1012 0.04012 -2.52 0.01166 * DOCTOR.NUMBER4 0.31089 0.03528 8.81 < 2e-16 *** DOCTOR.NUMBER5 0.52643 0.35486 1.48 0.13795 · ServCode_DRG_ICD -0.20195 0.00603 -33.48 < 2e-16 *** Age10-20 -0.26285 0.05948 -4.42 9.90E-06 *** Age100+ -1.86377 0.39675 -4.7 2.60E-06 *** Age20-30 -0.61615 0.05559 -11.08 < 2e-16 *** Age30-40 -0.66055 0.05605 -11.78 < 2e-16 *** Age40-50 -0.63653 0.05632 -11.3 < 2e-16 *** Age50-60 -0.63357 0.05637 -11.24 < 2e-16 *** Age60-70 -0.63019 0.05726 -11.01 < 2e-16 *** Age70-80 -0.60506 0.05855 -10.33 < 2e-16 *** Age80-90 -0.62985 0.06173 -10.2 < 2e-16 *** Age90-100 -0.6078 0.08093 -7.51 6.00E-14 *** Surgeon2 -0.27524 0.02279 -12.08 < 2e-16 *** Surgeon3 -0.18455 0.06706 -2.75 0.00592 ** Surgeon4 -0.36174 0.04333 -8.35 < 2e-16 *** Surgeon5 0.53341 0.34909 1.53 0.12652 · PATIENT.MARITAL.STATUSM 0.16705 0.01716 9.73 < 2e-16 *** PATIENT.MARITAL.STATUSPS 0.05911 0.02037 2.9 0.00371 ** Nur.StatMS_CCU_ER_PC 0.02249 0.03593 0.63 0.53137 · Nur.StatWS 0.2277 0.06175 3.69 0.00023 *** DISCHARGE.STATUSR 0.39629 0.0448 8.85 < 2e-16 *** DISCHARGE.STATUSY 0.43827 0.05653 7.75 9.20E-15 *** income_levellow -0.11351 0.02611 -4.35 1.40E-05 *** income_levelmeduim -0.09598 0.01506 -6.37 1.80E-10 *** Patient.Days20-30 -0.28387 0.04367 -6.5 8.10E-11 *** Patient.Days30+ -0.48626 0.32127 -1.51 0.13015 · PATIENT.RACE.CODEBO -0.18993 0.14455 -1.31 0.18887 · PATIENT.RACE.CODEDH -0.99651 0.20834 -4.78 1.70E-06 *** PATIENT.RACE.CODEWX -0.11298 0.14369 -0.79 0.43171 · IO_CODEO -0.18799 0.04251 -4.42 9.80E-06 *** Significant codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Table 4.  The R output of the generalized linear model (GLM) on all data

 Coefficients: Estimate Std. Error t value p-value Significant Code (Intercept) 5.84694 0.06366 91.84 < 2e-16 *** DOCTOR.GROUP2 -0.58933 0.06102 -9.66 < 2e-16 *** DOCTOR.GROUP3 -0.1105 0.03487 -3.17 0.00153 ** DOCTOR.GROUP4 0.28857 0.03085 9.36 < 2e-16 *** ServCode_DRG_ICD -0.201 0.00528 -38.1 < 2e-16 *** Age10-20 -0.30656 0.05169 -5.93 3.00E-09 *** Age100+ -1.16735 0.35101 -3.33 0.00088 *** Age20-30 -0.63787 0.0482 -13.23 < 2e-16 *** Age30-40 -0.69793 0.04861 -14.36 < 2e-16 *** Age40-50 -0.67707 0.04885 -13.86 < 2e-16 *** Age50-60 -0.67009 0.04887 -13.71 < 2e-16 *** Age60-70 -0.66848 0.04966 -13.46 < 2e-16 *** Age70-80 -0.63585 0.05076 -12.53 < 2e-16 *** Age80-90 -0.63634 0.0536 -11.87 < 2e-16 *** Age90-100 -0.66369 0.07005 -9.47 < 2e-16 *** Surgeon2 -0.28085 0.01977 -14.21 < 2e-16 *** Surgeon3 -0.22311 0.0586 -3.81 0.00014 *** Surgeon4 -0.3631 0.03772 -9.63 < 2e-16 *** PATIENT.MARITAL.STATUSM 0.18697 0.01491 12.54 < 2e-16 *** PATIENT.MARITAL.STATUSPS 0.06817 0.01772 3.85 0.00012 *** Nur.StatWS 0.20997 0.04929 4.26 2.10E-05 *** DISCHARGE.STATUSR 0.41057 0.0387 10.61 < 2e-16 *** DISCHARGE.STATUSY 0.45851 0.04926 9.31 < 2e-16 *** income_levellow -0.08182 0.02262 -3.62 0.0003 *** income_levelmeduim -0.09786 0.01307 -7.49 7.20E-14 *** Patient.Days20-30 -0.22549 0.03781 -5.96 2.50E-09 *** PATIENT.RACE.CODEDH -0.57857 0.13194 -4.39 1.20E-05 *** PATIENT.RACE.CODEWX 0.07868 0.01669 4.71 2.40E-06 *** IO_CODEO -0.21764 0.02759 -7.89 3.10E-15 *** Significant codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Table 5.  Interpretation of the GLM results

 Feature Coefficients (β) exp(β)-1 Interpretation DOCTOR.GROUP = 2 -0.58933 -0.45 45% decrease in DBA compared to the base group DOCTOR.GROUP = 1.This means the group 2 of doctors are less effective in terms of improving the DBA (or reducing 30-day readmission) than group 1. DOCTOR.GROUP = 3 -0.11050 -0.10 10% decrease in DBA compared to the group DOCTOR.GROUP = 1. DOCTOR.GROUP = 4 0.28857 0.33 33% decrease in DBA compared to the group DOCTOR.GROUP = 1. Therefore group 4 of doctors are more effective in improving the DBA than group 1. ServCode_DRG_ICD -0.20100 -0.18 18% decrease in DBA for every 1.0 increase in the feature ServCode_DRG_ICD, which is the new artificial feature made using the PCA from the predictors PATIENT.DRG, HOSPITAL.SERVICE.CODE, ICD.PROCEDURE.CODE, ICD9_DIAGNOST_CODE Age = 10-20 -0.30656 -0.26 26% decrease in DBA compared to Age 0-10. This makes sense because younger people are less likely to be readmitted. Age = 20-30 -0.63787 -0.47 47% decrease compared to Age 0-10. … Age = 100+ -1.16735 -0.69 69% decrease compared to Age 0-10. Surgeon = 2 -0.28085 -0.24 24% decrease of DBA compared with Surgeon = 1. … PATIENT.MARITAL.STATUS = M 0.205591 0.21 21% increase of DBA for patients in marriage compared with base-level MARITAL.STATUS = DUWXNA, which is the group for divorced, widowed, or unknown status. People in marriage usually can be taken care of better thus have better health. PATIENT.MARITAL.STATUS = PS 0.06817 0.07 7% increase of DBA for patients with a domestic partner (P) or single (S) compared with base-level MARITAL.STATUS = DUWXNA. income_level = low -0.08182 -0.08 8% decrease of DBA for low-income patients compared to high-income patients. This makes sense because low-income patients may not afford enough healthcare to maintain good health. income_level = medium -0.09786 -0.09 9% decrease of DBA compared to high-income patients. The medium-income patients have even worse readmission days than low-income patients maybe because they have longer working hours, higher mental pressure. Patient.Days = 20-30 -0.22549 -0.20 20% decrease compared to its base level, the visits whose Patient.Days smaller than 20 or greater than 30. This suggests the visits whose inpatient days are between 20-30 days are most likely to be readmitted with short readmission days. PATIENT.RACE.CODE = WX 0.07868 0.08 8% increase of DBA for PATIENT.RACE.CODE is W or X compared to its base level PATIENT.RACE.CODE = AIMNT.

Table 6.  Data Dictionary

 Variable Name Definition Date type and values PATIENT.DRG Patient diagnosis-related group Integer 0-999 NurStat Nurses Station Letters code representing the type of nurses station, with majority values missing. DOCTOR.NUMBER ID of the doctor Integer Surgeon ID of the surgen Integer IO_CODE Inpatient or outpatient I: inpatient O: outpatient HOSPITAL.SERVICE.CODE A code representing the type of healthcare service Letters code. No missing value. ADMIT.SOURCE The code indicating the source of the referral for the admission or visit. 1:Physician Referral2:Clinic Referral3:HMO Referral4:Transfer from a Hospital6:Transfer from Another Health Care Facility8:Court/Law Enforcement9:Information Not Available DISCHARGE.STATUS Patient discharge status Integer code PATIENT.SEX.CODE A$\cdot$code$\cdot$indicating the$\cdot$sex$\cdot$of the$\cdot$patient. F:FemaleM:MaleU: Unknown PATIENT.MARITAL.STATUS Marital status D: divorcedS:singleM:marriedW:widowedU:unknownP:partneredX:legally Separated PATIENT.RACE.CODE Code$\cdot$indicating the$\cdot$racial$\cdot$or ethnic background of a person. A:Asian or Pacific IslanderB:BlackD:Subcontinent Asian AmericanH:HispanicI:American Indian or Alaskan NativeN:Black(Non-Hispanic)O:White(Non-Hispanic)W:widowX:legally separated ICD.PROCEDURE.CODE ICD-10 Procedure Coding Integer code ICD9_DIAGNOST_CODE ICD-9-CM Diagnosis Codes Integer code PATIENT ZIP Patient zip code Integer code DBA days between the admissions Non-negative integer

Table 7.  Levels combined for predictors

 Variable Name Levels before combined Levels after combined Nur.Stat isNAMSCCUERPC isNA WS WS Patient.Days 0-1930+ 0-19or30+ 20-30 20-30 HOSPITAL.SERVICE.CODE OPSCTHNB Y DXMEDOBSOPBOPVEOB R WNDIVTSOWWCNP B ER G ADMIT.SOURCE 9 9 16 16 8 8 4 4 25 25 PATIENT.SEX.CODE MU M F F PATIENT.MARITAL.STATUS M M PS PS DUWXNA DUWXNA PATIENT.RACE.CODE WX WX AIMNTBO AIMNT DH DH DISCHARGE.STATUS 1730 R 25436263657072 Y 346921505164818283 G

Table 8.  The mean and median of log(DBA+1) within three levels of Patient.Days

 Patient.Days Mean of log(DBA+1) Median of log(DBA+1) 0-19 4.34939 4.54329 30+ 3.86552 3.98744 20-30 2.59868 2.07944

Figures(8)

Tables(8)