
-
Previous Article
Quantum topological data analysis with continuous variables
- FoDS Home
- This Issue
- Next Article
Issues using logistic regression with class imbalance, with a case study from credit risk modelling
Department of Mathematics, Imperial College London, London, SW7 2AZ, UK |
The class imbalance problem arises in two-class classification problems, when the less frequent (minority) class is observed much less than the majority class. This characteristic is endemic in many problems such as modeling default or fraud detection. Recent work by Owen [
References:
[1] |
E. I. Altman and G. Sabato,
Modelling credit risk for smes: Evidence from the US market, Abacus, 43 (2007), 332-357.
doi: 10.1111/j.1467-6281.2007.00234.x. |
[2] |
G. E. Batista, R. C. Prati and M. C. Monard,
A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter, 6 (2004), 20-29.
doi: 10.1145/1007730.1007735. |
[3] |
C. Bravo, L. C. Thomas and R. Weber,
Improving credit scoring by differentiating defaulter behaviour, Journal of the Operational Research Society, 66 (2015), 771-781.
doi: 10.1057/jors.2014.50. |
[4] |
N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer,
SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16 (2002), 321-357.
doi: 10.1613/jair.953. |
[5] |
T. M. Clauretie,
A note on mortgage risk: Default vs. loss rates, Real Estate Economics, 18 (1990), 202-206.
doi: 10.1111/1540-6229.00517. |
[6] |
Cornell Law School, Definition of default, date of default, and requirement of notice of default, URL https://www.law.cornell.edu/cfr/text/24/203.467. Google Scholar |
[7] |
E. R. DeLong and D. L. Clarke-Pearson,
Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach, Biometrics, 44 (1988), 837-845.
doi: 10.2307/2531595. |
[8] |
B. Efron and T. Hastie, Computer Age Statistical Inference: Algorithms, Evidence, and Data Science, Institute of Mathematical Statistics (IMS) Monographs, 5. Cambridge University Press, New York, 2016.
doi: 10.1017/CBO9781316576533.![]() ![]() |
[9] |
T. Fawcett,
An introduction to ROC analysis, Pattern Recognition Letters, 27 (2006), 861-874.
doi: 10.1016/j.patrec.2005.10.010. |
[10] |
D. J. Hand, Reject inference in credit operations, Credit Risk Modeling: Design and Application, 181–190. Google Scholar |
[11] |
A. E. Hoerl and R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12 (1970), 55-67. Google Scholar |
[12] |
G. King and L. Zeng, Logistic regression in rare events data, Political analysis, 9 (2001), 137-163. Google Scholar |
[13] |
G. Krempl and V. Hofer, Classification in presence of drift and latency, in Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, IEEE, 2011, 596–603.
doi: 10.1109/ICDMW.2011.47. |
[14] |
J. Laurikkala,
Improving identification of difficult small classes by balancing class distribution, Artificial Intelligence in Medicine, 2101 (2001), 63-66.
doi: 10.1007/3-540-48229-6_9. |
[15] |
X.-Y. Liu, J. Wu and Z.-H. Zhou, Exploratory undersampling for class-imbalance learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39 (2009), 539-550. Google Scholar |
[16] |
F. J. Massey Jr, The Kolmogorov-{S}mirnov test for goodness of fit, Journal of the American Statistical Association, 46 (1951), 68-78. Google Scholar |
[17] |
F. Murtagh and P. Contreras, Algorithms for hierarchical clustering: An overview, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2 (2012), 86-97. Google Scholar |
[18] |
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, 87. Kluwer Academic Publishers, Boston, MA, 2004.
doi: 10.1007/978-1-4419-8853-9. |
[19] |
A. B. Owen,
Infinitely imbalanced logistic regression, Journal of Machine Learning Research, 8 (2007), 761-773.
|
[20] |
O. Pons,
Bootstrap of means under stratified sampling, Electronic Journal of Statistics, 1 (2007), 381-391.
doi: 10.1214/07-EJS033. |
[21] |
R. Rockafellar, Convex Analysis, Princeton University Press, Princeton, N.J. 1970. |
[22] |
C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse and A. Napolitano, Resampling or reweighting: A comparison of boosting implementations, in 2008 20th IEEE International Conference on Tools with Artificial Intelligence, IEEE, 1 (2008), 445–451.
doi: 10.1109/ICTAI.2008.59. |
[23] |
M. J. Silvapulle,
On the existence of maximum likelihood estimators for the binomial response models, Journal of the Royal Statistical Society. Series B (Methodological), 43 (1981), 310-313.
doi: 10.1111/j.2517-6161.1981.tb01676.x. |
[24] |
St udent, The probable error of a mean, Biometrika, 6 (1908), 1-25. Google Scholar |
[25] |
L. C. Thomas, Consumer Credit Models: Pricing, Profit and Portfolios, Oxford, 2009. Google Scholar |
[26] |
R. Tibshirani,
Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58 (1996), 267-288.
doi: 10.1111/j.2517-6161.1996.tb02080.x. |
[27] |
R. Tibshirani,
The lasso problem and uniqueness, Electronic Journal of Statistics, 7 (2013), 1456-1490.
doi: 10.1214/13-EJS815. |
[28] |
H. Wang, Q. Xu and L. Zhou, Large unbalanced credit scoring using lasso-logistic regression ensemble, PLoS ONE, 10 (2015), e0117844.
doi: 10.1371/journal.pone.0117844. |
[29] |
V. Wieringen and Wessel, Lecture notes on ridge regression, arXiv preprint, arXiv: 1509.09169. Google Scholar |
[30] |
G. Zeng,
On the existence of maximum likelihood estimates for weighted logistic regression, Communications in Statistics-Theory and Methods, 46 (2017), 11194-11203.
doi: 10.1080/03610926.2016.1260742. |
[31] |
M. Zhu, W. Su and H. A. Chipman,
Lago: A computationally efficient approach for statistical detection, Technometrics, 48 (2006), 193-205.
doi: 10.1198/004017005000000643. |
show all references
References:
[1] |
E. I. Altman and G. Sabato,
Modelling credit risk for smes: Evidence from the US market, Abacus, 43 (2007), 332-357.
doi: 10.1111/j.1467-6281.2007.00234.x. |
[2] |
G. E. Batista, R. C. Prati and M. C. Monard,
A study of the behavior of several methods for balancing machine learning training data, ACM SIGKDD Explorations Newsletter, 6 (2004), 20-29.
doi: 10.1145/1007730.1007735. |
[3] |
C. Bravo, L. C. Thomas and R. Weber,
Improving credit scoring by differentiating defaulter behaviour, Journal of the Operational Research Society, 66 (2015), 771-781.
doi: 10.1057/jors.2014.50. |
[4] |
N. V. Chawla, K. W. Bowyer, L. O. Hall and W. P. Kegelmeyer,
SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16 (2002), 321-357.
doi: 10.1613/jair.953. |
[5] |
T. M. Clauretie,
A note on mortgage risk: Default vs. loss rates, Real Estate Economics, 18 (1990), 202-206.
doi: 10.1111/1540-6229.00517. |
[6] |
Cornell Law School, Definition of default, date of default, and requirement of notice of default, URL https://www.law.cornell.edu/cfr/text/24/203.467. Google Scholar |
[7] |
E. R. DeLong and D. L. Clarke-Pearson,
Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach, Biometrics, 44 (1988), 837-845.
doi: 10.2307/2531595. |
[8] |
B. Efron and T. Hastie, Computer Age Statistical Inference: Algorithms, Evidence, and Data Science, Institute of Mathematical Statistics (IMS) Monographs, 5. Cambridge University Press, New York, 2016.
doi: 10.1017/CBO9781316576533.![]() ![]() |
[9] |
T. Fawcett,
An introduction to ROC analysis, Pattern Recognition Letters, 27 (2006), 861-874.
doi: 10.1016/j.patrec.2005.10.010. |
[10] |
D. J. Hand, Reject inference in credit operations, Credit Risk Modeling: Design and Application, 181–190. Google Scholar |
[11] |
A. E. Hoerl and R. W. Kennard, Ridge regression: Biased estimation for nonorthogonal problems, Technometrics, 12 (1970), 55-67. Google Scholar |
[12] |
G. King and L. Zeng, Logistic regression in rare events data, Political analysis, 9 (2001), 137-163. Google Scholar |
[13] |
G. Krempl and V. Hofer, Classification in presence of drift and latency, in Data Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, IEEE, 2011, 596–603.
doi: 10.1109/ICDMW.2011.47. |
[14] |
J. Laurikkala,
Improving identification of difficult small classes by balancing class distribution, Artificial Intelligence in Medicine, 2101 (2001), 63-66.
doi: 10.1007/3-540-48229-6_9. |
[15] |
X.-Y. Liu, J. Wu and Z.-H. Zhou, Exploratory undersampling for class-imbalance learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39 (2009), 539-550. Google Scholar |
[16] |
F. J. Massey Jr, The Kolmogorov-{S}mirnov test for goodness of fit, Journal of the American Statistical Association, 46 (1951), 68-78. Google Scholar |
[17] |
F. Murtagh and P. Contreras, Algorithms for hierarchical clustering: An overview, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2 (2012), 86-97. Google Scholar |
[18] |
Y. Nesterov, Introductory Lectures on Convex Optimization: A Basic Course, Applied Optimization, 87. Kluwer Academic Publishers, Boston, MA, 2004.
doi: 10.1007/978-1-4419-8853-9. |
[19] |
A. B. Owen,
Infinitely imbalanced logistic regression, Journal of Machine Learning Research, 8 (2007), 761-773.
|
[20] |
O. Pons,
Bootstrap of means under stratified sampling, Electronic Journal of Statistics, 1 (2007), 381-391.
doi: 10.1214/07-EJS033. |
[21] |
R. Rockafellar, Convex Analysis, Princeton University Press, Princeton, N.J. 1970. |
[22] |
C. Seiffert, T. M. Khoshgoftaar, J. Van Hulse and A. Napolitano, Resampling or reweighting: A comparison of boosting implementations, in 2008 20th IEEE International Conference on Tools with Artificial Intelligence, IEEE, 1 (2008), 445–451.
doi: 10.1109/ICTAI.2008.59. |
[23] |
M. J. Silvapulle,
On the existence of maximum likelihood estimators for the binomial response models, Journal of the Royal Statistical Society. Series B (Methodological), 43 (1981), 310-313.
doi: 10.1111/j.2517-6161.1981.tb01676.x. |
[24] |
St udent, The probable error of a mean, Biometrika, 6 (1908), 1-25. Google Scholar |
[25] |
L. C. Thomas, Consumer Credit Models: Pricing, Profit and Portfolios, Oxford, 2009. Google Scholar |
[26] |
R. Tibshirani,
Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 58 (1996), 267-288.
doi: 10.1111/j.2517-6161.1996.tb02080.x. |
[27] |
R. Tibshirani,
The lasso problem and uniqueness, Electronic Journal of Statistics, 7 (2013), 1456-1490.
doi: 10.1214/13-EJS815. |
[28] |
H. Wang, Q. Xu and L. Zhou, Large unbalanced credit scoring using lasso-logistic regression ensemble, PLoS ONE, 10 (2015), e0117844.
doi: 10.1371/journal.pone.0117844. |
[29] |
V. Wieringen and Wessel, Lecture notes on ridge regression, arXiv preprint, arXiv: 1509.09169. Google Scholar |
[30] |
G. Zeng,
On the existence of maximum likelihood estimates for weighted logistic regression, Communications in Statistics-Theory and Methods, 46 (2017), 11194-11203.
doi: 10.1080/03610926.2016.1260742. |
[31] |
M. Zhu, W. Su and H. A. Chipman,
Lago: A computationally efficient approach for statistical detection, Technometrics, 48 (2006), 193-205.
doi: 10.1198/004017005000000643. |





Logistic Regression | Ridge Penalized Logistic Regression | ||||||
100 | 1.1215 | 41.7805 | -0.5247 | 0.5917 | 59.1750 | 0.6879 | 1.9896 |
1000 | 0.5656 | 65.3495 | -2.4591 | 0.0855 | 85.5127 | 0.2454 | 1.2782 |
10000 | 0.5013 | 68.3830 | -4.6289 | 0.0098 | 97.6581 | 0.0450 | 1.0460 |
100000 | 0.5007 | 68.6940 | -6.9102 | 0.0010 | 99.7516 | 0.0049 | 1.0050 |
1000000 | 0.5001 | 68.7254 | -9.2106 | 0.0001 | 99.9750 | 0.0005 | 1.0005 |
Logistic Regression | Ridge Penalized Logistic Regression | ||||||
100 | 1.1215 | 41.7805 | -0.5247 | 0.5917 | 59.1750 | 0.6879 | 1.9896 |
1000 | 0.5656 | 65.3495 | -2.4591 | 0.0855 | 85.5127 | 0.2454 | 1.2782 |
10000 | 0.5013 | 68.3830 | -4.6289 | 0.0098 | 97.6581 | 0.0450 | 1.0460 |
100000 | 0.5007 | 68.6940 | -6.9102 | 0.0010 | 99.7516 | 0.0049 | 1.0050 |
1000000 | 0.5001 | 68.7254 | -9.2106 | 0.0001 | 99.9750 | 0.0005 | 1.0005 |
Logistic Regression | Ridge Penalized Logistic Regression | ||||||
100 | 2.2347 | 16.2756 | -1.0602 | 0.3464 | 34.6374 | 1.2598 | 3.5246 |
1000 | 3.2033 | 8.4214 | -3.4516 | 0.0317 | 31.6947 | 1.6478 | 5.1958 |
10000 | 4.6591 | 2.8035 | -4.9902 | 0.0068 | 68.0441 | 0.7112 | 2.0364 |
100000 | 6.3475 | 0.7238 | -6.9521 | 0.0010 | 95.6659 | 0.0878 | 1.0918 |
1000000 | 8.1866 | 0.1524 | -9.2148 | 0.0001 | 99.5517 | 0.0090 | 1.0090 |
Logistic Regression | Ridge Penalized Logistic Regression | ||||||
100 | 2.2347 | 16.2756 | -1.0602 | 0.3464 | 34.6374 | 1.2598 | 3.5246 |
1000 | 3.2033 | 8.4214 | -3.4516 | 0.0317 | 31.6947 | 1.6478 | 5.1958 |
10000 | 4.6591 | 2.8035 | -4.9902 | 0.0068 | 68.0441 | 0.7112 | 2.0364 |
100000 | 6.3475 | 0.7238 | -6.9521 | 0.0010 | 95.6659 | 0.0878 | 1.0918 |
1000000 | 8.1866 | 0.1524 | -9.2148 | 0.0001 | 99.5517 | 0.0090 | 1.0090 |
Fixture | Logistic Regression | Ridge | Lasso |
certain value, |
n | n | |
certain value, |
0 | 0 |
Fixture | Logistic Regression | Ridge | Lasso |
certain value, |
n | n | |
certain value, |
0 | 0 |
0.0190 | 0 | 0 | 0 | 0 | 0 |
0.0168 | 0.1650 | 0 | 0 | 0 | 0 |
0.0153 | 0.3106 | 0.1148 | 0 | 0 | 0 |
0.0139 | 0.4388 | 0.2416 | 0.0377 | 0 | 0 |
0.0116 | 0.6435 | 0.4445 | 0.2392 | 0.0471 | 0 |
0.0087 | 0.8621 | 0.6581 | 0.4525 | 0.2547 | 0.0516 |
0.0190 | 0 | 0 | 0 | 0 | 0 |
0.0168 | 0.1650 | 0 | 0 | 0 | 0 |
0.0153 | 0.3106 | 0.1148 | 0 | 0 | 0 |
0.0139 | 0.4388 | 0.2416 | 0.0377 | 0 | 0 |
0.0116 | 0.6435 | 0.4445 | 0.2392 | 0.0471 | 0 |
0.0087 | 0.8621 | 0.6581 | 0.4525 | 0.2547 | 0.0516 |
Logistic Regression | Multinomial Logistic Regression | |||||
Coefficients | Estimate | Cluster | Coefficients | Estimate | ||
Intercept | -5.7705 | Intercept | -7.6828 | |||
1.1384 | Intercept | -7.6828 | ||||
1.1287 | 0.0818 | 0.6045 | ||||
2.2775 | ||||||
2.3532 | ||||||
0.0953 | 0.5412 |
Logistic Regression | Multinomial Logistic Regression | |||||
Coefficients | Estimate | Cluster | Coefficients | Estimate | ||
Intercept | -5.7705 | Intercept | -7.6828 | |||
1.1384 | Intercept | -7.6828 | ||||
1.1287 | 0.0818 | 0.6045 | ||||
2.2775 | ||||||
2.3532 | ||||||
0.0953 | 0.5412 |
Variable | Type | Description |
Default | Categorical | Dependent variable: 1 if borrower greater than 180 days past due on monthly installments; 0 otherwise. |
Score | Continuous | A number, prepared by third parties, summarizing the borrower's creditworthiness, which may be indicative of the likelihood that the borrower will timely repay future obligations. |
DTI | Continuous | Original Debt-To-Income Ratio. |
UPB | Continuous | Unpaid Principal Balance. |
LTV | Continuous | Original Loan-To-Value. |
OIR | Continuous | Original Interest Rate. |
Number of Borrowers | Categorical | The number of borrower(s) who are obligated to repay the mortgage note secured by the mortgaged property. 1 = one borrower; 2 = more than one borrower. |
Seller | Categorical | The entity acting in its capacity as a seller of mortgages to Freddie Mac at the time of acquisition. |
Servicer | Categorical | The entity acting in its capacity as the servicer of mortgages to Freddie Mac as of the last period for which loan activity is reported in the Dataset. |
First Time Homebuyer | Categorical | Y =yes; N = no. |
Number of Units | Categorical | Denotes whether the mortgage is a one-, two-, three-, or four-unit property. |
Occupancy Status | Categorical | O = Owner Occupied; I = Investment Property; S = Second Home; Space = Unknown. |
Channel | Categorical | R = Retail; B = Broker; C = Correspondent; T = TPO Not Specified; Space = Unknown. |
PPM | Categorical | Denotes whether the mortgage is a Prepayment Penalty Mortgage. Y = PPM; N = Not PPM. |
Property Type | Categorical | CO = Condo; LH = Leasehold; PU = PUD; MH = Manufactured Housing; SF = 1-4 Fee Simple; CP = Co-op; Space = Unknown. |
Channel | Categorical | R = Retail; B = Broker; C = Correspondent; T = TPO Not Specified; Space = Unknown. |
Loan Purpose | Categorical | P = Purchase; C = Cash-out Refinance; N = No Cash-out Refinance; Space = Unknown. |
Variable | Type | Description |
Default | Categorical | Dependent variable: 1 if borrower greater than 180 days past due on monthly installments; 0 otherwise. |
Score | Continuous | A number, prepared by third parties, summarizing the borrower's creditworthiness, which may be indicative of the likelihood that the borrower will timely repay future obligations. |
DTI | Continuous | Original Debt-To-Income Ratio. |
UPB | Continuous | Unpaid Principal Balance. |
LTV | Continuous | Original Loan-To-Value. |
OIR | Continuous | Original Interest Rate. |
Number of Borrowers | Categorical | The number of borrower(s) who are obligated to repay the mortgage note secured by the mortgaged property. 1 = one borrower; 2 = more than one borrower. |
Seller | Categorical | The entity acting in its capacity as a seller of mortgages to Freddie Mac at the time of acquisition. |
Servicer | Categorical | The entity acting in its capacity as the servicer of mortgages to Freddie Mac as of the last period for which loan activity is reported in the Dataset. |
First Time Homebuyer | Categorical | Y =yes; N = no. |
Number of Units | Categorical | Denotes whether the mortgage is a one-, two-, three-, or four-unit property. |
Occupancy Status | Categorical | O = Owner Occupied; I = Investment Property; S = Second Home; Space = Unknown. |
Channel | Categorical | R = Retail; B = Broker; C = Correspondent; T = TPO Not Specified; Space = Unknown. |
PPM | Categorical | Denotes whether the mortgage is a Prepayment Penalty Mortgage. Y = PPM; N = Not PPM. |
Property Type | Categorical | CO = Condo; LH = Leasehold; PU = PUD; MH = Manufactured Housing; SF = 1-4 Fee Simple; CP = Co-op; Space = Unknown. |
Channel | Categorical | R = Retail; B = Broker; C = Correspondent; T = TPO Not Specified; Space = Unknown. |
Loan Purpose | Categorical | P = Purchase; C = Cash-out Refinance; N = No Cash-out Refinance; Space = Unknown. |
Training set year | 2000 | 2001 | |
Default collection year | 2001 2002 | 2002 2003 | |
Testing set year | 2003 | 2004 |
Training set year | 2000 | 2001 | |
Default collection year | 2001 2002 | 2002 2003 | |
Testing set year | 2003 | 2004 |
Time | With Relabelling | Without Relabelling | ||||||
AUC | DeLong | Bootstrap | Stratified | AUC | DeLong | Bootstrap | Stratified | |
2003 Q1 | 0.879 | 0.033 | 0.035 | 0.032 | 0.873 | 0.032 | 0.028 | 0.033 |
2003 Q2 | 0.880 | 0.025 | 0.024 | 0.024 | 0.878 | 0.026 | 0.026 | 0.025 |
2003 Q3 | 0.839 | 0.035 | 0.033 | 0.031 | 0.824 | 0.039 | 0.037 | 0.038 |
2003 Q4 | 0.872 | 0.025 | 0.025 | 0.025 | 0.872 | 0.026 | 0.028 | 0.026 |
2004 Q1 | 0.808 | 0.042 | 0.041 | 0.041 | 0.804 | 0.043 | 0.041 | 0.039 |
2004 Q2 | 0.804 | 0.053 | 0.056 | 0.053 | 0.795 | 0.052 | 0.046 | 0.050 |
2004 Q3 | 0.636 | 0.067 | 0.063 | 0.067 | 0.634 | 0.075 | 0.067 | 0.073 |
2004 Q4 | 0.806 | 0.046 | 0.045 | 0.046 | 0.796 | 0.054 | 0.056 | 0.051 |
2005 Q1 | 0.865 | 0.025 | 0.024 | 0.027 | 0.805 | 0.042 | 0.045 | 0.043 |
2005 Q2 | 0.841 | 0.026 | 0.025 | 0.026 | 0.758 | 0.038 | 0.037 | 0.036 |
2005 Q3 | 0.849 | 0.021 | 0.020 | 0.022 | 0.799 | 0.033 | 0.032 | 0.033 |
2005 Q4 | 0.814 | 0.022 | 0.022 | 0.021 | 0.776 | 0.027 | 0.028 | 0.029 |
2006 Q1 | 0.817 | 0.017 | 0.016 | 0.016 | 0.797 | 0.020 | 0.021 | 0.019 |
2006 Q2 | 0.803 | 0.015 | 0.016 | 0.016 | 0.795 | 0.017 | 0.017 | 0.017 |
2006 Q3 | 0.789 | 0.016 | 0.015 | 0.015 | 0.776 | 0.018 | 0.018 | 0.018 |
2006 Q4 | 0.776 | 0.012 | 0.012 | 0.012 | 0.769 | 0.013 | 0.013 | 0.013 |
2007 Q1 | 0.697 | 0.013 | 0.013 | 0.014 | 0.713 | 0.013 | 0.012 | 0.012 |
2007 Q2 | 0.704 | 0.010 | 0.010 | 0.010 | 0.720 | 0.009 | 0.009 | 0.009 |
2007 Q3 | 0.725 | 0.008 | 0.008 | 0.008 | 0.727 | 0.008 | 0.008 | 0.008 |
2007 Q4 | 0.720 | 0.006 | 0.006 | 0.007 | 0.738 | 0.006 | 0.006 | 0.005 |
2008 Q1 | 0.837 | 0.004 | 0.004 | 0.004 | 0.838 | 0.004 | 0.005 | 0.005 |
2008 Q2 | 0.832 | 0.005 | 0.005 | 0.005 | 0.833 | 0.005 | 0.006 | 0.005 |
2008 Q3 | 0.830 | 0.006 | 0.006 | 0.007 | 0.831 | 0.006 | 0.006 | 0.007 |
2008 Q4 | 0.857 | 0.008 | 0.008 | 0.008 | 0.856 | 0.008 | 0.008 | 0.008 |
2009 Q1 | 0.804 | 0.024 | 0.023 | 0.022 | 0.805 | 0.024 | 0.023 | 0.023 |
2009 Q2 | 0.811 | 0.018 | 0.019 | 0.017 | 0.807 | 0.018 | 0.017 | 0.018 |
2009 Q3 | 0.757 | 0.013 | 0.013 | 0.013 | 0.758 | 0.013 | 0.012 | 0.013 |
2009 Q4 | 0.738 | 0.023 | 0.025 | 0.022 | 0.742 | 0.023 | 0.022 | 0.023 |
2010 Q1 | 0.825 | 0.033 | 0.034 | 0.032 | 0.829 | 0.032 | 0.029 | 0.031 |
2010 Q2 | 0.793 | 0.038 | 0.039 | 0.037 | 0.798 | 0.037 | 0.034 | 0.039 |
2010 Q3 | 0.826 | 0.034 | 0.031 | 0.034 | 0.830 | 0.033 | 0.029 | 0.033 |
2010 Q4 | 0.769 | 0.036 | 0.038 | 0.034 | 0.779 | 0.037 | 0.035 | 0.037 |
2011 Q1 | 0.789 | 0.039 | 0.037 | 0.035 | 0.780 | 0.039 | 0.043 | 0.039 |
2011 Q2 | 0.780 | 0.042 | 0.041 | 0.039 | 0.773 | 0.043 | 0.041 | 0.042 |
2011 Q3 | 0.740 | 0.048 | 0.048 | 0.044 | 0.733 | 0.049 | 0.048 | 0.046 |
2011 Q4 | 0.782 | 0.050 | 0.043 | 0.047 | 0.783 | 0.049 | 0.050 | 0.046 |
2012 Q1 | 0.861 | 0.034 | 0.032 | 0.033 | 0.868 | 0.031 | 0.031 | 0.031 |
2012 Q2 | 0.776 | 0.043 | 0.045 | 0.038 | 0.778 | 0.042 | 0.046 | 0.039 |
2012 Q3 | 0.771 | 0.045 | 0.043 | 0.045 | 0.784 | 0.045 | 0.046 | 0.043 |
2012 Q4 | 0.771 | 0.038 | 0.036 | 0.034 | 0.766 | 0.039 | 0.038 | 0.040 |
2013 Q1 | 0.769 | 0.039 | 0.037 | 0.039 | 0.772 | 0.040 | 0.039 | 0.041 |
2013 Q2 | 0.738 | 0.029 | 0.028 | 0.029 | 0.739 | 0.030 | 0.028 | 0.026 |
2013 Q3 | 0.730 | 0.040 | 0.039 | 0.041 | 0.735 | 0.042 | 0.043 | 0.041 |
2013 Q4 | 0.754 | 0.033 | 0.031 | 0.032 | 0.750 | 0.033 | 0.032 | 0.032 |
Time | With Relabelling | Without Relabelling | ||||||
AUC | DeLong | Bootstrap | Stratified | AUC | DeLong | Bootstrap | Stratified | |
2003 Q1 | 0.879 | 0.033 | 0.035 | 0.032 | 0.873 | 0.032 | 0.028 | 0.033 |
2003 Q2 | 0.880 | 0.025 | 0.024 | 0.024 | 0.878 | 0.026 | 0.026 | 0.025 |
2003 Q3 | 0.839 | 0.035 | 0.033 | 0.031 | 0.824 | 0.039 | 0.037 | 0.038 |
2003 Q4 | 0.872 | 0.025 | 0.025 | 0.025 | 0.872 | 0.026 | 0.028 | 0.026 |
2004 Q1 | 0.808 | 0.042 | 0.041 | 0.041 | 0.804 | 0.043 | 0.041 | 0.039 |
2004 Q2 | 0.804 | 0.053 | 0.056 | 0.053 | 0.795 | 0.052 | 0.046 | 0.050 |
2004 Q3 | 0.636 | 0.067 | 0.063 | 0.067 | 0.634 | 0.075 | 0.067 | 0.073 |
2004 Q4 | 0.806 | 0.046 | 0.045 | 0.046 | 0.796 | 0.054 | 0.056 | 0.051 |
2005 Q1 | 0.865 | 0.025 | 0.024 | 0.027 | 0.805 | 0.042 | 0.045 | 0.043 |
2005 Q2 | 0.841 | 0.026 | 0.025 | 0.026 | 0.758 | 0.038 | 0.037 | 0.036 |
2005 Q3 | 0.849 | 0.021 | 0.020 | 0.022 | 0.799 | 0.033 | 0.032 | 0.033 |
2005 Q4 | 0.814 | 0.022 | 0.022 | 0.021 | 0.776 | 0.027 | 0.028 | 0.029 |
2006 Q1 | 0.817 | 0.017 | 0.016 | 0.016 | 0.797 | 0.020 | 0.021 | 0.019 |
2006 Q2 | 0.803 | 0.015 | 0.016 | 0.016 | 0.795 | 0.017 | 0.017 | 0.017 |
2006 Q3 | 0.789 | 0.016 | 0.015 | 0.015 | 0.776 | 0.018 | 0.018 | 0.018 |
2006 Q4 | 0.776 | 0.012 | 0.012 | 0.012 | 0.769 | 0.013 | 0.013 | 0.013 |
2007 Q1 | 0.697 | 0.013 | 0.013 | 0.014 | 0.713 | 0.013 | 0.012 | 0.012 |
2007 Q2 | 0.704 | 0.010 | 0.010 | 0.010 | 0.720 | 0.009 | 0.009 | 0.009 |
2007 Q3 | 0.725 | 0.008 | 0.008 | 0.008 | 0.727 | 0.008 | 0.008 | 0.008 |
2007 Q4 | 0.720 | 0.006 | 0.006 | 0.007 | 0.738 | 0.006 | 0.006 | 0.005 |
2008 Q1 | 0.837 | 0.004 | 0.004 | 0.004 | 0.838 | 0.004 | 0.005 | 0.005 |
2008 Q2 | 0.832 | 0.005 | 0.005 | 0.005 | 0.833 | 0.005 | 0.006 | 0.005 |
2008 Q3 | 0.830 | 0.006 | 0.006 | 0.007 | 0.831 | 0.006 | 0.006 | 0.007 |
2008 Q4 | 0.857 | 0.008 | 0.008 | 0.008 | 0.856 | 0.008 | 0.008 | 0.008 |
2009 Q1 | 0.804 | 0.024 | 0.023 | 0.022 | 0.805 | 0.024 | 0.023 | 0.023 |
2009 Q2 | 0.811 | 0.018 | 0.019 | 0.017 | 0.807 | 0.018 | 0.017 | 0.018 |
2009 Q3 | 0.757 | 0.013 | 0.013 | 0.013 | 0.758 | 0.013 | 0.012 | 0.013 |
2009 Q4 | 0.738 | 0.023 | 0.025 | 0.022 | 0.742 | 0.023 | 0.022 | 0.023 |
2010 Q1 | 0.825 | 0.033 | 0.034 | 0.032 | 0.829 | 0.032 | 0.029 | 0.031 |
2010 Q2 | 0.793 | 0.038 | 0.039 | 0.037 | 0.798 | 0.037 | 0.034 | 0.039 |
2010 Q3 | 0.826 | 0.034 | 0.031 | 0.034 | 0.830 | 0.033 | 0.029 | 0.033 |
2010 Q4 | 0.769 | 0.036 | 0.038 | 0.034 | 0.779 | 0.037 | 0.035 | 0.037 |
2011 Q1 | 0.789 | 0.039 | 0.037 | 0.035 | 0.780 | 0.039 | 0.043 | 0.039 |
2011 Q2 | 0.780 | 0.042 | 0.041 | 0.039 | 0.773 | 0.043 | 0.041 | 0.042 |
2011 Q3 | 0.740 | 0.048 | 0.048 | 0.044 | 0.733 | 0.049 | 0.048 | 0.046 |
2011 Q4 | 0.782 | 0.050 | 0.043 | 0.047 | 0.783 | 0.049 | 0.050 | 0.046 |
2012 Q1 | 0.861 | 0.034 | 0.032 | 0.033 | 0.868 | 0.031 | 0.031 | 0.031 |
2012 Q2 | 0.776 | 0.043 | 0.045 | 0.038 | 0.778 | 0.042 | 0.046 | 0.039 |
2012 Q3 | 0.771 | 0.045 | 0.043 | 0.045 | 0.784 | 0.045 | 0.046 | 0.043 |
2012 Q4 | 0.771 | 0.038 | 0.036 | 0.034 | 0.766 | 0.039 | 0.038 | 0.040 |
2013 Q1 | 0.769 | 0.039 | 0.037 | 0.039 | 0.772 | 0.040 | 0.039 | 0.041 |
2013 Q2 | 0.738 | 0.029 | 0.028 | 0.029 | 0.739 | 0.030 | 0.028 | 0.026 |
2013 Q3 | 0.730 | 0.040 | 0.039 | 0.041 | 0.735 | 0.042 | 0.043 | 0.041 |
2013 Q4 | 0.754 | 0.033 | 0.031 | 0.032 | 0.750 | 0.033 | 0.032 | 0.032 |
train year | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 |
test year | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 |
without relabelling | 0.435 | 0.885 | 0.980 | 1.000 | 1.000 | 0.800 |
with relabelling | 0.420 | 0.679 | 0.398 | 0.842 | 0.855 | 0.289 |
train year | 2006 | 2007 | 2008 | 2009 | 2010 | |
test year | 2009 | 2010 | 2011 | 2012 | 2013 | |
without relabelling | 0.993 | 0.900 | 0.985 | 0.890 | 0.990 | |
with relabelling | 0.930 | 0.930 | 0.983 | 0.827 | 0.795 |
train year | 2000 | 2001 | 2002 | 2003 | 2004 | 2005 |
test year | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 |
without relabelling | 0.435 | 0.885 | 0.980 | 1.000 | 1.000 | 0.800 |
with relabelling | 0.420 | 0.679 | 0.398 | 0.842 | 0.855 | 0.289 |
train year | 2006 | 2007 | 2008 | 2009 | 2010 | |
test year | 2009 | 2010 | 2011 | 2012 | 2013 | |
without relabelling | 0.993 | 0.900 | 0.985 | 0.890 | 0.990 | |
with relabelling | 0.930 | 0.930 | 0.983 | 0.827 | 0.795 |
Train year | Training Default rate | Test year | Test Default rate | AUC difference |
2000 | 0.41% | 2003 | 0.06% | 0.0057 |
2001 | 0.20% | 2004 | 0.07% | 0.0063 |
2002 | 0.10% | 2005 | 0.18% | 0.0578 |
2003 | 0.06% | 2006 | 0.89% | 0.0119 |
2004 | 0.07% | 2007 | 4.26% | -0.0133 |
2005 | 0.18% | 2008 | 3.15% | -0.0005 |
2006 | 0.89% | 2009 | 0.30% | -0.0003 |
2007 | 4.26% | 2010 | 0.09% | -0.0055 |
2008 | 3.15% | 2011 | 0.08% | 0.0055 |
2009 | 0.30% | 2012 | 0.06% | -0.0041 |
2010 | 0.09% | 2013 | 0.10% | -0.0011 |
Train year | Training Default rate | Test year | Test Default rate | AUC difference |
2000 | 0.41% | 2003 | 0.06% | 0.0057 |
2001 | 0.20% | 2004 | 0.07% | 0.0063 |
2002 | 0.10% | 2005 | 0.18% | 0.0578 |
2003 | 0.06% | 2006 | 0.89% | 0.0119 |
2004 | 0.07% | 2007 | 4.26% | -0.0133 |
2005 | 0.18% | 2008 | 3.15% | -0.0005 |
2006 | 0.89% | 2009 | 0.30% | -0.0003 |
2007 | 4.26% | 2010 | 0.09% | -0.0055 |
2008 | 3.15% | 2011 | 0.08% | 0.0055 |
2009 | 0.30% | 2012 | 0.06% | -0.0041 |
2010 | 0.09% | 2013 | 0.10% | -0.0011 |
[1] |
Yuyuan Ouyang, Trevor Squires. Some worst-case datasets of deterministic first-order methods for solving binary logistic regression. Inverse Problems & Imaging, 2021, 15 (1) : 63-77. doi: 10.3934/ipi.2020047 |
[2] |
Wenbin Lv, Qingyuan Wang. Global existence for a class of Keller-Segel models with signal-dependent motility and general logistic term. Evolution Equations & Control Theory, 2021, 10 (1) : 25-36. doi: 10.3934/eect.2020040 |
[3] |
Bilel Elbetch, Tounsia Benzekri, Daniel Massart, Tewfik Sari. The multi-patch logistic equation. Discrete & Continuous Dynamical Systems - B, 2021 doi: 10.3934/dcdsb.2021025 |
[4] |
Lu Xu, Chunlai Mu, Qiao Xin. Global boundedness of solutions to the two-dimensional forager-exploiter model with logistic source. Discrete & Continuous Dynamical Systems - A, 2020 doi: 10.3934/dcds.2020396 |
[5] |
Lei Yang, Lianzhang Bao. Numerical study of vanishing and spreading dynamics of chemotaxis systems with logistic source and a free boundary. Discrete & Continuous Dynamical Systems - B, 2021, 26 (2) : 1083-1109. doi: 10.3934/dcdsb.2020154 |
[6] |
Xiyou Cheng, Zhitao Zhang. Structure of positive solutions to a class of Schrödinger systems. Discrete & Continuous Dynamical Systems - S, 2020 doi: 10.3934/dcdss.2020461 |
[7] |
Nguyen Thi Kim Son, Nguyen Phuong Dong, Le Hoang Son, Alireza Khastan, Hoang Viet Long. Complete controllability for a class of fractional evolution equations with uncertainty. Evolution Equations & Control Theory, 2020 doi: 10.3934/eect.2020104 |
[8] |
Dandan Wang, Xiwang Cao, Gaojun Luo. A class of linear codes and their complete weight enumerators. Advances in Mathematics of Communications, 2021, 15 (1) : 73-97. doi: 10.3934/amc.2020044 |
[9] |
Nahed Naceur, Nour Eddine Alaa, Moez Khenissi, Jean R. Roche. Theoretical and numerical analysis of a class of quasilinear elliptic equations. Discrete & Continuous Dynamical Systems - S, 2021, 14 (2) : 723-743. doi: 10.3934/dcdss.2020354 |
[10] |
Shiqiu Fu, Kanishka Perera. On a class of semipositone problems with singular Trudinger-Moser nonlinearities. Discrete & Continuous Dynamical Systems - S, 2020 doi: 10.3934/dcdss.2020452 |
[11] |
Guojie Zheng, Dihong Xu, Taige Wang. A unique continuation property for a class of parabolic differential inequalities in a bounded domain. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2020280 |
[12] |
Caterina Balzotti, Simone Göttlich. A two-dimensional multi-class traffic flow model. Networks & Heterogeneous Media, 2020 doi: 10.3934/nhm.2020034 |
[13] |
Nguyen Huu Can, Nguyen Huy Tuan, Donal O'Regan, Vo Van Au. On a final value problem for a class of nonlinear hyperbolic equations with damping term. Evolution Equations & Control Theory, 2021, 10 (1) : 103-127. doi: 10.3934/eect.2020053 |
[14] |
Meilan Cai, Maoan Han. Limit cycle bifurcations in a class of piecewise smooth cubic systems with multiple parameters. Communications on Pure & Applied Analysis, 2021, 20 (1) : 55-75. doi: 10.3934/cpaa.2020257 |
[15] |
Shudi Yang, Xiangli Kong, Xueying Shi. Complete weight enumerators of a class of linear codes over finite fields. Advances in Mathematics of Communications, 2021, 15 (1) : 99-112. doi: 10.3934/amc.2020045 |
[16] |
Zhouxin Li, Yimin Zhang. Ground states for a class of quasilinear Schrödinger equations with vanishing potentials. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2020298 |
[17] |
Hui Gao, Jian Lv, Xiaoliang Wang, Liping Pang. An alternating linearization bundle method for a class of nonconvex optimization problem with inexact information. Journal of Industrial & Management Optimization, 2021, 17 (2) : 805-825. doi: 10.3934/jimo.2019135 |
[18] |
Yolanda Guerrero–Sánchez, Muhammad Umar, Zulqurnain Sabir, Juan L. G. Guirao, Muhammad Asif Zahoor Raja. Solving a class of biological HIV infection model of latently infected cells using heuristic approach. Discrete & Continuous Dynamical Systems - S, 2020 doi: 10.3934/dcdss.2020431 |
[19] |
Zedong Yang, Guotao Wang, Ravi P. Agarwal, Haiyong Xu. Existence and nonexistence of entire positive radial solutions for a class of Schrödinger elliptic systems involving a nonlinear operator. Discrete & Continuous Dynamical Systems - S, 2020 doi: 10.3934/dcdss.2020436 |
[20] |
Mengni Li. Global regularity for a class of Monge-Ampère type equations with nonzero boundary conditions. Communications on Pure & Applied Analysis, 2021, 20 (1) : 301-317. doi: 10.3934/cpaa.2020267 |
Impact Factor:
Tools
Article outline
Figures and Tables
[Back to Top]