Models | Parameters |
Huber | epsilon=1.35 max_iter=100 alpha=0.0001 tol=0.00001 |
RANSAC | max_trials=100 stop_probability=0.99 loss='absolute_error' |
Theil Sen | max_subpopulation=10000 max_iter=300 tol=0.001 |
The prediction interval is an important guide for financial organizations to make decisions for pricing loan rates. In this paper, we considered four models with bootstrap technique to calculate prediction intervals. Two datasets are used for the study and $ 5 $-fold cross validation is used to estimate performance. The classical regression and Huber regression models have similar performance, both of them have narrow intervals. Although the RANSAC model has a slightly higher coverage rate, it has the widest interval. When the coverage rates are similar, the model with a narrower interval is recommended. Therefore, the classical and Huber regression models with bootstrap method are recommended to calculate the prediction interval.
Citation: |
Table 1. Default parameters for robust models
Models | Parameters |
Huber | epsilon=1.35 max_iter=100 alpha=0.0001 tol=0.00001 |
RANSAC | max_trials=100 stop_probability=0.99 loss='absolute_error' |
Theil Sen | max_subpopulation=10000 max_iter=300 tol=0.001 |
Table 3. Running time for the two datasets
R values | Model | Running time (seconds) | |
1st Dataset | 2nd Dataset | ||
R=3000 | Classical | 103 | 104 |
Huber | 1176 | 1213 | |
RANSAC | 4027 | 4069 | |
Theil Sen | 2482 | 2720 | |
R=5000 | Classical | 170 | 172 |
Huber | 1981 | 2009 | |
RANSAC | 6715 | 6779 | |
Theil Sen | 4187 | 4587 | |
R=7000 | Classical | 238 | 238 |
Huber | 2737 | 2808 | |
RANSAC | 9491 | 9480 | |
Theil Sen | 5949 | 6507 | |
R=9000 | Classical | 304 | 322 |
Huber | 3517 | 3668 | |
RANSAC | 12287 | 12289 | |
Theil Sen | 7824 | 8549 |
Table 2. Coverage rate for the two datasets
R values | Model | Coverage rate | |
1st Dataset | 2nd Dataset | ||
R=3000 | Classical | 95.18% | 94.88% |
Huber | 95.05% | 94.93% | |
RANSAC | 97.89% | 97.91% | |
Theil Sen | 95.05% | 94.93% | |
R=5000 | Classical | 95.14% | 94.84% |
Huber | 94.91% | 95.01% | |
RANSAC | 97.89% | 97.99% | |
Theil Sen | 95.23% | 94.97% | |
R=7000 | Classical | 95.14% | 94.93% |
Huber | 94.95% | 94.84% | |
RANSAC | 97.84% | 97.91% | |
Theil Sen | 94.86% | 94.97% | |
R=9000 | Classical | 95.09% | 94.93% |
Huber | 95.09% | 94.84% | |
RANSAC | 97.94% | 98.08% | |
Theil Sen | 95.18% | 94.88% |
Table 4. Tukey test of mean of the widths for the first dataset
R value | Model | Difference of Mean | P-value | $ 95\% $ Confidence Interval |
R=3000 | Classical vs Huber | -0.0103 | 0.3012 | (-0.0255, 0.0049) |
RANSAC vs Huber | 0.6364 | 0.0010 | (0.6212, 0.6516) | |
Theil Sen vs Huber | 0.0862 | 0.0010 | (0.0710, 0.1014) | |
RANSAC vs Classical | 0.6467 | 0.0010 | (0.6315, 0.6619) | |
Theil Sen vs Classical | 0.0966 | 0.0010 | (0.0813, 0.1118) | |
Theil Sen vs RANSAC | -0.5501 | 0.0010 | (-0.5653, -0.5349) | |
R=5000 | Classical vs Huber | -0.0099 | 0.3076 | (-0.0246, 0.0048) |
RANSAC vs Huber | 0.6341 | 0.0010 | (0.6194, 0.6488) | |
Theil Sen vs Huber | 0.0871 | 0.0010 | (0.0724, 0.1018) | |
RANSAC vs Classical | 0.6440 | 0.0010 | (0.6293, 0.6587) | |
Theil Sen vs Classical | 0.0970 | 0.0010 | (0.0823, 0.1117) | |
Theil Sen vs RANSAC | -0.5470 | 0.0010 | (-0.5617, -0.5323) | |
R=7000 | Classical vs Huber | -0.0134 | 0.0866 | (-0.0280, 0.0012) |
RANSAC vs Huber | 0.6348 | 0.0010 | (0.6201, 0.6494) | |
Theil Sen vs Huber | 0.0833 | 0.0010 | (0.0687, 0.0979) | |
RANSAC vs Classical | 0.6482 | 0.0010 | (0.6335, 0.6628) | |
Theil Sen vs Classical | 0.0967 | 0.0010 | (0.0821, 0.1113) | |
Theil Sen vs RANSAC | -0.5515 | 0.0010 | (-0.5661, -0.5368) | |
R=9000 | Classical vs Huber | -0.0151 | 0.0365 | (-0.0295, -0.0007) |
RANSAC vs Huber | 0.6352 | 0.0010 | (0.6207, 0.6496) | |
Theil Sen vs Huber | 0.0843 | 0.0010 | (0.0699, 0.0987) | |
RANSAC vs Classical | 0.6503 | 0.0010 | (0.6358, 0.6647 | |
Theil Sen vs Classical | 0.0994 | 0.0010 | (0.0850, 0.1138) | |
Theil Sen vs RANSAC | -0.5509 | 0.0010 | (-0.5653, -0.5364) |
Table 5. Tukey test of mean of the widths for the second dataset
R value | Model | Difference of Mean | P-value | $ 95\% $ Confidence Interval |
R=3000 | Classical vs Huber | -0.0062 | 0.6138 | (-0.0195, 0.0071) |
RANSAC vs Huber | 0.5572 | 0.0010 | (0.5439, 0.5704) | |
Theil Sen vs Huber | 0.0485 | 0.0010 | (0.0353, 0.0618) | |
RANSAC vs Classical | 0.5633 | 0.0010 | (0.5501, 0.5766) | |
Theil Sen vs Classical | 0.0547 | 0.0010 | (0.0414, 0.0680) | |
Theil Sen vs RANSAC | -0.5086 | 0.0010 | (-0.5219, -0.4954) | |
R=5000 | Classical vs Huber | -0.0002 | 0.9000 | (-0.0131, 0.0127) |
RANSAC vs Huber | 0.5600 | 0.0010 | (0.5471, 0.5729) | |
Theil Sen vs Huber | 0.0484 | 0.0010 | (0.0355, 0.0613) | |
RANSAC vs Classical | 0.5602 | 0.0010 | (0.5473, 0.5731) | |
Theil Sen vs Classical | 0.0487 | 0.0010 | (0.0357, 0.0616) | |
Theil Sen vs RANSAC | -0.5116 | 0.0010 | (-0.5245, -0.4986) | |
R=7000 | Classical vs Huber | -0.0017 | 0.9000 | (-0.0144, 0.0111) |
RANSAC vs Huber | 0.5609 | 0.0010 | (0.5482, 0.5736) | |
Theil Sen vs Huber | 0.0466 | 0.0010 | (0.0338, 0.0593) | |
RANSAC vs Classical | 0.5625 | 0.0010 | (0.5498, 0.5753) | |
Theil Sen vs Classical | 0.0482 | 0.0010 | (0.0355, 0.0609) | |
Theil Sen vs RANSAC | -0.5143 | 0.0010 | (-0.5270, -0.5016) | |
R=9000 | Classical vs Huber | -0.0030 | 0.9000 | (-0.0156, 0.0096) |
RANSAC vs Huber | 0.5624 | 0.0010 | (0.5498, 0.5750) | |
Theil Sen vs Huber | 0.0455 | 0.0010 | (0.0329, 0.0581) | |
RANSAC vs Classical | 0.5654 | 0.0010 | (0.5528, 0.5780 | |
Theil Sen vs Classical | 0.0485 | 0.0010 | (0.0359, 0.0611) | |
Theil Sen vs RANSAC | -0.5169 | 0.0010 | (-0.5295, -0.5043) |
[1] | M. G. Akritas, S. A. Murphy and M. P. Lavalley, The Theil-Sen estimator with doubly censored data and applications to astronomy, J. Amer. Statist. Assoc., 90 (1995), 170-177. doi: 10.1080/01621459.1995.10476499. |
[2] | D. Bertsimas and B. Van Parys, Bootstrap robust prescriptive analytics, Mathematical Programming, (2021), 1-40. |
[3] | M. R. Chernick and R. A. LaBudde, An Introduction to Bootstrap Methods with Applications to R, John Wiley & Sons, 2014. |
[4] | X. Dang, H. Peng, X. Wang and H. Zhang, Theil-sen estimators in a multiple linear regression model, Olemiss Edu., (2008). |
[5] | A. C. Davison and D. V. Hinkley, Bootstrap Methods and Their Application, Cambridge Series in Statistical and Probabilistic Mathematics, 1. Cambridge University Press, Cambridge, 1997. doi: 10.1017/CBO9780511802843. |
[6] | B. Efron, Bootstrap methods: Another look at the jackknife, Breakthroughs in Statistics, Springer, (1992), 569-593. |
[7] | B. Efron and R. J. Tibshirani, An Introduction to the Bootstrap, Monographs on Statistics and Applied Probability, 57. Chapman and Hall, New York, 1993. doi: 10.1007/978-1-4899-4541-9. |
[8] | Y. Fang, J. Xu and L. Yang, Online bootstrap confidence intervals for the stochastic gradient descent estimator, J. Mach. Learn. Res., 19 (2018), Paper No. 78, 21 pp. |
[9] | M. A. Fischler and R. C. Bolles, Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography, Comm. ACM, 24 (1981), 381-395. doi: 10.1145/358669.358692. |
[10] | J.-B. Grill, F. Strub, F. Altché, C. Tallec, P. Richemond, E. Buchatskaya, C. Doersch, B. Avila Pires, Z. Guo and M. Gheshlaghi Azar, et al., Bootstrap your own latent-a new approach to self-supervised learning, Advances in Neural Information Processing Systems, 33 (2020), 21271-21284. |
[11] | T. Hesterberg, Bootstrap, Wiley Interdisciplinary Reviews: Computational Statistics, 3 (2011), 497-526. |
[12] | P. J. Huber, Robust estimation of a location parameter, Breakthroughs in Statistics, Springer, (1992), 492-518. |
[13] | P. J. Huber, Robust Statistics, Wiley Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York, 1981. |
[14] | G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning, Springer Texts in Statistics, 103. Springer, New York, 2013. doi: 10.1007/978-1-4614-7138-7. |
[15] | A. Khosravi, S. Nahavandi, D. Srinivasan and R. Khosravi, Constructing optimal prediction intervals by using neural networks and bootstrap method, IEEE Trans. Neural Netw. Learn. Syst., 26 (2015), 1810-1815. doi: 10.1109/TNNLS.2014.2354418. |
[16] | M. Kuhn and K. Johnson, Applied Predictive Modeling, Springer, New York, 2013. doi: 10.1007/978-1-4614-6849-3. |
[17] | L. Pan and D. N. Politis, Bootstrap prediction intervals for linear, nonlinear and nonparametric autoregressions, J. Statist. Plann. Inference, 177 (2016), 1-27. doi: 10.1016/j.jspi.2014.10.003. |
[18] | F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot and E. Duchesnay, Scikit-learn: Machine learning in Python, J. Mach. Learn. Res., 12 (2011), 2825-2830. |
[19] | T. Phaladisailoed and T. Numnonda, Machine learning models comparison for bitcoin price prediction, 2018 10th International Conference on Information Technology and Electrical Engineering (ICITEE), (2018), 506-511. |
[20] | R. Raguram, O. Chum, M. Pollefeys, J. Matas and J.-M. Frahm, Usac: A universal framework for random sample consensus, IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (2012), 2022-2038. |
[21] | S. Raschka, Model evaluation, model selection, and algorithm selection in machine learning, arXiv preprint, (2018), arXiv: 1811.12808. |
[22] | D. Romanić, M. Ćurić, I. Jovičić and M. Lompar, Long-term trends of the 'koshava' wind during the period 1949-2010, International Journal of Climatology, 35 (2015), 288-302. |
[23] | P. K. Sen, Estimates of the regression coefficient based on Kendall's tau, J. Amer. Statist. Assoc., 63 (1968), 1379-1389. doi: 10.1080/01621459.1968.10480934. |
[24] | R. A. Stine, Bootstrap prediction intervals for regression, J. Amer. Statist. Assoc., 80 (1985), 1026-1031. doi: 10.1080/01621459.1985.10478220. |
[25] | H. Theil, A rank-invariant method of linear and polynomial regression analysis, Henri Theil's Contributions to Economics and Econometrics, Springer, (1992), 345-381. |
[26] | D. Wang, D. Hong and Q. Wu, Prediction of loan rate for mortgage data: Deep learning versus robust regression, Computational Economics, (2022), 1-14. |
Histogram of the noterate
Box-plot for Different
Box-plot for Different