# American Institute of Mathematical Sciences

• Previous Article
The $F$-objective function method for differentiable interval-valued vector optimization problems
• JIMO Home
• This Issue
• Next Article
The dual step size of the alternating direction method can be larger than 1.618 when one function is strongly convex
doi: 10.3934/jimo.2020128

## Two penalized mixed–integer nonlinear programming approaches to tackle multicollinearity and outliers effects in linear regression models

 Faculty of Mathematics, Statistics and Computer Science, Semnan University, P.O. Box 35195–363, Semnan, Iran

* Corresponding author: Mahdi Roozbeh

Received  September 2019 Revised  May 2020 Published  August 2020

In classical regression analysis, the ordinary least–squares estimation is the best strategy when the essential assumptions such as normality and independency to the error terms as well as ignorable multicollinearity in the covariates are met. However, if one of these assumptions is violated, then the results may be misleading. Especially, outliers violate the assumption of normally distributed residuals in the least–squares regression. In this situation, robust estimators are widely used because of their lack of sensitivity to outlying data points. Multicollinearity is another common problem in multiple regression models with inappropriate effects on the least–squares estimators. So, it is of great importance to use the estimation methods provided to tackle the mentioned problems. As known, robust regressions are among the popular methods for analyzing the data that are contaminated with outliers. In this guideline, here we suggest two mixed–integer nonlinear optimization models which their solutions can be considered as appropriate estimators when the outliers and multicollinearity simultaneously appear in the data set. Capable to be effectively solved by metaheuristic algorithms, the models are designed based on penalization schemes with the ability of down–weighting or ignoring unusual data and multicollinearity effects. We establish that our models are computationally advantageous in the perspective of the flop count. We also deal with a robust ridge methodology. Finally, three real data sets are analyzed to examine performance of the proposed methods.

Citation: Mahdi Roozbeh, Saman Babaie–Kafaki, Zohre Aminifard. Two penalized mixed–integer nonlinear programming approaches to tackle multicollinearity and outliers effects in linear regression models. Journal of Industrial & Management Optimization, doi: 10.3934/jimo.2020128
##### References:

show all references

##### References:
The diagnostic plots of the model (18)
The diagram of ${\rm GCV}(k,z)$ versus the ridge parameter for the bridge projects data set
The diagnostic plots for the model (20)
The diagram of ${\rm GCV}(k,z)$ versus the ridge parameter for the electricity data
The diagnostic plots for the model (21)
The diagram of ${\rm GCV}(k,z)$ versus the ridge parameter for the CPS data
Evaluation of the proposed estimators for the bridge projects data set
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 2.3317 1.91363 2.0304 1.8278 $\log(CCost)$ 0.1483 0.33718 0.3056 0.2923 $\log(Dwgs)$ 0.8356 0.58002 0.6210 0.7829 $\log(Spans)$ 0.1963 0.06662 0.0657 0.0241 ${\rm SSE}$ 3.8692 1.9788 1.9778 1.0577 ${\rm R}^2$ 0.7747 0.8579 0.8600 0.9147 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 1.9140 -0.0125 - -7.8431 $\log(CCost)$ 0.2360 0.4152 - 0.4236 $\log(Dwgs)$ 0.8914 0.3933 - 2.8061 $\log(Spans)$ 0.0467 0.1176 - 0.5110 ${\rm SSE}$ 1.1504 4.0131 2.7834 1.7108 ${\rm R}^2$ 0.9020 0.7663 0.8379 0.9004
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 2.3317 1.91363 2.0304 1.8278 $\log(CCost)$ 0.1483 0.33718 0.3056 0.2923 $\log(Dwgs)$ 0.8356 0.58002 0.6210 0.7829 $\log(Spans)$ 0.1963 0.06662 0.0657 0.0241 ${\rm SSE}$ 3.8692 1.9788 1.9778 1.0577 ${\rm R}^2$ 0.7747 0.8579 0.8600 0.9147 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 1.9140 -0.0125 - -7.8431 $\log(CCost)$ 0.2360 0.4152 - 0.4236 $\log(Dwgs)$ 0.8914 0.3933 - 2.8061 $\log(Spans)$ 0.0467 0.1176 - 0.5110 ${\rm SSE}$ 1.1504 4.0131 2.7834 1.7108 ${\rm R}^2$ 0.9020 0.7663 0.8379 0.9004
The most effective subgroup of predictor variables based on the ${\rm R}^2_{adj}$ and AIC criteria for the electricity data set
 Subset size Predictor variables ${\rm R}^2_{adj}$ AIC 1 $Temp$ 0.5523 -1067.814 2 $Temp,LREG$ 0.5781 -1077.339 3 ${\bf Temp,LREG,LI}$ 0.5892 -1081.063 4 $Temp,LREG,LI,x_{9}$ 0.5891 -1080.057 5 $Temp,LREG,LI,x_{9},x_{10}$ 0.5882 -1078.709 6 $Temp,LREG,LI,x_{9},x_{10},x_{11}$ 0.5875 -1077.427 7 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1}$ 0.5858 -1075.734 8 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3}$ 0.5837 -1073.897 9 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5}$ 0.5812 -1071.907 10 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4}$ 0.5789 -1069.987 11 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7}$ 0.5764 -1067.997 12 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2}$ 0.5740 -1064.098 13 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2},x_{6}$ 0.5718 -1064.281 14 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2},x_{6},x_{8}$ 0.5709 -1063.014
 Subset size Predictor variables ${\rm R}^2_{adj}$ AIC 1 $Temp$ 0.5523 -1067.814 2 $Temp,LREG$ 0.5781 -1077.339 3 ${\bf Temp,LREG,LI}$ 0.5892 -1081.063 4 $Temp,LREG,LI,x_{9}$ 0.5891 -1080.057 5 $Temp,LREG,LI,x_{9},x_{10}$ 0.5882 -1078.709 6 $Temp,LREG,LI,x_{9},x_{10},x_{11}$ 0.5875 -1077.427 7 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1}$ 0.5858 -1075.734 8 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3}$ 0.5837 -1073.897 9 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5}$ 0.5812 -1071.907 10 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4}$ 0.5789 -1069.987 11 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7}$ 0.5764 -1067.997 12 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2}$ 0.5740 -1064.098 13 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2},x_{6}$ 0.5718 -1064.281 14 $Temp,LREG,LI,x_{9},x_{10},x_{11},x_{1},x_{3},x_{5},x_{4},x_{7},x_{2},x_{6},x_{8}$ 0.5709 -1063.014
Evaluation of the proposed estimators for the electricity data set
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 4.4069 5.1693 4.9881 5.2039 $LI$ 0.1925 0.0989 0.1146 0.0956 $LREG$ -0.0778 -0.0939 -0.1054 -0.0956 $Temp$ -0.0002 -0.0002 -0.0003 -0.0003 ${\rm SSE}$ 0.3765 0.2637 0.1982 0.1296 ${\rm R}^2$ 0.5962 0.6742 0.7399 0.7559 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 4.0907 0.0881 - 2.6215 $LI$ 0.2225 0.1545 - 1.2806 $LREG$ -0.0940 -0.1322 - -3.7418 $Temp$ -0.0003 -0.7508 - -0.8067 ${\rm SSE}$ 0.1413 0.3881 0.2629 0.4240 ${\rm R}^2$ 0.7468 0.5838 0.7181 0.5452
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 4.4069 5.1693 4.9881 5.2039 $LI$ 0.1925 0.0989 0.1146 0.0956 $LREG$ -0.0778 -0.0939 -0.1054 -0.0956 $Temp$ -0.0002 -0.0002 -0.0003 -0.0003 ${\rm SSE}$ 0.3765 0.2637 0.1982 0.1296 ${\rm R}^2$ 0.5962 0.6742 0.7399 0.7559 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 4.0907 0.0881 - 2.6215 $LI$ 0.2225 0.1545 - 1.2806 $LREG$ -0.0940 -0.1322 - -3.7418 $Temp$ -0.0003 -0.7508 - -0.8067 ${\rm SSE}$ 0.1413 0.3881 0.2629 0.4240 ${\rm R}^2$ 0.7468 0.5838 0.7181 0.5452
Evaluation of the proposed estimators for the CPS data
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 1.0786 0.7498 1.1963 0.9257 $education$ 0.1794 0.1482 0.2576 0.2018 $south$ -0.1024 -0.1208 -0.1109 -0.1174 $sex$ -0.2220 -0.2851 -0.2776 -0.2665 $experience$ 0.0958 0.0613 0.1630 0.1090 $union$ 0.2005 0.1939 0.1987 0.1427 $age$ -0.0854 -0.0473 -0.1510 -0.0960 $race$ 0.0504 0.0674 0.0482 0.0749 $occupation$ -0.0074 -0.0122 0.0072 -0.0126 $sector$ 0.0915 0.0614 0.0411 0.0965 $married$ 0.0766 0.0590 0.1937 0.0924 ${\rm SSE}$ 101.17 76.3827 50.5810 49.8101 ${\rm R}^2$ 0.3185 0.4049 0.4146 0.4123 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 0.9038 0.0054 - -5.5913 $education$ 0.1974 0.4997 - 0.6978 $south$ -0.0916 -0.1141 - -0.4331 $sex$ -0.2416 -0.2638 - -0.9731 $experience$ 0.1011 0.2573 - 0.2991 $union$ 0.1791 0.1511 - 1.0483 $age$ -0.0888 0.0420 - -0.2590 $race$ 0.0515 0.0930 - 0.2437 $occupation$ -0.0140 -0.0526 - 0.0004 $sector$ 0.0810 0.0918 - 0.3258 $married$ 0.1216 0.0524 - 0.4156 ${\rm SSE}$ 49.2827 102.5847 79.0911 84.2234 ${\rm R}^2$ 0.4279 0.3089 0.4672 0.4326
 Method Coefficients OLS RLTS MLTSCM UBDMLTSCM1 $Intercept$ 1.0786 0.7498 1.1963 0.9257 $education$ 0.1794 0.1482 0.2576 0.2018 $south$ -0.1024 -0.1208 -0.1109 -0.1174 $sex$ -0.2220 -0.2851 -0.2776 -0.2665 $experience$ 0.0958 0.0613 0.1630 0.1090 $union$ 0.2005 0.1939 0.1987 0.1427 $age$ -0.0854 -0.0473 -0.1510 -0.0960 $race$ 0.0504 0.0674 0.0482 0.0749 $occupation$ -0.0074 -0.0122 0.0072 -0.0126 $sector$ 0.0915 0.0614 0.0411 0.0965 $married$ 0.0766 0.0590 0.1937 0.0924 ${\rm SSE}$ 101.17 76.3827 50.5810 49.8101 ${\rm R}^2$ 0.3185 0.4049 0.4146 0.4123 Method Coefficients UBDMLTSCM2 LSVR NSVR NNR $Intercept$ 0.9038 0.0054 - -5.5913 $education$ 0.1974 0.4997 - 0.6978 $south$ -0.0916 -0.1141 - -0.4331 $sex$ -0.2416 -0.2638 - -0.9731 $experience$ 0.1011 0.2573 - 0.2991 $union$ 0.1791 0.1511 - 1.0483 $age$ -0.0888 0.0420 - -0.2590 $race$ 0.0515 0.0930 - 0.2437 $occupation$ -0.0140 -0.0526 - 0.0004 $sector$ 0.0810 0.0918 - 0.3258 $married$ 0.1216 0.0524 - 0.4156 ${\rm SSE}$ 49.2827 102.5847 79.0911 84.2234 ${\rm R}^2$ 0.4279 0.3089 0.4672 0.4326
 [1] Yasmine Cherfaoui, Mustapha Moulaï. Biobjective optimization over the efficient set of multiobjective integer programming problem. Journal of Industrial & Management Optimization, 2021, 17 (1) : 117-131. doi: 10.3934/jimo.2019102 [2] Yuyuan Ouyang, Trevor Squires. Some worst-case datasets of deterministic first-order methods for solving binary logistic regression. Inverse Problems & Imaging, 2021, 15 (1) : 63-77. doi: 10.3934/ipi.2020047 [3] Ugo Bessi. Another point of view on Kusuoka's measure. Discrete & Continuous Dynamical Systems - A, 2020  doi: 10.3934/dcds.2020404 [4] Wolfgang Riedl, Robert Baier, Matthias Gerdts. Optimization-based subdivision algorithm for reachable sets. Journal of Computational Dynamics, 2021, 8 (1) : 99-130. doi: 10.3934/jcd.2021005 [5] Yahia Zare Mehrjerdi. A new methodology for solving bi-criterion fractional stochastic programming. Numerical Algebra, Control & Optimization, 2020  doi: 10.3934/naco.2020054 [6] Pablo Neme, Jorge Oviedo. A note on the lattice structure for matching markets via linear programming. Journal of Dynamics & Games, 2020  doi: 10.3934/jdg.2021001 [7] Ke Su, Yumeng Lin, Chun Xu. A new adaptive method to nonlinear semi-infinite programming. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2021012 [8] Tengfei Yan, Qunying Liu, Bowen Dou, Qing Li, Bowen Li. An adaptive dynamic programming method for torque ripple minimization of PMSM. Journal of Industrial & Management Optimization, 2021, 17 (2) : 827-839. doi: 10.3934/jimo.2019136 [9] Christian Beck, Lukas Gonon, Martin Hutzenthaler, Arnulf Jentzen. On existence and uniqueness properties for solutions of stochastic fixed point equations. Discrete & Continuous Dynamical Systems - B, 2020  doi: 10.3934/dcdsb.2020320 [10] Gang Luo, Qingzhi Yang. The point-wise convergence of shifted symmetric higher order power method. Journal of Industrial & Management Optimization, 2021, 17 (1) : 357-368. doi: 10.3934/jimo.2019115 [11] Sabira El Khalfaoui, Gábor P. Nagy. On the dimension of the subfield subcodes of 1-point Hermitian codes. Advances in Mathematics of Communications, 2021, 15 (2) : 219-226. doi: 10.3934/amc.2020054 [12] Balázs Kósa, Karol Mikula, Markjoe Olunna Uba, Antonia Weberling, Neophytos Christodoulou, Magdalena Zernicka-Goetz. 3D image segmentation supported by a point cloud. Discrete & Continuous Dynamical Systems - S, 2021, 14 (3) : 971-985. doi: 10.3934/dcdss.2020351 [13] George W. Patrick. The geometry of convergence in numerical analysis. Journal of Computational Dynamics, 2021, 8 (1) : 33-58. doi: 10.3934/jcd.2021003 [14] Ali Mahmoodirad, Harish Garg, Sadegh Niroomand. Solving fuzzy linear fractional set covering problem by a goal programming based solution approach. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2020162 [15] Mahdi Karimi, Seyed Jafar Sadjadi. Optimization of a Multi-Item Inventory model for deteriorating items with capacity constraint using dynamic programming. Journal of Industrial & Management Optimization, 2020  doi: 10.3934/jimo.2021013 [16] Huiying Fan, Tao Ma. Parabolic equations involving Laguerre operators and weighted mixed-norm estimates. Communications on Pure & Applied Analysis, 2020, 19 (12) : 5487-5508. doi: 10.3934/cpaa.2020249 [17] Jie Zhang, Yuping Duan, Yue Lu, Michael K. Ng, Huibin Chang. Bilinear constraint based ADMM for mixed Poisson-Gaussian noise removal. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2020071 [18] Jianli Xiang, Guozheng Yan. The uniqueness of the inverse elastic wave scattering problem based on the mixed reciprocity relation. Inverse Problems & Imaging, , () : -. doi: 10.3934/ipi.2021004 [19] Riadh Chteoui, Abdulrahman F. Aljohani, Anouar Ben Mabrouk. Classification and simulation of chaotic behaviour of the solutions of a mixed nonlinear Schrödinger system. Electronic Research Archive, , () : -. doi: 10.3934/era.2021002 [20] Juan Pablo Pinasco, Mauro Rodriguez Cartabia, Nicolas Saintier. Evolutionary game theory in mixed strategies: From microscopic interactions to kinetic equations. Kinetic & Related Models, 2021, 14 (1) : 115-148. doi: 10.3934/krm.2020051

2019 Impact Factor: 1.366