Article Contents
Article Contents

# A difference of convex optimization algorithm for piecewise linear regression

• * Corresponding author: Adil Bagirov
The first and second authors are supported by Australian Research Council's Discovery Projects funding scheme (Project No. DP140103213).
• The problem of finding a continuous piecewise linear function approximating a regression function is considered. This problem is formulated as a nonconvex nonsmooth optimization problem where the objective function is represented as a difference of convex (DC) functions. Subdifferentials of DC components are computed and an algorithm is designed based on these subdifferentials to find piecewise linear functions. The algorithm is tested using some synthetic and real world data sets and compared with other regression algorithms.

Mathematics Subject Classification: 65K05, 90C90.

 Citation:

• Figure 1.  Illustration of the PWLREG algorithm: (a) (1, 1); (b) (2, 1); (c) (2, 2); (d) (3, 2); (e) (4, 2); (f) (5, 2)

Figure 2.  1 input variable and 500 instances; (a) PWLREG; (b) SVM(Lin); (c) SVM(RBF)

Figure 3.  1 input variable and 5000 instances; (a) PWLREG; (b) SVM(Lin); (c) SVM(RBF)

Figure 4.  (a) 2 input variables and 5000 instances (a) PWLREG; (b) SVM(Lin); (c) SVM(RBF)

Figure 5.  (a) 2 input variables and 5000 instances (a) PWLREG; (b) SVM(Lin); (c) SVM(RBF)

Figure 6.  (a) CPU time vs the number of input variables; (b) CPU time vs the number of instances

Table 1.  RMSE results and CPU time for PWLREG on synthetic data sets

 $m$ $n$ Train Test CPU $m$ $n$ Train Test CPU $5\cdot 10^2$ 1 1.089 1.084 0.09 $3\cdot 10^3$ 1 1.138 1.113 0.37 $5\cdot 10^2$ 2 0.890 0.856 0.14 $3\cdot 10^3$ 2 0.596 0.587 5.90 $5\cdot 10^2$ 3 0.982 1.146 0.87 $3\cdot 10^3$ 3 1.063 1.101 3.37 $5\cdot 10^2$ 4 1.143 1.395 0.59 $3\cdot 10^3$ 4 1.168 1.289 4.32 $5\cdot 10^2$ 7 1.307 1.409 0.39 $3\cdot 10^3$ 7 1.310 1.353 2.12 $5\cdot 10^2$ 10 1.311 1.406 0.28 $3\cdot 10^3$ 10 1.314 1.355 1.64 $1\cdot 10^3$ 1 1.061 1.101 0.14 $5\cdot 10^3$ 1 1.120 1.118 0.62 $1\cdot 10^3$ 2 0.665 0.725 1.70 $5\cdot 10^3$ 2 0.706 0.692 9.10 $1\cdot 10^3$ 3 1.003 1.077 1.22 $5\cdot 10^3$ 3 1.067 1.038 6.96 $1\cdot 10^3$ 4 1.312 1.225 0.59 $5\cdot 10^3$ 4 1.263 1.271 7.07 $1\cdot 10^3$ 7 1.368 1.257 0.59 $5\cdot 10^3$ 7 1.337 1.323 2.56 $1\cdot 10^3$ 10 1.363 1.263 0.92 $5\cdot 10^3$ 10 1.337 1.322 4.10

Table 2.  Results for PWLREG algorithm with the different levels of noise

 (1, 1) (2, 1) (2, 2) (3, 2) (4, 2) (5, 2) $\sigma=0$ RMSE 1.094 1.094 0.063 0.058 0.055 0.055 MAE 0.842 0.842 0.016 0.016 0.015 0.015 MSE 1.198 1.198 0.004 0.003 0.003 0.003 CE 0.013 0.013 0.997 0.997 0.998 0.998 r 0.114 0.114 0.998 0.999 0.999 0.999 $\sigma=0.5$ RMSE 1.221 1.221 0.503 0.503 0.503 0.503 MAE 0.937 0.937 0.404 0.404 0.404 0.404 MSE 1.491 1.492 0.254 0.254 0.254 0.254 CE 0.011 0.011 0.832 0.832 0.832 0.832 r 0.105 0.105 0.912 0.912 0.912 0.912 $\sigma=1$ RMSE 1.498 1.498 1.242 1.000 1.000 1.000 MAE 1.169 1.169 0.993 0.801 0.801 0.801 MSE 2.246 2.247 1.546 1.003 1.004 1.005 CE 0.008 0.008 0.318 0.558 0.558 0.558 r 0.087 0.087 0.564 0.747 0.747 0.747

Table 3.  The brief description of real world data sets

 Data set $m$ $n$ 1. Airfoil Self-noise 1503 5 2. Red Wine Quality 1599 11 3. White Wine Quality 4898 11 4. Combined Cycle Power Plant 9568 4 5. Physicochemical Properties 45730 9 of Protein Tertiary Structure

Table 4.  Prediction performances of algorithms for real world data sets

 PWLREG MLR SVM(Lin) SVM(RBF) MARS ANN Airfoil Self-noise RMSE 4.445 5.024 5.037 4.710 5.672 5.499 MAE 3.404 3.905 3.877 3.327 4.614 4.342 MSE 24.641 25.750 25.885 22.941 32.822 31.059 CE 0.653 0.558 0.556 0.611 0.437 0.471 r 0.875 0.761 0.760 0.825 0.704 0.709 Red Wine Quality RMSE 0.639 0.643 0.657 0.735 0.658 0.636 MAE 0.489 0.490 0.489 0.566 0.503 0.482 MSE 0.480 0.430 0.450 0.569 0.450 0.423 CE 0.319 0.312 0.282 0.101 0.280 0.326 r 0.582 0.570 0.537 0.394 0.555 0.585 White Wine Quality RMSE 0.637 0.690 0.717 0.675 0.713 0.693 MAE 0.508 0.532 0.551 0.522 0.527 0.545 MSE 0.449 0.482 0.521 0.463 0.515 0.487 CE 0.330 0.216 0.153 0.249 0.164 0.209 r 0.616 0.494 0.467 0.521 0.537 0.520 Combined Cycle Power Plant RMSE 4.163 4.484 4.503 3.907 4.187 4.283 MAE 3.276 3.572 3.559 2.920 3.295 3.396 MSE 17.702 20.159 20.341 15.377 17.577 18.411 CE 0.942 0.933 0.932 0.949 0.942 0.939 r 0.970 0.966 0.966 0.975 0.971 0.969 Physicochemical Properties Protein RMSE 4.802 5.203 5.302 4.321 5.044 5.783 MAE 3.840 4.374 4.240 3.018 4.146 4.987 MSE 23.061 27.101 28.145 18.699 25.470 33.487 CE 0.390 0.285 0.257 0.507 0.328 0.116 r 0.625 0.534 0.528 0.719 0.573 0.341
•  [1] L. T. H. An and P. D. Tao, The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems, Annals of Operations Research, 133 (2005), 23-46.  doi: 10.1007/s10479-004-5022-1. [2] K. Bache and M. Lichman, UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml, 2013. [3] A. M. Bagirov, C. Clausen and M. Kohler, Estimation of a regression function by maxima of minima of linear functions, IEEE Transactions on Information Theory, 55 (2009), 833-845.  doi: 10.1109/TIT.2008.2009835. [4] A. M. Bagirov, C. Clausen and M. Kohler, An algorithm for the estimation of a regression function by continuous piecewise linear functions, Computational Optimization and Applications, 45 (2010), 159-179.  doi: 10.1007/s10589-008-9174-9. [5] A. M. Bagirov, C. Clausen and M. Kohler, An l2-boosting algorithm for estimation of a regression function, IEEE Transactions on Information Theory, 56 (2010), 1417-1429.  doi: 10.1109/TIT.2009.2039161. [6] A. M. Bagirov, B. Karasözen and M. Sezer, Discrete gradient method: A derivative-free method for nonsmooth optimization, Journal of Optimization Theory and Applications, 137 (2008), 317-334.  doi: 10.1007/s10957-007-9335-5. [7] A. M. Bagirov, N. Karmitsa and M. Mäkelä, Introduction to Nonsmooth Optimization, Cham, Springer, 2014. doi: 10.1007/978-3-319-08114-4. [8] A. M. Bagirov, S. Taheri and J. Ugon, Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems, Pattern Recognition, 53 (2016), 12-24.  doi: 10.1016/j.patcog.2015.11.011. [9] S. G. Bartels, L. Kuntz and S. Scholtes, Continuous selections of linear functions and nonsmooth critical point theory, Nonlinear Analysis: Theory, Methods & Applications, 24 (1995), 385-407.  doi: 10.1016/0362-546X(95)91645-6. [10] L. Breiman, J. H. Friedman, C. J. Stone and R. A. Olshen, Classification and Regression Trees, CRC press, 1984. [11] R. Collobert and S. Bengio, SVMTorch: Support vector machines for large-scale regression problems, Journal of Machine Learning Research, 1 (2001), 143-160.  doi: 10.1162/15324430152733142. [12] V. F. Demyanov and A. M. Rubinov, Constructive Nonsmooth Analysis, Springer, 1995. [13] J. H. Fridedman, Multivariate adaptive regression splines (with discussion), Annals of Statistics, 19 (1991), 79-141.  doi: 10.1214/aos/1176347963. [14] J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5. [15] V. V. Gorokhovik, O. I. Zorko and G. Birkhoff, Piecewise affine functions and polyhedral sets, Optimization, 31 (1994), 209-221.  doi: 10.1080/02331939408844018. [16] L. Györfi, M. Kohler, A. Krzyźak and H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer Series in Statistics. Springer, Heldelberg, 2002. doi: 10.1007/b97848. [17] R. Horst and N. V. Thoai, DC programming: Overview, Journal of Optimization Theory and Applications, 103 (1999), 1-43.  doi: 10.1023/A:1021765131316. [18] K. Joki, A. M. Bagirov, N. Karmitsa and M. M. Mäkelä, A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes, Journal of Global Optimization, 68 (2017), 501-535.  doi: 10.1007/s10898-016-0488-3. [19] D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, F. Leisch, C. C. Chang and C. C. Lin, Misc functions of the department of statistics, probability theory group, TU Wien, Package "e1071", http://cran.r-project.org/web/packages/e1071/e1071.pdf, 2017. [20] S. Milborrow, Multivariate adaptive regression splines, Package "earth", http://cran.r-project.org/web/packages/earth/earth.pdf, 2017. [21] J. Nash and J. Sutcliffe, River flow forecasting through conceptual models part I-A discussion of principles, Journal of Hydrology, 10 (1970), 282-290.  doi: 10.1016/0022-1694(70)90255-6. [22] B. Ripley and W. Venables, Feed-forward neural networks and multinomial log-linear models, Package "nnet", http://cran.r-project.org/web/packages/nnet/nnet.pdf, 2016. [23] A. J. Smola and B. Schölkopf, A tutorial on support vector regression, Statistics and Computing, 14 (2004), 199-222.  doi: 10.1023/B:STCO.0000035301.49549.88. [24] P. D. Tao and L. T. H. An, Convex analysis approach to DC programming: Theory, algorithms and applications, Acta Mathematica Vietnamica, 22 (1997), 289-355. [25] H. Tuy, Convex Analysis and Global Optimization, Nonconvex Optimization and Its Applications, Vol. 22. Kluwer, Dordrecht, 1998. doi: 10.1007/978-1-4757-2809-5. [26] P. H. Wolfe, Finding the nearest point in a polytope, Mathematical Programming, 11 (1976), 128-149.  doi: 10.1007/BF01580381.

Figures(6)

Tables(4)