\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

A difference of convex optimization algorithm for piecewise linear regression

  • * Corresponding author: Adil Bagirov

    * Corresponding author: Adil Bagirov 
The first and second authors are supported by Australian Research Council's Discovery Projects funding scheme (Project No. DP140103213).
Abstract Full Text(HTML) Figure(6) / Table(4) Related Papers Cited by
  • The problem of finding a continuous piecewise linear function approximating a regression function is considered. This problem is formulated as a nonconvex nonsmooth optimization problem where the objective function is represented as a difference of convex (DC) functions. Subdifferentials of DC components are computed and an algorithm is designed based on these subdifferentials to find piecewise linear functions. The algorithm is tested using some synthetic and real world data sets and compared with other regression algorithms.

    Mathematics Subject Classification: 65K05, 90C90.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  Illustration of the PWLREG algorithm: (a) (1, 1); (b) (2, 1); (c) (2, 2); (d) (3, 2); (e) (4, 2); (f) (5, 2)

    Figure 2.  1 input variable and 500 instances; (a) PWLREG; (b) SVM(Lin); (c) SVM(RBF)

    Figure 3.  1 input variable and 5000 instances; (a) PWLREG; (b) SVM(Lin); (c) SVM(RBF)

    Figure 4.  (a) 2 input variables and 5000 instances (a) PWLREG; (b) SVM(Lin); (c) SVM(RBF)

    Figure 5.  (a) 2 input variables and 5000 instances (a) PWLREG; (b) SVM(Lin); (c) SVM(RBF)

    Figure 6.  (a) CPU time vs the number of input variables; (b) CPU time vs the number of instances

    Table 1.  RMSE results and CPU time for PWLREG on synthetic data sets

    $m$ $n$ Train Test CPU   $m$ $n$ Train Test CPU
    $5\cdot 10^2$ 1 1.089 1.084 0.09   $3\cdot 10^3$ 1 1.138 1.113 0.37
    $5\cdot 10^2$ 2 0.890 0.856 0.14   $3\cdot 10^3$ 2 0.596 0.587 5.90
    $5\cdot 10^2$ 3 0.982 1.146 0.87   $3\cdot 10^3$ 3 1.063 1.101 3.37
    $5\cdot 10^2$ 4 1.143 1.395 0.59   $3\cdot 10^3$ 4 1.168 1.289 4.32
    $5\cdot 10^2$ 7 1.307 1.409 0.39   $3\cdot 10^3$ 7 1.310 1.353 2.12
    $5\cdot 10^2$ 10 1.311 1.406 0.28   $3\cdot 10^3$ 10 1.314 1.355 1.64
    $1\cdot 10^3$ 1 1.061 1.101 0.14   $5\cdot 10^3$ 1 1.120 1.118 0.62
    $1\cdot 10^3$ 2 0.665 0.725 1.70   $5\cdot 10^3$ 2 0.706 0.692 9.10
    $1\cdot 10^3$ 3 1.003 1.077 1.22   $5\cdot 10^3$ 3 1.067 1.038 6.96
    $1\cdot 10^3$ 4 1.312 1.225 0.59   $5\cdot 10^3$ 4 1.263 1.271 7.07
    $1\cdot 10^3$ 7 1.368 1.257 0.59   $5\cdot 10^3$ 7 1.337 1.323 2.56
    $1\cdot 10^3$ 10 1.363 1.263 0.92   $5\cdot 10^3$ 10 1.337 1.322 4.10
     | Show Table
    DownLoad: CSV

    Table 2.  Results for PWLREG algorithm with the different levels of noise

    (1, 1) (2, 1) (2, 2) (3, 2) (4, 2) (5, 2)
    $\sigma=0$
    RMSE 1.094 1.094 0.063 0.058 0.055 0.055
    MAE 0.842 0.842 0.016 0.016 0.015 0.015
    MSE 1.198 1.198 0.004 0.003 0.003 0.003
    CE 0.013 0.013 0.997 0.997 0.998 0.998
    r 0.114 0.114 0.998 0.999 0.999 0.999
    $\sigma=0.5$
    RMSE 1.221 1.221 0.503 0.503 0.503 0.503
    MAE 0.937 0.937 0.404 0.404 0.404 0.404
    MSE 1.491 1.492 0.254 0.254 0.254 0.254
    CE 0.011 0.011 0.832 0.832 0.832 0.832
    r 0.105 0.105 0.912 0.912 0.912 0.912
    $\sigma=1$
    RMSE 1.498 1.498 1.242 1.000 1.000 1.000
    MAE 1.169 1.169 0.993 0.801 0.801 0.801
    MSE 2.246 2.247 1.546 1.003 1.004 1.005
    CE 0.008 0.008 0.318 0.558 0.558 0.558
    r 0.087 0.087 0.564 0.747 0.747 0.747
     | Show Table
    DownLoad: CSV

    Table 3.  The brief description of real world data sets

    Data set $m$ $n$
    1. Airfoil Self-noise 1503 5
    2. Red Wine Quality 1599 11
    3. White Wine Quality 4898 11
    4. Combined Cycle Power Plant 9568 4
    5. Physicochemical Properties 45730 9
     of Protein Tertiary Structure
     | Show Table
    DownLoad: CSV

    Table 4.  Prediction performances of algorithms for real world data sets

    PWLREG MLR SVM(Lin) SVM(RBF) MARS ANN
    Airfoil Self-noise
    RMSE 4.445 5.024 5.037 4.710 5.672 5.499
    MAE 3.404 3.905 3.877 3.327 4.614 4.342
    MSE 24.641 25.750 25.885 22.941 32.822 31.059
    CE 0.653 0.558 0.556 0.611 0.437 0.471
    r 0.875 0.761 0.760 0.825 0.704 0.709
    Red Wine Quality
    RMSE 0.639 0.643 0.657 0.735 0.658 0.636
    MAE 0.489 0.490 0.489 0.566 0.503 0.482
    MSE 0.480 0.430 0.450 0.569 0.450 0.423
    CE 0.319 0.312 0.282 0.101 0.280 0.326
    r 0.582 0.570 0.537 0.394 0.555 0.585
    White Wine Quality
    RMSE 0.637 0.690 0.717 0.675 0.713 0.693
    MAE 0.508 0.532 0.551 0.522 0.527 0.545
    MSE 0.449 0.482 0.521 0.463 0.515 0.487
    CE 0.330 0.216 0.153 0.249 0.164 0.209
    r 0.616 0.494 0.467 0.521 0.537 0.520
    Combined Cycle Power Plant
    RMSE 4.163 4.484 4.503 3.907 4.187 4.283
    MAE 3.276 3.572 3.559 2.920 3.295 3.396
    MSE 17.702 20.159 20.341 15.377 17.577 18.411
    CE 0.942 0.933 0.932 0.949 0.942 0.939
    r 0.970 0.966 0.966 0.975 0.971 0.969
    Physicochemical Properties Protein
    RMSE 4.802 5.203 5.302 4.321 5.044 5.783
    MAE 3.840 4.374 4.240 3.018 4.146 4.987
    MSE 23.061 27.101 28.145 18.699 25.470 33.487
    CE 0.390 0.285 0.257 0.507 0.328 0.116
    r 0.625 0.534 0.528 0.719 0.573 0.341
     | Show Table
    DownLoad: CSV
  • [1] L. T. H. An and P. D. Tao, The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems, Annals of Operations Research, 133 (2005), 23-46.  doi: 10.1007/s10479-004-5022-1.
    [2] K. Bache and M. Lichman, UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences, http://archive.ics.uci.edu/ml, 2013.
    [3] A. M. BagirovC. Clausen and M. Kohler, Estimation of a regression function by maxima of minima of linear functions, IEEE Transactions on Information Theory, 55 (2009), 833-845.  doi: 10.1109/TIT.2008.2009835.
    [4] A. M. BagirovC. Clausen and M. Kohler, An algorithm for the estimation of a regression function by continuous piecewise linear functions, Computational Optimization and Applications, 45 (2010), 159-179.  doi: 10.1007/s10589-008-9174-9.
    [5] A. M. BagirovC. Clausen and M. Kohler, An l2-boosting algorithm for estimation of a regression function, IEEE Transactions on Information Theory, 56 (2010), 1417-1429.  doi: 10.1109/TIT.2009.2039161.
    [6] A. M. BagirovB. Karasözen and M. Sezer, Discrete gradient method: A derivative-free method for nonsmooth optimization, Journal of Optimization Theory and Applications, 137 (2008), 317-334.  doi: 10.1007/s10957-007-9335-5.
    [7] A. M. Bagirov, N. Karmitsa and M. Mäkelä, Introduction to Nonsmooth Optimization, Cham, Springer, 2014. doi: 10.1007/978-3-319-08114-4.
    [8] A. M. BagirovS. Taheri and J. Ugon, Nonsmooth DC programming approach to the minimum sum-of-squares clustering problems, Pattern Recognition, 53 (2016), 12-24.  doi: 10.1016/j.patcog.2015.11.011.
    [9] S. G. BartelsL. Kuntz and S. Scholtes, Continuous selections of linear functions and nonsmooth critical point theory, Nonlinear Analysis: Theory, Methods & Applications, 24 (1995), 385-407.  doi: 10.1016/0362-546X(95)91645-6.
    [10] L. Breiman, J. H. Friedman, C. J. Stone and R. A. Olshen, Classification and Regression Trees, CRC press, 1984.
    [11] R. Collobert and S. Bengio, SVMTorch: Support vector machines for large-scale regression problems, Journal of Machine Learning Research, 1 (2001), 143-160.  doi: 10.1162/15324430152733142.
    [12] V. F. Demyanov and A. M. Rubinov, Constructive Nonsmooth Analysis, Springer, 1995.
    [13] J. H. Fridedman, Multivariate adaptive regression splines (with discussion), Annals of Statistics, 19 (1991), 79-141.  doi: 10.1214/aos/1176347963.
    [14] J. Friedman, T. Hastie and R. Tibshirani, The Elements of Statistical Learning, Springer, Berlin, 2001. doi: 10.1007/978-0-387-21606-5.
    [15] V. V. GorokhovikO. I. Zorko and G. Birkhoff, Piecewise affine functions and polyhedral sets, Optimization, 31 (1994), 209-221.  doi: 10.1080/02331939408844018.
    [16] L. Györfi, M. Kohler, A. Krzyźak and H. Walk, A Distribution-Free Theory of Nonparametric Regression, Springer Series in Statistics. Springer, Heldelberg, 2002. doi: 10.1007/b97848.
    [17] R. Horst and N. V. Thoai, DC programming: Overview, Journal of Optimization Theory and Applications, 103 (1999), 1-43.  doi: 10.1023/A:1021765131316.
    [18] K. JokiA. M. BagirovN. Karmitsa and M. M. Mäkelä, A proximal bundle method for nonsmooth DC optimization utilizing nonconvex cutting planes, Journal of Global Optimization, 68 (2017), 501-535.  doi: 10.1007/s10898-016-0488-3.
    [19] D. Meyer, E. Dimitriadou, K. Hornik, A. Weingessel, F. Leisch, C. C. Chang and C. C. Lin, Misc functions of the department of statistics, probability theory group, TU Wien, Package "e1071", http://cran.r-project.org/web/packages/e1071/e1071.pdf, 2017.
    [20] S. Milborrow, Multivariate adaptive regression splines, Package "earth", http://cran.r-project.org/web/packages/earth/earth.pdf, 2017.
    [21] J. Nash and J. Sutcliffe, River flow forecasting through conceptual models part I-A discussion of principles, Journal of Hydrology, 10 (1970), 282-290.  doi: 10.1016/0022-1694(70)90255-6.
    [22] B. Ripley and W. Venables, Feed-forward neural networks and multinomial log-linear models, Package "nnet", http://cran.r-project.org/web/packages/nnet/nnet.pdf, 2016.
    [23] A. J. Smola and B. Schölkopf, A tutorial on support vector regression, Statistics and Computing, 14 (2004), 199-222.  doi: 10.1023/B:STCO.0000035301.49549.88.
    [24] P. D. Tao and L. T. H. An, Convex analysis approach to DC programming: Theory, algorithms and applications, Acta Mathematica Vietnamica, 22 (1997), 289-355. 
    [25] H. Tuy, Convex Analysis and Global Optimization, Nonconvex Optimization and Its Applications, Vol. 22. Kluwer, Dordrecht, 1998. doi: 10.1007/978-1-4757-2809-5.
    [26] P. H. Wolfe, Finding the nearest point in a polytope, Mathematical Programming, 11 (1976), 128-149.  doi: 10.1007/BF01580381.
  • 加载中

Figures(6)

Tables(4)

SHARE

Article Metrics

HTML views(1791) PDF downloads(538) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return