\`x^2+y_1+z_12^34\`
Advanced Search
Article Contents
Article Contents

Supervised distance preserving projection using alternating direction method of multipliers

The research of the author was supported by the Commonwealth Scholarship Commission, UK BDCS-2012-44

Abstract / Introduction Full Text(HTML) Figure(5) / Table(7) Related Papers Cited by
  • Supervised Distance Preserving Projection (SDPP) is a dimension reduction method in supervised setting proposed recently by Zhu et. al in [43]. The method learns a linear mapping from the input space to the reduced feature space. While the method showed very promising result in regression task, for classification problems the performance is not satisfactory. The preservation of distance relation with neighborhood points forces data to project very close to one another in the projected space irrespective of their classes which ends up with low classification rate. To avoid the crowdedness of SDPP approach we have proposed a modification of SDPP which deals both regression and classification problems and significantly improves the performance of SDPP. We have incorporated the total variance of the projected co-variates to the SDPP problem which is maximized to preserve the global structure. This approach not only facilitates efficient regression like SDPP but also successfully classifies data into different classes. We have formulated the proposed optimization problem as a Semidefinite Least Square (SLS) SDPP problem. A two block Alternating Direction Method of Multipliers have been developed to learn the transformation matrix solving the SLS-SDPP which can easily handle out of sample data.

    Mathematics Subject Classification: Primary: 62H30, 68T10; Secondary: 90C25.

    Citation:

    \begin{equation} \\ \end{equation}
  • 加载中
  • Figure 1.  (a) SDPP: Solid lines indicate connection between neighbors. (b-c) Preservation scheme of the local geometry by SDPP

    Figure 2.  Average Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) with error bars for prediction of test set of Parkinsons Telemonitoring Data Set obtained by SLS-SDPP, SDPP, PLS, SPCA and KDR. The bar diagram represents almost same performance for all the methods in terms of RMSE. In terms of MAE, SLS-SDPP outperforms all other methods

    Figure 3.  Continuity measure with respect to different $ k $ and $ k_r $ for (a) Parkinsons Telemonitoring Data: Figure suggests to choose the neighborhood size $ k = 8 $ since highest continuity measure is obtained at $ k = 8 $ (b) Concrete Compressive Strength Data: Highest continuity measure is obtained at $ k = 10 $ therefore $ k = 10 $ is chosen as the neighborhood size

    Figure 4.  Average RMSE and MAE for test data prediction of Concrete Compressive Strength Data set along different dimension obtained by SLS-SDPP, SDPP, PLS, SPCA and KDR. Best performance achieved by SLS-SDPP at D = 5. The small error bar implies the stability of SLS-SDPP method regardless of training data

    Figure 5.  Classification error rates for different projection dimension computed by algorithm SLS-SDPP, SDPP, SPCA, KDR and FDA. (a)CTG data (b) Seismic bump data (c)Diabetic Retinopathy data (d) Mushroom data. Figures illustrate that best performance is achieved by SLS-SDPP. The error rate for this method remained consistently lower then other methods

    Table 1.  List of datasets used in this article and their sources :

    Dataset Dim Class no. of ins. Source
    Classification Seismic bump 19 2 2584 UCI Repository
    Cardiotocography 21 3 2126 UCI Repository
    Diabetic Retinopathy 19 2 1115 UCI Repository
    Mushroom 22 2 8124 UCI Repository
    Regression Parkinson's Telemonitoring 16 - 5875 UCI Repository
    Concrete Compressive Strength 8 - 1030 UCI Repository
     | Show Table
    DownLoad: CSV

    Table 2.  Average RMSE and MAE for test set prediction of Parkinson Telemonitoring dataset

    Method RMSE (mean$ \pm $std) MAE (mean$ \pm $std)
    SLS-SDPP 10.6781$ \pm $1.1481 8.3503$ \pm $0.8525
    SDPP 10.7934$ \pm $1.2371 8.7459$ \pm $0.8378
    PLS 10.8133$ \pm $1.2806 8.7822$ \pm $0.8883
    SPCA 10.8006$ \pm $ 1.2449 8.7714 $ \pm $0.8555
    KDR 10.8478$ \pm $ 1.3032 8.8008$ \pm $0.9139
     | Show Table
    DownLoad: CSV

    Table 3.  Average RMSE and MAE (mean$ \pm $std) for the test set prediction on Concrete Compressive Strength Data Set

    Error Dim SLS-SDPP SDPP PLS SPCA KDR
    RMSE 1 10.4649$ \pm $1.2072 10.5241$ \pm $1.4220 12.7666$ \pm $2.1539 12.8090$ \pm $2.0429 13.7423$ \pm $2.7046
    2 10.4540$ \pm $1.1356 10.4075$ \pm $1.5903 11.8629$ \pm $1.1837 12.8712$ \pm $1.9074 11.6399$ \pm $2.6641
    3 10.4629$ \pm $1.1648 10.4079$ \pm $1.5915 10.9379$ \pm $1.1096 12.8370$ \pm $1.9316 11.5163$ \pm $2.2178
    4 10.4450$ \pm $1.2848 10.5890$ \pm $1.3303 10.5108$ \pm $1.1943 12.8674$ \pm $1.8353 11.0320$ \pm $1.7599
    5 10.2932$ \pm $0.7866 10.7247$ \pm $1.0038 10.4432$ \pm $1.1883 12.9647$ \pm $1.5969 10.2933$ \pm $1.6597
    6 10.9910$ \pm $0.7200 10.4893$ \pm $0.7228 10.4359$ \pm $1.1480 10.4770$ \pm $1.1142 10.4473$ \pm $1.1485
    7 10.4495$ \pm $0.7052 10.5313$ \pm $0.7228 10.4480$ \pm $1.0746 10.4520$ \pm $1.0604 10.4514$ \pm $1.0952
    8 10.4349$ \pm $1.1134 10.4342$ \pm $1.0034 10.4342$ \pm $1.0034 10.4342$ \pm $1.0034 16.3019$ \pm $1.8078
    MAE 1 8.3879$ \pm $1.0882 8.3687$ \pm $1.0994 10.3995$ \pm $1.7948 10.4451$ \pm $1.6868 10.8884$ \pm $2.3565
    2 8.2339$ \pm $1.2642 8.2338$ \pm $1.2918 9.3977$ \pm $0.8294 10.4484$ \pm $1.5463 9.2285$ \pm $2.3954
    3 8.2288$ \pm $1.3385 8.2342$ \pm $1.3380 8.3771$ \pm $0.6452 10.4125$ \pm $1.5656 8.9818$ \pm $1.6882
    4 8.5898$ \pm $1.1262 8.3935$ \pm $1.1461 8.2199$ \pm $0.7943 10.4553$ \pm $1.4677 8.6622$ \pm $1.3565
    5 8.0177$ \pm $0.7413 8.5133$ \pm $0.8507 8.1563$ \pm $0.8538 10.5221$ \pm $1.3204 8.0167$ \pm $1.2904
    6 8.3434$ \pm $0.7236 8.2713$ \pm $0.5728 8.1612$ \pm $0.8389 8.1860$ \pm $0.7925 8.1659$ \pm $0.8225
    7 8.2747$ \pm $0.5815 8.2820$ \pm $0.5787 8.1869$ \pm $0.7809 8.1872$ \pm $0.7735 8.1868$ \pm $0.7903
    8 8.1852$ \pm $0.7413 8.1842$ \pm $0.7477 8.1842$ \pm $0.7477 8.1842$ \pm $0.7477 13.1814$ \pm $1.5359
     | Show Table
    DownLoad: CSV

    Table 4.  Average error rate of class prediction of test set for CTG data

    Error Dim SLS-SDPP SDPP SPCA KDR FDA
    Error Rate 2 0.2251 0.2865 0.2739 0.2593 0.208
    3 0.1940 0.2827 0.2661 0.2992 0.208
    4 0.1949 0.2943 0.3314 0.2427 0.208
    5 0.1969 0.2661 0.3372 0.2437 0.208
    6 0.1969 0.2749 0.2710 0.2115 0.208
    7 0.1988 0.2827 0.2768 0.2193 0.208
    8 0.1979 0.2768 0.2817 0.2300 0.208
    9 0.1988 0.2700 0.2612 0.2315 0.208
    10 0.1949 0.2690 0.2515 0.2412 0.208
     | Show Table
    DownLoad: CSV

    Table 5.  Average classification error rate of test set for Seismic bump data

    Error Dim SLS-SDPP SDPP SPCA KDR FDA
    Error Rate 1 0.0328 0.0328 0.0328 0.0328 0.0328
    2 0.0701 0.0707 0.1004 0.0688 0.0328
    3 0.0701 0.0669 0.0694 0.0669 0.0328
    4 0.0701 0.0720 0.0676 0.0720 0.032
    5 0.0701 0.0720 0.0732 0.0726 0.0328
    6 0.0701 0.0713 0.0789 0.0728 0.0328
    7 0.0701 0.0713 0.0789 0.0727 0.0328
    8 0.0701 0.0713 0.0795 0.0729 0.0328
    9 0.0701 0.0713 0.0795 0.0730 0.0328
    10 0.0701 0.0713 0.0795 0.0730 0.0328
     | Show Table
    DownLoad: CSV

    Table 6.  Average error rate of class prediction of test set for Diabetic Retinopathy data

    Error Dim SLS-SDPP SDPP SPCA KDR FDA
    Error Rate 1 0.6382 0.6382 0.6382 0.6382 0.6382
    2 0.2678 0.3390 0.2963 0.2934 0.6382
    3 0.2678 0.3105 0.3618 0.2963 0.6382
    4 0.2678 0.3191 0.2877 0.3048 0.6382
    5 0.2678 0.2934 0.2906 0.3134 0.6382
    6 0.2678 0.2849 0.2877 0.3048 0.6382
    7 0.2735 0.3191 0.2963 0.3048 0.6382
    8 0.2707 0.3191 0.2934 0.3134 0.6382
    9 0.2707 0.2906 0.2906 0.3134 0.6382
    10 0.2678 0.3020 0.3077 0.3048 0.6382
     | Show Table
    DownLoad: CSV

    Table 7.  Average error rate of class prediction of test set for Mushroom data

    Error Dim SLS-SDPP SDPP SPCA KDR FDA
    Error Rate 1 0.2486 0.2486 0.2486 0.2486 0.2486
    2 0.3555 0.1318 0.2735 0.3037 0.2486
    3 0.2891 0.1266 0.1836 0.2741 0.2486
    4 0.1516 0.2076 0.2020 0.2018 0.2486
    5 0.1648 0.2524 0.1911 0.2311 0.2486
    6 0.1667 0.2524 0.3785 0.3815 0.2486
    7 0.1723 0.2693 0.3653 0.3544 0.2486
    8 0.1728 0.2255 0.3630 0.2587 0.2486
    9 0.1186 0.2655 0.1756 0.1615 0.2486
    10 0.1427 0.2665 0.3545 0.2812 0.2486
     | Show Table
    DownLoad: CSV
  • [1] E. BarshanA. GhodsiZ. Azimifar and M. Z. Jahromi, Supervised principal component analysis: Visualization, classification and regression on subspaces and submanifolds, Pattern Recognit, 44 (2010), 1357-1371.  doi: 10.1016/j.patcog.2010.12.015.
    [2] J. Borwein and A. S. Lewis, Convex Analysis and Non Linear Optimization: Theory and Examples, Springer, New York, 2006. doi: 10.1007/978-0-387-31256-9.
    [3] S. BoydN. ParikhE. ChuB. Peleato and J. Eckstein, Distributed optimization and statistical learning via the alternating direction method of multipliers, Machine Learning, 3 (2010), 1-122. 
    [4] S. Boyd and  L. VandenbergheConvex Optimization, Cambridge University Press, 2004.  doi: 10.1017/CBO9780511804441.
    [5] M. R. BritoE. L. ChávezA. J. Quiroz and J. E. Yukich, Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection, Stat. Probabil. Lett., 35 (1997), 33-42.  doi: 10.1016/S0167-7152(96)00213-1.
    [6] I. Cheng Yeh, Modeling of strength of high performance concrete using artificial neural networks, Cement and Concrete Research, 28 (1998), 1797-1808.  doi: 10.1016/S0008-8846(98)00165-3.
    [7] F. CoronaaZ. ZhuA. H. d. Souza JrM. MulasdE. MurufL. SassufG. Barretob and R. Baratti, Supervised Distance Preserving Projections: Applications in the quantitative analysis of diesel fuels and light cycle oils from NIR spectra, Journal of Process Control, 30 (2015), 10-21.  doi: 10.1016/j.jprocont.2014.11.005.
    [8] J. Eckstein and D. P. Bertsekas, On the douglas-rachford splitting method and the proximal point algorithm for maximal monotone operators, Mathematical Programming, 55 (1992), 293-318.  doi: 10.1007/BF01581204.
    [9] J. Eckstein and M. Fukushima, Some reformulations and applications of the alternating direction method of multipliers, Large Scale Optimization: State of the Art, (1993), 115–134.
    [10] J. Eckstein and W. Yao, Understanding the convergence of Alternating Direction Method of Multipliers, Theoritical and Computational Perspectives, RUTCOR Research Report, 2014.
    [11] R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals of Eugenics, 7 (1936), 179-188.  doi: 10.1111/j.1469-1809.1936.tb02137.x.
    [12] M. Fortin and R. Glowinski, Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, North-Holland Publishing Co., Amsterdam, 1983.
    [13] M. Fortin and R. Glowinski, On Decomposition-Coordination Methods Using an Augmented Lagrangian, Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems, North-Holland: Amsterdam, 1983.
    [14] K. FukumizuF. R. Bach and M. Jordan, Kernel dimension reduction in regression, Annals of Statistics, 37 (2009), 1871-1905.  doi: 10.1214/08-AOS637.
    [15] D. Gabay, Applications of the method of multipliers to variational inequalities, in Augmented Lagrangian Methods: Applications to the Numerical Solution of Boundary-Value Problems, studies in Mathematics, 15 (1983), 299-331. 
    [16] D. Gabay and B. Mercier, A dual algorithm for the solution of nonlinearvariational problems via Finite element approximation, Computers and Mathematics with Applications, 2 (1976), 17-40. 
    [17] R. Glowinski, Lectures on Numerical Methods for Nonlinear Variational Problem, Tata Institute of Fundamental Research Lectures on Mathematics and Physics, Tata Institute of Fundamental Research, Bombay, Notes by M. G. Vijayasundaram and M. Adimurthi, 1980.
    [18] R. Glowinski and A. Marrocco, Sur l'approximation, par $\acute{e}l\acute{e}ments$ finis d'ordre un, et la $r\acute{e}$solution, par $p\acute{e}$nalisation-dualit$\acute{e}$, d'une classe de probl$\grave{e}$mes de dirichlet non lin$\acute{e}$ares, Revue Francaise d'Automatique, Informatique et Recherche Op$\acute{e}$rationelle, 9 (1975), 41–76.
    [19] R. Glowinski and P. L. Tallec, Augmented Lagrangian Methods for the Solution of Variational Problems, Studies in Applied and Numerical Mathematics, 1989. doi: 10.1137/1.9781611970838.ch3.
    [20] J. Han, M. Kamber and J. Pei, Data Mining: Concepts and Techniques, Springer-Verlag, Berlin, 2011. doi: 10.1007/978-3-642-19721-5.
    [21] B. He, H. Yang and S. Wang, Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities, Journal of Optimization Theory and Applications, 106 (2000), 337–356. doi: 10.1023/A:1004603514434.
    [22] S. Jahan and H. D. Qi, Regularized Multidimensional Scaling with Radial Basis Functions, Journal of Industrial and Management Optimization, 12 (2016), 543-563.  doi: 10.3934/jimo.2016.12.543.
    [23] K. JiangD. Sun and K.-C. Toh, Solving nuclear norm regularized and semidefinite matrix least square problems with linear equality constraints, Discrete Geometry and Optimization, 69 (2013), 133-162.  doi: 10.1007/978-3-319-00200-2_9.
    [24] J. Lee and M. Verleysen, Nonlinear Dimensionality Reduction, Springer, New York, 2007. doi: 10.1007/978-0-387-39351-3.
    [25] K. Li, Sliced inverse regression for dimension reduction, J Am Stat Assoc, 86 (1991), 316-342.  doi: 10.1080/01621459.1991.10475035.
    [26] X. Li, D. Sun and K.-C. Toh, A Schur complement based semi-proximal ADMM for convex quadratic conic programming and extensions,, Math. Program., 155 (2016), Ser. A, 333–373. doi: 10.1007/s10107-014-0850-5.
    [27] L. J. P. Maaten, E. Postma and H. V. D. Herik, Dimensionality Reduction: A Comparative Review, Technical Report TiCC-TR 2009–005, Tilburg University Technical, Tilburg, 2009.
    [28] S. Mika, G. Ratsch, J. Weston, B. Scholkopf and K. Mullers, Fisher discriminant analysis with kernels, Neural Networks for Signal Processing IX, Proceedings of the IEEE Signal Processing Society Workshop, IEEE, Piscataway, (2002), 41–48. doi: 10.1109/NNSP.1999.788121.
    [29] H.-D. Qi and D. Sun, An augmented Lagrangian dual approach for the H-weighted nearest correlation matrix problem, IMA Journal of Numerical Analysis, 31 (2011), 491-511.  doi: 10.1093/imanum/drp031.
    [30] H.-D. QiN. H. Xiu and X. M. Yuan, A Lagrangian dual approach to the single source localization problem, IEEE Transactions on Signal Processing, 61 (2013), 3815-3826.  doi: 10.1109/TSP.2013.2264814.
    [31] R. T. Rockafellar, Augmented lagrangians and applications of the proximal point algorithm in convex programming, Mathematics of Operations Research, 1 (1976), 97-116.  doi: 10.1287/moor.1.2.97.
    [32] S. Roweis and L. Saul, Nonlinear dimensionality reduction by locally linear embedding, Science, 290 (2000), 2323-2326. 
    [33] B. SchölkopfA. Smola and K. Müller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, 10 (1998), 1299-1319. 
    [34] D. SunK.-C. Toh and L. Yang, A convergent 3-block semi-proximal alternating direction method of multipliers for conic programming with 4-type of constraints, SIAM Journal on Optimization, 25 (2015), 882-915.  doi: 10.1137/140964357.
    [35] J. TenenbaumV. Silva and J. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, 290 (2000), 2319-2323. 
    [36] S. Theodoridis and K. Koutroumbas, An Introduction to Pattern Recognition, A MATLAB approach, Elsevier Inc., 2010.
    [37] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Elsevier Inc., 2009.
    [38] A. TsanasM. A. LittleP. E. McSharry and L. O. Ramig, Accurate telemonitoring of Parkinson.s disease progression by non-invasive speech tests, IEEE Transactions on Biomedical Engineering, 57 (2010), 884-893.  doi: 10.1109/TBME.2009.2036000.
    [39] J. Venna and S. Kaski, Comparison of visualization methods for an atlas of gene expression data sets, Inf Vis, 6 (2007), 139-154.  doi: 10.1057/palgrave.ivs.9500153.
    [40] H. Wold, Soft modeling by latent variables: The nonlinear iterative partial least squares approach, Perspectives in Probability and Statistics, Papers in Honour of MS Bartlett, 1975,117–142. doi: 10.1017/s0021900200047604.
    [41] H. Wold, Partial Least Squares, Encyclopedia of Statistical Sciences, 2009.
    [42] Y. Yeh YS. Huang and Y. Lee, Nonlinear dimension reduction with kernel sliced inverse regression, IEEE Trans Knowl Data Eng, 21 (2009), 1590-1603. 
    [43] Z. ZhuT. Simil$\ddot{a}$ and F. Corona, Supervised distance preserving projection, Neural Processing Letters, 38 (2013), 445-463.  doi: 10.1007/s11063-013-9285-x.
  • 加载中

Figures(5)

Tables(7)

SHARE

Article Metrics

HTML views(2535) PDF downloads(252) Cited by(0)

Access History

Other Articles By Authors

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return