## Discriminant analysis of regularized multidimensional scaling

 1 Department of Mathematics, University of Dhaka, Bangladesh 2 School of Mathematics, University of Southampton, UK

Received  October 2019 Revised  January 2020 Published  March 2020

Fund Project: This research was supported by the Commonwealth Scholarship BDCS-2012-44

Regularized Multidimensional Scaling with Radial basis function (RMDS) is a nonlinear variant of classical Multi-Dimensional Scaling (cMDS). A key issue that has been addressed in RMDS is the effective selection of centers of the radial basis functions that plays a very important role in reducing the dimension preserving the structure of the data in higher dimensional space. RMDS uses data in unsupervised settings that means RMDS does not use any prior information of the dataset. This article is concerned on the supervised setting. Here we have incorporated the class information of some members of data to the RMDS model. The class separability term improved the method RMDS significantly and also outperforms other discriminant analysis methods such as Linear discriminant analysis (LDA) which is documented through numerical experiments.

Citation: Sohana Jahan. Discriminant analysis of regularized multidimensional scaling. Numerical Algebra, Control & Optimization, doi: 10.3934/naco.2020024
Fig. 1(a) shows the separation of the nonseparable two classes of iris data by support vector machine (SVM)algorithm. One class represented "+" and the other one is represented by "$\diamond$". Over $100$ runs, SRMDS model yielded at an average $1$ misclassified points, while the corresponding number for RMDS was Webb's model is $3$ to $5$. Fig. 1(b) shows SVM applied on 2-dimensional projection of Cancer data. Over $100$ runs, SRMDS model yielded $1$ to $3$ misclassified points. Here support vectors in each image is bounded by "O" and misclassified points (bounded by $\square$)
SVM on Seeds data projected in 2 dimensional space by $\text{RMDS}$ is shown in these figures. Where the separation of the classes are shown using multiclass classifier
Projected 2-dimensional Iris data, consisting of $3$ classes. One class represented by 'red +' is completely separated from the other two. Training points of nonseparable two classes are represented by 'green +' and 'blue +', whereas the testing points projected by SRMDS are represented by 'pink o' and 'red o'. Fig. 3(a) $\lambda = 0.1$. Fig. 3(b) $\lambda = 0.5$. Fig. 3(c)$\lambda = 0.9$, over $100$ random runs. The incresed value of $\lambda$ puts more weight on preservation of the structure of data
Projected 2-dimensional Cancer data, consisting of $2$ classes. Training points of the classes are represented by 'red +' and 'blue +', whereas the testing points of respective classes projected by SRMDS are represented by 'green o' and 'pink o' Fig. 4(a) $\lambda = 0.1$. Fig. 4(b) $\lambda = 0.5$. Fig. 4(c) $\lambda = 0.9$, over $100$ random runs. The incresed value of $\lambda$ puts more weight on preserving the structure of data
Average stress of dataset for different values of $\lambda$ over $100$ random runs. The incresed value of $\lambda$ reduces the stress
 Dataset Dim Class no. of ins. Source Iris 4 3 150 UCI Repository Cancer 9 2 683 UCI Repository Seeds 7 3 210 UCI Repository
 Dataset Dim Class no. of ins. Source Iris 4 3 150 UCI Repository Cancer 9 2 683 UCI Repository Seeds 7 3 210 UCI Repository
Numerical results obtained by applying SVM on three datasets projected using discriminant analysis
 Dataset MDS RMDS SRMDS Improvment over RMDS Iris Support vector 18 13 5 Missclassified points 6 3 0 66 % Cancer Support vector 64 54 47 Missclassified points 9 5 1 80 % Seeds C1 Support vector 42 35 23 Missclassified points 12 10 5 50 % Seeds C2 Support vector 20 16 10 Missclassified points 5 3 2 33 % Seeds C3 Support vector 24 20 9 Missclassified points 8 5 2 60 %
 Dataset MDS RMDS SRMDS Improvment over RMDS Iris Support vector 18 13 5 Missclassified points 6 3 0 66 % Cancer Support vector 64 54 47 Missclassified points 9 5 1 80 % Seeds C1 Support vector 42 35 23 Missclassified points 12 10 5 50 % Seeds C2 Support vector 20 16 10 Missclassified points 5 3 2 33 % Seeds C3 Support vector 24 20 9 Missclassified points 8 5 2 60 %
Misclassified points obtained by k-nn (3-nn) classifier on three datasets projected by SRMDS and LDA
 Dataset LDA SRMDS Iris 0 0 Cancer 4 2 Seeds 20 9
 Dataset LDA SRMDS Iris 0 0 Cancer 4 2 Seeds 20 9
