# American Institute of Mathematical Sciences

doi: 10.3934/naco.2020024

## Discriminant analysis of regularized multidimensional scaling

 1 Department of Mathematics, University of Dhaka, Bangladesh 2 School of Mathematics, University of Southampton, UK

Received  October 2019 Revised  January 2020 Published  March 2020

Fund Project: This research was supported by the Commonwealth Scholarship BDCS-2012-44

Regularized Multidimensional Scaling with Radial basis function (RMDS) is a nonlinear variant of classical Multi-Dimensional Scaling (cMDS). A key issue that has been addressed in RMDS is the effective selection of centers of the radial basis functions that plays a very important role in reducing the dimension preserving the structure of the data in higher dimensional space. RMDS uses data in unsupervised settings that means RMDS does not use any prior information of the dataset. This article is concerned on the supervised setting. Here we have incorporated the class information of some members of data to the RMDS model. The class separability term improved the method RMDS significantly and also outperforms other discriminant analysis methods such as Linear discriminant analysis (LDA) which is documented through numerical experiments.

Citation: Sohana Jahan. Discriminant analysis of regularized multidimensional scaling. Numerical Algebra, Control & Optimization, doi: 10.3934/naco.2020024
##### References:
 [1] A. Argyriou, T. Evgeniou and M. Pontil, Multi-task Feature Learning, in Advances in Neural Information Processing Systems (eds. B. Schoelkopf, J. Platt, and T. Hoffman), MIT Press, 2007. Google Scholar [2] A. Argyriou, T. Evgeniou and M. Pontil, Convex Multi-task Feature Learning, Machine Learning, Special Issue on Inductive Transfer Learning, 73 (2008), 243-272.   Google Scholar [3] J. Bénasséni, Partial additive constant, J. Statist. Comput. Simul., 49 (1994), 179-193.   Google Scholar [4] I. Borg and P. J. F. Groenen, Modern Multidimensional Scaling. Theory and Applications, 2$^{nd}$ edition, Springer Series in Statistics, Springer, 2005.  Google Scholar [5] F. Cailliez, The analytical solution of the additive constant problem, Psychometrika, 48 (1983), 305-308.  doi: 10.1007/BF02294026.  Google Scholar [6] H. G. Chew and C. C. Lim, On regularisation parameter transformation of support vector machines, Journal of Industrial and Management Optimization, 5 (2009), 403-415.  doi: 10.3934/jimo.2009.5.403.  Google Scholar [7] L. G. Cooper, A new solution to the additive constant problem in metric and multidimensional scaling, Psychometrika, 37 (1972), 311-321.   Google Scholar [8] T. F. Cox and M. A. Cox, Multidimensional Scaling, 2$^{nd}$ edition, Chapman and Hall/CRC, 2002.  Google Scholar [9] T. F. Cox and G. Ferry, Discriminant analysis using nonmetric multidimensional scaling, Pattern Recognition, 26 (1993), 145-153.   Google Scholar [10] J. de Leeuw, Applications of convex analysis to multidimensional scaling, in Recent Developments in Statistics (eds. J. Barra, F. Brodeau, G. Romier, and B. van Cutsen), North Holland Publishing Company, Amsterdem, The Netherlands, 133–145. Google Scholar [11] J. de Leeuw, Block relaxation algorithms in statistics, in Information Systems and Data Analysis (eds. Bock, H.H. et al.), Springer, Berlin, (1994), 308–325. Google Scholar [12] W. Glunt, T. L. Hayden, S. Hong and J. Wells, An alternating projection algorithm for computing the nearest Euclidean distance matrix, SIAM J. Matrix Anal. Appl., 11 (1990), 589-600.  doi: 10.1137/0611042.  Google Scholar [13] W. Glunt, T. L. Hayden and R. Raydan, Molecular conformations from distance matrices, J. Computational Chemistry, 14 (1993), 114-120.   Google Scholar [14] J. C. Gower, Some distance properties of latent rootand vector methods in multivariate analysis, Biometrika, 53 (1966), 315-328.  doi: 10.1093/biomet/53.3-4.325.  Google Scholar [15] Y. Hao and F. Meng, A new method on gene selection for tissue classification, Journal of Industrial and Management Optimization, 3 (2007), 739-748.  doi: 10.3934/jimo.2007.3.739.  Google Scholar [16] W. L. G. Koontz and K. Fukunaga, A nonlinear feature extraction algorithm using distance information, IEEE Transactions on Computers, 21 (1972), 56-63.   Google Scholar [17] J. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika, 29 (1964), 1-27.  doi: 10.1007/BF02289565.  Google Scholar [18] T. Li, S. Zhu and M. Ogihara, Using discriminant analysis for multi-class classification: an experimental investigation, Knowl Inf Syst., 10 (2006), 453-472.  doi: 10.1007/s10115-006-0013-y.  Google Scholar [19] D. Lowe, Novel topographic nonlinear feature extraction using radial basis functions for concentration coding in the artificial nose, IEEE International Conference on Artificial Neural Networks, (1993), 95–99. Google Scholar [20] K. V. Mardia, J. T. Kent and J. M. Bibby, Multivariate Analysis, $10{^th}$ printing, Academic Press, 1995.   Google Scholar [21] A. M. Martinez and A. C. Kak, PCA versus LDA, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 (2001), 228-233.  doi: 10.1109/34.908974.  Google Scholar [22] S. J. Messick and R. P Abelson, The additive constant problem in multidimensional scaling, Psychometrika, 21 (1956), 1-15.   Google Scholar [23] E. Pȩkalaska and R. P. W. Duin, The Dissimilarity Representation for Pattern Recognition: Foundations and Application, Series in Machine Perception Artificial Intelligence 64, World Scientific, 2005. Google Scholar [24] H.-D. Qi, A semismooth Newton method for the nearest Euclidean distance matrix problem, SIAM Journal Matrix Analysis and Applications, 34 (2013), 67-93.  doi: 10.1137/110849523.  Google Scholar [25] H.-D. Qi and N. Xiu, A convex quadratic semidefinite programming approach to the partial additive constant problem in multidimensional scaling, Journal of Statistical Computation and Simulation, 82 (2012), 1317-1336.  doi: 10.1080/00949655.2011.579970.  Google Scholar [26] H.-D. Qi, N. H. Xiu and X. M. Yuan, A Lagrangian dual approach to the single source localization problem, IEEE Transactions on Signal Processing, 61 (2013), 3815-3826.  doi: 10.1109/TSP.2013.2264814.  Google Scholar [27] H.-D. Qi and X. M. Yuan, Computing the nearest Euclidean distance matrix with low embedding dimensions, Mathematical Programming, Ser. A, 147 (2014), 351-389.  doi: 10.1007/s10107-013-0726-0.  Google Scholar [28] K. Schittkowski, Optimal parameter selection in support vector machines, Journal of Industrial and Management Optimization, 1 (2005), 465-476.  doi: 10.3934/jimo.2005.1.465.  Google Scholar [29] I. J. Schoenberg, Remarks to Maurice Fréchet's article "Sur la définition axiomatque d'une classe d'espaces vectoriels distanciés applicbles vectoriellement sur l'espace de Hilbet, Ann. Math., 36 (1935), 724-732.  doi: 10.2307/1968654.  Google Scholar [30] S Jahanand and H. D. Qi, Regularized multidimensional scaling with radial basis functions, Journal of Industrial and Management Optimization, 12 (2016), 543-563.  doi: 10.3934/jimo.2016.12.543.  Google Scholar [31] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Elsevier Inc., 2009. Google Scholar [32] S. Theodoridis and K. Koutroumbas, An Introduction to Pattern Recognition, A MATLAB Approach, Elsevier Inc., 2010. Google Scholar [33] W. S. Torgerson, Theory and Methods for Scaling, Wiley, New York, 1958.  Google Scholar [34] A. R. Webb, Multidimensional Scaling by iterative majorization using radial basis functions, Pattern Recognition, 28 (1995), 753-759.   Google Scholar [35] A. R. Webb, Nonlinear feature extraction with radial basis functions using a weighted multidimensional scaling stress measure, Pattern Recognition, IEEE Conference Publications, 4 (1996), 635–639. Google Scholar [36] A. R. Webb, An approach to nonlinear principal component analysis using radially-symmetric kernel functions, Statistics and Computing, 6 (1996), 159-168.   Google Scholar [37] G. Young and A. S. Householder, Discussion of a set of points in terms of their mutual distances, Psychometrika, 3 (1938), 19-22.  doi: 10.1007/BF02288560.  Google Scholar [38] Y. Yuan, W. Fan and D. Pu, Spline function smooth support vector machine for classification, Journal of Industrial and Management Optimization, 3 (2007), 529-542.  doi: 10.3934/jimo.2007.3.529.  Google Scholar

show all references

##### References:
 [1] A. Argyriou, T. Evgeniou and M. Pontil, Multi-task Feature Learning, in Advances in Neural Information Processing Systems (eds. B. Schoelkopf, J. Platt, and T. Hoffman), MIT Press, 2007. Google Scholar [2] A. Argyriou, T. Evgeniou and M. Pontil, Convex Multi-task Feature Learning, Machine Learning, Special Issue on Inductive Transfer Learning, 73 (2008), 243-272.   Google Scholar [3] J. Bénasséni, Partial additive constant, J. Statist. Comput. Simul., 49 (1994), 179-193.   Google Scholar [4] I. Borg and P. J. F. Groenen, Modern Multidimensional Scaling. Theory and Applications, 2$^{nd}$ edition, Springer Series in Statistics, Springer, 2005.  Google Scholar [5] F. Cailliez, The analytical solution of the additive constant problem, Psychometrika, 48 (1983), 305-308.  doi: 10.1007/BF02294026.  Google Scholar [6] H. G. Chew and C. C. Lim, On regularisation parameter transformation of support vector machines, Journal of Industrial and Management Optimization, 5 (2009), 403-415.  doi: 10.3934/jimo.2009.5.403.  Google Scholar [7] L. G. Cooper, A new solution to the additive constant problem in metric and multidimensional scaling, Psychometrika, 37 (1972), 311-321.   Google Scholar [8] T. F. Cox and M. A. Cox, Multidimensional Scaling, 2$^{nd}$ edition, Chapman and Hall/CRC, 2002.  Google Scholar [9] T. F. Cox and G. Ferry, Discriminant analysis using nonmetric multidimensional scaling, Pattern Recognition, 26 (1993), 145-153.   Google Scholar [10] J. de Leeuw, Applications of convex analysis to multidimensional scaling, in Recent Developments in Statistics (eds. J. Barra, F. Brodeau, G. Romier, and B. van Cutsen), North Holland Publishing Company, Amsterdem, The Netherlands, 133–145. Google Scholar [11] J. de Leeuw, Block relaxation algorithms in statistics, in Information Systems and Data Analysis (eds. Bock, H.H. et al.), Springer, Berlin, (1994), 308–325. Google Scholar [12] W. Glunt, T. L. Hayden, S. Hong and J. Wells, An alternating projection algorithm for computing the nearest Euclidean distance matrix, SIAM J. Matrix Anal. Appl., 11 (1990), 589-600.  doi: 10.1137/0611042.  Google Scholar [13] W. Glunt, T. L. Hayden and R. Raydan, Molecular conformations from distance matrices, J. Computational Chemistry, 14 (1993), 114-120.   Google Scholar [14] J. C. Gower, Some distance properties of latent rootand vector methods in multivariate analysis, Biometrika, 53 (1966), 315-328.  doi: 10.1093/biomet/53.3-4.325.  Google Scholar [15] Y. Hao and F. Meng, A new method on gene selection for tissue classification, Journal of Industrial and Management Optimization, 3 (2007), 739-748.  doi: 10.3934/jimo.2007.3.739.  Google Scholar [16] W. L. G. Koontz and K. Fukunaga, A nonlinear feature extraction algorithm using distance information, IEEE Transactions on Computers, 21 (1972), 56-63.   Google Scholar [17] J. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika, 29 (1964), 1-27.  doi: 10.1007/BF02289565.  Google Scholar [18] T. Li, S. Zhu and M. Ogihara, Using discriminant analysis for multi-class classification: an experimental investigation, Knowl Inf Syst., 10 (2006), 453-472.  doi: 10.1007/s10115-006-0013-y.  Google Scholar [19] D. Lowe, Novel topographic nonlinear feature extraction using radial basis functions for concentration coding in the artificial nose, IEEE International Conference on Artificial Neural Networks, (1993), 95–99. Google Scholar [20] K. V. Mardia, J. T. Kent and J. M. Bibby, Multivariate Analysis, $10{^th}$ printing, Academic Press, 1995.   Google Scholar [21] A. M. Martinez and A. C. Kak, PCA versus LDA, IEEE Transactions on Pattern Analysis and Machine Intelligence, 23 (2001), 228-233.  doi: 10.1109/34.908974.  Google Scholar [22] S. J. Messick and R. P Abelson, The additive constant problem in multidimensional scaling, Psychometrika, 21 (1956), 1-15.   Google Scholar [23] E. Pȩkalaska and R. P. W. Duin, The Dissimilarity Representation for Pattern Recognition: Foundations and Application, Series in Machine Perception Artificial Intelligence 64, World Scientific, 2005. Google Scholar [24] H.-D. Qi, A semismooth Newton method for the nearest Euclidean distance matrix problem, SIAM Journal Matrix Analysis and Applications, 34 (2013), 67-93.  doi: 10.1137/110849523.  Google Scholar [25] H.-D. Qi and N. Xiu, A convex quadratic semidefinite programming approach to the partial additive constant problem in multidimensional scaling, Journal of Statistical Computation and Simulation, 82 (2012), 1317-1336.  doi: 10.1080/00949655.2011.579970.  Google Scholar [26] H.-D. Qi, N. H. Xiu and X. M. Yuan, A Lagrangian dual approach to the single source localization problem, IEEE Transactions on Signal Processing, 61 (2013), 3815-3826.  doi: 10.1109/TSP.2013.2264814.  Google Scholar [27] H.-D. Qi and X. M. Yuan, Computing the nearest Euclidean distance matrix with low embedding dimensions, Mathematical Programming, Ser. A, 147 (2014), 351-389.  doi: 10.1007/s10107-013-0726-0.  Google Scholar [28] K. Schittkowski, Optimal parameter selection in support vector machines, Journal of Industrial and Management Optimization, 1 (2005), 465-476.  doi: 10.3934/jimo.2005.1.465.  Google Scholar [29] I. J. Schoenberg, Remarks to Maurice Fréchet's article "Sur la définition axiomatque d'une classe d'espaces vectoriels distanciés applicbles vectoriellement sur l'espace de Hilbet, Ann. Math., 36 (1935), 724-732.  doi: 10.2307/1968654.  Google Scholar [30] S Jahanand and H. D. Qi, Regularized multidimensional scaling with radial basis functions, Journal of Industrial and Management Optimization, 12 (2016), 543-563.  doi: 10.3934/jimo.2016.12.543.  Google Scholar [31] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Elsevier Inc., 2009. Google Scholar [32] S. Theodoridis and K. Koutroumbas, An Introduction to Pattern Recognition, A MATLAB Approach, Elsevier Inc., 2010. Google Scholar [33] W. S. Torgerson, Theory and Methods for Scaling, Wiley, New York, 1958.  Google Scholar [34] A. R. Webb, Multidimensional Scaling by iterative majorization using radial basis functions, Pattern Recognition, 28 (1995), 753-759.   Google Scholar [35] A. R. Webb, Nonlinear feature extraction with radial basis functions using a weighted multidimensional scaling stress measure, Pattern Recognition, IEEE Conference Publications, 4 (1996), 635–639. Google Scholar [36] A. R. Webb, An approach to nonlinear principal component analysis using radially-symmetric kernel functions, Statistics and Computing, 6 (1996), 159-168.   Google Scholar [37] G. Young and A. S. Householder, Discussion of a set of points in terms of their mutual distances, Psychometrika, 3 (1938), 19-22.  doi: 10.1007/BF02288560.  Google Scholar [38] Y. Yuan, W. Fan and D. Pu, Spline function smooth support vector machine for classification, Journal of Industrial and Management Optimization, 3 (2007), 529-542.  doi: 10.3934/jimo.2007.3.529.  Google Scholar
Fig. 1(a) shows the separation of the nonseparable two classes of iris data by support vector machine (SVM)algorithm. One class represented "+" and the other one is represented by "$\diamond$". Over $100$ runs, SRMDS model yielded at an average $1$ misclassified points, while the corresponding number for RMDS was Webb's model is $3$ to $5$. Fig. 1(b) shows SVM applied on 2-dimensional projection of Cancer data. Over $100$ runs, SRMDS model yielded $1$ to $3$ misclassified points. Here support vectors in each image is bounded by "O" and misclassified points (bounded by $\square$)
SVM on Seeds data projected in 2 dimensional space by $\text{RMDS}$ is shown in these figures. Where the separation of the classes are shown using multiclass classifier
Projected 2-dimensional Iris data, consisting of $3$ classes. One class represented by 'red +' is completely separated from the other two. Training points of nonseparable two classes are represented by 'green +' and 'blue +', whereas the testing points projected by SRMDS are represented by 'pink o' and 'red o'. Fig. 3(a) $\lambda = 0.1$. Fig. 3(b) $\lambda = 0.5$. Fig. 3(c)$\lambda = 0.9$, over $100$ random runs. The incresed value of $\lambda$ puts more weight on preservation of the structure of data
Projected 2-dimensional Cancer data, consisting of $2$ classes. Training points of the classes are represented by 'red +' and 'blue +', whereas the testing points of respective classes projected by SRMDS are represented by 'green o' and 'pink o' Fig. 4(a) $\lambda = 0.1$. Fig. 4(b) $\lambda = 0.5$. Fig. 4(c) $\lambda = 0.9$, over $100$ random runs. The incresed value of $\lambda$ puts more weight on preserving the structure of data
Average stress of dataset for different values of $\lambda$ over $100$ random runs. The incresed value of $\lambda$ reduces the stress
 Dataset Dim Class no. of ins. Source Iris 4 3 150 UCI Repository Cancer 9 2 683 UCI Repository Seeds 7 3 210 UCI Repository
 Dataset Dim Class no. of ins. Source Iris 4 3 150 UCI Repository Cancer 9 2 683 UCI Repository Seeds 7 3 210 UCI Repository
Numerical results obtained by applying SVM on three datasets projected using discriminant analysis
 Dataset MDS RMDS SRMDS Improvment over RMDS Iris Support vector 18 13 5 Missclassified points 6 3 0 66 % Cancer Support vector 64 54 47 Missclassified points 9 5 1 80 % Seeds C1 Support vector 42 35 23 Missclassified points 12 10 5 50 % Seeds C2 Support vector 20 16 10 Missclassified points 5 3 2 33 % Seeds C3 Support vector 24 20 9 Missclassified points 8 5 2 60 %
 Dataset MDS RMDS SRMDS Improvment over RMDS Iris Support vector 18 13 5 Missclassified points 6 3 0 66 % Cancer Support vector 64 54 47 Missclassified points 9 5 1 80 % Seeds C1 Support vector 42 35 23 Missclassified points 12 10 5 50 % Seeds C2 Support vector 20 16 10 Missclassified points 5 3 2 33 % Seeds C3 Support vector 24 20 9 Missclassified points 8 5 2 60 %
Misclassified points obtained by k-nn (3-nn) classifier on three datasets projected by SRMDS and LDA
 Dataset LDA SRMDS Iris 0 0 Cancer 4 2 Seeds 20 9
 Dataset LDA SRMDS Iris 0 0 Cancer 4 2 Seeds 20 9
 [1] Yubo Yuan, Weiguo Fan, Dongmei Pu. Spline function smooth support vector machine for classification. Journal of Industrial & Management Optimization, 2007, 3 (3) : 529-542. doi: 10.3934/jimo.2007.3.529 [2] Xin Li, Ziguan Cui, Linhui Sun, Guanming Lu, Debnath Narayan. Research on iterative repair algorithm of Hyperchaotic image based on support vector machine. Discrete & Continuous Dynamical Systems - S, 2019, 12 (4&5) : 1199-1218. doi: 10.3934/dcdss.2019083 [3] Martin D. Buhmann, Slawomir Dinew. Limits of radial basis function interpolants. Communications on Pure & Applied Analysis, 2007, 6 (3) : 569-585. doi: 10.3934/cpaa.2007.6.569 [4] Sohana Jahan, Hou-Duo Qi. Regularized multidimensional scaling with radial basis functions. Journal of Industrial & Management Optimization, 2016, 12 (2) : 543-563. doi: 10.3934/jimo.2016.12.543 [5] Yubo Yuan. Canonical duality solution for alternating support vector machine. Journal of Industrial & Management Optimization, 2012, 8 (3) : 611-621. doi: 10.3934/jimo.2012.8.611 [6] Peter Giesl. Construction of a global Lyapunov function using radial basis functions with a single operator. Discrete & Continuous Dynamical Systems - B, 2007, 7 (1) : 101-124. doi: 10.3934/dcdsb.2007.7.101 [7] Ying Lin, Qi Ye. Support vector machine classifiers by non-Euclidean margins. Mathematical Foundations of Computing, 2020  doi: 10.3934/mfc.2020018 [8] Jian Luo, Shu-Cherng Fang, Yanqin Bai, Zhibin Deng. Fuzzy quadratic surface support vector machine based on fisher discriminant analysis. Journal of Industrial & Management Optimization, 2016, 12 (1) : 357-373. doi: 10.3934/jimo.2016.12.357 [9] Dmitry Treschev. A locally integrable multi-dimensional billiard system. Discrete & Continuous Dynamical Systems - A, 2017, 37 (10) : 5271-5284. doi: 10.3934/dcds.2017228 [10] Kang-Ling Liao, Chih-Wen Shih, Chi-Jer Yu. The snapback repellers for chaos in multi-dimensional maps. Journal of Computational Dynamics, 2018, 5 (1&2) : 81-92. doi: 10.3934/jcd.2018004 [11] Franz Achleitner, Anton Arnold, Eric A. Carlen. On multi-dimensional hypocoercive BGK models. Kinetic & Related Models, 2018, 11 (4) : 953-1009. doi: 10.3934/krm.2018038 [12] Anatoli F. Ivanov. On global dynamics in a multi-dimensional discrete map. Conference Publications, 2015, 2015 (special) : 652-659. doi: 10.3934/proc.2015.0652 [13] Gerald Sommer, Di Zang. Parity symmetry in multi-dimensional signals. Communications on Pure & Applied Analysis, 2007, 6 (3) : 829-852. doi: 10.3934/cpaa.2007.6.829 [14] Ning Lu, Ying Liu. Application of support vector machine model in wind power prediction based on particle swarm optimization. Discrete & Continuous Dynamical Systems - S, 2015, 8 (6) : 1267-1276. doi: 10.3934/dcdss.2015.8.1267 [15] Wen-Guei Hu, Song-Sun Lin. On spatial entropy of multi-dimensional symbolic dynamical systems. Discrete & Continuous Dynamical Systems - A, 2016, 36 (7) : 3705-3717. doi: 10.3934/dcds.2016.36.3705 [16] Xiaoling Sun, Xiaojin Zheng, Juan Sun. A Lagrangian dual and surrogate method for multi-dimensional quadratic knapsack problems. Journal of Industrial & Management Optimization, 2009, 5 (1) : 47-60. doi: 10.3934/jimo.2009.5.47 [17] Masaharu Taniguchi. Multi-dimensional traveling fronts in bistable reaction-diffusion equations. Discrete & Continuous Dynamical Systems - A, 2012, 32 (3) : 1011-1046. doi: 10.3934/dcds.2012.32.1011 [18] Péter Bálint, Imre Péter Tóth. Hyperbolicity in multi-dimensional Hamiltonian systems with applications to soft billiards. Discrete & Continuous Dynamical Systems - A, 2006, 15 (1) : 37-59. doi: 10.3934/dcds.2006.15.37 [19] Wen-Qing Xu. Boundary conditions for multi-dimensional hyperbolic relaxation problems. Conference Publications, 2003, 2003 (Special) : 916-925. doi: 10.3934/proc.2003.2003.916 [20] Eugenii Shustin. Dynamics of oscillations in a multi-dimensional delay differential system. Discrete & Continuous Dynamical Systems - A, 2004, 11 (2&3) : 557-576. doi: 10.3934/dcds.2004.11.557

Impact Factor: