doi: 10.3934/fods.2021002

The rankability of weighted data from pairwise comparisons

1. 

Department of Computer Science and Software Engineering, California Polytechnic State University, San Luis Obispo, CA, USA

2. 

Department of Mathematics and Computer Science, Davidson College, Davidson, NC, USA

3. 

Mathematics Department, College of Charleston, Charleston, SC, USA

* Corresponding author: Langville

Received  August 2020 Revised  December 2020 Published  January 2021

In prior work [4], Anderson et al. introduced a new problem, the rankability problem, which refers to a dataset's inherent ability to produce a meaningful ranking of its items. Ranking is a fundamental data science task with numerous applications that include web search, data mining, cybersecurity, machine learning, and statistical learning theory. Yet little attention has been paid to the question of whether a dataset is suitable for ranking. As a result, when a ranking method is applied to a dataset with low rankability, the resulting ranking may not be reliable.

Rankability paper [4] and its methods studied unweighted data for which the dominance relations are binary, i.e., an item either dominates or is dominated by another item. In this paper, we extend rankability methods to weighted data for which an item may dominate another by any finite amount. We present combinatorial approaches to a weighted rankability measure and apply our new measure to several weighted datasets.

Citation: Paul E. Anderson, Timothy P. Chartier, Amy N. Langville, Kathryn E. Pedings-Behling. The rankability of weighted data from pairwise comparisons. Foundations of Data Science, doi: 10.3934/fods.2021002
References:
[1]

T. Achterberg, Scip: Solving constraint integer programs, Mathematical Programming Computation, 1 (2009), 1-41.  doi: 10.1007/s12532-008-0001-1.  Google Scholar

[2]

N. Ailon, M. Charikar and A. Newman, Aggregating inconsistent information: Ranking and clustering, Journal of the ACM, 55 (2008), 27 pp. doi: 10.1145/1411509.1411513.  Google Scholar

[3]

I. AliW. D. Cook and M. Kress, On the minimum violations ranking of a tournament, Management Science, 32 (1986), 660-672.  doi: 10.1287/mnsc.32.6.660.  Google Scholar

[4]

P. Anderson, T. Chartier and A. Langville, The rankability of data, SIAM Journal on the Mathematics of Data Science, (2019), 121–143. doi: 10.1137/18M1183595.  Google Scholar

[5]

P. Anderson, T. Chartier, A. Langville and K. Pedings-Behling, IGARDS technical report, 2019. Available from: https://igards.github.io/research/IGARDS_Technical_Report_November_2019.pdf. Google Scholar

[6]

P. Anderson, T. Chartier, A. Langville and K. Pedings-Behling, Revisiting rankability of unweighted data technical report, 2020. Available from: https://igards.github.io/research/Revisiting_Rankability_Unweighted_Data.pdf. Google Scholar

[7]

R. D. ArmstrongW. D. Cook and L. M. Seiford, Priority ranking and consensus formation: The case of ties, Management Science, 28 (1982), 638-645.  doi: 10.1287/mnsc.28.6.638.  Google Scholar

[8]

J. Brenner and P. Keating, Chaos kills, ESPN, The Magazine, (2016), 20–23. Google Scholar

[9]

S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Computer Networks and ISDN Systems, 33 (1998), 107–117. doi: 10.1016/S0169-7552(98)00110-X.  Google Scholar

[10]

S. Brin, L. Page, R. Motwami and T. Winograd, The PageRank citation ranking: Bringing order to the web, Tech. Report 1999-0120, Computer Science Department, Stanford University, 1999. Google Scholar

[11]

T. R. CameronA. N. Langville and H. C. Smith, On the graph Laplacian and the rankability of data, Linear Algebra and its Applications, 588 (2020), 81-100.  doi: 10.1016/j.laa.2019.11.026.  Google Scholar

[12]

C. R. CassadyL. M. Maillart and S. Salman, Ranking sports teams: A customizable quadratic assignment approach, INFORMS: Interfaces, 35 (2005), 497-510.  doi: 10.1287/inte.1050.0171.  Google Scholar

[13]

T. P. Chartier, E. Kreutzer, A. N. Langville and K. E. Pedings, Sensitivity of ranking vectors, SIAM Journal on Scientific Computing, 33 (2011), 1077–1102. doi: 10.1137/090772745.  Google Scholar

[14]

B. J. Coleman, Minimizing game score violations in college football rankings, INFORMS: Interfaces, 35 (2005), 483-496.  doi: 10.1287/inte.1050.0172.  Google Scholar

[15]

W. D. Cook and L. M. Seiford, Priority ranking and consensus formation, Management Science, 24 (1978), 1721-1732.  doi: 10.1287/mnsc.24.16.1721.  Google Scholar

[16]

T. G. Dietterich, Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation, 10 (1998), 1895-1923.  doi: 10.1162/089976698300017197.  Google Scholar

[17]

S. DuttaS. H. Jacobson and J. J. Sauppe, Identifying ncaa tournament upsets using balance optimization subset selection, Journal of Quantitative Analysis in Sports, 13 (2017), 79-93.   Google Scholar

[18]

M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, 1979.  Google Scholar

[19]

F. M. Harper and J. A. Konstan, The movielens datasets: History and context, Acm Transactions on Interactive Intelligent Systems (TIIS), 5 (2015), 1-19.  doi: 10.1145/2827872.  Google Scholar

[20]

X. JiangL.-H. LimY. Yao and Y. Ye, Statistical ranking and combinatorial Hodge theory, Mathematical Programming, 127 (2011), 203-244.  doi: 10.1007/s10107-010-0419-x.  Google Scholar

[21]

P. Keating, personal communication. Google Scholar

[22]

Y. Kondo, Triangulation of input-output tables based on mixed integer programs for inter-temporal and inter-regional comparison of production structures, Journal of Economic Structures, 3 (2014), 1-19.  doi: 10.1186/2193-2409-3-2.  Google Scholar

[23] A. N. Langville and C. D. Meyer, Who's #1? The Science of Rating and Ranking Items, Princeton University Press, Princeton, 2012.   Google Scholar
[24]

A. N. Langville, K. Pedings and Y. Yamamoto, A minimum violations ranking method, Optimization and Engineering, (2011), 1–22. doi: 10.1007/s11081-011-9135-5.  Google Scholar

[25]

A. S. Lee and D. R. Shier, A method for ranking teams based on simple paths, UMAP Journal, 39 (2018), 353-371.   Google Scholar

[26]

W. W. Leontief, Input-Output Economics, 2nd edition, Oxford University Press, 1986. doi: 10.1038/scientificamerican1051-15.  Google Scholar

[27]

M. J. Lopez and G. J. Matthews, Building an NCAA men's basketball predictive model and quantifying its success, Journal of Quantitative Analysis in Sports, 11 (2015), 5-12.  doi: 10.1515/jqas-2014-0058.  Google Scholar

[28]

R. Marti and G. Reinelt, The linear ordering problem: Exact and heuristic methods in combinatorial optimization, AMS, 2011. doi: 10.1007/978-3-642-16729-4.  Google Scholar

[29]

S. Mehrotra and Y. Ye, Finding an interior point in the optimal face of linear programs, 62 (1993), 497–515. doi: 10.1007/BF01585180.  Google Scholar

[30]

J. Park, On minimum violations ranking in paired comparisons, 2005. Google Scholar

[31]

G. Reinelt, The Linear Ordering Problem: Algorithms and Applications, Heldermann Verlag, 1985.  Google Scholar

[32]

V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, Berlin, Heidelberg, 1995. doi: 10.1007/978-1-4757-2440-0.  Google Scholar

[33]

S. Wartenberg, How many people will fill out March Madness brackets?, The Columbus Dispatch, (2015). Google Scholar

show all references

References:
[1]

T. Achterberg, Scip: Solving constraint integer programs, Mathematical Programming Computation, 1 (2009), 1-41.  doi: 10.1007/s12532-008-0001-1.  Google Scholar

[2]

N. Ailon, M. Charikar and A. Newman, Aggregating inconsistent information: Ranking and clustering, Journal of the ACM, 55 (2008), 27 pp. doi: 10.1145/1411509.1411513.  Google Scholar

[3]

I. AliW. D. Cook and M. Kress, On the minimum violations ranking of a tournament, Management Science, 32 (1986), 660-672.  doi: 10.1287/mnsc.32.6.660.  Google Scholar

[4]

P. Anderson, T. Chartier and A. Langville, The rankability of data, SIAM Journal on the Mathematics of Data Science, (2019), 121–143. doi: 10.1137/18M1183595.  Google Scholar

[5]

P. Anderson, T. Chartier, A. Langville and K. Pedings-Behling, IGARDS technical report, 2019. Available from: https://igards.github.io/research/IGARDS_Technical_Report_November_2019.pdf. Google Scholar

[6]

P. Anderson, T. Chartier, A. Langville and K. Pedings-Behling, Revisiting rankability of unweighted data technical report, 2020. Available from: https://igards.github.io/research/Revisiting_Rankability_Unweighted_Data.pdf. Google Scholar

[7]

R. D. ArmstrongW. D. Cook and L. M. Seiford, Priority ranking and consensus formation: The case of ties, Management Science, 28 (1982), 638-645.  doi: 10.1287/mnsc.28.6.638.  Google Scholar

[8]

J. Brenner and P. Keating, Chaos kills, ESPN, The Magazine, (2016), 20–23. Google Scholar

[9]

S. Brin and L. Page, The anatomy of a large-scale hypertextual web search engine, Computer Networks and ISDN Systems, 33 (1998), 107–117. doi: 10.1016/S0169-7552(98)00110-X.  Google Scholar

[10]

S. Brin, L. Page, R. Motwami and T. Winograd, The PageRank citation ranking: Bringing order to the web, Tech. Report 1999-0120, Computer Science Department, Stanford University, 1999. Google Scholar

[11]

T. R. CameronA. N. Langville and H. C. Smith, On the graph Laplacian and the rankability of data, Linear Algebra and its Applications, 588 (2020), 81-100.  doi: 10.1016/j.laa.2019.11.026.  Google Scholar

[12]

C. R. CassadyL. M. Maillart and S. Salman, Ranking sports teams: A customizable quadratic assignment approach, INFORMS: Interfaces, 35 (2005), 497-510.  doi: 10.1287/inte.1050.0171.  Google Scholar

[13]

T. P. Chartier, E. Kreutzer, A. N. Langville and K. E. Pedings, Sensitivity of ranking vectors, SIAM Journal on Scientific Computing, 33 (2011), 1077–1102. doi: 10.1137/090772745.  Google Scholar

[14]

B. J. Coleman, Minimizing game score violations in college football rankings, INFORMS: Interfaces, 35 (2005), 483-496.  doi: 10.1287/inte.1050.0172.  Google Scholar

[15]

W. D. Cook and L. M. Seiford, Priority ranking and consensus formation, Management Science, 24 (1978), 1721-1732.  doi: 10.1287/mnsc.24.16.1721.  Google Scholar

[16]

T. G. Dietterich, Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation, 10 (1998), 1895-1923.  doi: 10.1162/089976698300017197.  Google Scholar

[17]

S. DuttaS. H. Jacobson and J. J. Sauppe, Identifying ncaa tournament upsets using balance optimization subset selection, Journal of Quantitative Analysis in Sports, 13 (2017), 79-93.   Google Scholar

[18]

M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, Freeman, 1979.  Google Scholar

[19]

F. M. Harper and J. A. Konstan, The movielens datasets: History and context, Acm Transactions on Interactive Intelligent Systems (TIIS), 5 (2015), 1-19.  doi: 10.1145/2827872.  Google Scholar

[20]

X. JiangL.-H. LimY. Yao and Y. Ye, Statistical ranking and combinatorial Hodge theory, Mathematical Programming, 127 (2011), 203-244.  doi: 10.1007/s10107-010-0419-x.  Google Scholar

[21]

P. Keating, personal communication. Google Scholar

[22]

Y. Kondo, Triangulation of input-output tables based on mixed integer programs for inter-temporal and inter-regional comparison of production structures, Journal of Economic Structures, 3 (2014), 1-19.  doi: 10.1186/2193-2409-3-2.  Google Scholar

[23] A. N. Langville and C. D. Meyer, Who's #1? The Science of Rating and Ranking Items, Princeton University Press, Princeton, 2012.   Google Scholar
[24]

A. N. Langville, K. Pedings and Y. Yamamoto, A minimum violations ranking method, Optimization and Engineering, (2011), 1–22. doi: 10.1007/s11081-011-9135-5.  Google Scholar

[25]

A. S. Lee and D. R. Shier, A method for ranking teams based on simple paths, UMAP Journal, 39 (2018), 353-371.   Google Scholar

[26]

W. W. Leontief, Input-Output Economics, 2nd edition, Oxford University Press, 1986. doi: 10.1038/scientificamerican1051-15.  Google Scholar

[27]

M. J. Lopez and G. J. Matthews, Building an NCAA men's basketball predictive model and quantifying its success, Journal of Quantitative Analysis in Sports, 11 (2015), 5-12.  doi: 10.1515/jqas-2014-0058.  Google Scholar

[28]

R. Marti and G. Reinelt, The linear ordering problem: Exact and heuristic methods in combinatorial optimization, AMS, 2011. doi: 10.1007/978-3-642-16729-4.  Google Scholar

[29]

S. Mehrotra and Y. Ye, Finding an interior point in the optimal face of linear programs, 62 (1993), 497–515. doi: 10.1007/BF01585180.  Google Scholar

[30]

J. Park, On minimum violations ranking in paired comparisons, 2005. Google Scholar

[31]

G. Reinelt, The Linear Ordering Problem: Algorithms and Applications, Heldermann Verlag, 1985.  Google Scholar

[32]

V. N. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, Berlin, Heidelberg, 1995. doi: 10.1007/978-1-4757-2440-0.  Google Scholar

[33]

S. Wartenberg, How many people will fill out March Madness brackets?, The Columbus Dispatch, (2015). Google Scholar

Figure 1.  Cityplot of $ 8 \times 8 $ data matrix with original ordering and hillside reordering
Figure 2.  Cityplots of $ n = 8 $ college football data matrices with the original ordering (left) and the optimal hillside reordering (right). The top row is the 2008 season, a less rankable season with hillside $ \delta = 155 $ and $ \rho = 6 $. The bottom row is the 2005 season, a more rankable season with hillside $ \delta = 92 $ and $ \rho = 4 $
Figure 3.  Approximate fractional matrix $ {\bf X}^*({\bf r}, {\bf r}) $ for Example 1 obtained by the Interior Point solver of the linear programming relaxation
Figure 4.  Spaghetti plots and summary of diversity of $ P $ sets for Examples 1 and 2
Figure 5.  Two maximally discordant optimal solutions for Examples 1 and 2
Figure 6.  $ X^* $ color-coded visualization of years 2009, 2013, and 2016 NCAA March Madness using the top performing LOP formulation
Figure 7.  $ X^* $ color-coded visualization of years 2009, 2013, and 2016 NCAA March Madness using the top performing hillside formulation
Figure 8.  $ X^* $ color-coded visualization for each year of NCAA March Madness using the top performing LOP formulation
Figure 9.  $ X^* $ color-coded visualization for each year of NCAA March Madness using the top performing hillside formulation
Table 1.  Sample Input/Output economic data based on Japan 2005 [22] (A) and its graphical representation in (B)
Table 2.  Sample of movie rating data based on MovieLens [19] (A) and graphical representation of user rating data transformed into pairwise comparisons (B)
Table 3.  Sample of NCAA Men's Basketball games is shown in (A) and the graphical representation of the aggregate dominance information, i.e., $ {\bf D} $ is shown in (B)
Table 4.  5 fold cross-validation results for predicting the upset measure for NCAA Men's March Madness 2002-2018
Dominance Method Parameters Method MAE
1 Direct+Indirect dt=0, st=1, wi=1 Hillside 3.148221
2 Direct+Indirect dt=0, st=0, wi=1 Hillside 3.148221
3 Direct+Indirect dt=1, st=0, wi=1 LOP 3.172793
4 Direct+Indirect dt=1, st=1, wi=1 LOP 3.172793
5 Direct+Indirect dt=0, st=0, wi=0.25 Hillside 3.213978
6 Direct+Indirect dt=0, st=1, wi=0.25 Hillside 3.213978
7 Direct dt=2 Hillside 3.309455
8 Direct+Indirect dt=2, st=1, wi=1 LOP 3.311588
9 Direct+Indirect dt=2, st=0, wi=1 LOP 3.311588
10 Direct+Indirect dt=2, st=2, wi=0.5 Hillside 3.331698
Dominance Method Parameters Method MAE
1 Direct+Indirect dt=0, st=1, wi=1 Hillside 3.148221
2 Direct+Indirect dt=0, st=0, wi=1 Hillside 3.148221
3 Direct+Indirect dt=1, st=0, wi=1 LOP 3.172793
4 Direct+Indirect dt=1, st=1, wi=1 LOP 3.172793
5 Direct+Indirect dt=0, st=0, wi=0.25 Hillside 3.213978
6 Direct+Indirect dt=0, st=1, wi=0.25 Hillside 3.213978
7 Direct dt=2 Hillside 3.309455
8 Direct+Indirect dt=2, st=1, wi=1 LOP 3.311588
9 Direct+Indirect dt=2, st=0, wi=1 LOP 3.311588
10 Direct+Indirect dt=2, st=2, wi=0.5 Hillside 3.331698
[1]

Luke Finlay, Vladimir Gaitsgory, Ivan Lebedev. Linear programming solutions of periodic optimization problems: approximation of the optimal control. Journal of Industrial & Management Optimization, 2007, 3 (2) : 399-413. doi: 10.3934/jimo.2007.3.399

[2]

Braxton Osting, Jérôme Darbon, Stanley Osher. Statistical ranking using the $l^{1}$-norm on graphs. Inverse Problems & Imaging, 2013, 7 (3) : 907-926. doi: 10.3934/ipi.2013.7.907

[3]

Feng Luo. A combinatorial curvature flow for compact 3-manifolds with boundary. Electronic Research Announcements, 2005, 11: 12-20.

[4]

Ardeshir Ahmadi, Hamed Davari-Ardakani. A multistage stochastic programming framework for cardinality constrained portfolio optimization. Numerical Algebra, Control & Optimization, 2017, 7 (3) : 359-377. doi: 10.3934/naco.2017023

[5]

Diana Keller. Optimal control of a linear stochastic Schrödinger equation. Conference Publications, 2013, 2013 (special) : 437-446. doi: 10.3934/proc.2013.2013.437

[6]

Guillaume Bal, Wenjia Jing. Homogenization and corrector theory for linear transport in random media. Discrete & Continuous Dynamical Systems - A, 2010, 28 (4) : 1311-1343. doi: 10.3934/dcds.2010.28.1311

[7]

Nizami A. Gasilov. Solving a system of linear differential equations with interval coefficients. Discrete & Continuous Dynamical Systems - B, 2021, 26 (5) : 2739-2747. doi: 10.3934/dcdsb.2020203

[8]

Hong Seng Sim, Wah June Leong, Chuei Yee Chen, Siti Nur Iqmal Ibrahim. Multi-step spectral gradient methods with modified weak secant relation for large scale unconstrained optimization. Numerical Algebra, Control & Optimization, 2018, 8 (3) : 377-387. doi: 10.3934/naco.2018024

[9]

Abdulrazzaq T. Abed, Azzam S. Y. Aladool. Applying particle swarm optimization based on Padé approximant to solve ordinary differential equation. Numerical Algebra, Control & Optimization, 2021  doi: 10.3934/naco.2021008

[10]

Mohsen Abdolhosseinzadeh, Mir Mohammad Alipour. Design of experiment for tuning parameters of an ant colony optimization method for the constrained shortest Hamiltonian path problem in the grid networks. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 321-332. doi: 10.3934/naco.2020028

[11]

Reza Lotfi, Yahia Zare Mehrjerdi, Mir Saman Pishvaee, Ahmad Sadeghieh, Gerhard-Wilhelm Weber. A robust optimization model for sustainable and resilient closed-loop supply chain network design considering conditional value at risk. Numerical Algebra, Control & Optimization, 2021, 11 (2) : 221-253. doi: 10.3934/naco.2020023

[12]

Alexander A. Davydov, Massimo Giulietti, Stefano Marcugini, Fernanda Pambianco. Linear nonbinary covering codes and saturating sets in projective spaces. Advances in Mathematics of Communications, 2011, 5 (1) : 119-147. doi: 10.3934/amc.2011.5.119

[13]

W. Cary Huffman. On the theory of $\mathbb{F}_q$-linear $\mathbb{F}_{q^t}$-codes. Advances in Mathematics of Communications, 2013, 7 (3) : 349-378. doi: 10.3934/amc.2013.7.349

[14]

Arunima Bhattacharya, Micah Warren. $ C^{2, \alpha} $ estimates for solutions to almost Linear elliptic equations. Communications on Pure & Applied Analysis, , () : -. doi: 10.3934/cpaa.2021024

[15]

Misha Bialy, Andrey E. Mironov. Rich quasi-linear system for integrable geodesic flows on 2-torus. Discrete & Continuous Dynamical Systems - A, 2011, 29 (1) : 81-90. doi: 10.3934/dcds.2011.29.81

[16]

Fumihiko Nakamura. Asymptotic behavior of non-expanding piecewise linear maps in the presence of random noise. Discrete & Continuous Dynamical Systems - B, 2018, 23 (6) : 2457-2473. doi: 10.3934/dcdsb.2018055

[17]

Hirofumi Notsu, Masato Kimura. Symmetry and positive definiteness of the tensor-valued spring constant derived from P1-FEM for the equations of linear elasticity. Networks & Heterogeneous Media, 2014, 9 (4) : 617-634. doi: 10.3934/nhm.2014.9.617

[18]

Jong Yoon Hyun, Yoonjin Lee, Yansheng Wu. Connection of $ p $-ary $ t $-weight linear codes to Ramanujan Cayley graphs with $ t+1 $ eigenvalues. Advances in Mathematics of Communications, 2021  doi: 10.3934/amc.2020133

 Impact Factor: 

Article outline

Figures and Tables

[Back to Top]