• PDF
• Cite
• Share
Article Contents  Article Contents

# The rankability of weighted data from pairwise comparisons

• * Corresponding author: Langville
• In prior work , Anderson et al. introduced a new problem, the rankability problem, which refers to a dataset's inherent ability to produce a meaningful ranking of its items. Ranking is a fundamental data science task with numerous applications that include web search, data mining, cybersecurity, machine learning, and statistical learning theory. Yet little attention has been paid to the question of whether a dataset is suitable for ranking. As a result, when a ranking method is applied to a dataset with low rankability, the resulting ranking may not be reliable.

Rankability paper  and its methods studied unweighted data for which the dominance relations are binary, i.e., an item either dominates or is dominated by another item. In this paper, we extend rankability methods to weighted data for which an item may dominate another by any finite amount. We present combinatorial approaches to a weighted rankability measure and apply our new measure to several weighted datasets.

Mathematics Subject Classification: Primary: 90C08, 90C10, 52B12; Secondary: 90C35.

 Citation: • • Figure 1.  Cityplot of $8 \times 8$ data matrix with original ordering and hillside reordering

Figure 2.  Cityplots of $n = 8$ college football data matrices with the original ordering (left) and the optimal hillside reordering (right). The top row is the 2008 season, a less rankable season with hillside $\delta = 155$ and $\rho = 6$. The bottom row is the 2005 season, a more rankable season with hillside $\delta = 92$ and $\rho = 4$

Figure 3.  Approximate fractional matrix ${\bf X}^*({\bf r}, {\bf r})$ for Example 1 obtained by the Interior Point solver of the linear programming relaxation

Figure 4.  Spaghetti plots and summary of diversity of $P$ sets for Examples 1 and 2

Figure 5.  Two maximally discordant optimal solutions for Examples 1 and 2

Figure 6.  $X^*$ color-coded visualization of years 2009, 2013, and 2016 NCAA March Madness using the top performing LOP formulation

Figure 7.  $X^*$ color-coded visualization of years 2009, 2013, and 2016 NCAA March Madness using the top performing hillside formulation

Figure 8.  $X^*$ color-coded visualization for each year of NCAA March Madness using the top performing LOP formulation

Figure 9.  $X^*$ color-coded visualization for each year of NCAA March Madness using the top performing hillside formulation

Table 1.  Sample Input/Output economic data based on Japan 2005  (A) and its graphical representation in (B) Table 2.  Sample of movie rating data based on MovieLens  (A) and graphical representation of user rating data transformed into pairwise comparisons (B) Table 3.  Sample of NCAA Men's Basketball games is shown in (A) and the graphical representation of the aggregate dominance information, i.e., ${\bf D}$ is shown in (B) Table 4.  5 fold cross-validation results for predicting the upset measure for NCAA Men's March Madness 2002-2018

 Dominance Method Parameters Method MAE 1 Direct+Indirect dt=0, st=1, wi=1 Hillside 3.148221 2 Direct+Indirect dt=0, st=0, wi=1 Hillside 3.148221 3 Direct+Indirect dt=1, st=0, wi=1 LOP 3.172793 4 Direct+Indirect dt=1, st=1, wi=1 LOP 3.172793 5 Direct+Indirect dt=0, st=0, wi=0.25 Hillside 3.213978 6 Direct+Indirect dt=0, st=1, wi=0.25 Hillside 3.213978 7 Direct dt=2 Hillside 3.309455 8 Direct+Indirect dt=2, st=1, wi=1 LOP 3.311588 9 Direct+Indirect dt=2, st=0, wi=1 LOP 3.311588 10 Direct+Indirect dt=2, st=2, wi=0.5 Hillside 3.331698
• Figures(9)

Tables(4)

## Article Metrics  DownLoad:  Full-Size Img  PowerPoint