Rank aggregation methods for the Web

Cynthia Dwork¹, Ravi Kumar², Moni Naor³, D. Sivakumar² 1: Compaq Systems Research Center, Palo Alto, CA, USA. 2: IBM Almaden Research Center, San Jose, CA, USA. 3: Weizmann Institute of Science, Rehovot, Israel. (Visiting IBM Almaden and Stanford University)

Abstract

We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. We develop a set of techniques for the rank aggregation problem and compare their performance to that of well-known methods. A primary goal of our work is to design rank aggregation techniques that can effectively combat "spam," a serious problem in Web searches. Experiments show that our methods are simple, efficient, and effective.

Keywords: rank aggregation, ranking functions, meta-search, multi-word queries, spam

1 Introduction

The task of ranking a list of several alternatives based on one or more criteria is encountered in many situations. One of the underlying goals of this endeavor is to identify the best alternatives, either to simply declare them to be the best (e.g., in sports) or to employ them for some purpose. When there is just a single criterion (or "judge") for ranking, the task is relatively easy, and is simply a reflection of the judge's opinions and biases. (If simplicity were the only desideratum, dictatorship would prevail over democracy.) In contrast, this paper addresses the problem of computing a "consensus" ranking of the alternatives, given the individual ranking preferences of several judges. We call this the rank aggregation problem. Specifically, we study the rank aggregation problem in the context of the Web, where it is complicated by a plethora of issues. We begin by underscoring the importance of rank aggregation for Web applications and clarifying the various characteristics of this problem in the context of the Web. We provide the theoretical underpinnings for stating criteria for "good" rank aggregation techniques and evaluating specific proposals, and we offer novel algorithmic solutions. Our experiments provide initial evidence for the success of our methods, which we believe will significantly improve a variety of search applications on the Web.

1.1 Motivation

As of February 2001, there were at least 24 general-purpose search engines (see Search Engine Watch), as well as numerous special-purpose search engines. The very fact that there are so many choices is an indication that no single search engine has proven to be satisfactory for all Web users. There are a number of good reasons why this is the case, even if we restrict attention to search engines that are meant to be "general purpose." Two fairly obvious reasons are that no one ranking algorithm can be considered broadly acceptable and no one search engine is sufficiently comprehensive in its coverage of the Web. The issues, however, are somewhat deeper.

Firstly, there is the question of "spam" --- devious manipulation by authors of Web pages in an attempt to achieve undeservedly high rank. No single ranking function can be trusted to perform well for all queries. A few years ago, query term frequency was the single main heuristic in ranking Web pages; since the influential work of Kleinberg [Kleinberg 1998, 1999] and Brin and Page [Brin and Page 1998], link analysis has come to be identified as a very powerful technique in ranking Web pages and other hyperlinked documents. Several other heuristics have been added, including anchor-text analysis [Chakrabarti et al. 1998], page structure (headers, etc.) analysis, the use of keyword listings and the url text itself, etc. These well-motivated heuristics exploit a wealth of information, but are often prone to manipulation by devious parties.

Secondly, in a world governed by (frequently changing) commercial interests and alliances, it is not clear that users have any form of protection against the biases/interests of individual search engines. As a case in point, note that paid placement and paid inclusion (see [Article in Search Engine Watch]) appear to be gaining popularity among search engines.

In some cases, individual ranking functions are inadequate for a more fundamental reason: the data being ranked are simply not amenable to simple ranking functions. This is the case with querying about multimedia documents, e.g. "find a document that has information about Greek islands with pictures of beautiful blue beaches." This is a problem conventionally studied in database middleware (see [Fagin 1999]). Several novel approaches have been invented for this purpose, but this problem cannot be considered well-solved by any measure. Naturally, these problems fall under the realm of rank aggregation.

Thus, our first motivation for studying rank aggregation in the context of the Web is to provide users a certain degree of robustness of search, in the face of various shortcomings and biases --- malicious or otherwise --- of individual search engines. That is, to find robust techniques for meta-search.

There is a second very broad set of scenarios where rank aggregation is called for. Roughly described, these are the cases where the user preference includes a variety of criteria, and the logic of classifying a document as acceptable or unacceptable is too complicated or too nebulous to encode in any simple query form. As prototypical examples, we list some cases that Web users experience frequently. Broadly, these can be classified as multi-criteria selection and word association queries. Examples of multi-criteria selection arise when trying to choose a product from a database of products, such as restaurants or travel plans. Examples of word association queries arise when a user wishes to search for a good document on a topic; the user knows a list of keywords that collectively describe the topic, but isn't sure that the best document on the topic necessarily contains all of them. (See Section 5 for specific examples of both categories.) This is a very familiar dilemma for Web search users: when we supply a list of keywords to a search engine, do we ask for documents that contain all the keywords, or do we ask for documents that contain any of the keywords? Notice that the former may produce no useful document, or too few of them, while the latter may produce an enormous list of documents where it is not clear which one to choose as the best. We propose the following natural approach to this problem:

Associations Ranking
Rank the database with respect to several small subsets of the queries, and aggregate these rankings.

1.2 Challenges

The ideal scenario for rank aggregation is when each judge (search engine in the case of meta-search, individual criterion for multi-criteria selection, and subsets of queries in the case of word association queries) gives a complete ordering of all the alternatives in the universe of alternatives. This, however, is far too unrealistic for two main reasons.

The first reason is a particularly acute problem in doing meta-search: the coverage of various search engines is different; it is unlikely that all search engines will (eventually) be capable of ranking the entire collection of pages on the Web, which is growing at a very high rate. Secondly, search engines routinely limit access to about the first few hundreds of pages in their rank-ordering. This is done both to ensure the confidentiality of their ranking algorithm, and in the interest of efficiency. The issue of efficiency is also a serious bottleneck in performing rank aggregation for multi-criteria selection and word association queries.

Therefore, any method for rank aggregation for Web applications must be capable of dealing with the fact that only the top few hundred entries of each ranking are available. Of course, if there is absolutely no overlap among these entries, there isn't much any algorithm can do; the challenge is to design rank aggregation algorithms that work when there is limited but non-trivial overlap among the top few hundreds or thousands of entries in each ranking. Finally, in light of the amount of data, it is implicit that any rank aggregation method has to be computationally efficient.

1.3 Our results

We provide a mathematical setting in which to study the rank aggregation problem, and propose several algorithms. By drawing on the literature from social choice theory, statistics, and combinatorial optimization, we formulate precisely what it means to compute a good consensus ordering of the alternatives, given several (partial) rankings of the alternatives. Specifically, we identify the method of Kemeny, originally proposed in the context of social choice theory, as an especially desirable approach, since it minimizes the total disagreement (formalized below) between the several input rankings and their aggregation. Unfortunately, we show that computing optimal solutions based on Kemeny's approach is NP-hard, even when the number of rankings to be aggregated is only 4. Therefore, we provide several heuristic algorithms for rank aggregation and evaluate them in the context of Web applications. Besides the heuristics, we identify a crucial property of Kemeny optimal solutions that is particularly useful in combatting spam, and provide an efficient algorithm for minimimally modifying any initial aggregation so as to enjoy this property. This property is called the "extended Condorcet criterion," and we call the efficient process that is guaranteed to achieve it "local Kemenization."

Our algorithms for initial aggregation are based on two broad principles. The first principle is to achieve optimality not with respect to the Kemeny guidelines, but with respect to a different, closely related, measure, for which it is possible to find an efficient solution. The second principle is through the use of Markov chains as a means of combining partial comparison information --- derived from the individual rankings --- into a total ordering. While there is no guarantee on the quality of the output, the latter methods are extremely efficient, and usually match or outperform the first method.

We report experiments and quantitative measures of quality for the meta-search problem, and give several illustrations of our methods applied for the problems of spam resistance and word association queries.

1.4 Organization

We describe our framework, including the notions of ranking, distance measures, and optimal aggregation, in Section 2. This section also contains a brief description of concepts from graph theory and Markov chains that we need for this paper. Section 3 discusses spam, the extended Condorcet principle, and local Kemenization. Section 4 describes various rank aggregation methods, including the well-known Borda method and several other new methods. Section 5 presents five major applications of our methods and Section 6 presents an experimental study of some of them. Finally, Section 7 concludes the paper with some remarks on future work.

2 Preliminaries

2.1 Ranking

Given a universe U, an ordered list (or simply, a list) t with respect to U is an ordering (aka ranking) of a subset S of U, i.e., t = [x₁ > x₂ > ... > x_d], with each x_i in S, and > is some ordering relation on S. Also, if i in U is present in t, let t(i) denote the position or rank of i (a highly ranked or preferred element has a low-numbered position in the list). For a list t, let |t| denote the number of elements. By assigning a unique identifier to each element in U, we may assume without loss of generality that U = {1, 2, ..., |U|}.

Depending on the kind of information present in t, three situations arise:

(1) If t contains all the elements in U, then it is said to be a full list. Full lists are in fact total orderings (permutations) of U. For instance, if U is the set of all pages indexed by a search engine, it is easy to see that a full list emerges when we rank pages (say, with respect to a query) according to a fixed algorithm.

(2) There are situations where full lists are not convenient or even possible. For instance, let U denote the set of all Web pages in the world. Let t denote the results of a search engine in response to some fixed query. Even though the query might induce a total ordering of the pages indexed by the search engine, since the index set of the search engine is almost surely only a subset of U, we have a strict inequality |t| < |U|. In other words, there are pages in the world which are unranked by this search engine with respect to the query. Such lists that rank only some of the elements in U are called partial lists.

(3) A special case of partial lists is the following. If S is the set of all the pages indexed by a particular search engine and if t corresponds to the top 100 results of the search engine with respect to a query, clearly the pages that are not present in list t can be assumed to be ranked below 100 by the search engine. Such lists that rank only a subset of S and where it is implicit that each ranked element is above all unranked elements, are called top d lists, where d is the size of the list. A natural operation of projection will be useful. Given a list t and a subset T of the universe U, the projection of t with respect to T (denoted t_|T will be a new list that contains only elements from T. Notice that if t happens to contain all the elements in T, then t_|T is a full list with respect to T.

2.1.1 Distance Measures

How do we measure distance between two full lists with respect to a set S? Two popular distance measures (see [Diaconis 1988]) are: (1) The Spearman footrule distance is the sum, over all elements i in S, of the absolute difference between the rank of i according to the two lists. Formally, given two full lists s and t, their Spearman footrule distance is given by

F(s, t) = S _i |s(i) - t(i)|.

After dividing this number by the maximum value (1/2)|S|², one can obtain a normalized value of the footrule distance, which is always between 0 and 1. The footrule distance between two lists can be computed in linear time.

(2) The Kendall tau distance counts the number of pairwise disagreements between two lists; that is, the distance between two full lists s and t is

K(s, t) = |{(i, j) : i < j, s(i) < s(j) but t(i) > t(j)|.

Dividing this number by the maximum possible value (1/2)S(S - 1) we obtain a normalized version of the Kendall distance. The Kendall distance for full lists is the "bubble sort" distance, i.e., the number of pairwise adjacent transpositions needed to transform from one list to the other. The Kendall distance between two lists of length n can be computed in n log n time using simple data structures.

The above measures are metrics and extend in a natural way to several lists. Given several full lists s, t₁, ..., t_k, for instance, the normalized Footrule distance of s to t₁, ..., t_k is given by

F(s, t₁, ... t_k) = (1/k) S_i F(s, t_i).

One can define generalizations of these distance measures to partial lists. If t₁, ..., t_k are partial lists, let U denote the union of elements in t₁, ..., t_k, and let s be a full list with respect to U. Now, given s, the idea is to consider the distance between t_i and the projection of s with respect to t_i. Then, for instance, we have the induced footrule distance:

F(s, t₁, ..., t_k) = (1/k) S_i F(s _|_ti, t_i).

In a similar manner, induced Kendall tau distance can be defined. Finally, we define a third notion of distance that measures the distance between a full list and a partial list on the same universe:

(3) Given one full list and a partial list, the scaled footrule distance weights contributions of elements based on the length of the lists they are present in. More formally, if s is a full list and t is a partial list, then:

SF(s, t) = S_{i
in t} | (s(i)/|s|) - (t(i)/|t|) |.

We will normalize SF by dividing by |t|/2.

Note that these distances are not necessarily metrics.

To a large extent, our interpretations of experimental results will be in terms of these distance measures. While these distance measures seem natural, why these measures are good is moot. We do not delve into such discussions here; the interested reader can find such arguments in the books by Diaconis [Diaconis 1988], Critchlow [Critchlow 1985], or Marden [Marden].

2.1.2 Optimal rank aggregation

In the generic context of rank aggregation, the notion of "better" depends on what distance measure we strive to optimize. Suppose we wish to optimize Kendall distance, the question then is: given (full or partial) lists t₁, ..., t_k, find a s such that s is a full list with respect to the union of the elements of t₁, ..., t_k, and s minimizes K(s, t₁, ..., t_k). The aggregation obtained by optimizing Kendall distance is called Kemeny optimal aggregation and in a precise sense, corresponds to the geometric median of the inputs. We show that computing the Kemeny optimal aggregation is NP-Hard even when k = 4 (see Appendix B). (Note that in contrast to the social choice scenario where there are many voters and relatively few candidates, in the web aggregation scenario we have many candidates (pages) and relatively few voters (the search engines).)

Kemeny optimal aggregations have a maximum likelihood interpretation. Suppose there is an underlying "correct" ordering s of S, and each order t₁, ..., t_k is obtained from s by swapping two elements with some probability less than 1/2. Thus, the t's are "noisy" versions of s. A Kemeny optimal aggregation of t₁, ..., t_k, is one that is maximally likely to have produced the t's (it need not be unique) [Young 1988]. Viewed differently, Kemeny optimal aggregation has the property of eliminating noise from various different ranking schemes. Furthermore, Kemeny optimal aggregations are essentially the only ones that simultaneously satisfy natural and important properties of rank aggregation functions, called neutrality and consistency in the social choice literature, and the so-called Condorcet property [Young and Levenglick 1978]. Indeed, Kemeny optimal aggregations satisfy the extended Condorcet criterion. In Section 3 we establish a strong connection between satisfaction of the extended Condorcet criterion and fighting search engine "spam."

Given that Kemeny optimal aggregation is useful, but computationally hard, how do we compute it? The following relation shows that Kendall distance can be approximated very well via the Spearman footrule distance.

Proposition 1 [Diaconis-Graham]
For any two full lists s, t, K(s, t) < F(s, t) < 2K(s, t).

This leads us to the problem of footrule optimal aggregation. This is the same as before, except that the optimizing criterion is the footrule distance. In Section 4 we exhibit a polynomial time algorithm to compute optimal footrule aggregation (scaled footrule aggregation for partial lists). Therefore we have:

Proposition 2
If s is the Kemeny optimal aggregation of full lists t₁, ..., t_k, and s' optimizes the footrule aggregation, then K(s', t₁, ..., t_k) < 2K(s, t₁, ..., t_k).

Later, in Section 4, we develop rank aggregation methods that do not optimize any obvious criteria, but turn out to be very effective in practice.

2.2 Basic Notions

Readers familiar with the notions in graph theory and Markov chains can skip this section.

2.2.1 Some concepts from graph theory

A graph G = (V, E) consists of set of nodes V and a set of edges E. Each element e in E is an unordered pair (u, v) of incident nodes, representing a connection between nodes u and v. A graph is connected if the node set cannot be partitioned into components such that there are no edges whose incident nodes occur in different components.

A bipartite graph G = (U, V, E) consists of two disjoint sets of nodes U, V such that each edge e in E has one node from U and the other node from V. A bipartite graph is complete if each node in U is connected to every node in V. A matching is a subset of edges such that for each edge in the matching, there is no other edge that shares a node with it. A maximum matching is a matching of largest cardinality. A weighted graph is a graph with a (non-negative) weight for every edge e. Given a weighted graph, the minimum weight maximum matching is the maximum matching with minimum weight. The minimum weight maximum matching problem for bipartite graphs can be solved in time O(n^2.5), where n is the number of nodes.

A directed graph consists of nodes and edges, but this time an edge is an ordered pair of nodes (u, v), representing a connection from u to v. A directed path is said to exist from u to v if there is a sequence of nodes u = w₀, ..., w_k = v such that (w_i, w_i+1 is an edge, for all i = 0, ..., k - 1. A directed cycle is a non-trivial directed path from a node to itself. A strongly connected component of a graph is a set of nodes such that for every pair of nodes in the component, there is a directed path from one to the other. A directed acyclic graph (DAG) is a directed graph with no directed cycles. In a DAG, a sink node is one with no directed path to any other node.

2.2.2 Markov chains

A (homogeneous) Markov chain for a system is specified by a set of states S = {1, 2, ..., n } and an n by n non-negative, stochastic (i.e., the sum of each row is 1) matrix M. The system begins in some start state in S and at each step moves from one state to another state. This transition is guided by M: at each step, if the system is in state i, it moves to state j with probability M_ij. If the current state is given as a probability distribution, the probability distribution of the next state is given by the product of the vector representing the current state distribution and M. In general, the start state of the system is chosen according to some distribution x (usually, the uniform distribution) on S. After t steps, the state of the system is distributed according to xM^t. Under some niceness conditions on the Markov chain (whose details we will not discuss), irrespective of the start distribution x, the system eventually reaches a unique fixed point where the state distribution does not change. This distribution is called the stationary distribution. It can be shown that the stationary distribution is given by the principal left eigenvector y of M, i.e., yM = ly. In practice, a simple power-iteration algorithm can quickly obtain a reasonable approximation to y.

An important observation here is that the entries in y define a natural ordering on S. We call such an ordering, the Markov chain ordering of M. A technical point to note while using Markov chains for ranking is the following. A Markov chain M defines a weighted graph with n nodes such that the weight on edge (u, v) is given by M_uv. The strongly connected components of this graph form a DAG. If this DAG has a sink node, then the stationary distribution of the chain will be entirely concentrated in the strongly connected component corresponding to the sink node. In this case, we only obtain an ordering of the alternatives present in this component; if this happens, the natural extended procedure is to remove these states from the chain and repeat the process to rank the remaining nodes. Of course, if this component has sufficiently many alternatives, one may stop the aggregation process and output a partial list containing some of the best alternatives. If the DAG of connected components is (weakly) connected and has more than one sink node, then we will obtain two or more clusters of alternatives, which we could sort by the total probability mass of the components. If the DAG has several weakly connected components, we will obtain incomparable clusters of alternatives. Thus, when we refer to a Markov chain ordering, we refer to the ordering obtained by this extended procedure.

3 Spam resistance and Condorcet Criteria

In 1785 Marie J. A. N. Caritat, Marquis de Condorcet, proposed that if there is some element of S, now known as the Condorcet alternative, that defeats every other in pairwise simple majority voting, then that this element should be ranked first [Condorcet 1785]. A natural extension, due to Truchon [Truchon 1998] (see also [Smith 1973]), mandates that if there is a partition (C, D) of S such that for any x in C and y in D the majority prefers x to y, then x must be ranked above y. This is called the extended Condorcet criterion (ECC). We will show that not only can the ECC be achieved efficiently, but it also has excellent "spam-fighting" properties when used in the context of meta-search.

Intuitively, a search engine has been spammed by a page in its index, on a given query, if it ranks the page "too highly" with respect to other pages in the index, in the view of a "typical" user. Indeed, in accord with this intuition, search engines are both rated (see [Media Metrix]) and trained by human evaluators. This approach to defining spam: (1) permits an author to raise the rank of her page by improving the content; (2) puts ground truth about the relative value of pages into the purview of the users --- in other words, the definition does not assume the existence of an absolute ordering that yields the "true" relative value of a pair of pages on a query; (3) does not assume unanimity of users' opinions or consistency among the opinions of a single user; and (4) suggests some natural ways to automate training of engines to incorporate useful biases, such as geographic bias.

We believe that reliance on evaluators in defining spam is unavoidable. (If the evaluators are human, the typical scenario during the design and training of search engines, then the eventual product will incorporate the biases of the training evaluators.) We model the evaluators by the search engine ranking functions. That is, we make the simplifying assumption that for any pair of pages, the relative ordering by the majority of the search engines comparing them is the same as the relative ordering by the majority of the evaluators. Our intuition is that if a page spams all or even most search engines for a particular query, then no combination of these search engines can defeat the spam. This is reasonable: Fix a query; if for some pair of pages a majority of the engines is spammed, then the aggregation function is working with overly bad data --- garbage in, garbage out. On the other hand, if a page spams strictly fewer than half the search engines, then a majority of the search engines will prefer a "good" page to a spam page. In other words, under this definition of spam, the spam pages are the Condorcet losers, and will occupy the bottom partition of any aggregated ranking that satisfies the extended Condorcet criterion. Similarly, assuming that good pages are preferred by the majority to mediocre ones, these will be the Condorcet winners, and will therefore be ranked highly.

Many of the existing aggregation methods (see Section 4) do not ensure the election of the Condorcet winner, should one exist. Our aim is to obtain a simple method of modifying any initial aggregation of input lists so that the Condorcet losers (spam) will be pushed to the bottom of the ranking during this process. This procedure is called local Kemenization and is described next.

3.1 Local Kemenization

We introduce the notion of a locally Kemeny optimal aggregation, a relaxation of Kemeny optimality, that ensures satisfaction of the extended Condorcet principle and yet remains computationally tractable. As the name implies, local Kemeny optimal is a "local" notion that possesses some of the properties of a Kemeny optimal aggregation.

A full list p is a locally Kemeny optimal aggregation of partial lists t₁, t₂, ..., t_k, if there is no full list p' that can be obtained from p by performing a single transposition of an adjacent pair of elements and for which K(p', t₁, t₂, ..., t_k) < K(p, t₁, t₂, ..., t_k). In other words, it is impossible to reduce the total distance to the p's by flipping an adjacent pair.

Every Kemeny optimal aggregation is also locally Kemeny optimal, but the converse is false. Nevertheless, we show that a locally Kemeny optimal aggregation satisfies the extended Condorcet property and can be computed (see Appendix A) in time O(kn log n).

We have discussed the value of the extended Condorcet criterion in increasing resistance to search engine spam and in ensuring that elements in the top partitions remain highly ranked. However, specific aggregation techniques may add considerable value beyond simple satisfaction of this criterion; in particular, they may produce good rankings of alternatives within a given partition (as noted above, the extended Condorcet criterion gives no guidance within a partition). We now show how, using any initial aggregation m of partial lists t₁, t₂, ..., t_k --- one that is not necessarily Condorcet --- we can efficiently construct a locally Kemeny optimal aggregation of the t's that is in a well-defined sense maximally consistent with m. For example, if the t's are full lists then m could be the Borda ordering on the alternatives (see Section 4.1 for Borda's method). Even if a Condorcet winner exists, the Borda ordering may not rank it first. However, by applying our "local Kemenization" procedure (described below), we can obtain a ranking that is maximally consistent with the Borda ordering but in which the Condorcet winners are at the top of the list.

A local Kemenization (LK) of a full list m with respect to t₁, ..., t_k is a procedure that computes a locally Kemeny optimal aggregation of t₁, ..., t_k that is (in a precise sense) maximally consistent with m. Intuitively, this approach also preserves the strengths of the initial aggregation m. Thus:

(1) the Condorcet losers receive low rank, while the Condorcet winners receive high rank (this follows from local Kemeny optimality)

(2) the result disagrees with m on the order of any given pair (i,j) of elements only if a majority of those t's expressing opinions disagrees with m on (i,j).

(3) for every d between 1 and |m|, the length d prefix of the output is a local Kemenization of the top d elements in m.

Thus, if m is an initial meta-search result, and we have some faith that the top, say, 100 elements of m contain enough good pages, then we can build a locally Kemeny optimal aggregation of the projections of the t's onto the top 100 elements in m.

The local Kemenization procedure is a simple inductive construction. Without loss of generality, let m = (1, 2, ..., |m|). Assume inductively for that we have constructed p, a local Kemenization of the projection of the t's onto the elements 1, ..., l-1. Insert element l into the lowest-ranked "permissible" position in p: just below the lowest-ranked element y in p such that (a) no majority among the (original) t's prefers x to y and (b) for all successors z of y in p there is a majority that prefers x to z. In other words, we try to insert x at the end (bottom) of the list p; we bubble it up toward the top of the list as long as a majority of the t's insists that we do.

A rigorous treatment of local Kemeny optimality and local Kemenization is given in Appendix A, where we also show that the local Kemenization of an aggregation is unique. On the strength of these results we suggest the following general approach to rank aggregation:

Given t₁, ..., t_k, use your favorite aggregation method to obtain a full list m.
Output the (unique) local Kemenization of m with respect to t₁, ..., t_k

4 Rank aggregation methods

4.1 Borda's method

Borda's method [Borda 1781] is a "positional" method, in that it assigns a score corresponding to the positions in which a candidate appears within each voter's ranked list of preferences, and the candidates are sorted by their total score. A primary advantage of positional methods is that they are computationally very easy: they can be implemented in linear time. They also enjoy the properties called anonymity, neutrality, and consistency in the social choice literature [Young 1974]. However, they cannot satisfy the Condorcet criterion. In fact, it is possible to show that no method that assigns a weights to each position and then sorts the results by applying a function to the weights associated with each candidate satisfies the Condorcet criterion (see [Young 1974]).

Full lists. Given full lists t₁, ..., t_k, for each candidate c in S and list t_i, Borda's method first assigns a score B_i(c) = the number of candidates ranked below c in t_i, and the total Borda score B(c) is defined as S_i B_i(c). The candidates are then sorted in decreasing order of total Borda score.

We remark that Borda's method can be thought of as assigning a k-element position vector to each candidate (the positions of the candidate in the k lists), and sorting the candidates by the L₁ norm of these vectors. Of course, there are plenty of other possibilities with such position vectors: sorting by L_p norms for p > 1, sorting by the median of the k values, sorting by the geometric mean of the k values, etc. This intuition leads us to several Markov chain based approaches, described in Section 4.3.

Partial lists. It has been proposed (e.g., in a recent article that appeared in The Economist [Saari, 2000]) that the right way to extend Borda to partial lists is by apportioning all the excess score equally among all unranked candidates. This idea stems from the goal of being unbiased; however, it is easy to show that for any method of assigning scores to unranked candidates, there are partial information cases in which undesirable outcomes occur.

4.2 Footrule and scaled footrule

Since the footrule optimal aggregation is a good approximation of Kemeny optimal aggregation (by Proposition 2), it merits investigation.

Full lists. Footrule optimal aggregation is related to the median of the values in a position vector:

Proposition 3
Given full lists t₁, ..., t_k, if the median positions of the candidates in the lists form a permutation, then this permutation is a footrule optimal aggregation.

Now we obtain an algorithm for footrule optimal aggregation via the following proposition:

Proposition 4
Footrule optimal aggregation of full lists can be computed in polynomial time, specifically, the time to find a minimum cost perfect matching in a bipartite graph.

Proof (Sketch):
Let the union of t₁, ..., t_k be S with n elements. Now, we define a a weighted complete bipartite graph (C, P, W) as follows. The first set of nodes C = {1, ..., n} denotes the set of elements to be ranked (pages). The second set of nodes P = {1, ..., n} denotes the n available positions. The weight W(c, p) is the total footrule distance (from the t_i's) of a ranking that places element c at position p, given by W(c, p) = S_i |t_i(c) - p|. It can be shown that a permutation minimizing the total footrule distance to the t_i's is given by a minimum cost perfect matching in the bipartite graph.

Partial lists. The computation of a footrule-optimal aggregation for partial lists is more problematic. In fact, it can be shown to be equivalent to the NP-hard problem of computing the minimum number of edges to delete to convert a directed graph into a DAG.

Keeping in mind that footrule optimal aggregation for full lists can be recast as a minimum cost bipartite matching problem, we now describe a method that retains the computational advantages of the full list case, and is reasonably close to it in spirit. We define the bipartite graph as before, except that the weights are defined differently. The weight W(c, p) is the scaled footrule distance (from the t_i's) of a ranking that places element c at position p, given by W(c, p) = S_i | (t_i(c) /|t_i|) - (p/n) |. As before, we can solve the minimum cost maximum matching problem on this bipartite graph to obtain the footrule aggregation algorithm for partial lists. We called this method the scaled footrule aggregation (SFO).

4.3 Markov chain methods

We propose a general method for obtaining an initial aggregation of partial lists, using Markov chains. The states of the chain correspond to the n candidates to be ranked, the transition probabilities depend in some particular way on the given (partial) lists, and the Markov chain ordering is the aggregated ordering. There are several motivations for using Markov chains:

Handling partial lists and top d lists
Rather than require every pair of pages (candidates) i and j to be compared by every search engine (voter), we may now use the the available comparisons between i and j to determine the transition probability between i and j, and exploit the connectivity of the chain to (transitively) "infer" comparison outcomes between pairs that were not explicitly ranked by any of the search engines. The intuition is that Markov chains provide a more holistic viewpoint of comparing all n candidates against each other --- significantly more meaningful than ad hoc and local inferences like "if a majority prefer A to B and a majority prefer B to C, then A should be better than C."

Handling uneven comparisons
If a Web page P appears in the bottom half of about 70% of the lists, and is ranked Number 1 by the other 30%, how important is the quality of the pages that appear on the latter 30% of the lists? If these pages all appear near the bottom on the first set of 70% of the lists and the winners in these lists were not known to the other 30% of the search engines that ranked P Number 1, then perhaps we shouldn't consider P too seriously. In other words, if we view each list as a tournament within a league, we should take into account the strength of the schedule of matches played by each player. The Markov chain solutions we discuss are similar in spirit to the approaches considered in the mathematical community for this problem (eigenvectors of linear maps, fixed points of nonlinear maps, etc.).

Enhancements of other heuristics
Heuristics for combining rankings are motivated by some underlying principle. For example, Borda's method is based on the idea "more wins is better." This gives some figure of merit for each candidate. It is natural to extend this and say "more wins against good players is even better," and so on, and iteratively refine the ordering produced by a heuristic. In the context of Web searching, the HITS algorithm of Kleinberg [Kleinberg 1998, 1999] and the PageRank algorithm of Brin and Page [Brin and Page 1998] are motivated by similar considerations. As we will see, some of the chains we propose are natural extensions (in a precise sense) of Borda's method, sorting by geometric mean, and sorting by majority.

Computational efficiency
In general, setting up one of these Markov chains and determining its stationary probability distribution takes about q(n²k + n³) time. However, in practice, if we explicitly compute the transition matrix in O(n²k) time, a few iterations of the power method will allow us to compute the stationary distribution. In fact, we suggest an even faster method for practical purposes. For all of the chains that we propose, with about O(nk) (linear in input size) time for preprocessing, it is usually possible to simulate one step of the chain in O(k) time; thus by simulating the Markov chain for about O(n) steps, we should be able to sample from the stationary distribution pretty effectively. This is usually sufficient to identify the top few candidates in the stationary distribution in O(nk) time, perhaps considerably faster in practice.

We now propose some specific Markov chains, denoted MC1, MC2, MC4 and MC4. For each of these chains, we specify the transition matrix and give some intuition as to why such a definition is reasonable. In all cases, the state space is the union of the sets of pages ranked by various search engines.

MC1
If the current state is page P, then the next state is chosen uniformly from the multiset of all pages that were ranked higher than (or equal to) P by some search engine that ranked P, that is, from the multiset of all pages Q such that t_i(Q) at most t_i(P). The main idea is that in each step, we move from the current page to a better page, allowing about 1/j probability of staying in the same page, where j is roughly the average rank of the current page.

MC2
If the current state is page P, then the next state is chosen by first picking a ranking t uniformly from all the partial lists t₁, ..., t_k containing P, then picking a page Q uniformly from the set of all pages Q such that t(Q) is at most t(P).
This chain takes into account the fact that we have several lists of rankings, not just a collection of pairwise comparisons among the pages. As a consequence, MC2 is arguably the most representative of minority viewpoints of sufficient statistical significance; it also protects specialist views. In fact, MC2 generalizes the geometric mean analogue of Borda's method. For full lists, if the initial state is chosen uniformly at random, after one step of MC2, the distribution induced on its states produces a ranking of the pages such that P is ranked higher than (preferred to) Q iff the geometric mean of the ranks of P is lower than the geometric mean of the ranks of Q.

MC3
If the current state is page P, then the next state is chosen as follows: first pick a ranking t uniformly from all the partial lists t₁, ..., t_k containing P, then uniformly pick a page Q that was ranked by t. If t(Q) < t(P) then go to Q, else stay in P.
This chain is a generalization of Borda method. For full lists, if the initial state is chosen uniformly at random, after one step of MC3, the distribution induced on its states produces a ranking of the pages such that P is ranked higher than Q iff the Borda score of P is higher than the Borda score of Q. This is natural, considering that in any state P, the probability of staying in P is roughly the fraction of pairwise contests (with all other pages) that P won, which is a very Borda-like measure.

MC4
If the current state is page P, then the next state is chosen as follows: first pick a page Q uniformly from the union of all pages ranked by the search engines. If t(Q) < t(P) for a majority of the lists t that ranked both P and Q, then go to Q, else stay in P.
This chain generalizes Copeland's suggestion of sorting the candidates by the number of pairwise majority contests they have won [Copeland 1951].

There are examples that differentiate the behavior of these chains. One can also show that the Markov ordering implied by these chains need not satisfy the extended Condorcet principle.

5 Applications

We envisage several applications of our rank aggregation methods in the context of searching and retrieval in general, and the Web in particular. We present five major applications of our techniques in the following sections.

5.1 Meta-search

Meta-search is the problem of constructing a meta-search engine, which uses the results of several search engines to produce a collated answer. Several meta-search engines exist (e.g., [Metacrawler]) and many Web users build their own meta-search engines. As we observed earlier, the problem of constructing a good meta-search engine is tantamount to obtaining a good rank aggregation function for partial and top d lists. Given the different crawling strategies, indexing policies, and ranking functions employed by different search engines, meta-search engines are useful in many situations.

The actual success of a meta-search engine directly depends on the aggregation technique underlying it. Since the techniques proposed in Section 4 work on partial lists and top d lists, they can be applied to build a meta-search engine. The idea is simple: given a query, obtain the top (say) 100 results from many search engines, apply the rank aggregation function with the universe being the union of pages returned by the search engines, and return the top (say) 100 results of the aggregation. We illustrate this scheme in Section 6.2.1 and examine the performance of our methods.

5.2 Aggregating ranking functions

Given a collection of documents, the problem of indexing is: store the documents in such a manner that given a search term, those most relevant to the search term can be retrieved easily. This is a classic information retrieval problem and reasonably well-understood for static documents (see [Salton 1989]). When the documents are hypertext documents, however, indexing algorithms could exploit the latent relationship between documents implied by the hyperlinks. On the Web, such an approach has already proved tremendously successful [Kleinberg 1998, 1999], [Chakrabarti et al. 1998], [Brin and Page 1998].

One common technique for indexing is to construct a ranking function. With respect to a query, a ranking function can operate in two ways: (i) it can give an absolute score to a document indicating the relevance of the document to the query (score-based) or (ii) it can take two documents and rank order them with respect to the query (comparison-based). Based on the underlying methodology used, many competing ranking functions can be obtained. For instance, term-counting yields a simple ranking function. Another ranking function might be the consequence of applying the vector-space model and an appropriate distance measure to the document collection. Yet other ranking functions might be the ones implied by PageRank [Brin and Page 1998] and Clever [Kleinberg 1998, 1999], [Chakrabarti et al. 1998]. It is important to note that if the ranking function is score-based, the ordering implied by the scores makes more sense than the actual scores themselves, which are often either meaningless or inaccurate. And, for a particular ranking function and a query, it is often easier to return the top few documents relevant to the query than to rank the entire document base.

Given many ranking functions for a single document base, we have the case of top d lists, where d is the number of documents returned by each of the ranking functions. Our techniques can be applied to obtain a good aggregation of these ranking functions. Notice that we give equal weight to all the ranking functions, but this could be easily modified if necessary.

Such rank aggregation may be useful in other domains as well: many airline reservation systems suffer from lack of ability to express preferences. If the system is flexible enough to let the user specify various preference criteria (travel dates/times, window/aisle seating, number of stops, frequent-flier preferences, refundable/non-refundable nature of ticket purchase, and of course, price), it can rank the available flight plans based on each of the criteria, and apply rank aggregation methods to give better quality results to the user. Similarly, in the choice of restaurants from a restaurant database, users might rank restaurants based on several different criteria (cuisine, driving distance, ambiance, star-rating, dollar-rating, etc.). In both examples, users might be willing to compromise one or more of these criteria, provided there is a clear benefit with respect to the others. In fact, very often there is not even a clear order of importance among the criteria. A good aggregation function is a very effective way to make a selection in such cases.

5.3 Spam reduction

As we discussed earlier, the extended Condorcet principle is a reasonable cure for spam. Using the technique of local Kemenization, it is easy to take any rank aggregation method and tweak its output to make it satisfy the extended Condorcet principle. In fact, we suggest this as a general technique to reduce spam in search engines or meta-search engines: apply a favorite rank aggregation to obtain an initial ranking and then apply local Kemenization. This extra step is inexpensive in terms of computation cost, but has the benefit of reducing spam by ranking Condorcet losers below Condorcet winners. Again, we illustrate this application in Section 6.2.2 by examples.

5.4 Word association techniques

Different search engines and portals have different (default) semantics of handling a multi-word query. For instance, Altavista seems to use the OR semantics (it is enough for a document to contain one of the given query terms to be considered) while Google seems to use the AND semantics (it is mandatory for all the query words to appear in a document for it to be considered). As discussed in Section 1.2, both these scenarios are inconvenient in many situations.

Many of these tasks can be accomplished by a complicated Boolean query (via advanced query), but we feel that it is unreasonable to expect an average Web user to subscribe to this. Note also that simply asking for documents that contain as many of the keywords as possible is not necessarily a good solution: the best document on the topic might have only three of the keywords, while a spam document might well have four keywords. As a specific motivating example, consider searching for the job of a software engineer from an on-line job database. The user lists a number of skills and a number of potential keywords in the job description, for example, "Silicon Valley C++ Java CORBA TCP-IP algorithms start-up pre-IPO stock options". It is clear that the "AND" rule might produce no document, and the "OR" rule is equally disastrous.

We propose a word association scheme to handle these situations. Given a set of query words w₁, ..., w_l, we propose to construct several (say, k) sub-queries which are subsets of the original query words. We query the search engine with these k sub-queries (using the AND semantics) and obtain k top d (say, d = 100) results for each of the sub-queries. Then, we can use the methods in Section 3 and Section 4 to obtain a locally Kemenized aggregation of the top d lists and output this as the final answer corresponding to the multi-word query. By examples, we illustrate this application in Section 6.2.3.

Where do the words come from?
One way to obtain such a set of query words is to prompt the user to associate as many terms as possible with the desired response. This might be too taxing on a typical user. A less demanding way is to let the user highlight some words in a current document; the search term are then extracted from the "anchor text," i.e., the words around the selected words.

5.5 Search engine comparison

Our methods also imply a natural way to compare the performance of various search engines. The main idea is that a search engine can be called good when it behaves like a least noisy expert for a query. In other words, a good search engine is one that is close to the aggregated ranking. This agrees with our earlier notion of what an expert is and how to deal with noisy experts. Thus, the procedure to rank the search engines themselves (with respect to a query) is as follows: obtain a rank aggregation of the results from various search engines and rank the search engines based on their (Kendall or footrule) distance to the aggregated ranking.

6 Experiments and Results

6.1 Infrastructure

We conducted three types of experiments. The first experiment is to build a meta-search engine using different aggregation methods (Section 4) and compare their performances. The second experiment is to illustrate the effect of our techniques in combating spam. The third experiment is to illustrate the technique of word association for multi-word queries. While we provide numerical values for the first experiment, we provide actual examples for the second and third experiments.

We use the following seven search engines: Altavista (AV), Alltheweb (AW), Excite (EX), Google (GG), Hotbot (HB), Lycos (LY), and Northernlight (NL). For each of the search engines, we focused only on the top 100 queries. Our distance measurements are with respect to union of the top 100 results from these search engines.

For measuring the performance of our methods (first experiment), we selected the following 38 general queries (these queries are a superset of the 28 queries used in several earlier papers [Bharat and Henzinger 1998], [Chakrabarti et al. 1998]). For the second experiment, we pick some queries that were spammed in popular search engines. For the third experiment, we pick multi-word queries that perform poorly with existing search engines. Our notion of two URLs being identical is purely syntactic (up to some canonical form); we do not use the content of page to determine if two URLs are identical.

6.2 Results

6.2.1 Meta-Search

The queries we picked for our experiment are:

affirmative action, alcoholism, amusement parks, architecture, bicycling, blues, cheese, citrus groves, classical guitar, computer vision, cruises, Death Valley, field hockey, gardening, graphic design, Gulf war, HIV, java, Lipari, lyme disease, mutual funds, National parks, parallel architecture, Penelope Fitzgerald, recycling cans, rock climbing, San Francisco, Shakespeare, stamp collecting, sushi, table tennis, telecommuting, Thailand tourism, vintage cars, volcano, zen buddhism, and Zener.

The average intersection in the top 100 for any pair of search engines is given in Table 1, which shows the number of pages as a function of number of search engines in which they are present. For instance, the fourth column in the table means that 27.231 pages (on average) were present in exactly three of the search engine results. The second column indicates that around 284 pages were present in only one search engine while the last column indicates that less than 2 pages were present in all the search engines.

Table 1:
Overlap among 7 search engine results.

# engines	1	2	3	4	5	6	7
# pages	284.5	84.0	27.2	12.9	8.1	4.7	1.8

The results of our first experiment are presented in Table 2. The performance is calculated in terms of the three distance measures described in Section 2.1.1. Each row corresponds to a method presented in Section 4. Local Kemenization (LK) was applied to the result of each of these methods.

Table 2:
Performance of various rank aggregation methods for meta-search.
LK denotes Local Kemenization.

	Kendall Tau	Kendall Tau	Induced Footrule	Induced Footrule	Scaled Footrule	Scaled Footrule
	No LK	With LK	No LK	With LK	No LK	With LK
Borda	0.221	0.214	0.353	0.345	0.440	0.438
SFO	0.112	0.111	0.168	0.167	0.137	0.137
MC1	0.133	0.130	0.216	0.213	0.292	0.291
MC2	0.131	0.128	0.213	0.210	0.287	0.286
MC3	0.116	0.114	0.186	0.183	0.239	0.239
MC4	0.105	0.104	0.151	0.149	0.181	0.181

6.2.2 Spam reduction

In the following we present anecdotal evidence of spam reduction by our methods. We use the following queries: Feng Shui, organic vegetables, gardening. For each of these queries, we look at the (top) pages that we consider spam. Notice that our definition of spam does not mean evil! --- it is just that in our opinion, these pages obtained an undeservedly high rank from one or more search engines. It is easy to find URLs that spammed a single search engine. On the other hand, we were interested in URLs that spammed at least two search engines --- given that the overlap among search engines was not very high, this proved to be a challenging task. Table 3 presents our examples: the entries are the rank within individual search engines' lists. A blank entry in the table indicates that the url was not returned as one of the top 100 by the search engine. Based on results from Section 6.2.1, we restrict our attention to SFO and MC4 with local Kemenization.

url	AV	AW	GG	HB	LY	NL	SFO	MC4
`www.lucky-bamboo.com`	4	43			41		144	63
`www.cambriumcrystals.com`		9	51		5		31	59
`www.luckycat.com`	11	14	26		13		49	36
`www.davesorganics.com`	84	19	1		17		77	93
`www.frozen.ch`		9		63	11		49	121
`www.eonseed.com`		18		6	16		23	66
`www.augusthome.com`	26	16		27	12	16	57	54
`www.taunton.com`		25			21		78	67
`www.egroups.com`		34			29		108	101

6.2.3 Word associations

We use Google to perform our experiments on word associations. As noted earlier, Google uses AND semantics and hence for many interesting multi-word queries, the number or the quality of the pages returned is not very high. On the other hand, the fact that it uses the AND semantics is convenient to work with, when we supply small subsets of a multi-word query, in accordance to the word association rule described earlier. The queries, the top 5 results from Google and some of the top results from SFO and MC4 (after local Kemenization) are shown in Table 4, Table 5, and Table 6. We chose every pair of terms in the multi-word query to construct several lists and the apply rank aggregation (SFO and MC4) to these lists.

Google

raphael.math.uic.edu/jeremy/poetry.htm
www-groups.dcs.st-and.ac.uk/history/Search/historysearch.cgi?
www.galstar.com/ichudov/ppl/ap/File1993_11-12.html
www.mathe.tu-freiberg.de/hebisch/cafe/lebensdaten.html
www.math.snu.ac.kr/mhkim/a-fermat2.html

SFO with LK

www.ams.org/bookstore
www.mathsoft.com/asolve/constant/hrdyltl/goldbach.html
www.omega23.com/mathematics_physics_papers/Riemann.html
www.omega23.com/mathematics_physics_papers/Poincare.html
www.informatik.uni-giessen.de/staff/richstein/ca/Goldbach.html

MC4 with LK

www.omega23.com/mathematics_physics_papers/Poincare.html
www.ams.org/bookstore
www.mbay.net/cgd/flt/flt01.htm
mathpuzzle.com/riemann.html
rendezvous.com/tangledweb/conferences/fermat/romance.html

Google

www.mssrf.org/Fris9809/location-tamilnadu.html
www.indiaplus.com/Info/schools.html
www.focustamilnadu.com/tamilnadu/Policy%20Note ...Forests.html
www.tn.gov.in/policy/environ.htm
www.indiacolleges.com/Tamil_Nadu.htm

SFO with LK

www.madurai.com
www.ozemail.com.au/clday/locations.htm
www.utoledo.edu/homepages/speelam/coimbatore.html
www.ozemail.com.au/clday/madras.htm
www.madurai.com/around.htm
www.indiatraveltimes.com/tamilnadu/tamil1.html

MC4 with LK

www.madurai.com
www.surfindia.com/omsakthi/tourism.htm
www.indiatraveltimes.com/tamilnadu/tamil1.html
www.indiatraveltimes.com/tamilnadu/tamil2.html
www.indiatravels.com/forts/vellore_fort.htm
www.india-tourism.de/english/south/tamil_nadu.html

Google

www.nautica.it/charter/eolie__e.htm
rozzo.tripod.com/filicudi1.htm
www.corfu1.com/b33.htm
circle.kenyon.edu/publications/bulletin/19_4/kcabmill.htm
www.netnet.it/english/Salina/index
www.initaly.com/regions/seaside/diving.htm

SFO with LK

www.cs.brown.edu/people/sml/lipari.scenes.html
www.gogreece.about.com/cs/hydrofoilsferries
pharma.unime.it/rdpa/Link/HowReachlipar.htm
www.travel-italy.com/library/feature_articles/aeolian_islands_lipari.html
www.travel-italy.com/library/feature_articles/aeolian_islands_stromboli.html
www.gardenroute.co.za/strombolis
www.travel-italy.com/library/feature_articles/aeolian_island_an_intro.html

MC4 with LK

www.cs.brown.edu/people/sml/lipari.scenes.html
www.gogreece.about.com/cs/hydrofoilsferries
www.travel-italy.com/library/feature_articles/aeolian_islands_stromboli.html
www.gate1travel.com/Destinations/Greece/Misc/gre_ferry.htm
www.travel-italy.com/library/feature_articles/aeolian_islands_lipari.html
www.participez.com/reports/recits/vulcano.htm
www.travel-italy.com/library/feature_articles

6.3 Discussion

Of all the methods, MC4 outperforms all others. In fact, it beats Borda by a huge margin. This is very interesting since Borda's method is the usual choice of aggregation, and perhaps the most natural. Scaled footrule and MC3 (a generalization of Borda) seem to be on par. Recall that the footrule procedure for partial lists was only a heuristic modification of the footrule procedure for full lists. The above experimental evidence suggests that this heuristic is very good. MC1 and MC2 are always worse than the other Markov chains, but they are strictly better than Borda.

In general, local Kemenization seems to improve around 1-3% in terms of the distance measures. It can be shown formally that local Kemenization never does worse in the sense that the Kendall distance never deteriorates after local Kemenization. Interestingly, this seems to be true even for footrule and scaled footrule distances (although we don't know if this true always). We conclude that local Kemenization procedure is always worth applying: either the improvement is large and if not, then the time spent is small.

Examining the results in Section 6.2.2, we see that SFO and MC4 are quite effective in combating spam. While we do not claim that our methods completely eliminate spam, our study shows that they reduce spam in general.

The results in Section 6.2.3 shows that our technique of word association combined with rank aggregation methods can improve the quality of search results for multi-word queries. In each of the three examples presented, Google typically produced a total of only around 10--15 pages, and the top 5 results were often very poor (a direct consequence of the AND semantics employed for a long list). In sharp contrast, the URLs produced by the rank aggregation methods turned out to contain a wealth of information about the topic of the query.

7 Conclusions and further work

We have developed the theoretical groundwork for describing and evaluating rank aggregation methods. We have proposed and tested several rank aggregation techniques. Our methods have the advantage of being applicable in a variety of contexts and try to use as much information as available. The methods also turn out to be simple to implement, do not have any computational overhead, and out-perform popular classical methods like Borda's method. We have established the value of the extended Condorcet criterion in the context of meta-search, and have described a simple process, local Kemenization, for ensuring satisfaction of this criterion.

Further work involves trying to obtain a qualitative understanding of why the Markov chain methods perform very well. Also, it will be interesting to measure the efficacy of our methods on a document base with several competing ranking functions. Finally, this work originated in conversations with Helen Nissenbaum on bias in searching. A formal treatment of bias seems difficult but alluring.

References

1.: Search Engine Watch
2.: Search Engine Watch Article
3.: Metacrawler
4.: K. Bharat and M. Henzinger.
5.: J. J. Bartholdi, C. A. Tovey, and M. A. Trick.
6.: J. C. Borda.
7.: S. Brin and L. Page.
8.: S. Chakrabarti, B. Dom, D. Gibson, R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins.
9.: M.-J. Condorcet.
10.: A. H. Copeland.
11.: D. E. Critchlow.
12.: P. Diaconis.
13.: P. Diaconis and R. Graham.
14.: G. Even, J. Naor, B. Schieber, and M. Sudan.
15.: R. Fagin.
16.: J. Kleinberg.
17.: J. I. Marden.
18.: Media Metrix search engine ratings.
19.: D. G. Saari.
20.: G. Salton.
21.: J. H. Smith.
22.: M. Truchon.
23.: H. P. Young.
24.: H. P. Young.
25.: H. P. Young and A. Levenglick.

Appendix A: Local Kemenization

We begin with a formal definition:

Definition 5
A permutation p is a locally Kemeny optimal aggregation of partial lists t₁, t₂, ..., t_k, if there is no permutation p' that can be obtained from p by performing a single transposition of an adjacent pair of elements and for which K(p', t₁, t₂, ..., t_k) < K(p, t₁, t₂, ..., t_k). In other words, it is impossible to reduce the total distance to the t's by flipping an adjacent pair.

Note that the above definition is not equivalent to requiring that no flipping of any (not necessarily adjacent) pair will decrease the sum of the distances to the t's.

Example 1 p = (1,2,3), t₁ = (1,2), t₂ = (2,3), t₃ = t₄ = t₅ = (3,1).
We have that p satisfies Definition 5, K(p, t₁, t₂, ..., t₅)= 3, but transposing 1 and 3 decreases the sum to 2.

Every Kemeny optimal permutation is also locally Kemeny optimal, but the converse does not hold (cf. Example 1). Furthermore, a locally Kemeny optimal permutation is not necessarily a good approximation for the optimal. For example, if the t's are as in Example 1, the number of (3,1) partial lists is very large, and there is only one occurrence of each of the partial lists (1,2) and (2,3), then (1,2,3) is still locally Kemeny optimal, but the ratio (of the K-distances) to the optimal may be arbitrarily large. Nevertheless, the important observations, proved next, are that a locally Kemeny optimal aggregation satisfies the extended Condorcet property and can be computed efficiently.

Convention. Recall our convention that p ranks x above y (i.e., prefers x to y) whenever p(x) < p(y).

Lemma 6
Let p, a permutation on alternatives {1, ..., n}, be a locally Kemeny optimal aggregation for partial lists t₁, t₂, ..., t_k. Then p satisfies the extended Condorcet criterion with respect to t₁, t₂, ..., t_k.

Proof
If the lemma is false then there exist partial lists t₁, t₂, ..., t_k, a locally Kemeny optimal aggregation p, and a partition (T,U) of the alternatives where for all a in T and b in U the majority among t₁, t₂, ..., t_k prefers a to b, but there are c in T and d in U such that p(d) < p(c). Let (d,c) be a closest (in p) such pair. Consider the immediate successor of d in p, call it e. If e=c then c is adjacent to d in p and transposing this adjacent pair of alternatives produces a p' such that K(p', t₁, t₂, ..., t_k) < K(p, t₁, t₂, ..., t_k), contradicting the assumption that p is a locally Kemeny optimal aggregation of the t's. If e does not equal c, then either e is in T, in which case the pair (d,e) is a closer pair in p than (d,c) and also violates the extended Condorcet condition, or e is in U, in which case (e,c) is a closer pair than (d,c) that violates the extended Condorcet condition. Both cases contradict the choice of (d,c).

The set t₁, t₂, ..., t_k of partial lists defines a directed majority graph G on the n alternatives, with an edge (x,y) from x to y if a majority of the t's that contain both x and y rank x above y.

Lemma 7
Locally Kemeny optimal aggregations of k lists can be computed in O( kn log n) time.

Proof
It is not surprising that locally Kemeny optimal aggregations can be found in polynomial time because they are only local minima. A straightforward approach requires O(n²) time; we describe a technique requiring only O( kn log n) time (generally, we are interested in the case in which k is much smaller than n).
Consider the majority graph T for t₁, t₂, ..., t_k with anti-parallel edges in the case of a tie. The problem of finding a locally Kemeny optimal aggregation of t₁, t₂, ..., t_k is now equivalent to finding a Hamiltonian path in this graph. Due to the density of the edges it is possible to find such a path in T in O( n log n) probes to the edges of T using, for instance, a mergesort-like algorithm (the advantage of using mergesort is that the issue of inconsistent answers never arises, which simplifies the execution of the algorithm). Note that T need not be constructed explicitly. The cost of each probe is k accesses to the partial lists (to find out whether there is a majority), so the resulting complexity is O( kn log n).

We next turn to the details of the local Kemenization procedure. Recall that the value of local Kemenization is that, given an aggregation m of several rankings, it produces a ranking p that achieves the best of both worlds: p satisfies the extended Condorcet criterion, and p is maximally consistent with m. We begin by formalizing our notion of consistency.

Definition 8
Given partial lists t₁, t₂, ..., t_k, and a total order m we say that p is consistent with m and t₁, t₂, ..., t_k if p(i) < p(j) implies that either
(a) m(i) < m(j)
or
(b) a majority of t₁, t₂, ..., t_k prefer i to j (more prefer i over j than j over i, but not necessarily an absolute majority).

In other words, the order of two elements differs between m and p only if a majority of the t's support the change (however, consistency does not mandate a switch).

Note that if p is consistent with m and t₁, t₂, ..., t_k, then

K(p, t₁, t₂, ..., t_k) < K(m, t₁, t₂, ..., t_k),

since the only allowed changes decrease the distance to the t's.

The proof of the next lemma is straightforward from Definition 8.

Lemma 9
If p is consistent with m and t₁, t₂, ..., t_k, then for any 1 < l < n, if S is the set of l alternatives ranked most highly by m, the projection of p onto S is consistent with the projections of m and t₁, t₂, ..., t_k onto S.

As we will see, for any partial lists t₁, t₂, ..., t_k, and order m there is a permutation p that is (i) locally Kemeny optimal and (ii) consistent with m. (Such a p is not necessarily unique.) We will focus particularly on m-consistent locally Kemeny optimal aggregations that, when restricted to subsets S of the most highly ranked elements in m, retain their local Kemeny optimality (Definition 10 below). This is desirable whenever we are more sure of the significance of the top results in m than the bottom ones. In this case the solution is unique (Theorem 11).

Definition 10
Given partial lists t₁, t₂, ..., t_k and a total order m on alternatives {1,2, ..., n}, we say that p is a local Kemenization of m with respect to t₁, t₂, ..., t_k, if (1) p is consistent with m and (2) if we restrict attention to the set S consisting of the 1 < l < n most highly ranked alternatives in m, then the projection of p onto S is a locally Kemeny optimal aggregation of the projections of t₁, t₂, ..., t_k onto S.

Theorem 12
For any partial lists t₁, t₂, ..., t_k and order m on alternatives {1, ..., n}, there exists a unique local Kemenization of m with respect to t₁, t₂, ..., t_k.

Proof
We prove the theorem by induction on n, the number of alternatives. The base case n=1 is trivial. Assume the statement inductively for n-1. We will prove it for n. Let x be the last (lowest-ranked) element in m and let S = {1, ..., n}-{x}. Since S is of size n-1, we have by induction that there is a unique permutation p_n-1 on the elements in S satisfying the conditions of the theorem. Now insert the removed element x into the lowest-ranked "permissible" position in p_n-1: just below the lowest-ranked element y such that such that (a) no majority among the (original) t's prefers x to y and (b) for all successors z of y (i.e., p_n-1(y) < p_n-1(z)) there is a majority that prefers x to z. Clearly no two elements of m were switched unnecessarily and the solution, p, is locally Kemeny optimal from the local Kemeny optimality of p_n-1 and the majority properties. Note that the consistency condition requires that x be as low in p as local Kemeny optimality permits, so given p_n-1 there is only one place in which to insert x.
Suppose now that m and t₁, t₂, ..., t_k contradict uniqueness: there are two different local Kemenizations of m with respect to t₁, t₂, ..., t_k; call them p and p'. If we drop the last element x in m and let S be as above, then (by property (ii) of local Kemenization) the resulting permutations p_n-1 and p'_n-1 must each be local Kemenizations of the restrictions of the t's to S and (by property (i) and Lemma 9) they must be consistent with the restriction of m to S. By the induction hypothesis p_n-1 = p'_n-1 As argued above, there is only one place to insert x into this list.

The algorithm suggested by this proof may take O(n² k) time in the worst case (say a transitive tournament where m is the anti-transitive order). However, in general it requires time proportional to the Kendall distance between m and the solution. We do not expect m to be uncorrelated with the solution and therefore anticipate better performance in practice.

Appendix B: Complexity of Kemeny optima

In this section, we study the complexity of finding a Kemeny optimal permutation. We show that computing a Kemeny optimal permutation is NP-hard, even when the input consists of four full lists t₁, t₂, t₃, t₄. ..., For partial lists of length 2 finding a Kemeny optimal solution is exactly the same problem as finding a minimum feedback arc set, and hence is NP-hard (see [Even et al. 1998] for approximation results). The problem is also known to be NP-hard for an unbounded number of complete lists [Bartholdi et al. 1989].

We remark that computing a Kemeny optimal permutation for two lists is trivial --- simply output one of the input lists. The complexity of computing a Kemeny optimal permutation for three full lists is open; we show later in this section that this problem is reducible to the problem of finding minimum feedback edge sets on tournament graphs, which, as far as we know, is open as well.

Computing a Kemeny optimal permutation for an unbounded number of partial lists is easily seen to be NP-hard by a straightforward encoding of the feedback edge set problem: for each edge (i,j), create a partial list of two elements: i followed by j.

Theorem 11
The problem of computing a Kemeny optimal permutation for a given collection of k full lists, for even integers k >= 4, is NP-hard. The corresponding decision problem is NP-complete.

Proof
The reduction is from the feedback edge set problem. Given a directed graph G = (V,E), and an integer L >= 0, the question is whether there exists a subset F of E such that |F| < L and (V, E-F) is acyclic. Let n = |V| and m = |E|. Given G, we first produce a graph G' = (V', E') by "splitting" each edge of G into two edges; formally, let V' denote the union of V and the set {v_e : e is in E} and E' = {(i, v_i_,j), (v_i_,j, j) : (i,j) in E}. The easy fact that we will use later is that G has a feedback edge set of size L if and only if G' does.
Arbitrarily order all the vertices of G' so that the vertices in V receive the numbers 1, ..., n (and the vertices of the form v_e receive numbers n+1, ..., n+m). Whenever we wish to refer to this ordering, we will denote it by Z. For a vertex i in V, let out(i) denote a listing of the out-neighbors of i in G' in the order prescribed by Z; similarly let in(i) denote the in-neighbors of i in G' in the order prescribed by Z. Note that none of the lists out(i) or in(i) contains any vertex from the original graph G. We now define four full lists on the set V'. For a list L, the notation L^r denotes the reversal of the list.
t₁ = 1, out(1), 2, out(2), ..., n, out(n)
t₂ = n, out(n)^r, n-1, out(n-1)^r, ..., 1, out(1)^r
t₃ = 1, in(1), 2, in(2), ..., n, in(n)
t₄ = n, in(n)^r, n-1, in(n-1)^r, ..., 1, in(1)^r
The idea is that in t₁, each vertex in V precedes all its out-neighbors in G', but the ordering of the out-neighbors of a vertex, as well as the ordering of the vertex-neighbor groups are arbitrary (according to Z). The list t₂ "cancels" the effect of this arbitrariness in ordering the neighbors of a vertex and the vertex-neighbor groups, while "reinforcing" the ordering of each vertex in V above its out-neighbors in G'. Similarly, in t₃ and t₄, each vertex of the original vertex set V is preceded by its in-neighbors in G', with suitably arranged cancellations of the artificial ordering among the other pairs.
The main claim is that G has a feedback edge set of size L if and only if there is a permutation p such that S_r K(p, t_r) < L', where L' = 2L + 2(n(n-1)/2 + m(m-1)/2 + m).
First suppose that G has a feedback edge set F of size L. It is easy to see that the set F' = {(i, v_i,j) : (i,j) in F} is a feedback edge set of G', and |F'| = L. The graph (V', E'-F') is acyclic, so by topologically sorting the vertices of this graph, we obtain an ordering p of the vertices in V' such that for every (i,j) in E'-F', i is placed before j in p. We claim that p is an ordering that satisfies K(p, t_r) < L'.
Note that regardless of how p was obtained, the last three terms are inevitable:
(1) for each pair i,j in V, exactly one of t₁ and t₂ places i above j and the other places j above i, so there is a contribution of 1 to K(p, t₁) + K(p, t₂); similarly, there is a contribution of 1 to K(p, t₃) + K(p, t₄). This accounts for the term 2n(n-1)/2.
(2) a similar argument holds for pairs v_e, v_e', and there are m(m-1)/2 such pairs, accounting for the term 2m(m-1)/2.
(3) a similar argument holds for pairs v_i,j, j with respect to t₁ and t₂, and for pairs i, v_i,j, with respect to t₃ and t₄. The total number of such pairs is 2m.
The only remaining contribution to the total distance of p from the t's comes from the i, v_i,j pairs with respect to t₁ and t₂ (where i precedes v_i,j in both lists), and the v_i,j, j pairs with respect to t₃ and t₄ (where v_i,j precedes j in both lists). Of these, a pair contributes 2 to the total Kemeny distance S_r K(p, t_r) precisely if it occurs as a "back edge" with respect to the topological ordering p of the vertices of G'; since (V', E'-F') is acyclic, the total number of such back edges is at most |F'| = L.
Conversely, suppose that there exists a permutation p that achieves a total Kemeny distance of at most L' = 2L + 2(n(n-1)/2 + m(m-1)/2 + m). We have already argued (in items (1), (2), and (3) above) that p must incur a distance of 2(n(n-1)/2 + m(m-1)/2 + m). with respect to the t's, the so the only extra distance between p and the t's comes from pairs of the form i, v_i,j in t₁ and t₂, and of the form v_i,j, j in t₃ and t₄. Once again, each such pair contributes either 0 or 2 to the total distance. Consider the pairs that contribute 2 to the distance, and let the corresponding set of edges in E' be denoted by F'. Now, (V', E'-F') is acyclic since every edge that remains in E'-F', by definition, respects the ordering in p. Thus F' is a feedback edge set of G' of size at most L', and the set F = {(i,j) : (i, v_i,j) in F' OR (v_i,j, j) in F'} is a feedback edge set of G of size at most L'.
This completes the proof that computing a Kemeny optimal permutation is NP-hard even when the input consists of four full lists. The proof for the case of even k, k > 4, is a simple extension: first produce four lists as above, then add (k-4)/2 pairs of lists s, s^r, where s is an arbitrary permutation. This addition clearly preserves Kemeny optimal solutions; the distance parameter is increased by an additive (k-4) (n+m)(n+m-1)/4 term.