From social choice to statistics to coding theory, rankings are found to be a useful vehicle for storing and presenting information in modern data systems. Often, in order to process the information, an appropriately defined distance on rankings is required or at least helpful. For example, in social choice, the quality of the results of distance-based voting rules depends almost entirely on the chosen distance function; in statistics, distances between rankings are used to measure correlation and to model data; and in coding theory, in the context of rank modulation, designing codes with a given minimum distance is required for preserving data integrity.It is however well-known that conventional distances are not adequate for many applications. Motivated by several problems from different disciplines, including problems in social choice and bioinformatics, we axiomatically introduce two novel classes of distances on rankings to address the shortcomings of conventional distances. In addition, we propose several algorithms for computing or approximating these distances. The algorithms are based on graph-search techniques, Viterbi-type algorithms, and dynamic programming. Furthermore, we present algorithms for rank aggregation using the proposed distances. One algorithm is based on finding a minimum weight bipartite matching and another is a PageRank-type algorithm in which the transition probabilities of a Markov chain model yield the aggregate ranking.In the context of coding theory, we introduce permutation codes in the Ulam metric that were not previously reported in the literature. Compared to known codes, the proposed codes can handle more diverse types of errors, including arbitrary charge-drop errors as well as other errors that affect a single cell, such as read disturb or write disturb errors. We present capacity achieving codes that can correct a constant number of errors and asymptotically good codes that can correct a linear number of errors in the length of the code. Our results also include derivation of the asymptotic capacity of permutation codes in the Ulam metric and simple decoding methods for one class of constructed codes. As part of our exposition, we also highlight the close connections between the new code family and permutations with short common subsequences, deletion and insertion error-correcting codes, and substitution error-correcting codes.
【 预 览 】
附件列表
Files
Size
Format
View
Distances on rankings: from social choice to flash memories