期刊论文详细信息
BMC Medical Research Methodology
An R package for analyzing and modeling ranking data
Philip LH Yu1  Paul H Lee1 
[1] Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, Hong Kong
关键词: Weighted distance;    Visualization;    Multidimensional preference analysis;    Luce model;    Distance-based model;   
Others  :  1125933
DOI  :  10.1186/1471-2288-13-65
 received in 2012-09-25, accepted in 2013-04-25,  发布年份 2013
PDF
【 摘 要 】

Background

In medical informatics, psychology, market research and many other fields, researchers often need to analyze and model ranking data. However, there is no statistical software that provides tools for the comprehensive analysis of ranking data. Here, we present pmr, an R package for analyzing and modeling ranking data with a bundle of tools. The pmr package enables descriptive statistics (mean rank, pairwise frequencies, and marginal matrix), Analytic Hierarchy Process models (with Saaty’s and Koczkodaj’s inconsistencies), probability models (Luce model, distance-based model, and rank-ordered logit model), and the visualization of ranking data with multidimensional preference analysis.

Results

Examples of the use of package pmr are given using a real ranking dataset from medical informatics, in which 566 Hong Kong physicians ranked the top five incentives (1: competitive pressures; 2: increased savings; 3: government regulation; 4: improved efficiency; 5: improved quality care; 6: patient demand; 7: financial incentives) to the computerization of clinical practice. The mean rank showed that item 4 is the most preferred item and item 3 is the least preferred item, and significance difference was found between physicians’ preferences with respect to their monthly income. A multidimensional preference analysis identified two dimensions that explain 42% of the total variance. The first can be interpreted as the overall preference of the seven items (labeled as “internal/external”), and the second dimension can be interpreted as their overall variance of (labeled as “push/pull factors”). Various statistical models were fitted, and the best were found to be weighted distance-based models with Spearman’s footrule distance.

Conclusions

In this paper, we presented the R package pmr, the first package for analyzing and modeling ranking data. The package provides insight to users through descriptive statistics of ranking data. Users can also visualize ranking data by applying a thought multidimensional preference analysis. Various probability models for ranking data are also included, allowing users to choose that which is most suitable to their specific situations.

【 授权许可】

   
2013 Lee and Yu; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150218025446511.pdf 1281KB PDF download
Figure 2. 62KB Image download
Figure 1. 79KB Image download
【 图 表 】

Figure 1.

Figure 2.

【 参考文献 】
  • [1]Diaconis P: Group representations in probability and statistics. Hayward: Institute of Methematical Statistics; 1988.
  • [2]Duncan OD, Brody C: Analyzing rankings of three items. In Social structure and behavior. Edited by Hauser RM, Mechanic D, Haller AO, Hauser TS. New York: Academic; 1982:269-310.
  • [3]Goldberg AI: The relevance of cosmopolitan/local orientations to professional values and behavior. Sociol Work Occup 1975, 3:331-356.
  • [4]Yu PLH, Chan LKY: Bayesian analysis of wandering vector models for displaying ranking data. Stat Sin 2001, 11:445-461.
  • [5]Plumb AAO, Grieve FM, Khan SH: Survey of hospital clinicians’ preferences regarding the format of radiology reports. Clin Radiol 2009, 64:386-394.
  • [6]Salomon JA: Reconsidering the use of rankings in the valuation of health states: a model for estimating cardinal values from ordinal data. Popul Health Metrics 2003, 1:1-12. BioMed Central Full Text
  • [7]Krabbe PFM, Salomon JA, Murray CJL: Quantificaition of health states with rank-based nonmetric multidimensional scaling. Med Decis Making 2007, 27:395-405.
  • [8]McCabe C, Brazier J, Gilks P, Tsuchiya A, Roberts J, O’Hagan A, Stevens K: Use rank data to estimate health state utility models. J Health Econ 2006, 25:418-431.
  • [9]Craig BM, Busschbach JJV, Salomon JA: Modeling ranking, time trade-off, and visual analog scale values for EQ-5d health states: a review and comparison of methods. Med Care 2009, 47:634-641.
  • [10]Ratcliffe J, Brazaier J, Tsuchiya A, Symonds T, Brown M: Using DCE and ranking data to estimate cardinal values for health states for deruving a preference-based single index from the sexual quality of life questionnaire. Health Econ 2009, 18:1261-1276.
  • [11]Leung GM, Yu PLH, Wong IOL, Johnston JM, Tin KYK: Incentives and barriers that influence clinical computerization in Hong Kong: a population-based physician survey. J Am Med Inform Assoc 2003, 10:201-212.
  • [12]Park ST, Pennock DM: Applying collaborative filtering techniques to movie search for better ranking and browsing. Proc KDD 2007 2007.
  • [13]Lin S, Ding J: Integration of ranked lists via Cross Entropy Monte Carlo with applications to mRNA and microRNA studies. Biometrics 2009, 65:9-18.
  • [14]Ganesan K, Zhai C: Opinion-based entity ranking. Inf Retr 2012, 15:116-150.
  • [15]Marden JI: Analyzing and modeling rank data. London: Chapman and Hall; 1995.
  • [16]Luce RD: Individual choice behavior. New York: John Wiley and Sons; 1959.
  • [17]Fligner MA, Verducci JS: Distance based ranking models. J R Stat Soc B 1986, 48(3):359-369.
  • [18]Lee PH, Yu PLH: Distance-based tree models for ranking data. Comput Stat Data Anal 2010, 54(6):1672-1682.
  • [19]Lee PH, Yu PLH: Mixtures of weighted distance-based models for ranking data with applications in political studies. Comput Stat Data Anal 2012, 56(8):2486-2500.
  • [20]R: a language and environment for statistical computating. [http://www.R-project.org webcite]
  • [21]Holleczek B, Gondos A, Brenner H: Period R - an R package to calculate long-term cancer survival estimates using period analysis. Methods Inf Med 2009, 48(2):123-128.
  • [22]Kreuz M, Rosolowski M, Berger H, Schwaenen C, Wessendorf S, Loeffler M, Hasenclever D: Development and implementation of an analysis tool for array-based comparative genomic hybridization. Methods Inf Med 2007, 46(5):608-613.
  • [23]Murphy TB, Martin D: Mixtures of distance-based models for ranking data. Comput Stat Data Anal 2003, 41:645-655.
  • [24]Critchlow DE, Fligner MA, Verducci JS: Probability models on rankings. J Math Psychol 1991, 35:294-318.
  • [25]Yu PLH: Statistical modelling of ranking data. In Computational mathematics and modelling. edn Edited by Lenbury Y, Sanh NV, Wu YH, Wiwatanapataphee B. 2003, 319-326.
  • [26]Saaty TL: A scaling methods for priorities in hierarchical structure. J Math Psychol 1977, 15:234-281.
  • [27]Bozoki S, Rapcsak T: On Saaty’s and Koczkodaj’s inconsistencies of pairwise comparison matrices. J Global Optim 2008, 42(2):157-175.
  • [28]Carroll JD: Individual differences and multidimensional scaling. In Multidimensional scaling: theory and applications in the behavioral sciences. Volume 1, edn. Edited by Shepard RN, Romney AK, Nerlove SB. New York: Seminar Press; 1972.
  • [29]Thurstone LL: A law of comparative judgement. Psychol Rev 1927, 34:273-286.
  • [30]Guiver J, Snelson E: Bayesian inference for Plackett-Luce ranking models. Proc ICML 2009 2009.
  • [31]Lu T, Boutilier C: Learning mallows models with pairwise preferences. Proc ICML 2011 2011.
  • [32]Caron F, Teh YW: Bayesian nonparametric models for ranked data. Proc NIPS 2012 2012.
  • [33]Chapman RG, Staelin R: Exploiting rank ordered choice set data within the stochastic utility model. J Market Res 1982, 19:288-301.
  • [34]Beggs S, Cardell S, Hausman JA: Assessing the potential demand for electric cars. J Econ 1981, 16:1-19.
  • [35]Hausman JA, Ruud PA: Specifying and testing econometric models for ranked-ordered data. J Econ 1987, 34(1-2):82-104.
  • [36]Spearman C: The proof and measurement of association between two things. Am J Psychol 1904, 15:72-101.
  • [37]Mallows CL: Non-null ranking models. I. Biometrika 1957, 44:114-130.
  • [38]Cayley A: A note on the theory of permutations. Phil Mag 1849, 34:527-529.
  • [39]Shieh GS: A weighted Kendall’s tau statistic. Stat Prob Lett 1998, 39:17-24.
  • [40]Shieh GS, Bai Z, Tsai WY: Rank tests for independence - with a weighted contamination alternative. Stat Sin 2000, 10:577-593.
  • [41]Tarsitano A: Comparing the effectiveness of rank correlation statistics. Working papers, universita della calabria, dipartimento di economia e statistica, 200906 2009.
  • [42]Cheng W, Dembczynski K, Hullermeier E: Label ranking methods based on the Plackett-Luce model. Proc ICML 2010 2010.
  • [43]Koczkodaj WW, Herman MW, Orlowski M: Using consistency-driven pairwise comaprisons in knowledge-based systems. Proc CIKM 1997 1997.
  • [44]Thompson GL: Graphical techniques for ranked data. In Probability models and statistical analyses for ranking data. edn. Edited by Fligner MA, Verducci JS. New York: Springer; 1993:294-298.
  • [45]Cheng W, Hullermeier E: A new instance-based label ranking approach using the Mallows model. Proc ISNN 2009 2009.
  • [46]Yu PLH, Wan WM, Lee PH: Decision tree modelling for ranking data. In Preference learning. edn. Edited by Furnkranz J, Hullermeier E. Berlin: Springer-Verlag; 2010:83-106.
  • [47]Fligner MA, Verducci JS: Multi-stage ranking models. J Am Stat Assoc 1988, 83:892-901.
  • [48]Xu L: A multistage ranking model. Psychometrika 2000, 65(2):217-231.
  文献评价指标  
  下载次数:16次 浏览次数:13次