期刊论文详细信息
BMC Medical Research Methodology
Comparison of confidence interval methods for an intra-class correlation coefficient (ICC)
Kevin K Dobbin3  Lisa M McShane1  Mei-Yin C Polley1  Alexei C Ionan2 
[1] Biometric Research Branch, National Cancer Institute, Rockville, MD, USA;Department of Statistics, University of Georgia, Athens, GA, USA;Department of Epidemiology and Biostatistics, University of Georgia, Athens, GA, USA
关键词: Modified large sample;    Intraclass correlation coefficient;    Generalized confidence interval;    Credible interval;    Confidence interval;   
Others  :  1090558
DOI  :  10.1186/1471-2288-14-121
 received in 2014-05-06, accepted in 2014-10-27,  发布年份 2014
PDF
【 摘 要 】

Background

The intraclass correlation coefficient (ICC) is widely used in biomedical research to assess the reproducibility of measurements between raters, labs, technicians, or devices. For example, in an inter-rater reliability study, a high ICC value means that noise variability (between-raters and within-raters) is small relative to variability from patient to patient. A confidence interval or Bayesian credible interval for the ICC is a commonly reported summary. Such intervals can be constructed employing either frequentist or Bayesian methodologies.

Methods

This study examines the performance of three different methods for constructing an interval in a two-way, crossed, random effects model without interaction: the Generalized Confidence Interval method (GCI), the Modified Large Sample method (MLS), and a Bayesian method based on a noninformative prior distribution (NIB). Guidance is provided on interval construction method selection based on study design, sample size, and normality of the data. We compare the coverage probabilities and widths of the different interval methods.

Results

We show that, for the two-way, crossed, random effects model without interaction, care is needed in interval method selection because the interval estimates do not always have properties that the user expects. While different methods generally perform well when there are a large number of levels of each factor, large differences between the methods emerge when the number of one or more factors is limited. In addition, all methods are shown to lack robustness to certain hard-to-detect violations of normality when the sample size is limited.

Conclusions

Decision rules and software programs for interval construction are provided for practical implementation in the two-way, crossed, random effects model without interaction. All interval methods perform similarly when the data are normal and there are sufficient numbers of levels of each factor. The MLS and GCI methods outperform the NIB when one of the factors has a limited number of levels and the data are normally distributed or nearly normally distributed. None of the methods work well if the number of levels of a factor are limited and data are markedly non-normal. The software programs are implemented in the popular R language.

【 授权许可】

   
2014 Ionan et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20150128161756135.pdf 401KB PDF download
Figure 1. 59KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Bartko J: Intraclass correlation coefficient as a measure of reliability. Psychol Rep 1966, 19:3-11.
  • [2]Donner A: The use of correlation and regression in the analysis of family resemblance. Am J Epidemiol 1979, 110(3):335-342.
  • [3]Wolak M, Fairbairn D, Paulsen Y: Guidelines for estimating repeatability. Methods Ecol Evol 2012, 3(1):129-137.
  • [4]Gisev N, Bell J, Chen T: Interrate agreement and interrater reliability: key concepts, approaches, and applications. Res Soc Admin Pharm 2013, 9(3):330-338.
  • [5]Berger J: Statistical Decision Theory and Bayesian Analysis. 2nd edition. New York: Springer-Verlag; 1985.
  • [6]Carlin B, Louis T: Bayesian Methods for Data Analysis. 3rd edition. Boca Raton, FL: Chapman and Hall; 2009.
  • [7]Little R: Calibrated Bayes: a Bayes/frequentist roadmap. Am Stat 2006, 60:213-223.
  • [8]Rubin D: Bayesianly justifiable and relevant frequency calculations for applied statisticians. Ann Stat 1984, 12:1151-1172.
  • [9]Box G: Sampling and Bayes inference in scientific modeling and robustness. J Royal Stat Soc A 1980, 143:383-430.
  • [10]Browne W, Draper D: A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Anal 2006, 1(3):473-514.
  • [11]Yin G: Bayesian generalized method of moments. Bayesian Anal 2009, 4:191-208.
  • [12]Leonard D: Estimating a bivariate linear relationship. Bayesian Anal 2011, 6:727-754.
  • [13]Bingham M, Vardeman S, Nordman D: Bayes one-sample and one-way random effects analyses for 3-D orientations with application to materials science. Bayesian Anal 2009, 4:607-630.
  • [14]Samaniego F: A Comparison of the Bayesian and Frequentist Approaches to Estimation. New York: Springer; 2010.
  • [15]Barzman D, Mossman D, Sonnier L, Sorter M: Brief rating of aggression by children and adolescents (BRACHA): a reliability study. J Am Acad Psychiatry Law 2012, 40:374-382.
  • [16]Dobbin K, Beer D, Meyerson M, Yeatman T, Gerald W, Jacobson J, Conley B, Buetow K, Heiskanen M, Simon RM, Minna JD, Girard L, Misek DE, Taylor JM, Hanash S, Naoki K, Hayes DN, Ladd-Acosta C, Enkemann SA, Viale A, Giordano TJ: Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin Cancer Res 2005, 11:565-572.
  • [17]McShane LM, Aamodt R, Cordon-Cardo C, Cote R, Faraggi D, Fradet Y, Grossman HB, Peng A, Taube SE, Waldman FM: Reproducibility of p53 immunohistochemistry in bladder tumors. National cancer institute, bladder tumor marker network. Clin Cancer Res 2000, 6(5):1854-1864.
  • [18]Chen C, Barnhart HX: Comparison of ICC and CCC for assessing agreement for data without and with replications. Comput Stat Data Anal 2008, 53:554-564.
  • [19]Lin LI, Hedayat AS, Wu WM: Statistical Tools for Measuring Agreement. New York: Springer; 2012.
  • [20]Montgomery D: Design and Analysis of Experiments. 8th edition. New York: Wiley; 2013.
  • [21]Searle S, Fawcett R: Expected mean squares in variance components models having finite populations. Biometrics 1970, 26(2):243-254.
  • [22]Lin LI, Hedayat AS, Wu WM: A unified approach for assessing agreement for continuous and categorical data. Biopharm Stat 2007, 17(4):629-652.
  • [23]Cappelleri J, Ting N: A modified large-sample approach to approximate interval estimation for a particular class of intraclass correlation coefficient. Stat Med 2003, 22:1861-1877.
  • [24]Graybill F, Wang C: Confidence intervals for nonnegative linear combinations of variances. J Am Stat Assoc 1980, 75:869-873.
  • [25]Burdick R, Borror C, Montgomery D: Design and Analysis of Gauge R&R Studies: Making Decisions with Confidence Intervals in Random and Mixed ANOVA Models. Alexandria, Virginia: ASA and SIAM; 2005.
  • [26]Arteaga C, Jeyaratnam S, Graybill F: Confidence intervals for proportions of total variance in the two-way cross component of variance model. Commun Stat Theor Methods 1982, 11:1643-1658.
  • [27]Weerahandi S: Generalized confidence intervals. J Am Stat Assoc 1993, 88(423):899-905.
  • [28]Robert C, Casella G: Monte Carlo Statistical Methods. New York: Springer; 2010.
  • [29]Gelfand A, Smith A: Sampling based approaches to calculating marginal densities. J Am Stat Assoc 1990, 85:398-409.
  • [30]Tierney L: Markov chains for exploring posterior distributions. Ann Stat 1991, 22:1701-1762.
  • [31]Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E: Equations of state calculations by fast computing machines. J Chem Phys 1953, 21:1087-1092.
  • [32]Thomas A, O’Hara B, Ligges U, Sturtz S: Making BUGS open. R News 2006, 6:12-17.
  • [33]Lunn D, Thomas A, Best N: WinBUGS – a Bayesian modeling framework: concepts, structure and extensibility. Stat Comput 2000, 10:325-337.
  • [34]Weerahandi S: Exact Statistical Methods for Data Analysis. New York: Springer-Verlag; 2003.
  • [35]Gelman A: Prior distributions for variance parameters in hierarchical models. Bayesian Anal 2006, 1(3):515-533.
  • [36]Hadfield J: MCMC methods for multi-response generalized linear mixed models: the MCMCglmm R package. J Stat Software 2010, 33(2):1-22.
  • [37]Box G, Cox D: An analysis of transformations (with discussion). J Royal Stat Soc B 1964, 26:211-252.
  • [38]John J, Draper N: An alternative family of transformations. Appl Stat 1980, 29:190-197.
  • [39]Li C, Wing WH: Model-based analysis of oligonucleotide arrays: expression index computation and outlier detection. Proc Natl Acad Sci U S A 2001, 98(1):31-36.
  • [40]Muller P, Quintana F: Nonparametric Bayesian data analysis. Statistical Science 2004, 19(1):95-110.
  • [41]Lehman E, Cassella G: Theory of Point Estimation. New York: Springer; 1998.
  文献评价指标  
  下载次数:16次 浏览次数:18次