BMC Medical Research Methodology | |
Observer agreement paradoxes in 2x2 tables: comparison of agreement measures | |
Shrikant I Bangdiwala1  Viswanathan Shankar2  | |
[1] Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina, Chapel Hill, NC 27599, USA;Division of Biostatistics, Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY 10461, USA | |
关键词: AC1-index; Delta; B-statistic; Aickin’s alpha; Cohen’s kappa; 2x2 table; Rater agreement; | |
Others : 1091178 DOI : 10.1186/1471-2288-14-100 |
|
received in 2014-06-23, accepted in 2014-08-12, 发布年份 2014 | |
【 摘 要 】
Background
Various measures of observer agreement have been proposed for 2x2 tables. We examine the behavior of alternative measures of observer agreement for 2x2 tables.
Methods
The alternative measures of observer agreement and the corresponding agreement chart were calculated under various scenarios of marginal distributions (symmetrical or not, balanced or not) and of degree of diagonal agreement, and their behaviors are compared. Specifically, two specific paradoxes previously identified for kappa were examined: (1) low kappa values despite high observed agreement under highly symmetrically imbalanced marginals, and (2) higher kappa values for asymmetrical imbalanced marginal distributions.
Results
Kappa and alpha behave similarly and are affected by the marginal distributions more so than the B-statistic, AC1-index and delta measures. Delta and kappa provide values that are similar when the marginal totals are asymmetrically imbalanced or symmetrical but not excessively imbalanced. The AC1-index and B-statistics provide closer results when the marginal distributions are symmetrically imbalanced and the observed agreement is greater than 50%. Also, the B-statistic and the AC1-index provide values closer to the observed agreement when the subjects are classified mostly in one of the diagonal cells. Finally, the B-statistic is seen to be consistent and more stable than kappa under both types of paradoxes studied.
Conclusions
The B-statistic behaved better under all scenarios studied as well as with varying prevalences, sensitivities and specificities than the other measures, we recommend using B-statistic along with its corresponding agreement chart as an alternative to kappa when assessing agreement in 2x2 tables.
【 授权许可】
2014 Shankar and Bangdiwala; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150128170056940.pdf | 1158KB | download | |
Figure 6. | 59KB | Image | download |
Figure 5. | 41KB | Image | download |
Figure 4. | 51KB | Image | download |
Figure 3. | 58KB | Image | download |
Figure 2. | 40KB | Image | download |
Figure 1. | 24KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
【 参考文献 】
- [1]Banerjee M, Capozzoli M, McSweeney L, Sinha D: Beyond kappa: A review of interrater agreement measures. Can J Stat 1999, 27(1):3-23.
- [2]Kraemer HC, Periyakoil VS, Noda A: Kappa coefficients in medical research. Stat Med 2002, 21(14):2109-2129.
- [3]Landis JR, King TS, Choi JW, Chinchilli VM, Koch GG: Measures of agreement and concordance with clinical research applications. Stat Biopharma Res 2011., 3(2) doi:10.1198/sbr.2011.10019
- [4]Cohen J: A coefficient of agreement for nominal scales. Educ Psychol Meas 1960, 20:37-46.
- [5]Brennan RL, Prediger DJ: Coefficient kappa: Some uses, misuses, and alternatives. Educ Psychol Meas 1981, 41(3):687-699.
- [6]Byrt T, Bishop J, Carlin JB: Bias, prevalence and kappa. J Clin Epidemiol 1993, 46(5):423-429.
- [7]Feinstein AR, Cicchetti DV: High agreement but low kappa: I. The problems of two paradoxes. J Clin Epidemiol 1990, 43(6):543-549.
- [8]Cicchetti DV, Feinstein AR: High agreement but low kappa: II. Resolving the paradoxes. J Clin Epidemiol 1990, 43(6):551-558.
- [9]Kraemer HC: Ramifications of a population model forκ as a coefficient of reliability. Psychometrika 1979, 44(4):461-472.
- [10]Nelson JC, Pepe MS: Statistical description of interrater variability in ordinal ratings. Stat Methods Med Res 2000, 9(5):475-496.
- [11]Thompson WD, Walter SD: A reappraisal of the kappa coefficient. J Clin Epidemiol 1988, 41(10):949-958.
- [12]Lantz CA, Nebenzahl E: Behavior and interpretation of the kappa statistic: resolution of the two paradoxes. J Clin Epidemiol 1996, 49(4):431-434.
- [13]Bangdiwala SI: The Agreement Chart. Chapel Hill: The University of North Carolina; 1988.
- [14]Bangdiwala SI, Shankar V: The agreement chart. BMC Med Res Methodol 2013, 13(1):97. BioMed Central Full Text
- [15]Aickin M: Maximum likelihood estimation of agreement in the constant predictive probability model, and its relation to Cohen’s kappa. Biometrics 1990, 46:293-302.
- [16]Andres AM, Femia-Marzo P: Chance-corrected measures of reliability and validity in 2× 2 tables. Commun Stat Theory Met 2008, 37(5):760-772.
- [17]Andrés AM, Marzo PF: Delta: a new measure of agreement between two raters. Brit J Math Stat Psychol 2004, 57(1):1-19.
- [18]Gwet KL: Computing inter‒rater reliability and its variance in the presence of high agreement. Brit J Math Stat Psychol 2008, 61(1):29-48.
- [19]Bangdiwala SI: A Graphical Test for Observer Agreement. In 45th International Statistical Institute Meeting, 1985. Amsterdam; 1985:307-308.
- [20]Meyer D, Zeileis A, Hornik K, Meyer MD, KernSmooth S: The vcd package. Retrieved October 2007, 3:2007.
- [21]Friendly M: Visualizing Categorical Data. Cary, NC: SAS Institute; 2000.
- [22]Guggenmoos‒Holzmann I: How reliable are change‒corrected measures of agreement? Stat Med 1993, 12(23):2191-2205.
- [23]Guggenmoos-Holzmann I: The meaning of kappa: probabilistic concepts of reliability and validity revisited. J Clin Epidemiol 1996, 49(7):775-782.