Frontiers in Psychology | 卷:5 |
How to assess and compare inter-rater reliability, agreement and correlation of ratings: an exemplary analysis of mother-father and parent-teacher expressive vocabulary rating pairs | |
Margarita eStolarova1  Tanja eRinker2  Corinna eWolf2  Aenne Annelie Brielmann2  | |
[1] Rhine-Waal University of Applied Sciences; | |
[2] University of Konstanz; | |
关键词: Language assessment; Correlation analysis; lexical development; inter-rater reliability; Inter-rater agreement; expressive vocabulary; | |
DOI : 10.3389/fpsyg.2014.00509 | |
来源: DOAJ |
【 摘 要 】
This report has two main purposes. First, we combine well-known analytical approaches to conduct a comprehensive assessment of agreement and correlation of rating-pairs and to dis-entangle these often confused concepts, providing a best-practice example on concrete data and a tutorial for future reference. Second, we explore whether a screening questionnaire deve-loped for use with parents can be reliably employed with daycare teachers when assessing early expressive vocabulary. A total of 53 vocabulary rating pairs (34 parent-teacher and 19 mother-father pairs) collected for two-year-old children (12 bilingual) are evaluated. First, inter-rater reliability both within and across subgroups is assessed using the intra-class correlation coefficient (ICC). Next, based on this analysis of reliability and on the test-retest reliability of the employed tool, inter-rater agreement is analyzed, magnitude and direction of rating differences are considered. Finally, Pearson correlation coefficients of standardized vocabulary scores are calculated and compared across subgroups. The results underline the necessity to distinguish between reliability measures, agreement and correlation. They also demonstrate the impact of the employed reliability on agreement evaluations. This study provides evidence that parent-teacher ratings of children’s early vocabulary can achieve agreement and correlation comparable to those of mother-father ratings on the assessed vocabulary scale. Bilingualism of the evaluated child decreased the likelihood of raters’ agreement. We conclude that future reports of agree-ment, correlation and reliability of ratings will benefit from better definition of terms and stricter methodological approaches. The methodological tutorial provided here holds the potential to increase comparability across empirical reports and can help improve research practices and knowledge transfer to educational and therapeutic settings.
【 授权许可】
Unknown