BMC Medical Research Methodology | |
Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R | |
Marianne Huebner1  Carsten Oliver Schmidt2  Stephan Struckmann2  Adrian Richter2  Jürgen Stausberg3  Börge Schmidt3  Cornelia Enzenbach4  Willi Sauerbrei5  Achim Reineke6  Stefan Damerow7  | |
[1] Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA;Institute for Community Medicine, Department SHIP-KEF, University Medicine Greifswald, Greifswald, Germany;Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), Faculty of Medicine, University of Duisburg-Essen, Duisburg, Germany;Institute for Medical Informatics, Statistics, and Epidemiology, University of Leipzig, Leipzig, Germany;Institute of Medical Biometry and Statistics, Faculty of Medicine and Medical Center, University of Freiburg, Freiburg, Germany;Leibniz Institute for Prevention Research and Epidemiology – BIPS, Bremen, Germany;Robert Koch Institute, Department of Epidemiology and Health Monitoring, Berlin, Germany; | |
关键词: Data quality; Observational health studies; Data quality indicators; Data quality monitoring; Initial data analysis; R; | |
DOI : 10.1186/s12874-021-01252-7 | |
来源: Springer | |
![]() |
【 摘 要 】
BackgroundNo standards exist for the handling and reporting of data quality in health research. This work introduces a data quality framework for observational health research data collections with supporting software implementations to facilitate harmonized data quality assessments.MethodsDevelopments were guided by the evaluation of an existing data quality framework and literature reviews. Functions for the computation of data quality indicators were written in R. The concept and implementations are illustrated based on data from the population-based Study of Health in Pomerania (SHIP).ResultsThe data quality framework comprises 34 data quality indicators. These target four aspects of data quality: compliance with pre-specified structural and technical requirements (integrity); presence of data values (completeness); inadmissible or uncertain data values and contradictions (consistency); unexpected distributions and associations (accuracy). R functions calculate data quality metrics based on the provided study data and metadata and R Markdown reports are generated. Guidance on the concept and tools is available through a dedicated website.ConclusionsThe presented data quality framework is the first of its kind for observational health research data collections that links a formal concept to implementations in R. The framework and tools facilitate harmonized data quality assessments in pursue of transparent and reproducible research. Application scenarios comprise data quality monitoring while a study is carried out as well as performing an initial data analysis before starting substantive scientific analyses but the developments are also of relevance beyond research.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202107023356689ZK.pdf | 1764KB | ![]() |