BMC Bioinformatics | |
Dynamic model updating (DMU) approach for statistical learning model building with missing data | |
Rahi Jain1  Wei Xu2  | |
[1] Biostatistics Department, Princess Margaret Cancer Research Centre, Toronto, ON, Canada;Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada; | |
关键词: Missing data; Bayesian regression; Hierarchical clustering; Model updating; Dynamic model updating; | |
DOI : 10.1186/s12859-021-04138-z | |
来源: Springer | |
【 摘 要 】
BackgroundDeveloping statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models.Method and resultsThis study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM.ConclusionDMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data.
【 授权许可】
CC BY
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
RO202107036100098ZK.pdf | 1425KB | download |