Building risk models from multiple different sources of data allows researchers to incorporate the best available information on key model parameters.In this thesis, we develop and apply methodology for optimally combining information from multiple data sources in two main contexts.In the first, motivated by the need for building subtype-specific absolute risk models for breast cancer, we develop and apply methodology for combining information information from analytic cohort or case-control studies and from population-based registries.We address the statistical challenges involved with handling different types of missing information in this context.We derive variance estimators for the risk predictions produced by such models, accounting for different sources of uncertainty.We apply the methods to two large consortia in order to build absolute risk models for overall breast cancer and for subtypes of breast cancer defined by estrogen receptor status.We show how the absolute risk models can be used to project distributions of breast cancer risk for the US population and to evaluate the potential impact of population-wide modification of breast cancer risk factors.In the second problem, we consider the issue of how to effectively incorporate external information when building new or updated risk models, again with the goal of combining data sources to produce models that are more efficient and representative of the underlying population.In particular, we explore a regression calibration approach, utilizing a method from sample-survey literature which is traditionally used for increasing the efficiency of parameter estimation from a given survey by leveraging information from external data sources.We examine the performance of the estimator in a context that has not previously been studied, where the sample and the external data are representative of different populations.We derive theoretical conditions under which the calibrated estimator produces meaningful estimates, which are calibrated to the external population, and corroborate our analytic results with numerical simulations.Our work also identified weaknesses in the methodology and promising avenues of further research in this important area.
【 预 览 】
附件列表
Files
Size
Format
View
Synthesizing Data Sources to Develop and Update Risk Models