Electronic health records are an increasingly important source of data for research, allowing for largescale longitu dinal studies on the same population that is being treated. Unlike in controlled studies, though, these data vary widely in quality, quantity, and structure. In order to know whether algorithms can accurately uncover new knowledge from these records, or whether findings can be extrapolated to new populations, they must be validated. One approach is to conduct the same study in multiple sites and compare results, but it is a challenge to determine whether differences are due to artifacts of the medical process, population differences, or failures of the methods used. In this paper we describe the results of replicating a datadriven experiment to infer possible causes of congestive heart failure and their timing using data from two medical systems and two patient populations. We focus on the difficulties faced in
【 预 览 】
附件列表
Files
Size
Format
View
Lessons Learned in Replicating DataDriven Experiments in Multiple Medical Systems and Patient Populations