期刊论文详细信息
BMC Bioinformatics
A framework for feature extraction from hospital medical data with applications in risk prediction
Truyen Tran3  Wei Luo2  Dinh Phung2  Sunil Gupta2  Santu Rana2  Richard Lee Kennedy1  Ann Larkins4  Svetha Venkatesh2 
[1] School of Medicine, Deakin University, Geelong, VIC, Australia
[2] Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong 3220, VIC, Australia
[3] Department of Computing, Curtin University, Perth, WA, Australia
[4] Barwon Health, Geelong, VIC, Australia
关键词: Hospital data;    Risk prediction;    Feature extraction;   
Others  :  1114582
DOI  :  10.1186/s12859-014-0425-8
 received in 2014-06-06, accepted in 2014-12-11,  发布年份 2014
PDF
【 摘 要 】

Background

Feature engineering is a time consuming component of predictive modeling. We propose a versatile platform to automatically extract features for risk prediction, based on a pre-defined and extensible entity schema. The extraction is independent of disease type or risk prediction task. We contrast auto-extracted features to baselines generated from the Elixhauser comorbidities.

Results

Hospital medical records was transformed to event sequences, to which filters were applied to extract feature sets capturing diversity in temporal scales and data types. The features were evaluated on a readmission prediction task, comparing with baseline feature sets generated from the Elixhauser comorbidities. The prediction model was through logistic regression with elastic net regularization. Predictions horizons of 1, 2, 3, 6, 12 months were considered for four diverse diseases: diabetes, COPD, mental disorders and pneumonia, with derivation and validation cohorts defined on non-overlapping data-collection periods.

For unplanned readmissions, auto-extracted feature set using socio-demographic information and medical records, outperformed baselines derived from the socio-demographic information and Elixhauser comorbidities, over 20 settings (5 prediction horizons over 4 diseases). In particular over 30-day prediction, the AUCs are: COPD—baseline: 0.60 (95% CI: 0.57, 0.63), auto-extracted: 0.67 (0.64, 0.70); diabetes—baseline: 0.60 (0.58, 0.63), auto-extracted: 0.67 (0.64, 0.69); mental disorders—baseline: 0.57 (0.54, 0.60), auto-extracted: 0.69 (0.64,0.70); pneumonia—baseline: 0.61 (0.59, 0.63), auto-extracted: 0.70 (0.67, 0.72).

Conclusions

The advantages of auto-extracted standard features from complex medical records, in a disease and task agnostic manner were demonstrated. Auto-extracted features have good predictive power over multiple time horizons. Such feature sets have potential to form the foundation of complex automated analytic tasks.

【 授权许可】

   
2014 Tran et al.; licensee BioMed Central.

【 预 览 】
附件列表
Files Size Format View
20150205011848409.pdf 1046KB PDF download
Figure 3. 24KB Image download
Figure 2. 69KB Image download
Figure 1. 79KB Image download
【 图 表 】

Figure 1.

Figure 2.

Figure 3.

【 参考文献 】
  • [1]Mayer-Schonberger V, Cukier KN: Big Data: A Revolution That Will Transform How We Live, Work, and Think: Eamon Dolan/Houghton Mifflin Harcourt; 2013
  • [2]de Lusignan S, van Weel C: The use of routinely collected computer data for research in primary care: opportunities and challenges. Fam Pract 2006, 23(2):253-263.
  • [3]Molokhia M, Curcin V, Majeed A: Improving pharmacovigilance. Use of routinely collected data. BMJ (Clin Res Ed) 2010, 340:c2403.
  • [4]Luo W, Cao J, Gallagher M, Wiles J: Estimating the intensity of ward admission and its effect on emergency department access block. Statistics in medicine 2013, 32(15):2681-2694.
  • [5]de Lusignan S, Metsemakers JF, Houwink P, Gunnarsdottir V, van der Lei J: Routinely collected general practice data: goldmines for research? A report of the European Federation for Medical Informatics Primary Care Informatics Working Group (EFMI PCIWG) from MIE2006, Maastricht, The Netherlands. Inform Primary Care 2006, 14(3):203-209.
  • [6]Keen J, Calinescu R, Paige R, Rooksby J: Big data + politics = open data: The case of health care data in England.Policy and Internet 2013, 5(2)228–243.
  • [7]Sharabiani MT, Aylin P, Bottle A: Systematic review of comorbidity indices for administrative data. Med Care 2012, 50(12):1109-1118.
  • [8]Elixhauser A, Steiner C, Harris DR, Coffey RM: Comorbidity measures for use with administrative data. Med Care 1998, 36(1):8-27.
  • [9]Charlson ME, Pompei P, Ales KL, MacKenzie CR: A new method of classifying prognostic comorbidity in longitudinal studies: development and validation. J Chronic Dis 1987, 40(5):373-383.
  • [10]Deyo RA, Cherkin DC, Ciol MA: Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol 1992, 45(6):613-619.
  • [11]Romano PS, Roos LL, Jollis JG: Adapting a clinical comorbidity index for use with ICD-9-CM administrative data: differing perspectives. J Clin Epidemiol 1993, 46(10):1075-1079. discussion 1081–1090
  • [12]Tabak YP, Sun X, Derby KG, Kurtz SG, Johannes RS: Development and validation of a disease-specific risk adjustment system using automated clinical data. Health Serv Res 2010, 45(6 Pt 1):1815-1835.
  • [13]Kansagara D, Englander H, Salanitro A, Kagen D, Theobald C, Freeman M, Kripalani S: Risk prediction models for hospital readmission: A systematic review. JAMA 2011, 306(15):1688-1698.
  • [14]Foreign Affairs Media Conference Call: Kenneth Cukier and Michael Flowers on "Big Data". http://www.cfr.org/health-science-and-technology/foreign-affairs-media-conference-call-kenneth-cukier-michael-flowers-big-data/p30695
  • [15]Lenzer J: FDA is incapable of protecting US "against another Vioxx". BMJ (Clin Res Ed) 2004, 329(7477):1253.
  • [16]Hastie T, Tibshirani R, Friedman J: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. In., 2 edn: Springer, New York, NY, USA, 2009.
  • [17]Markovitch S, Rosenstein D: Feature generation using general constructor functions. Mach Learn 2002, 49(1):59-98.
  • [18]Shahar Y, Musen MA: Knowledge-based temporal abstraction in clinical domains. Artif Intell Med 1996, 8(3):267-298.
  • [19]Dharmarajan K, Hsieh AF, Lin Z, Bueno H, Ross JS, Horwitz LI, Barreto-Filho JA, Kim N, Bernheim SM, Suter LG, Drye EE, Krumholz HM: Diagnoses and timing of 30-day readmissions after hospitalization for heart failure, acute myocardial infarction, or pneumonia. JAMA 2013, 309(4):355-363.
  • [20]Krumholz HM, Lin Z, Keenan PS, Chen J, Ross JS, Drye EE, Bernheim SM, Wang Y, Bradley EH, Han LF, Normand SLT: Relationship between hospital readmission and mortality rates for patients hospitalized with acute myocardial infarction, heart failure, or pneumonia. JAMA 2013, 309(6):587-593.
  • [21]Burke RE, Coleman EA: Interventions to Decrease Hospital Readmissions: Keys for Cost-effectiveness. JAMA Intern Med 2013, 173(8):695-698.
  • [22]Chatfield C: The analysis of time series: an introduction, vol. 59: Chapman and Hall/CRC Boca Raton, Florida; 2003
  • [23]Wood SN: Generalized additive models: an introduction with R, vol. 66: Chapman & Hall; 2006
  • [24]WHO: International Statistical Classification of Diseases and Related Health Problems 10th Revision. In.; 2010
  • [25]The National Casemix and Classification Centre: The Australian Classification of Health Interventions (ACHI). 7th edition. The National Casemix and Classification Centre, Sydney; 2013.
  • [26]World Health Organization: Anatomical therapeutic chemical classification system. WHO, Oslo, Norway; 2003.
  • [27]National Casemix and Classification Centre: Australian Refined Diagnosis Related Groups (AR-DRGs). National Casemix And Classification Centre, Sydney; 2012.
  • [28]Strang G, Nguyen T: Wavelets and filter banks: Wellesley Cambridge Press, Wellesley MA, USA; 1996
  • [29]Quan H, Sundararajan V, Halfon P, Fong A, Burnand B, Luthi JC, Saunders LD, Beck CA, Feasby TE, Ghali WA: Coding algorithms for defining comorbidities in ICD-9-CM and ICD-10 administrative data. Med Care 2005, 43(11):1130-1139.
  • [30]Zou H, Hastie T: Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 2005, 67(2):301-320.
  • [31]Ye N: The handbook of data mining: Lawrence Erlbaum Associates, Publishers, Mahwah, NJ, USA; 2003
  • [32]Halfon P, Eggli Y, Pretre-Rohrbach I, Meylan D, Marazzi A, Burnand B: Validation of the potentially avoidable hospital readmission rate as a routine indicator of the quality of hospital care. Med Care 2006, 44(11):972-981.
  • [33]Allaudeen N, Schnipper JL, Orav EJ, Wachter RM, Vidyarthi AR: Inability of providers to predict unplanned readmissions. J Gen Intern Med 2011, 26(7):771-776.
  • [34]Mathias JS, Agrawal A, Feinglass J, Cooper AJ, Baker DW, Choudhary A: Development of a 5 year life expectancy index in older adults using predictive mining of electronic health record data. J Am Med Inform Assoc: JAMIA 2013, 20(e1):e118-e124.
  • [35]Coleman EA, Min SJ, Chomiak A, Kramer AM: Posthospital care transitions: patterns, complications, and risk identification. Health Serv Res 2004, 39(5):1449-1465.
  • [36]Lowe D: Distinctive Image Features from Scale-Invariant Keypoints. Int J Comput Vis 2004, 60(2):91-110.
  文献评价指标  
  下载次数:40次 浏览次数:9次