Journal of Biomedical Semantics | |
Development and validation of a classification approach for extracting severity automatically from electronic health records | |
George Hripcsak2  Nicholas P Tatonetti1  Mary Regina Boland2  | |
[1] Department of Medicine, Columbia University, New York, NY, USA;Observational Health Data Sciences and Informatics (OHDSI), Columbia University, 622 West 168th Street, PH-20, New York, NY, USA | |
关键词: Outcome assessment (Health Care); Data mining; Health status indicators; Phenotype; Electronic Health Records; | |
Others : 1151609 DOI : 10.1186/s13326-015-0010-8 |
|
received in 2014-11-03, accepted in 2015-03-03, 发布年份 2015 | |
【 摘 要 】
Background
Electronic Health Records (EHRs) contain a wealth of information useful for studying clinical phenotype-genotype relationships. Severity is important for distinguishing among phenotypes; however other severity indices classify patient-level severity (e.g., mild vs. acute dermatitis) rather than phenotype-level severity (e.g., acne vs. myocardial infarction). Phenotype-level severity is independent of the individual patient’s state and is relative to other phenotypes. Further, phenotype-level severity does not change based on the individual patient. For example, acne is mild at the phenotype-level and relative to other phenotypes. Therefore, a given patient may have a severe form of acne (this is the patient-level severity), but this does not effect its overall designation as a mild phenotype at the phenotype-level.
Methods
We present a method for classifying severity at the phenotype-level that uses the Systemized Nomenclature of Medicine – Clinical Terms. Our method is called the Classification Approach for Extracting Severity Automatically from Electronic Health Records (CAESAR). CAESAR combines multiple severity measures – number of comorbidities, medications, procedures, cost, treatment time, and a proportional index term. CAESAR employs a random forest algorithm and these severity measures to discriminate between severe and mild phenotypes.
Results
Using a random forest algorithm and these severity measures as input, CAESAR differentiates between severe and mild phenotypes (sensitivity = 91.67, specificity = 77.78) when compared to a manually evaluated reference standard (k = 0.716).
Conclusions
CAESAR enables researchers to measure phenotype severity from EHRs to identify phenotypes that are important for comparative effectiveness research.
【 授权许可】
2015 Boland et al.; licensee BioMed Central.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20150406090147639.pdf | 2388KB | download | |
Figure 6. | 63KB | Image | download |
Figure 5. | 33KB | Image | download |
Figure 4. | 51KB | Image | download |
Figure 3. | 82KB | Image | download |
Figure 2. | 113KB | Image | download |
Figure 1. | 54KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
【 参考文献 】
- [1]Sox HC, Greenfield S: Comparative effectiveness research: a report from the Institute of Medicine. Ann Intern Med 2009, 151:203-205.
- [2]Stang PE, Ryan PB, Racoosin JA, Overhage JM, Hartzema AG, Reich C, et al.: Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann Intern Med 2010, 153:600-606.
- [3]Kho AN, Pacheco JA, Peissig PL, Rasmussen L, Newton KM, Weston N, et al.: Electronic medical records for genetic research: results of the eMERGE consortium. Sci Transl Med 2011, 3:79re71.
- [4]Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, et al.: PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations. Bioinformatics 2010, 26:1205-1210.
- [5]Boland MR, Hripcsak G, Shen Y, Chung WK, Weng C: Defining a comprehensive verotype using electronic health records for personalized medicine. J Am Med Inform Assoc 2013, 20:e232-e238.
- [6]Weiskopf NG, Weng C: Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc 2013, 20:144-151.
- [7]Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB: Bias associated with mining electronic health records. J Biomed Discov Collab 2011, 6:48.
- [8]Hripcsak G, Albers DJ: Correlating electronic health record concepts with healthcare process events. J Am Med Inform Assoc 2013, 20:e311-e318.
- [9]Robinson PN, Köhler S, Bauer S, Seelow D, Horn D, Mundlos S: The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. Am J Hum Genet 2008, 83:610-615.
- [10]Elkin PL, Brown SH, Husser CS, Bauer BA, Wahner-Roedler D, Rosenbloom ST, et al.: Evaluation of the content coverage of SNOMED CT: ability of SNOMED clinical terms to represent clinical problem lists. In Mayo Clinic Proceedings. Elsevier, North America; 2006:741-748.
- [11]Stearns MQ, Price C, Spackman KA, Wang AY: SNOMED clinical terms: overview of the development process and project status. In Proceedings of the AMIA Symposium. American Medical Informatics Association, Ann Arbor; 2001:662.
- [12]Campbell JR, Payne TH: A comparison of four schemes for codification of problem lists. Proc Annu Symp Comput Appl Med Care 1994, 201:5.
- [13]Shah NH: Mining the ultimate phenome repository. Nat Biotechnol 2013, 31:1095-1097.
- [14]Green J, Wintfeld N, Sharkey P, Passman LJ: THe importance of severity of illness in assessing hospital mortality. JAMA 1990, 263:241-246.
- [15]Rich P, Scher RK: Nail psoriasis severity index: a useful tool for evaluation of nail psoriasis. J Am Acad Dermatol 2003, 49:206-212.
- [16]Bastien CH, Vallières A, Morin CM: Validation of the Insomnia Severity Index as an outcome measure for insomnia research. Sleep Med 2001, 2:297-307.
- [17]McLellan AT, Kushner H, Metzger D, Peters R, Smith I, Grissom G, et al.: The fifth edition of the Addiction Severity Index. J Subst Abuse Treat 1992, 9:199-213.
- [18]Rockwood TH, Church JM, Fleshman JW, Kane RL, Mavrantonis C, Thorson AG, et al.: Patient and surgeon ranking of the severity of symptoms associated with fecal incontinence. Dis Colon Rectum 1999, 42:1525-1531.
- [19]Horn SD, Horn RA: Reliability and validity of the severity of illness index. Med Care 1986, 24:159-178.
- [20]Huser V, Cimino JJ: Don’t take your EHR to heaven, donate it to science: legal and research policies for EHR post mortem. J Am Med Inform Assoc 2014, 21:8-12.
- [21]Perotte A, Hripcsak G: Temporal properties of diagnosis code time series in aggregate. IEEE J Biomed Health Inform 2013, 17:477-483.
- [22]Moskovitch R, Walsh C, Hripcsak G, Tatonetti NP: Prediction of Biomedical Events via Time Intervals Mining. ACM KDD Workshop on Connected Health in Big Data Era, NYC, USA; 2014.
- [23]Moskovitch R, Shahar Y: Classification-driven temporal discretization of multivariate time series. Data Min Knowl Disc 2014, 1:43.
- [24]Moskovitch R, Shahar Y: Fast time intervals mining using the transitivity of temporal relations. Knowl Inf Syst 2013, 1:28.
- [25]Averill RF, McGuire TE, Manning BE, Fowler DA, Horn SD, Dickson PS, et al.: A study of the relationship between severity of illness and hospital cost in New Jersey hospitals. Health Serv Res 1992, 27:587.
- [26]CMS. License for Use of Current Procedural Terminology, Four. http://www.cms.gov/apps/ama/license.asp?file=/physicianfeesched/downloads/cpepfiles022306.zip 2004, Accessed 25 April 2014.
- [27]Hyvärinen A, Oja E: Independent component analysis: algorithms and applications. Neural Netw 2000, 13:411-430.
- [28]Hripcsak G, Albers DJ: Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 2013, 20:117-121.
- [29]Likert R. A technique for the measurement of attitudes. Arch Psychol. 1932;140. http://www.worldcat.org/title/technique-for-the-measurement-of-attitudes/oclc/812060.
- [30]Cohen J: Weighted kappa: nominal scale agreement with provision for scaled disagreement or partial credit. Psychol Bull 1968, 70:213-220.
- [31]Revelle W. Package ‘psych’: Procedures for Psychological, Psychometric, and Personality Research (Version 1.4.4) [software]. 2014. Available from http://cran.r-project.org/web/packages/psych/psych.pdf.
- [32]Fleiss J: Measuring nominal scale agreement among many raters. Psychol Bull 1971, 76:378-382.
- [33]Gamer M, Lemon J, Fellows I, Sing P. Package irr: Various Coefficients of Interrater Reliability and Agreement (Version 0.84) [software]. 2013. Available from http://cran.r-project.org/web/packages/irr/irr.pdf.
- [34]Liaw A, Wiener M: Classification and Regression by randomForest. R news 2002, 2:18-22.
- [35]Breiman L, Cutler A, Liaw A, Wiener M. Package ‘randomForest’: Breiman and Cutler’s random forests for classification and regression (Version 4.6-7) [software]. 2012. Available from: http://cran.r-project.org/web/packages/randomForest/randomForest.pdf.
- [36]Westreich D: Berkson’s bias, selection bias, and missing data. Epidemiology (Cambridge, Mass) 2012, 23:159.
- [37]Tinetti ME, Studenski SA: Comparative Effectiveness Research and Patients with Multiple Chronic Conditions. N Engl J Med 2011, 364(26):2478-2481.
- [38]Nissim N, Boland MR, Moskovitch R, Tatonetti NP, Elovici Y, Shahar Y, et al.: An Active Learning Enhancement for Conditions Severity Classification. ACM KDD on Workshop on Connected Health at Big Data Era, NYC, USA; 2014.
- [39]Housman TS, Feldman SR, Williford PM, Fleischer AB Jr, Goldman ND, Acostamadiedo JM, et al.: Skin cancer is among the most costly of all cancers to treat for the Medicare population. J Am Acad Dermatol 2003, 48:425-429.
- [40]Di Rocco M, Giona F, Carubbi F, Linari S, Minichilli F, Brady RO, et al.: A new severity score index for phenotypic classification and evaluation of responses to treatment in type I Gaucher disease. Haematologica 2008, 93:1211-1218.
- [41]Holmes AB, Hawson A, Liu F, Friedman C, Khiabanian H, Rabadan R: Discovering disease associations by integrating electronic clinical data and medical literature. PLoS One 2011, 6:e21132.
- [42]Ryan PB, Madigan D, Stang PE, Schuemie MJ, Hripcsak G: Medication-wide association studies. Pharmacometr Syst Pharmacol 2013, 2:e76.
- [43]Dligach D, Bethard S, Becker L, Miller T, Savova GK: Discovering body site and severity modifiers in clinical texts. J Am Med Inform Assoc 2014, 21(3):448-454.