Journal of Biomedical Semantics | |
A common type system for clinical natural language processing | |
Christopher G Chute1  Hongfang Liu1  Guergana K Savova4  Wendy W Chapman2  Lee Becker3  Pei Chen4  James J Masanz1  Dmitriy Dligach4  Vinod C Kaggal1  Stephen T Wu1  | |
[1] Mayo Clinic, Rochester, Rochester, MN, USA;University of California, San Diego, San Diego, CA, USA;University of Colorado at Boulder, Boulder, CO, USA;Childrens Hospital Boston and Harvard Medical School, Boston, MA, USA | |
关键词: Common type system; Clinical Element Models; Clinical information extraction; Standards and interoperability; Natural Language Processing; | |
Others : 812613 DOI : 10.1186/2041-1480-4-1 |
|
received in 2012-06-26, accepted in 2012-12-23, 发布年份 2013 | |
【 摘 要 】
Background
One challenge in reusing clinical data stored in electronic medical records is that these data are heterogenous. Clinical Natural Language Processing (NLP) plays an important role in transforming information in clinical text to a standard representation that is comparable and interoperable. Information may be processed and shared when a type system specifies the allowable data structures. Therefore, we aim to define a common type system for clinical NLP that enables interoperability between structured and unstructured data generated in different clinical settings.
Results
We describe a common type system for clinical NLP that has an end target of deep semantics based on Clinical Element Models (CEMs), thus interoperating with structured data and accommodating diverse NLP approaches. The type system has been implemented in UIMA (Unstructured Information Management Architecture) and is fully functional in a popular open-source clinical NLP system, cTAKES (clinical Text Analysis and Knowledge Extraction System) versions 2.0 and later.
Conclusions
We have created a type system that targets deep semantics, thereby allowing for NLP systems to encapsulate knowledge from text and share it alongside heterogenous clinical data sources. Rather than surface semantics that are typically the end product of NLP algorithms, CEM-based semantics explicitly build in deep clinical semantics as the point of interoperability with more structured data types.
【 授权许可】
2013 Wu et al.; licensee BioMed Central Ltd.
【 预 览 】
Files | Size | Format | View |
---|---|---|---|
20140709090806131.pdf | 3371KB | download | |
Figure 7. | 48KB | Image | download |
Figure 6. | 49KB | Image | download |
Figure 5. | 99KB | Image | download |
Figure 4. | 175KB | Image | download |
Figure 3. | 101KB | Image | download |
Figure 2. | 130KB | Image | download |
Figure 1. | 86KB | Image | download |
【 图 表 】
Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
Figure 7.
【 参考文献 】
- [1]Ferrucci D, Lally A: UIMA: an architectural approach to unstructured information processing in the corporate research environment. Nat Lang Eng 2004, 10:327-348.
- [2]Wu S, Kaggal V, Savova G, Liu H, Dligach D, Zheng J, Chapman W, Chute C: Generality and Reuse in a Common Type System for Clinical Natural Language Processing. Managing Interoperability and Complexity in Health Systems (MIXHS) 2011.
- [3]Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG: Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc 2010, 17:507-513.
- [4]Verspoor K, Baumgartner W Jr, Roeder C, Hunter L: Abstracting the types away from a UIMA type system. In From Form to Meaning: Processing Texts Automatically. Edited by Chiarcos C, Castilho E, Stede M. Tubingen: Narr; 2009.
- [5]Hahn U, Buyko E, Landefeld R, Mühlhausen M, Poprat M, Tomanek K, Wermter J: An overview of JCoRe, the JULIE lab UIMA component repository. Book An overview of JCoRe, the JULIE lab UIMA component repository, vol. 8. pp. 1–7 2008, 1-7.
- [6]Kano Y, Baumgartner WA Jr, McCrohon L, Ananiadou S, Cohen KB, Hunter L, Tsujii J: U-Compare: share and compare text mining tools with UIMA. Bioinformatics 2009, 25:1997-1998.
- [7]Cunningham DH, Maynard DD, Bontcheva DK, Tablan MV: GATE: a framework and graphical development environment for robust NLP tools and applications. Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics 2002, 168-175.
- [8]Klabbers E, Odijk J, De Pijper J, Theune M: GoalGetter: Football results, from teletext to speech. IPO Annual Progress Report 1996, 31:66-75.
- [9]Stent A, Dowding J, Gawron JM, Bratt EO, Moore R: The CommandTalk spoken dialogue system. Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL'99) 1999, 183-190.
- [10]Buchholz S, Marsi E: CoNLL-X shared task on multilingual dependency parsing. Proceedings of the Tenth Conference on Computational Natural Language Learning 2006, 149-164.
- [11]Marcus MP, Marcinkiewicz MA, Santorini B: Building a large annotated corpus of English: The Penn Treebank. Computational linguistics 1993, 19:313-330.
- [12]de Marneffe M-C, Manning CD: The Stanford typed dependencies representation. Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation 2008, 1-8.
- [13]Gildea D, Jurafsky D: Automatic labeling of semantic roles. Computational Linguistics 2002, 28:245-288.
- [14]Carreras X, Márquez L: Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005) 2005, 152-164.
- [15]Kingsbury P, Palmer M: 2003. Proc Treebanks and Lexical Theories 2003.
- [16]Bodenreider O, McCray AT: Exploring semantic groups through visual approaches. Journal of biomedical informatics 2003, 36:414-432.
- [17]Zeng QT, Goryachev S, Weiss S, Sordo M, Murphy SN, Lazarus R: Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system. BMC Med Inform Decis Mak 2006, 6:30. BioMed Central Full Text
- [18]Jagannathan V, Elmaghraby A: MEDKAT: multiple expert DELPHI-based Knowledge Acquisition Tool. 1985, 103-110.
- [19]Aronson AR, Lang FM: An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc 2010, 17:229-236.
- [20]Christensen L, Harkema H, Irwin J, Schleyer T, Haug P, Chapman WW: ONYX: A System for the Semantic Analysis of Clinical Text. Proc BioNLP2009 Workshop of ACL 2009, 19-27.
- [21]Friedman C, Kra P, Rzhetsky A: Two biomedical sublanguages: a description based on the theories of Zellig Harris. J of Biomedical Informatics 2002, 35:222-235.
- [22]Irwin JY, Harkema H, Christensen LM, Schleyer T, Haug PJ, Chapman WW: Methodology to develop and evaluate a semantic representation for NLP. AMIA Annu Symp Proc 2009, 2009:271-275.