期刊论文详细信息
Journal of Biomedical Semantics
Mining characteristics of epidemiological studies from Medline: a case study in obesity
Goran Nenadic2  Iain Buchan1  George Karystianis2 
[1] Centre for Health Informatics, Institute of Population Health, University of Manchester, Manchester, UK;Health e-Research Centre, Manchester, UK
关键词: Rule-based methodology;    Key characteristics;    Epidemiology;    Text mining;   
Others  :  804485
DOI  :  10.1186/2041-1480-5-22
 received in 2014-04-09, accepted in 2014-04-15,  发布年份 2014
PDF
【 摘 要 】

Background

The health sciences literature incorporates a relatively large subset of epidemiological studies that focus on population-level findings, including various determinants, outcomes and correlations. Extracting structured information about those characteristics would be useful for more complete understanding of diseases and for meta-analyses and systematic reviews.

Results

We present an information extraction approach that enables users to identify key characteristics of epidemiological studies from MEDLINE abstracts. It extracts six types of epidemiological characteristic: design of the study, population that has been studied, exposure, outcome, covariates and effect size. We have developed a generic rule-based approach that has been designed according to semantic patterns observed in text, and tested it in the domain of obesity. Identified exposure, outcome and covariate concepts are clustered into health-related groups of interest. On a manually annotated test corpus of 60 epidemiological abstracts, the system achieved precision, recall and F-score between 79-100%, 80-100% and 82-96% respectively. We report the results of applying the method to a large scale epidemiological corpus related to obesity.

Conclusions

The experiments suggest that the proposed approach could identify key epidemiological characteristics associated with a complex clinical problem from related abstracts. When integrated over the literature, the extracted data can be used to provide a more complete picture of epidemiological efforts, and thus support understanding via meta-analysis and systematic reviews.

【 授权许可】

   
2014 Karystianis et al.; licensee BioMed Central Ltd.

【 预 览 】
附件列表
Files Size Format View
20140708062344624.pdf 450KB PDF download
Figure 1. 73KB Image download
【 图 表 】

Figure 1.

【 参考文献 】
  • [1]Hara K, Matsumoto Y: Extracting clinical trial design information from MEDLINE abstracts. N Gener Comput 2007, 25:263-275.
  • [2]Hansen JM, Rasmussen ON, Chung G: A Method for Extracting the Number of Trial Participants from Abstracts of Randomized Controlled Trials. J Telemed Telecare 2008, 14(7):354-358.
  • [3]Chung YG: Sentence retrieval for abstracts of randomized controlled trials. BMC Med Informat Decis Making 2009, 9:10. doi:10.1186/1472-6947-9-10 BioMed Central Full Text
  • [4]Kiritchenko S, De Bruijn B, Carini S, Martin J, Sim I: ExaCT: Automatic extraction of clinical trial characteristics from Journal Publications. BMC Med Informat Decis Making 2010, 10:56. BioMed Central Full Text
  • [5]Last MJ: A Dictionary of Epidemiology. New York: Oxford University Press; 2001:180.
  • [6]Buchan I, Canoy D: Challenges in obesity epidemiology. Obes Rev 2007, 8(suppl 1):1-11.
  • [7]Hossain P, Kawar B, El Nahas M: Obesity and diabetes in the developing world – a growing challenge. New Engl J Med 2007, 356(3):213-215.
  • [8]Duncan M, Griffith M, Rutter H, Goldacre JM: Certification of obesity as a cause of death in England 1979–2006. Eur J Public Health Advance Access 2010, 20(6):671-675.
  • [9]World Health Organisation (WHO): Definition of Obesity, Risk Factors, Complications, Epidemiology. 2012. [http://www.who.int/en/ webcite]
  • [10]Ogden LC: The epidemiology of obesity. Gastroenterology 2007, 132:2087.
  • [11]Cohen MA, Hersh RW: A survey of current work in biomedical text mining. Brief Bioninform 2005, 6(1):57-71.
  • [12]Meystre MS, Savova KG, Kipper-Schuler CK, Hurdle FJ: Extracting information from textual documents in the electronic health record: a review of recent research. Methods Inf Med 2008, 47(Suppl 1):128-144.
  • [13]Aramaki E, Miura Y, Tonoike M, Ohkuma T, Masuichi H, Waki K, Ohe K: Extraction of ADE from clinical records. IMIA 2010. doi:10,3233/978-1-60750-588-4-739
  • [14]Chowdhury MF, Lavelli A: Disease Mention Recognition with Specific Features. In Proceedings of the 2010 Workshop on BNLP, ACL. Uppsala, Sweden: Association for Computational Linguistics; 2010:83-90.
  • [15]Niu Y, Hirst G: Analysis of Semantic Classes in Medical Text for Q&A. Proc. ACL Workshop on Question Answering in Restricted Domains 2004, 54-61.
  • [16]Borlawsky T, Friedman C, Lussier AY: Generating Executable Knowledge for Evidence-based Medicine Using Natural Language and Semantic Processing. AMIA Annual Symposium 2006, 56-60.
  • [17]Chung YG, Coiera E: A Study of Structured Clinical Abstracts and the Semantic Classification of Sentences. Proceedings of the ACL workshop BioNLP. Association for Computational Linguistics 2007, 121-128.
  • [18]Demner-Fushman D, Lin J: Answering clinical questions with knowledge-based and statistical techniques. Comput Ling 2007, 33(1):63-103.
  • [19]Fiszman M, Rosemblat G, Ahlers CB, Rindflesch TC: Identifying risk factors for metabolic syndrome in biomedical text. AMIA Annu Symp Proc 2007, 2007:249-253.
  • [20]Xu R, Gatern Y, Superkar SK, Das KA, Altman BR, Garber MA: Extracting Subject Demographic Information from Abstracts of Randomized Clinical Trial Reports. Proc. 12th World Congress on Health (Medical) Informatics 2007, 550-554.
  • [21]Chen ES, Hirpcsak G, Xu H, Markatou M, Friedman C: Automated acquisition of disease-drug knowledge from biomedical and clinical documents: an initial study. J Am Med Infor Assoc 2008, 15:87-98.
  • [22]De Bruijn B, Carini S, Kiritchenko S, Martin J, Sim I: Automated Information Extraction of Key Trial Design Elements from Clinical Trial Reports. AMIA Annual Symposium 2008, 141-145.
  • [23]Chung YG: Towards identifying intervention arms in RCTs: extracting coordinating constructions. J Biomed Inform 2009, 42(5):790-800.
  • [24]Gerner M, Nenadic G, Bergman CM: LINNAEUS: a species name identification system for biomedical literature. BMC Bioinformatics 2010, 11:85. BioMed Central Full Text
  • [25]Aronson AR, Lang FM: An overview of MetaMap: historical perspectove and recent advances. J Am Med Inform Assoc 2010, 17(3):229-236.
  • [26]SPECIALIST Lexicon 2014. https://www.nlm.nih.gov/research/umls/new_users/online_learning/LEX_001.html webcite
  • [27]Frantzi K, Ananiadou S: Automatic Recognition of Multi-Word Terms: the C/NC value method. Intern J Digital Libraries 2000, 3(2):115-130.
  • [28]Nenadić G, Ananiadou S, McNaught J: Enhancing automatic term recognition through recognition of variation. In Proceedings of COLING. Geneva; 2004:604-610.
  • [29]Fox C: A Stop List for General Text. In ACM SIGIR Forum, Volume 24, no. 1–2. New York, NY, USA: ACM; 1989:19-21.
  • [30]Cohen WW: MinorThird: Methods for Identifying Names and Ontological Relations in Text using Heuristics for Inducing Regularities from Data. 2004. http://minorthird.sourceforge.net webcite
  • [31]Ananiadou S, Kell DB, Tsujii J: Text mining and its potential applications in systems biology. Trends Biotechnol 2006, 24(12):571-579.
  • [32]Kim JD, Tsujii J: Corpora and their annotations. In Text Mining for Biology and Biomedicine. Edited by Ananiadou S, McNaught J. Artech House; 2006. ISBN 1-5053-984-X
  文献评价指标  
  下载次数:30次 浏览次数:11次