Electronic Medical Records (EMRs) are valuable resources for clinical observational studies. Smoking status of apatient is one of the key factors for many diseases, but it is often embedded in narrative text. Natural languageprocessing (NLP) systems have been developed for this specific task, such as the smoking status detection module inthe clinical Text Analysis and Knowledge Extraction System (cTAKES). This study examined transportability of thesmoking module in cTAKES on the Vanderbilt University Hospital’s EMR data. Our evaluation demonstrated thatmodest effort of change is necessary to achieve desirable performance. We modified the system by filtering notes,annotating new data for training the machine learning classifier, and adding rules to the rulebased classifiers. Ourresults showed that the customized module achieved significantly higher Fmeasures at all levels of classification
【 预 览 】
附件列表
Files
Size
Format
View
A Study of Transportability of an Existing Smoking Status Detection Moduleacross Institutions