学位论文详细信息
규칙기반 자연어처리 기술을 이용한 의료문서 필터링
자연어처리;Rule-based classifier;Decision Tree;의료문서 필터링;660
공과대학 협동과정 바이오엔지니어링전공 ;
University:서울대학교 대학원
关键词: 자연어처리;    Rule-based classifier;    Decision Tree;    의료문서 필터링;    660;   
Others  :  http://s-space.snu.ac.kr/bitstream/10371/122436/1/000000021017.pdf
美国|英语
来源: Seoul National University Open Repository
PDF
【 摘 要 】

Physicians encounter ultrasound reports of thyroid neoplasm everyday. Ultrasound reports are classified into three types. RECUR is a report of a patient whose cancer recurred. INTER is a report used when it is not certain whether cancer recurred or not. NED stands for No Evidence of Disease. The proportion among the three types is not uniform. It is more likely to see NED reports than RECUR or INTER reports. However, physicians have to review all the reports manually. Physicians want to see the detail of the recurrence reports, so filtering reports that do not have the evidence of disease is important and can reduce human workload. These documents are clinical texts, thus classifying RECUR documents as NED documents is unacceptable. We developed a rule-based classifier using JAVA which detects the keywords in the reports and classifies the reports into the three categories using the patterns. The evaluation showed 92.34% accuracy in classifying into the three types. Also, a very crucial result of this paper is the 1.0 of precision in NED class. The 1.0 precision in NED means that those classified as NED consist of documents that were actually only NED documents. In addition, to evaluate the rule-based classifier, we experimented the decision tree and machine learning techniques. The decision tree and machine learning technique experimented using WEKA. For this experiment, we used 80 keywords for the feature sets. The overall accuracy of the decision trees was 0.74% higher than the rule-based classifier, but it misclassified three documents. The precision in NED was 0.962 in the decision tree and 0.945 in the machine learning technique which were all lower than the precision of 1.0 in the rule-based classifier.

【 预 览 】
附件列表
Files Size Format View
규칙기반 자연어처리 기술을 이용한 의료문서 필터링 1492KB PDF download
  文献评价指标  
  下载次数:8次 浏览次数:13次