期刊论文详细信息
BMC Medical Informatics and Decision Making
PubMed-supported clinical term weighting approach for improving inter-patient similarity measure in diagnosis prediction
Research Article
Tao Chan1  SC Cesar Wong2  KY Kwok2  Thomas YH Lau2  KF Lo2  Helen KW Law2  William YL Chan2  SW Yeung2  Andy PH Yeung2  Lawrence WC Chan2  Chi-Ren Shyu3  Ying Liu4 
[1] Department of Diagnostic Radiology, University of Hong Kong, Pokfulam, Hong Kong;Department of Health Technology and Informatics, Hong Kong Polytechnic University, Kowloon, Hung Hom, Hong Kong;Informatics Institute and Department of Computer Science, University of Missouri, Columbia, MO, USA;Institute of Mechanical and Manufacturing Engineering, School of Engineering, Cardiff University, CF24 3AA, Cardiff, UK;
关键词: Feature Vector;    Conditional Probability;    Similarity Score;    Receiver Operating Characteristic;    Electronic Health Record;   
DOI  :  10.1186/s12911-015-0166-2
 received in 2014-12-24, accepted in 2015-05-22,  发布年份 2015
来源: Springer
PDF
【 摘 要 】

BackgroundSimilarity-based retrieval of Electronic Health Records (EHRs) from large clinical information systems provides physicians the evidence support in making diagnoses or referring examinations for the suspected cases. Clinical Terms in EHRs represent high-level conceptual information and the similarity measure established based on these terms reflects the chance of inter-patient disease co-occurrence. The assumption that clinical terms are equally relevant to a disease is unrealistic, reducing the prediction accuracy. Here we propose a term weighting approach supported by PubMed search engine to address this issue.MethodsWe collected and studied 112 abdominal computed tomography imaging examination reports from four hospitals in Hong Kong. Clinical terms, which are the image findings related to hepatocellular carcinoma (HCC), were extracted from the reports. Through two systematic PubMed search methods, the generic and specific term weightings were established by estimating the conditional probabilities of clinical terms given HCC. Each report was characterized by an ontological feature vector and there were totally 6216 vector pairs. We optimized the modified direction cosine (mDC) with respect to a regularization constant embedded into the feature vector. Equal, generic and specific term weighting approaches were applied to measure the similarity of each pair and their performances for predicting inter-patient co-occurrence of HCC diagnoses were compared by using Receiver Operating Characteristics (ROC) analysis.ResultsThe Areas under the curves (AUROCs) of similarity scores based on equal, generic and specific term weighting approaches were 0.735, 0.728 and 0.743 respectively (p < 0.01). In comparison with equal term weighting, the performance was significantly improved by specific term weighting (p < 0.01) but not by generic term weighting. The clinical terms “Dysplastic nodule”, “nodule of liver” and “equal density (isodense) lesion” were found the top three image findings associated with HCC in PubMed.ConclusionsOur findings suggest that the optimized similarity measure with specific term weighting to EHRs can improve significantly the accuracy for predicting the inter-patient co-occurrence of diagnosis when compared with equal and generic term weighting approaches.

【 授权许可】

CC BY   
© Chan et al.; licensee BioMed Central. 2015

【 预 览 】
附件列表
Files Size Format View
RO202311090596585ZK.pdf 980KB PDF download
【 参考文献 】
  • [1]
  • [2]
  • [3]
  • [4]
  • [5]
  • [6]
  • [7]
  • [8]
  • [9]
  • [10]
  • [11]
  • [12]
  • [13]
  • [14]
  • [15]
  • [16]
  • [17]
  • [18]
  • [19]
  • [20]
  • [21]
  文献评价指标  
  下载次数:2次 浏览次数:0次