期刊论文详细信息
CAAI Transactions on Intelligence Technology
Text segmentation of health examination item based on character statistics and information measurement
article
Hui An1  Dahui Wang1  Zhigeng Pan1  Meiling Chen1  Xinting Wang1 
[1] DigitalMedia & Interaction Research Center, Hangzhou Normal University, Wenzhou People's Hospital;Department of Health Examination, Hangzhou Normal University;Institute of Industrial VR, Foshan University
关键词: image segmentation;    text analysis;    data mining;    graph theory;    data analysis;    text segmentation;    health examination item;    character statistics;    information measurement;    segmentation algorithm;    item text data;    single long length data;    historical health examination data;    connection tightness values;    adjacent characters;    position BP;    balance weight BW;    total segmentation indexes SIs;    segmentation position Pos;    optimal parameter values;    frequently appeared text item;    existing domain knowledge;    single item;    professional domain knowledge;    historical data;    text understanding;    domain knowledge graph;    intelligent method;    automated method;    automatic data analysis;    information classification;    health assessment;    C1160 Combinatorial mathematics;    C6130 Data handling techniques;    C6170K Knowledge engineering techniques;    C7330 Biology and medical computing;   
DOI  :  10.1049/trit.2018.0005
学科分类:数学(综合)
来源: Wiley
PDF
【 摘 要 】

This study explores the segmentation algorithm of item text data, especially of single long length data in health examination. In the specific implementation, a large amount of historical health examination data is analysed. Using the method of character statistics, the connection tightness values T AB s between two adjacent characters are calculated. Three parameters, the candidate number N , the best position BP, and balance weight BW are set. The total segmentation indexes SIs are calculated, thus determined the segmentation position Pos. The optimal parameter values are determined by the method of information measurement. Experimental results show that the accuracy rate is 78.6% and reaches 82.9% in the most frequently appeared text item. The complexity of the algorithm is O ( n ). Using no existing domain knowledge, it is very simple and fast. By executed repeatedly, it is convenient to obtain the characteristics of each single item of text data, furthermore, to distinguish respective express preference of different physicians to the same item. The assumption is verified that without professional domain knowledge, a large amount of historical data can provide valuable clues for the text understanding. The results of this research are being applied and verified in the following research works in the field of health examination.

【 授权许可】

CC BY|CC BY-ND|CC BY-NC|CC BY-NC-ND   

【 预 览 】
附件列表
Files Size Format View
RO202107100000091ZK.pdf 161KB PDF download
  文献评价指标  
  下载次数:2次 浏览次数:2次