期刊论文

【摘要】

This study explores the segmentation algorithm of item text data, especially of single long length data in health examination. In the specific implementation, a large amount of historical health examination data is analysed. Using the method of character statistics, the connection tightness values T AB s between two adjacent characters are calculated. Three parameters, the candidate number N , the best position BP, and balance weight BW are set. The total segmentation indexes SIs are calculated, thus determined the segmentation position Pos. The optimal parameter values are determined by the method of information measurement. Experimental results show that the accuracy rate is 78.6% and reaches 82.9% in the most frequently appeared text item. The complexity of the algorithm is O ( n ). Using no existing domain knowledge, it is very simple and fast. By executed repeatedly, it is convenient to obtain the characteristics of each single item of text data, furthermore, to distinguish respective express preference of different physicians to the same item. The assumption is verified that without professional domain knowledge, a large amount of historical data can provide valuable clues for the text understanding. The results of this research are being applied and verified in the following research works in the field of health examination.

【授权许可】

CC BY|CC BY-ND|CC BY-NC|CC BY-NC-ND

【预览】

附件列表
Files	Size	Format	View
RO202107100000091ZK.pdf	161KB	PDF	download

CAAI Transactions on Intelligence Technology
Text segmentation of health examination item based on character statistics and information measurement
article
Hui An¹ Dahui Wang¹ Zhigeng Pan¹ Meiling Chen¹ Xinting Wang¹
[1] DigitalMedia & Interaction Research Center, Hangzhou Normal University, Wenzhou People's Hospital;Department of Health Examination, Hangzhou Normal University;Institute of Industrial VR, Foshan University
关键词: image segmentation; text analysis; data mining; graph theory; data analysis; text segmentation; health examination item; character statistics; information measurement; segmentation algorithm; item text data; single long length data; historical health examination data; connection tightness values; adjacent characters; position BP; balance weight BW; total segmentation indexes SIs; segmentation position Pos; optimal parameter values; frequently appeared text item; existing domain knowledge; single item; professional domain knowledge; historical data; text understanding; domain knowledge graph; intelligent method; automated method; automatic data analysis; information classification; health assessment; C1160 Combinatorial mathematics; C6130 Data handling techniques; C6170K Knowledge engineering techniques; C7330 Biology and medical computing;
DOI : 10.1049/trit.2018.0005
学科分类：数学（综合）
来源: Wiley
PDF


	文献评价指标
	下载次数：16次	浏览次数：4次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】