会议论文详细信息
2018 2nd annual International Conference on Cloud Technology and Communication Engineering
Research and Implementation of Chinese Text Automatic Proofreading System
计算机科学;无线电电子学
Gong, Yonggang^1 ; Fu, Junying^1 ; Lian, Xiaoqin^1 ; Li, Yuying^1
Beijing Key Laboratory of Big Data Technology for Food Safety, School of Computer and Information Engineering, Beijing Technology and Business University, Beijing
100048, China^1
关键词: Accuracy rate;    Application fields;    Error detection and correction;    High-speed processing;    Knowledge base;    Model library;    N-gram modeling;    Sensitive informations;   
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/466/1/012090/pdf
DOI  :  10.1088/1757-899X/466/1/012090
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】

The news media platform has a huge amount of original news releases every day, it is impractical to use manual review of text typos. This paper designed and implemented a Chinese text automatic proofreading system for large-scale text content and high-speed processing. The proofreading content is first analyzed and classified: typos and sensitive information. Firstly, the system used the n-gram model to statistically analyze the corpus after segmentation to form a 2-gram model library and a contextual context library; secondly, builded a typo confusion set, and then calculated the probability of the target word in the knowledge base to realize automatic error detection and correction of Chinese text. The system has been successfully applied to the error of the content of many government news media platforms, each server can handle one million articles every day. The results show that the recall rate of the article is 78.9% and the accuracy rate is 85.1%. It meets the demand of high speed and accurate processing of massive text error, and has important practical significance and application fields.

【 预 览 】
附件列表
Files Size Format View
Research and Implementation of Chinese Text Automatic Proofreading System 346KB PDF download
  文献评价指标  
  下载次数:7次 浏览次数:46次