会议论文详细信息
2018 2nd International Conference on Artificial Intelligence Applications and Technologies
Improving Neural Chinese Word Segmentation Using Unlabeled Data
计算机科学
Zhang, Yanna^1 ; Xu, Jinan^1 ; Miao, Guoyi^1 ; Chen, Yufeng^1 ; Zhang, Yujie^1
School of Computer and Information Technology, Beijing Jiaotong University, Shangyuancun No.3, Beijing, China^1
关键词: Attention mechanisms;    Chinese word segmentation;    Contextual information;    Neural network model;    Sampling strategies;    Semantic similarity;    Similarity algorithm;    State-of-the-art methods;   
Others  :  https://iopscience.iop.org/article/10.1088/1757-899X/435/1/012032/pdf
DOI  :  10.1088/1757-899X/435/1/012032
学科分类:计算机科学(综合)
来源: IOP
PDF
【 摘 要 】
Supervised word segmentation heavily relies on large-scale and high quality labeled data. However, building such a corpus is difficult, especially with respect to domain specific data. In this paper, we propose a novel semi-supervised Chinese word segmentation (CWS) method. Specifically, we seek to select more useful sample sentences from the large-scale unlabeled sentences to extend the training data, by means of a sampling strategy that uses character-based semantic similarity. The presented similarity algorithm is used to calculate the similarity between unlabeled sentences and the training data, which can help select helpful sample sentences from unlabeled data. In addition, we integrate an attention mechanism into our word segmentation model to focus on available contextual information. Experiments on PKU, MSR and Weibo benchmark data sets show that our method outperforms the previous neural network models and state-of-the-art methods.
【 预 览 】
附件列表
Files Size Format View
Improving Neural Chinese Word Segmentation Using Unlabeled Data 736KB PDF download
  文献评价指标  
  下载次数:5次 浏览次数:20次