期刊论文详细信息
Data Science and Engineering
Scaling Word2Vec on Big Corpus
  1    2    3    3    3    4 
[1] 0000 0001 2179 2105, grid.32197.3e, Tokyo Institute of Technology, Tokyo, Japan;AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, Tokyo, Japan;RIKEN Center for Computational Science, Kobe, Japan;0000 0001 2179 2105, grid.32197.3e, Tokyo Institute of Technology, Tokyo, Japan;RIKEN Center for Computational Science, Kobe, Japan;0000 0004 0368 8103, grid.24539.39, Renmin University of China, Beijing, China;0000 0004 0368 8103, grid.24539.39, Renmin University of China, Beijing, China;0000 0001 2179 2105, grid.32197.3e, Tokyo Institute of Technology, Tokyo, Japan;
关键词: Machine learning;    Natural language processing;    High performance computing;    Word embeddings;   
DOI  :  10.1007/s41019-019-0096-6
来源: publisher
PDF
【 摘 要 】

Word embedding has been well accepted as an important feature in the area of natural language processing (NLP). Specifically, the Word2Vec model learns high-quality word embeddings and is widely used in various NLP tasks. The training of Word2Vec is sequential on a CPU due to strong dependencies between word–context pairs. In this paper, we target to scale Word2Vec on a GPU cluster. To do this, one main challenge is reducing dependencies inside a large training batch. We heuristically design a variation of Word2Vec, which ensures that each word–context pair contains a non-dependent word and a uniformly sampled contextual word. During batch training, we “freeze” the context part and update only on the non-dependent part to reduce conflicts. This variation also directly controls the training iterations by fixing the number of samples and treats high-frequency and low-frequency words equally. We conduct extensive experiments over a range of NLP tasks. The results show that our proposed model achieves a 7.5 times acceleration on 16 GPUs without accuracy drop. Moreover, by using high-level Chainer deep learning framework, we can easily implement Word2Vec variations such as CNN-based subword-level models and achieves similar scaling results.

【 授权许可】

CC BY   

【 预 览 】
附件列表
Files Size Format View
RO201910108938975ZK.pdf 2288KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:14次