期刊论文

【摘要】

Word embedding has been well accepted as an important feature in the area of natural language processing (NLP). Specifically, the Word2Vec model learns high-quality word embeddings and is widely used in various NLP tasks. The training of Word2Vec is sequential on a CPU due to strong dependencies between word–context pairs. In this paper, we target to scale Word2Vec on a GPU cluster. To do this, one main challenge is reducing dependencies inside a large training batch. We heuristically design a variation of Word2Vec, which ensures that each word–context pair contains a non-dependent word and a uniformly sampled contextual word. During batch training, we “freeze” the context part and update only on the non-dependent part to reduce conflicts. This variation also directly controls the training iterations by fixing the number of samples and treats high-frequency and low-frequency words equally. We conduct extensive experiments over a range of NLP tasks. The results show that our proposed model achieves a 7.5 times acceleration on 16 GPUs without accuracy drop. Moreover, by using high-level Chainer deep learning framework, we can easily implement Word2Vec variations such as CNN-based subword-level models and achieves similar scaling results.

【授权许可】

CC BY

【预览】

附件列表
Files	Size	Format	View
RO201910108938975ZK.pdf	2288KB	PDF	download

Data Science and Engineering
Scaling Word2Vec on Big Corpus

¹ ² ³ ³ ³ ⁴
[1] 0000 0001 2179 2105, grid.32197.3e, Tokyo Institute of Technology, Tokyo, Japan;AIST-Tokyo Tech Real World Big-Data Computation Open Innovation Laboratory, Tokyo, Japan;RIKEN Center for Computational Science, Kobe, Japan;0000 0001 2179 2105, grid.32197.3e, Tokyo Institute of Technology, Tokyo, Japan;RIKEN Center for Computational Science, Kobe, Japan;0000 0004 0368 8103, grid.24539.39, Renmin University of China, Beijing, China;0000 0004 0368 8103, grid.24539.39, Renmin University of China, Beijing, China;0000 0001 2179 2105, grid.32197.3e, Tokyo Institute of Technology, Tokyo, Japan;
关键词: Machine learning; Natural language processing; High performance computing; Word embeddings;
DOI : 10.1007/s41019-019-0096-6
来源: publisher
PDF


	文献评价指标
	下载次数：0次	浏览次数：14次

【 摘 要 】

【 授权许可】

【 预 览 】

【摘要】

【授权许可】

【预览】