期刊论文详细信息
IEEE Access
Sample Contribution Pattern Based Big Data Mining Optimization Algorithms
Yang Liu1  Xiaodong Shi2 
[1] Modern Educational Technology Center, Henan University of Economics and Law, Zhengzhou, China;School of E-Commerce and Logistics Management, Henan University of Economics and Law, Zhengzhou, China;
关键词: Big data mining;    gradient descent;    gradient reuse;    sample contribution pattern;   
DOI  :  10.1109/ACCESS.2021.3060785
来源: DOAJ
【 摘 要 】

As is the case in many big data mining scenarios with a large scale of samples, the heavy computation cost hinders the application of machine learning, which has to iteratively compute by passing over the whole dataset without considering the roles of different samples in training computation. However, we argue that most of the samples dominating computation resources contribute little to the gradient-based model update, particularly when the model is close to convergence. We define this observation as the Sample Contribution Pattern (SCP) in machine learning. This paper proposes two approaches to exploit SCP by detecting gradient characteristics and triggering the reuse of outdated gradients. In particular, this paper reports research results in (1) the definition and description of SCP to reveal an intrinsic gradient contribution pattern of different samples; (2) a novel SCP-based optimizing algorithm (SCPOA) that outperforms alternative tested algorithms in terms of computation overhead; (3) a variant of SCPOA that incorporates discarding-recovering mechanisms to carefully tradeoff between model accuracy and computation cost; (4) the implementation and evaluation of two algorithms based on popular distributed big data mining platforms running typical sample-sets; (5) intuitive convergence proof of the algorithms. Our experimental results illustrate that the proposed approaches can significantly reduce the computation cost with competitive accuracy.

【 授权许可】

Unknown   

  文献评价指标  
  下载次数:0次 浏览次数:6次