期刊论文详细信息
CAAI Transactions on Intelligence Technology
Decentralised federated learning with adaptive partial gradient aggregation
article
Jingyan Jiang1  Liang Hu1 
[1] College of Computer Science and Technology, Jilin University
关键词: learning (artificial intelligence);    gradient methods;    communication frequency;    parameter server design;    nodes-to-server bandwidths;    stochastic gradient descent training;    end-to-end training;    real-world federated learning scenarios;    adaptive partial gradient aggregation method;    gradient partial level decentralised;    partial gradient exchange mechanism;    node-to-node bandwidth;    communication time;    adaptive model;    training time;    decentralised federated learning;    machine learning model;    geo-distributed workers;    inherently communication;    communication efficiency;    conventional federated learning algorithms;   
DOI  :  10.1049/trit.2020.0082
学科分类:数学(综合)
来源: Wiley
PDF
【 摘 要 】

Federated learning aims to collaboratively train a machine learning model with possibly geo-distributed workers, which is inherently communication constrained. To achieve communication efficiency, the conventional federated learning algorithms allow the worker to decrease the communication frequency by training the model locally for multiple times. Conventional federated learning architecture, inherited from the parameter server design, relies on highly centralised topologies and large nodes-to-server bandwidths, and convergence property relies on the stochastic gradient descent training in local, which usually causes the large end-to-end training latency in real-world federated learning scenarios. Thus, in this study, the authors propose the adaptive partial gradient aggregation method, a gradient partial level decentralised federated learning, to tackle this problem. In FedPGA, they propose a partial gradient exchange mechanism that makes full use of node-to-node bandwidth for speeding up the communication time. Besides, an adaptive model updating method further reduces the convergence rate by adaptive increasing the step size of the stable direction of gradient descent. The experimental results on various datasets demonstrate that the training time is reduced up to compared to baselines without accuracy degrade.

【 授权许可】

CC BY|CC BY-ND|CC BY-NC|CC BY-NC-ND   

【 预 览 】
附件列表
Files Size Format View
RO202107100000022ZK.pdf 368KB PDF download
  文献评价指标  
  下载次数:0次 浏览次数:0次