As deep learning has become solution for various machine learning, artificial intelligence applications, their architectures have been developed accordingly. Modern deep learning applications often use overparameterized setting, which is opposite to what conventional learning theory suggests. While deep neural networks are considered to be less vulnerable to overfitting even with their overparameterized architecture, this project observed that properly trained small-scale networks indeed outperform its larger counterparts. The generalization ability of small-scale networks has been overlooked in many researches and practice, due to their extremely slow convergence speed. This project observed that imbalanced layer-wise gradient norm can hider overall convergence speed of neural networks, and narrow networks are vulnerable to this. This projects investigates possible reasons of convergence failure of small-scale neural networks, and suggests a strategy to alleviate the problem.
【 预 览 】
附件列表
Files
Size
Format
View
Trainability and generalization of small-scale neural networks