Picking Winning Tickets Before Training by Preserving Gradient Flow

Wang, Chaoqi; Zhang, Guodong; Grosse, Roger

Computer Science > Machine Learning

arXiv:2002.07376 (cs)

[Submitted on 18 Feb 2020 (v1), last revised 7 Aug 2020 (this version, v2)]

Title:Picking Winning Tickets Before Training by Preserving Gradient Flow

Authors:Chaoqi Wang, Guodong Zhang, Roger Grosse

View PDF

Abstract:Overparameterization has been shown to benefit both the optimization and generalization of neural networks, but large networks are resource hungry at both training and test time. Network pruning can reduce test-time resource requirements, but is typically applied to trained networks and therefore cannot avoid the expensive training process. We aim to prune networks at initialization, thereby saving resources at training time as well. Specifically, we argue that efficient training requires preserving the gradient flow through the network. This leads to a simple but effective pruning criterion we term Gradient Signal Preservation (GraSP). We empirically investigate the effectiveness of the proposed method with extensive experiments on CIFAR-10, CIFAR-100, Tiny-ImageNet and ImageNet, using VGGNet and ResNet architectures. Our method can prune 80% of the weights of a VGG-16 network on ImageNet at initialization, with only a 1.6% drop in top-1 accuracy. Moreover, our method achieves significantly better performance than the baseline at extreme sparsity levels.

Comments:	Fix several typos
Subjects:	Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Cite as:	arXiv:2002.07376 [cs.LG]
	(or arXiv:2002.07376v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2002.07376
Journal reference:	In Proceedings of the 8th International Conference on Learning Representations (ICLR), 2020

Submission history

From: Chaoqi Wang [view email]
[v1] Tue, 18 Feb 2020 05:14:47 UTC (1,302 KB)
[v2] Fri, 7 Aug 2020 00:02:33 UTC (1,307 KB)

Computer Science > Machine Learning

Title:Picking Winning Tickets Before Training by Preserving Gradient Flow

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Picking Winning Tickets Before Training by Preserving Gradient Flow

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators