Please report any queries concerning the funding data grouped in the sections named "Externally Awarded" or "Internally Disbursed" (shown on the profile page) to
your Research Finance Administrator. Your can find your Research Finance Administrator at https://www.ucl.ac.uk/finance/research/rs-contacts.php by entering your department
Please report any queries concerning the student data shown on the profile page to:
Email: portico-services@ucl.ac.uk
Help Desk: http://www.ucl.ac.uk/ras/portico/helpdesk
Email: portico-services@ucl.ac.uk
Help Desk: http://www.ucl.ac.uk/ras/portico/helpdesk
Publication Detail
Powerpropagation: A sparsity inducing weight reparameterisation
-
Publication Type:Conference
-
Authors:Schwarz J, Jayakumar SM, Pascanu R, Latham PE, Teh YW
-
Publisher:NeurIPS
-
Publication date:14/12/2021
-
Published proceedings:Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems
-
Status:Accepted
-
Name of conference:Thirty-fifth Conference on Neural Information Processing Systems 2021
-
Conference place:Virtual
-
Conference start date:06/12/2021
-
Conference finish date:14/12/2021
-
Keywords:stat.ML, stat.ML, cs.AI, cs.LG
-
Author URL:
-
Publisher URL:
-
Notes:Accepted at NeurIPS 2021
Abstract
The training of sparse neural networks is becoming an increasingly important
tool for reducing the computational footprint of models at training and
evaluation, as well enabling the effective scaling up of models. Whereas much
work over the years has been dedicated to specialised pruning techniques,
little attention has been paid to the inherent effect of gradient based
training on model sparsity. In this work, we introduce Powerpropagation, a new
weight-parameterisation for neural networks that leads to inherently sparse
models. Exploiting the behaviour of gradient descent, our method gives rise to
weight updates exhibiting a "rich get richer" dynamic, leaving low-magnitude
parameters largely unaffected by learning. Models trained in this manner
exhibit similar performance, but have a distribution with markedly higher
density at zero, allowing more parameters to be pruned safely. Powerpropagation
is general, intuitive, cheap and straight-forward to implement and can readily
be combined with various other techniques. To highlight its versatility, we
explore it in two very different settings: Firstly, following a recent line of
work, we investigate its effect on sparse training for resource-constrained
settings. Here, we combine Powerpropagation with a traditional weight-pruning
technique as well as recent state-of-the-art sparse-to-sparse algorithms,
showing superior performance on the ImageNet benchmark. Secondly, we advocate
the use of sparsity in overcoming catastrophic forgetting, where compressed
representations allow accommodating a large number of tasks at fixed model
capacity. In all cases our reparameterisation considerably increases the
efficacy of the off-the-shelf methods.
› More search options
UCL Researchers