Abstract
We present an empirical dataset surveying the deep learning phenomenon on fully-connected networks, encompassing the training and test performance of numerous network topologies, sweeping across multiple learning tasks, depths, numbers of free parameters, learning rates, batch sizes, and regularization penalties. The dataset probes 178 thousand hyperparameter settings with an average of 20 repetitions each, totaling 3.5 million training runs and 20 performance metrics for each of the 13.1 billion training epochs observed. Accumulating this 671 GB dataset utilized 5,448 CPU core-years, 17.8 GPU-years, and 111.2 node-years. Additionally, we provide a preliminary analysis revealing patterns which persist across learning tasks and topologies. We aim to inspire work empirically studying modern machine learning techniques as a catalyst for the theoretical discoveries needed to progress the field beyond energy-intensive and heuristic practices.
Original language | American English |
---|---|
Number of pages | 37 |
Journal | ArXiv.org |
DOIs | |
State | Published - 2022 |
NREL Publication Number
- NREL/JA-2C00-83002
Keywords
- batch size
- dataset
- depth
- empirical study
- fully-connected networks
- generalization
- label noise
- learning rate
- neural architecture search
- optimization
- regularization
- shape
- topology