Trainer
class.
By the end of this guide, you will understand how to implement deferred initialization and the advantages it brings to your model training.
Prerequisites
Please ensure that you have read through the Trainer Overview beforehand. The rest of this page assumes that you already have at least a cursory understanding of what the Cerebras Model Zoo Trainer is and how to use the python API.Model Function
In Basic Usage, it was shown that you could pass anytorch.nn.Module
into the Trainer
. However, to do this, you need to have a concrete torch.nn.Module
object to pass into the Trainer
. Due to PyTorch’s eager nature, initializing a model can be very time consuming, especially for extremely large models.
To improve your experience, we introduce a mechanism by which you can defer your model’s weight initialization. The way to do this would be to pass in a function to the model
argument of the Trainer
that takes in no arguments and returns a torch.nn.Module
object.
lambda
function as follows.
Trainer
to employ the use of the Efficient weight initialization mechanism built into the Cerebras PyTorch API.
Empirically, deferring model weight initialization can reduce the time-to-first-loss (the amount of time it takes to get the first value back from the Wafer-Scale Cluster) by over 50%. This means, less time waiting around and faster iteration time overall.
Optimizer/Scheduler Functions
One question that may be on your mind is, “what about the optimizer?”. The optimizer constructor takes in the model parameters. So, if the model initialization is being delayed, then how can the optimizer be constructed? The answer is that theTrainer
can also accept a function for the optimizer
argument which is expected to take in a torch.nn.Module
and return a Optimizer
object.
Trainer
can also accept a function for the schedulers
argument which is expected to take in a Optimizer
object and return a Scheduler
object.
Conclusion
That is all there is to deferring model, optimizer, and scheduler initialization! By simply wrapping these components inside a callable, you can gain large improvements to iteration time and resource utilization, enhancing your overall experience with the Cerebras Model Zoo and ensuring more efficient training outcomes.Further Reading
To learn about how you can configure aTrainer
instance using a YAML configuration file, you can check out:
- Trainer YAML Overview
Trainer
in some core workflows, you can check out:
To learn more about how you can extend the capabilities of the Trainer
class, you can check out: