Writing Custom Optimizers
Learn how to write a Cerebras-compliant custom optimizer.
Start by creating a subclass of cerebras.pytorch.optim.Optimizer
.
For example:
As seen in the above example, similar to torch.optim.Optimizer
, the base Optimizer
class expects three arguments: the model parameters, the param group defaults, and the optional enable_global_step
that will define a global step state variable for each parameter.
In addition, there are two abstract methods that must be overridden:
This method is used to initialize any state variables that will be used by the optimizer. For example, SGD
defines its momentum buffers in its preinitialize
method.
To remain Cerebras-compliant, no optimizer state variables may be initialized outside of the preinitialize
method.
For optimal performance, when initializing the state tensors that are filled with some constant value, use the creation ops that are available in the cstorch
package to lazily initialize them.
These ops will lazily initialize and fill the tensor, meaning that they take up very little memory and can be initialized much quicker than their torch
counterparts when running on the cluster. Please see the source code for the optimizers in cerebras.pytorch for examples.
This method is where the optimizer step is implemented. Note that due to the nature of lazy tensor tracing and execution, there may not be any Python level conditions or loops used to dynamically define the control flow. This means that only torch ops (such as torch.where
) may be used.However, static structures are allowed. For example, a loop with a fixed number of iterations, or a Python conditional that doesn’t involve any torch tensors whose conditional involves only constant variables.
Once you’ve written your custom optimizer, as long as its available in the global scope, you can use it in ModelZoo in a similar way by setting the optimizer_type
to be the name of your custom optimizer class in your params YAML file.