cerebras.pytorch.amp.GradScaler
. For example:
Using Automatic Mixed Precision
By default, automatic mixed precision usesfloat16
. If you want to use cbfloat16
or bfloat16
instead of float16
, call cerebras.pytorch.amp.set_half_dtype
, e.g.
Using a Helper Function for Gradient Scaling
We introduce an optional helper functioncerebras.pytorch.amp.optimizer_step
to take care of the details of gradient scaling
It is useful for quickly constructing typical examples that use gradient scaling without needing to type up the details or worry about whether the grad scaler is being used correctly.This is entirely optional and only covers the basic gradient scaler use case. For more complicated use cases, the grad scaler object must be used explicitly.