GradScaler
#GradScaler
(loss_scale=None, init_scale=None, steps_per_increase=None, min_loss_scale=None, max_loss_scale=None, overflow_tolerance=0.0, max_gradient_norm=None)state_dict
(destination=None)load_state_dict
(state_dict)scale
(loss)get_scale
()unscale_
(optimizer)step_if_finite
(optimizer, *args, **kwargs)clip_gradients_and_return_isfinite
*(optimizers)step
(optimizer, *args, **kwargs)[source]#
Step carries out the following two operations: 1. Internally invokes unscale_(optimizer)
(unless unscale_ was
explicitly called for optimizer
earlier in the iteration). As part of the unscale_, gradients are checked for infs/NaNs.
optimizer.step()
using the unscaled gradients. Ensure that previous optimizer state or params carry over if we encounter NaNs in the gradients.*args
and **kwargs
are forwarded to optimizer.step()
. Returns the return value of optimizer.step(*args, **kwargs)
. :param optimizer: Optimizer that applies the gradients. :type optimizer: cerebras.pytorch.optim.Optimizer :param args: Any arguments. :param kwargs: Any keyword arguments.
update
(new_scale=None)[source]#
Update the gradient scalar after all optimizers have been stepped
set_half_dtype
set_half_dtype
(value)float16
. If you want to use cbfloat16
or bfloat16
instead of float16
, call this function.
Example usage:
optimizer_step
optimizer\_step
(loss, optimizer, grad_scaler, max_gradient_norm=None, max_gradient_value=None)