Top-level functions for creating and retrieving the Cerebras backend
backend
CSX
Parameters:
$cwd/cerebras_logs
/opt/cerebras/cached_compile
True
, then configure the CSX
backend only for compilation. This means that all parameter data is immediately dropped as it isn’t required for compilation. Further, the data executor will not send an execution request to the wafer scale cluster. This mode is intended to verify that the model is able to be compiled. As such, no system is required in this mode. Default: False
True
, then configure the CSX
backend only for validation. This means that all parameter data is immediately dropped as it isn’t required for validation. Further, the data executor will not send compile and execute requests to the wafer scale cluster. This mode is intended to verify that the model is able to be traced. As such, no system is required in this mode. Default: False
True
, all parameter data is immediately dropped even if in a non-compile-only run. In this case, a checkpoint containing values for all stateful tensors must be loaded in order to be able to run. Default: False
cstorch.save
will automatically only keep the newest max_checkpoints
checkpoints, removing the oldest when the number of checkpoints exceeds the specified number. Default: None
True
, print logs during weight initialization to keep the users updated on the current progress. Default: True
True
, retrace the entire training/evaluation step every iteration. This provides the benefit of being able to check whether the graph changes or not every iteration. But tracing overhead can hurt performance for certain models. Default: False
CPU
Parameters:
$cwd/cerebras_logs
cstorch.save
will automatically only keep the newest max_checkpoints
checkpoints, removing the oldest when the number of checkpoints exceeds the specified number. Default: None
True
, use the autocast
context manager during the forward pass. Default: None
GPU
Parameters:
$cwd/cerebras_logs
True
, configure the run to use torch.distributed
. Note, the run must have been triggered using torchrun. Default: False
0
"nccl"
None
False
current_backend
current_torch_device
current_torch_device
()[source]#
Gets the torch device of the current backend.
Returns torch.device(‘cpu’) if no backend has been initialized yet
use_cs
cerebras.pytorch.backends
controls the behavior of specific backends that Cerebras PyTorch API supports.
As of right now, the only backend for which there are configurable options include
cerebras.pytorch.backends.csx
cerebras.pytorch.backends.csx
[0, 3)
. Default: 1
cerebras.pytorch.backends.csx.performance.micro_batch_size
: Union[None, int, Literal[‘auto’, ‘explore’], Dict[str, Dict[str, int]]]#
The micro-batch size to use when compiling the model. The micro-batch size can affect the model’s performance and memory usage.
Valid values include:
int
: Use this micro batch size.
"auto"
.
cerebras.pytorch.backends.csx.performance.transfer_processes
: int#
The number of processes to use for transferring data to and from the Wafer Scale Cluster.
The default value is 5
.
cerebras.pytorch.backends.csx.debug.retrace_every_iteration
: bool_#
Whether to retrace the training/validation graph every iteration. Default: False
.
cerebras.pytorch.backends.csx.debug.lazy_initialization
: bool#
Whether to use lazy weight initialization. Default: True
.
cerebras.pytorch.backends.csx.debug.debug_args
: DebugArgs#
Arguments to pass to the cluster for cluster debugging purposes only. Default: None
.
cerebras.pytorch.backends.csx.debug.ini
: DebugArgs#
INI configuration flags for cluster debugging purposes only. Default: None
.
*cerebras.pytorch.backends.csx.debug.compile_crd_memory_gi
: Optional[int]#
The memory limit for the compile coordinator.
cerebras.pytorch.backends.csx.debug.execute_crd_memory_gi
: Optional[int]#
The memory limit for the execute coordinator.
cerebras.pytorch.backends.csx.debug.wrk_memory_gi
: Optional[int]#
cerebras.pytorch.backends.csx.debug.wrk_memory_gi
: Optional[int]#
The memory limit for the workers.
cerebras.pytorch.backends.csx.debug.act_memory_gi
: Optional[int]#
The memory limit for the activation hosts.
cerebras.pytorch.backends.csx.debug.cmd_memory_gi
: Optional[int]#
The memory limit for the command hosts.
cerebras.pytorch.backends.csx.debug.wgt_memory_gi
: Optional[int]#
The memory limit for the weight hosts.
compile
compile
(model, backend=None)trace
Parameters:
step_fn (callable) – The training/evaluation step function to be wrapped.
**Returns:**The wrapped training/evaluation step function.
**Return type:**callable
In addition, no tensor value may be eagerly evaluated at any point inside this training step. This means, no tensor is allowed to be printed, fetched via a debugger, or used as part of a python conditional. Any operation that requires knowing the value of tensor inside the training step will result in an error stating that it is not allowed to read a tensor’s contents outside of a step_closure
full
#full
(shape, value, dtype=None)[source]#
Returns an lazily initialized tensor filled with the provided value
Parameters:
full_like
full_like
(other, value, dtype=None)[source]#
Returns an lazily initialized full tensor with the same properties as the provided tensor
Parameters:
ones
ones
(shape, dtype=None)[source]#
Returns an lazily initialized tensor filled with ones
Parameters:
ones_like
ones_like
(other, dtype=None)[source]#
Returns an lazily initialized tensor full of ones with the same properties as the provided tensor
Parameters
zeros
#zeros
(shape, dtype=None)[source]#
Returns an lazily initialized tensor filled with zeros
Parameters:
zeros_like
#zeros_like
(other, dtype=None)[source]#
Returns an lazily initialized tensor full of zeros with the same properties as the provided tensor
Parameters:
save
(obj, checkpoint_file)load
(checkpoint_file, map_location=None, _kwargs_**)utils.data.DataLoader
#DataLoader
(*args, _kwargs_**)load_state_dict()
utils.data.SyntheticDataset
SyntheticDataset
(*args, _kwargs_**)[source]#
A synthetic dataset that generates samples from a SampleSpec.
Constructs a SyntheticDataset instance.
A synthetic dataset can be used to generate samples on the fly with an expected dtype/shape but without needing to create a full-blown dataset. This is especially useful for compile validation.
Parameters:
utils.data.DataExecutor
#DataExecutor
(_args_, **kwargs*)using a schedule object, the intervals must be zero-indexed.
utils.data.RestartableDataLoader
#RestartableDataLoader
(*args, _kwargs_**)state_dict()
[source]#DataLoaderCheckpoint
dataclass.
Usage:
get_worker_state
is well-defined only inside of the state_dict method; using this anywhere else will result in a RuntimeError exception. See linked docs for more details.load_state_dict
(state_dict, strict=True)deaggregate_state_dict
aggregate_state_dict
(worker_states)state_dict
Returns: The consolidated state dict that will be saved in a checkpoint.
Return type: Dict[str, Any]
Usage:
deaggregate_state_dict
(aggregated_state_dict, strict=True)aggregate_state_dict
.
Parameters:
load_state_dict
method.
Return type: Dict[str, Any]
Usage:
utils.data.DataLoaderCheckpoint
DataLoaderCheckpoint
get_worker_state
get_worker_state()
DataLoaderCheckpoint
dataclass format:
Returns:
DataLoaderCheckpoint
instance holding worker state information at the checkpoint step
RestartableDataLoader
protocol, since state_dict is well-defined only at a checkpoint step. - Use this method to save any of the aforementioned state info recorded by each worker when defining state_dict for custom implementations of restartable dataloaders. - This state info captured by each worker is for the current run only, i.e. if you pause and restart a run, the counters gathering information returned by this function will be reset.
utils.CSConfig
CSConfig
None
.
None
.
1
.
24
.
1
.
1
.
5
.
None
.
None
.
None
.
None
.
None
.
1
.
numpy
utilities#from_numpy
#from_numpy
(array)[source]#
Converts a numpy array to a torch tensor.
to_numpy
#to_numpy
(tensor)[source]#
Converts a torch tensor to a numpy array.
step_closure
step_closure
(closure)checkpoint_closure
checkpoint_closure
(closure)summarize_scalar
(*args, _kwargs_)**[source]#
SummaryWriter
was passed to the DataExecutor
object.cerebras.pytorch.summarize_tensor
(*args, _kwargs_)**[source]#SummaryWriter
was passed to the DataExecutor
object.SummaryWriter
(*args, _kwargs_)**#
Thin wrapper around torch.utils.tensorboard.SummaryWriter.
Additional features include the ability to add a tensor summary
Parameters:
summarize_{scalar,tensor}
functions.
add_tensor()
#
class cerebras.pytorch.utils.tensorboard.SummaryReader
(*args, _kwargs_)**#
Class for reading summaries saved using the SummaryWriter.
Parameters:
reload()
#
read_scalar()
#
Scalars()
#
read_tensor()
#
scalar_names()
#
scalar_groups()
#
tensor_names()
#
text\_summary\_names()
#
read\_text\_summary()
#
Tags()
#
benchmark_dataloader
(input_fn, num_epochs=None, steps_per_epoch=None, sampling_frequency=None, profile_activities=None, print_metrics=True)[source]#
Utility to benchmark a dataloader.
Parameters
Metrics
(dataloader_build_time=<factory>, epoch_metrics=<factory>, batch_specs=<factory>, total_time=<factory>, global_rate=0.0, is_partial=True, start_time_ns=<factory>, end_time_ns=0)[source]#
Metrics for a single dataloader experiment.
Parameters:
dataloader\_build\_time_
: numpy.timedelta64#
epoch_metrics
: List[cerebras.pytorch.utils.benchmark.utils.dataloader.EpochMetrics]_#
batch_specs_
: Dict[cerebras.pytorch.utils.benchmark.utils.dataloader.BatchSpec, cerebras.pytorch.utils.benchmark.utils.dataloader.BatchSpecOccurence]#
numpy.timedelta64
#
float
= 0.0 #
bool
= True #
int
#
int
= 0 #
int
#
Returns the total number of steps across all epochs.
Optional[float]
#
EpochMetrics
(iterator_creation=<factory>, iteration_time=<factory>, total_steps=0, batch_metrics=<factory>, start_time_ns=<factory>, end_time_ns=0)[source]#
Metrics for a single epoch of a dataloader experiment.
Parameters:
iterator_creation
: numpy.timedelta64#
numpy.timedelta64
#
int
= 0 #
List[cerebras.pytorch.utils.benchmark.utils.dataloader.BatchMetrics]
#
int
#
int
= 0 #
numpy.timedelta64
#
BatchMetrics
(epoch_step, global_step, local_rate, global_rate, profile_activities=<factory>, sampling_time_ns=<factory>)[source]#
Metrics for a single batch of a dataloader experiment.
Parameters:
int
#