Convert CS Checkpoints for GPUs
Learn how to work with Cerebras-format checkpoints, including how to load, convert, and reuse them in your training workflows.
PyTorch Checkpoint Format
Our large model-optimized checkpoint format is based off the standard HDF5 file format. At a high-level, when saving a checkpoint, the Cerebras stack will take a PyTorch state dictionary, flatten it, and store it in an HDF5 file.
For example, the following state dictionary is flattened and stored into the H5 file as follows:
Flattened H5:
A model/optimizer state dictionary can be saved in the new checkpoint format using the cstorch.save
method:
A checkpoint saved using the above can be loaded using the cstorch.load
method:
Convert to Pickle Format
If using cstorch.load
is not a sufficient solution for loading the checkpoint into memory, a simple conversion can be done to the pickle format: