PyTorch Checkpoint Format
Our large model-optimized checkpoint format is based off the standard HDF5 file format. At a high-level, when saving a checkpoint, the Cerebras stack will take a PyTorch state dictionary, flatten it, and store it in an HDF5 file. For example, the following state dictionary is flattened and stored into the H5 file as follows:cstorch.save
method:
cstorch.load
method:
Convert to Pickle Format
If usingcstorch.load
is not a sufficient solution for loading the checkpoint into memory, a simple conversion can be done to the pickle format: