PyTorch Checkpoint Format
Our large model-optimized checkpoint format is based off the standard HDF5 file format. At a high-level, when saving a checkpoint, the Cerebras stack will take a PyTorch state dictionary, flatten it, and store it in an HDF5 file. For example, the following state dict:cbtorch.save
method. e.g.
cbtorch.load
method. e.g.
If you’re using the
run.py
scripts provided in ModelZoo, the configuration and setup mentioned earlier are already handled automatically by the built-in runners.Converting Checkpoint Formats
If usingcbtorch.load
is not a sufficient solution for loading the checkpoint into memory, a simple conversion can be done to the pickle format that PyTorch uses as follows