Learn how to fix common errors.
if __name__ == “__main__”
guard. During execution, subprocesses may be created (e.g., for weight transfer or surrogate jobs), which can cause the entire module to run unintentionally.
To prevent this, wrap your script’s main logic inside an if __name__ == “__main__”
block.
Error parsing metadata: error=invalid value key=content-type value=text/html
This error is a bug in GRPC.
__main__
module, leading to this error.
model_dir
, they aren’t automatically loaded during runs. This is because the checkpoint naming convention doesn’t match the expected format.
The auto-load feature searches for files named checkpoint_<step>.mdl
in your model_dir
, loading the one with the highest <step>
value. This feature is enabled by default but can be disabled by setting runconfig.autoload_last_checkpoint
to False
in your params YAML.
checkpoint_<step>.mdl
format--checkpoint_path
flagbackend
backend
device by creating new tensors directly on the backend device:
ModuleNotFound
errors you may encounter:
Core Python Module Errors
When trying to use certain built-in Python modules like bz2, users may receive errors about missing core modules (bz2
, sqlite3
). For example:
ModuleNotFoundError
for Python packages that are installed on your local machine but unavailable in the Cerebras environment.
Our Custom Worker Container Workflow attempts to import your dependencies into Cerebras appliances, with a fallback that mounts packages from your virtual environment.
bzip2-devel
, sqlite-devel
) and rebuild Python, or use a pre-built Python binary instead.
For user-installed package errors:
venv/lib/python3.8/site-packages/<package_name>
to a NFS-mountable location. Only copy the custom packages, not the entire virtual environment.--mount_dirs
and its parent to --python_paths
when running jobs.@autogen_loss
decorator, which enables AutoGen to handle the compilation of these custom losses efficiently.
micro_batch_size
parameter in the train_input
or eval_input
section of your model’s yaml file (see working_with_microbatches).
micro_batch_size
value, you can try compiling with a decreased micro_batch_size
. The Using “explore” to Search for a Near-Optimal Microbatch Size flow can recommend performant micro batch sizes that will fit in memory.
micro_batch_size
values, see our tutorial on automatic microbatching.
batch_size
parameter set on the yaml configuration is the global batch size. This means that the batch size per CS-2 system is computed as the global batch size divided by the number of CS-2s used.
POL=0
should be applied to this specific layer.
This ensures the highest numerical precision for the final projection while maintaining the performance advantages of POL=1 throughout the rest of the model. This modification has already been incorporated into the Model Zoo variants of Cerebras large language models.