If you notice a throughput spike after saving checkpoints, this is a known artifact of reporting throughput on the user node, caused by the asynchronous nature of execution on Wafer-Scale Cluster. For more details and understanding of this behavior, please refer to Measure throughput of your model.Documentation Index
Fetch the complete documentation index at: https://training-docs.cerebras.ai/llms.txt
Use this file to discover all available pages before exploring further.