- You must have installed the Model Zoo.
- You must be familiar with the Trainer and YAML format.
- Please ensure you have read Checkpointing.
- Please ensure you have read LLaMA3 8B pre-training.
Configuring the Run
There are two main flows you can use to fine-tune a model:- Use a YAML configuration file and the Model Zoo CLI.
- Use pure Python.
Fine-Tuning Using a Pre-trained Checkpoint
To perform fine-tuning, a checkpoint from a previous training run is required. These checkpoints can be generated from previous runs or downloaded from online databases. For more information on porting a checkpoint from HuggingFace see Port a Hugging Face model to Cerebras Model Zoo. In this tutorial you will assume a checkpoint has already been generated after finishing Pretraining with Upstream Validation. For simplicty, let’s assume the checkpoint saved after the final step has the path:./ckpts/checkpoint_10000.mdl
Configure Checkpoint State Loading
To enable fine-tuning, only load the model state from the checkpoint. Other checkpoint states such as the optimizer state or the training step should be reset. If using a YAML, configure which states to load from the checkpoint using the callbacks key. For Python, configure which states to load from the checkpoint by constructing aLoadCheckpointStates
object as follows.
Load From a Checkpoint
Configure the trainer to load a checkpoint from a given path. If using a YAML, add theckpt_path
parameter to the fit
key.
If using Python, specify ckpt_path
in the Trainer’s fit
method as follows.
Putting It All Together
After the above adjustments, you should have a configuration that looks like this:Start Fine-Tuning
Now that you have a fully configured Trainer, kick off the run and start fine-tuning:Monitor the Run
Once compilation finishes and the Wafer-Scale Cluster is programmed for execution, you should start seeing progress logs that look likeThe performance numbers that you get will vary depending on how many Cerebras systems you are using and which generation systems you are using.
