Prerequisites
Please ensure that you have installed the Cerebras Model Zoo package by going through the installation guide. Make sure to have read through Trainer Overview and Trainer Configuration Overview which provide the basic overview of how to run Model Zoo models. Please also make sure to read Pretraining with Upstream Validation since this page directly builds on the walkthrough there. Lastly, please read though Downstream Validation using Eleuther Eval Harness and Downstream Validation using BigCode Eval Harness). Specifically, this guide presupposes your understanding of theBigCodeEvalHarness
callbacks.
Configuring the Run
Similar to Pretraining with Upstream Validation, this page will present the YAML configuration file as well as the equivalent pure Python setup side-by-side for your ease of comparison. You will add downstream validation to the pre-training configuration set up in Pretraining with Upstream Validation for Llama-3-8B. Recall the full configuration you put together from that tutorial:Configure EEH
Let’s add downstream validation on a single EEH multiple-choice taskwinogrande
as part of the pre-training run. To do this, you will need to augment the configuration with the EleutherEvalHarness
callback as such:
Simply add the callback to the list of callbacks in the YAML.
winogrande
.
-
The
eval_frequency
specified as part of the trainer’s loop (YAML) or in theTrainingLoop
object (Python) also controls the frequency of downstream validation; i.e., for your example above, validation on EEH taskwinogrande
will be run every 1K steps. -
Update the
tasks
argument to configure downstream validation for more EEH tasks. Note that only a single generative EEH task may be specified per callback.
Configure BCEH
Configuring downstream validation using BCEH is no different than it is for EEH. For example, if you want to configure the pre-training run on the code generative taskhumaneval
, please augment the YAML configuration file with the the [BigCodeEvalHarness
](cerebras.modelzoo.trainer.extensions.bigcode.BigCodeEvalHarness “cerebras.modelzoo.trainer.extensions.bigcode.BigCodeEvalHarness”) callback as such:
- YAML: Simply add the callback to the list of callbacks in the YAML. Don’t forget to include the inference settings under model configuration!
-
Python: Construct a
BigCodeEvalHarness
callback object and pass it to the Trainer’s constructor as follows. Note that the BCEH arguments are passed to the callback via theBigCodeCLIArgs
object, comprising the list of supported BCEH command line arguments.
humaneval
.
-
Since only running one generative eval harness task is supported per callback, please create a separate
BigCodeEvalHarness
callback to run downstream validation for more BCEH tasks. - To obtain the final eval metrics for BCEH, please run the code execution and evaluation flow separately using the Downstream Validation using BigCode Eval Harness guide.
Configure EEH and BCEH
Configuring downstream validation for both EEH and BCEH is also straightforward via the use of both theBigCodeEvalHarness
callbacks.
Let’s augment the full YAML configuration file to run downstream validation on EEH tasks hellaswag
, gsm8k
and winogrande
, and BCEH task mbpp
with the callbacks as follows:
- YAML: Simply add both callbacks to the list of callbacks in the YAML. Since you are running generative eval harness tasks, don’t forget to include the inference settings under model configuration!
-
Python: Construct
BigCodeEvalHarness
objects, respectively.
Start Pre-Training
Once you have a fully configured Trainer, with your choice of downstream validation, all you need to do now is to kick off the run and start pre-training.-
YAML: Let’s assume that the YAML configuration that you put together above is written to a file called
./pretrain_downstream_llama_8b.yaml
. Then, to run pre-training using the training script that comes packaged as part of ModelZoo, you can run the code below on the command line. -
Python: Let’s assume that the python code that you put together above is written to a file called
./pretrain_downstream_llama_8b.py
. Then, to run pre-training, all there is to do is to execute that python script.