Set Up the Environment
To set up your environment for Model Zoo, follow the instructions in our getting started guide.Running an Existing Model
To execute an existing model from the Cerebras Model Zoo, follow these steps: 1. Use the CLI to query the registry and display the supported models:configs
directory within the model’s folder. For example, GPT-2’s YAML files can be found at:
run.py
script, supplying the appropriate YAML file as an argument.
Editing Configurations
If you need to modify existing configurations, ensure you have cloned the Model Zoo repository for write access to the YAML files. If you want to modify how a run is configured for a specific Model Zoo model, make sure you have first cloned the Model Zoo repository for write access to the YAML files. All reference configuration files in Model Zoo are located in theconfigs/
directory.
Querying Additional Components
To see which data processors are provided in the reference examples, you can use the CLI:Use Config Classes with an Existing Model
Every model within the Model Zoo is equipped with a corresponding Config class. When a Config class is associated with a model, the configuration is automatically validated in the backend, necessitating no additional actions from the user. For a deeper understanding of Config classes, you can explore further details in Model Zoo Config Classes.Register a Component
To register a custom data processor (also known as a dataloader) or model, see Model Zoo Registry.Evaluating Your Model
To effectively evaluate your model during and after training, follow these guides:Evaluating During Training
For insights on assessing your model’s performance throughout the training process, visit the run-model/eval guide in the Cerebras Developer Documentation. This resource provides comprehensive information on the steps and settings required to evaluate your model during training on the Wafer-Scale Cluster (WSC).Using EleutherAI’s Evaluation Harness
If you’re working with Large Language Models (LLMs) within the Model Zoo, you might want to leverage EleutherAI’s Evaluation Harness (EEH) for a more in-depth evaluation. Our guide on downstream valudation using EEH offers detailed instructions on how to prepare your data and set up the EEH for evaluating LLMs. This tool provides a structured approach to assessing model performance across various benchmarks and tasks, facilitating a comprehensive evaluation of your LLM. By following these guidelines, you can gain valuable insights into your model’s effectiveness, helping you make informed decisions for further model refinement and deployment.Adding a New Dataset
To incorporate a new dataset for your model within the Cerebras Model Zoo, you’ll need to ensure it’s specified correctly in the model’s configuration and, for certain models like language models, converted into the appropriate format.Specifying the Dataset in the YAML Configuration
1. Locate the YAML file Identify the YAML configuration file associated with your model. 2. Update data_dir Within the YAML file, undertrain_input
or eval_input
sections (depending on whether the dataset is for training or evaluation), specify the path to your dataset using the data_dir
entry.