Model Description

GPT-3 is a decoder-only transformer language model architecture designed for large-scale autoregressive pretraining. It extends GPT-2 with significantly more parameters (ranging from 1.3B to 175B) and introduces architectural refinements such as sparse attention layers, used in alternating blocks to reduce compute costs during training. However, this implementation uses the GPT-2-style dense attention in all layers.

Training occurs on next-token prediction using large text corpora like The PILE, with inputs represented as token sequences padded and masked to a fixed maximum sequence length.

Code Structure

The code for this model is located in the gpt3 directory within ModelZoo. Here’s how it’s organized:

  • configs/: Contains YAML configuration files for various GPT-3-sized models.
  • run.py: Training and evaluation entry point. Accepts CLI arguments for mode, config path, checkpointing, and output directories.

Our implementation of GPT-3 is built on top of our GPT-2 backbone. For more details, see gpt2_model.py.

Available Configurations

The 1.3b(xl), 2.7b, 6.7b and 13b configs above show an example of setting micro batch size explicitly in the train_input section of the config. Without this setting, the best micro batch size search will be performed automatically during compilation which could take long time for larger models.

Model Input Tensor Specifications

Input NameShapeData TypeDescription
input_ids(batch_size, max_sequence_length)torch.int32Token IDs, padded to full sequence length.
attention_mask(batch_size, max_sequence_length)torch.int321s for valid tokens, 0s for padding.
labels(batch_size, max_sequence_length)torch.int32Targets for language modeling (same as inputs).

These are generated using GptHDF5DataProcessor.py, which consumes PILE-formatted datasets and outputs .h5 files via preprocess_data.py.

Workflow

For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.

For a complete list of Cerebras ModelZoo CLI commands, see the command reference.

Advanced Features

This implementation supports:

References