Model Description

The Falcon series consists of causal decoder-only transformer models with 7B, 40B, and 180B parameters, developed by the Technology Innovation Institute (TII). The models follow an optimized GPT-style architecture with key changes for efficient scaling and throughput:

  • Parallel attention and MLP layers within transformer blocks.
  • Rotary positional embeddings (RoPE) and multigroup attention (a generalization of multiquery attention) for faster inference and better tensor parallelism.
  • GELU activations, no dropout, and z-loss regularization for stable training.
  • Context length of 2,048 tokens and a 65K vocabulary.

Code Structure

The code for this model is located in the /falcon directory within ModelZoo. Here’s how it’s organized:

  • /configs: Contains YAML configuration files.
  • model.py: The implementation of the Falcon model.

Our implementation of Falcon is built on top of our GPT-2 backbone. For more details, see gpt2_model.py.

Available Configurations

ConfigurationDescription
params_falcon_7b.yamlFalcon model with 7B parameters.
params_falcon_40b.yamlFalcon model with 40B parameters.
params_falcon_180b.yamlFalcon model with 180B parameters.

Workflow

For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.

For a complete list of Cerebras ModelZoo CLI commands, see the command reference.

References