LLaMA

Model Description

The LLaMA family is a series of decoder-only transformer models designed for efficient, high-performance language modeling. Architecturally similar to GPT-2, the original LLaMA model uses RMSNorm instead of LayerNorm, SwiGLU activations, and rotary positional embeddings. LLaMA-2 improves on this with a larger training corpus, doubled context length, and grouped-query attention in its largest model. Code LLaMA specializes in programming tasks through continued pretraining on code-heavy data. LLaMA-3 introduces a more efficient 128K-token tokenizer, expands context to 8K tokens, and adopts grouped-query attention across all sizes. These models excel at text generation, summarization, reasoning, coding, and instruction following.

Code Structure

The code for this model is located in the /llama directory within ModelZoo. Here’s how it’s organized:

/configs: Contains YAML configuration files.
model.py: The implementation of the the LLaMA model.

Our implementation of LLaMA is built on top of our GPT-2 implementation. For more details, see gpt2_model.py.

Available Configurations

LLaMa 3

Configuration	Description
`params_llama3p1_70b_msl_128k.yaml`	A 70B parameter model with a maximum sequence length of 128K, configured as described in the LLaMa 3.1 blog.
`params_llama3p1_70b_msl_8k.yaml`	A 70B parameter model with a maximum sequence length of 8K, configured as described in the LLaMa 3.1 blog.
`params_llama3p1_8b_msl_128k.yaml`	A 8B parameter model with a maximum sequence length of 128K, configured as described in the LLaMa 3.1 blog.
`params_llama3p1_8b_msl_32k_swa_8k_sink_512.yaml`	A 8B parameter model with a maximum sequence length of 32K, SWA starting at 8K, and sink tokens set to 512. Configured as described in the LLaMa 3.1 blog.
`params_llama3p1_8b_msl_8k.yaml`	A 8B parameter model with a maximum sequence length of 8K, configured as described in the LLaMa 3.1 blog.

Configuration	Description
`params_llama3p1_70b_msl_128k.yaml`	A 70B parameter model with a maximum sequence length of 128K, configured as described in the LLaMa 3.1 blog.
`params_llama3p1_70b_msl_8k.yaml`	A 70B parameter model with a maximum sequence length of 8K, configured as described in the LLaMa 3.1 blog.
`params_llama3p1_8b_msl_128k.yaml`	A 8B parameter model with a maximum sequence length of 128K, configured as described in the LLaMa 3.1 blog.
`params_llama3p1_8b_msl_32k_swa_8k_sink_512.yaml`	A 8B parameter model with a maximum sequence length of 32K, SWA starting at 8K, and sink tokens set to 512. Configured as described in the LLaMa 3.1 blog.
`params_llama3p1_8b_msl_8k.yaml`	A 8B parameter model with a maximum sequence length of 8K, configured as described in the LLaMa 3.1 blog.

Configuration	Description
`params_llama3p2_1b_msl_8k.yaml`	A 1B parameter model with a maximum sequence length of 8K, configured as described in the LLaMa 3.2 blog.
`params_llama3p2_3b_msl_8k.yaml`	A 3B parameter model with a maximum sequence length of 8K, configured as described in the LLaMa 3.2 blog.

Configuration	Description
`params_llama3p3_70b_msl_128k.yaml`	A 70B parameter model with a maximum sequence length of 128K, configured as described in the LLaMa 3.3 model card.

LLaMa-2

Configuration	Description
`params_llama2_7b.yaml`	A 7B parameter model configured as described in the LLaMa-2 paper.
`params_llama2_13b.yaml`	A 13B parameter model configured as described in the LLaMa-2 paper.
`params_llama2_70b.yaml`	A 70B parameter model configured as described in the LLaMa-2 paper.

Code LLaMa

Configuration	Description
`params_code_llama_7b.yaml`	A 7B parameter model configured as described in the Code LLaMa paper.
`params_code_llama_70b.yaml`	A 70B parameter model configured as described in the Code LLaMa paper.

WizardLM

Configuration	Description
`params_wizardlm_13b.yaml`	A 13B parameter model configured as described in the WizardLM paper.

All configs are meant to be run on Weight Streaming mode using Appliance mode and Kubernetes flow.

Workflow

For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.

For a complete list of Cerebras ModelZoo CLI commands, see the command reference.

References

Radford, A. et al. (2019). Language Models are Unsupervised Multitask Learners.
Touvron, Hugo, et al. (2023). Llama: Open and efficient foundation language models
Touvron, Hugo, et al. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models
Rozière, Baptiste, et al. (2023). Code Llama: Open Foundation Models for Code
Meta AI (2024). Build the future of AI with Meta Llama 3

Get Started

Setup and Installation

Models

Data Preparation

Model Configuration

Training and Eval

Configure and Run Jobs

Monitoring and Troubleshooting

Convert and Port

Advanced Usage

Model Description

Code Structure

Available Configurations

Workflow

References

Get Started

Setup and Installation

Models

Data Preparation

Model Configuration

Training and Eval

Configure and Run Jobs

Monitoring and Troubleshooting

Convert and Port

Advanced Usage

​Model Description

​Code Structure

​Available Configurations

​Workflow

​References

Model Description

Code Structure

Available Configurations

Workflow

References