LLaMA
Series of decoder-only transfomer LLMs from Meta.
Model Description
The LLaMA family is a series of decoder-only transformer models designed for efficient, high-performance language modeling. Architecturally similar to GPT-2, the original LLaMA model uses RMSNorm instead of LayerNorm, SwiGLU activations, and rotary positional embeddings. LLaMA-2 improves on this with a larger training corpus, doubled context length, and grouped-query attention in its largest model. Code LLaMA specializes in programming tasks through continued pretraining on code-heavy data. LLaMA-3 introduces a more efficient 128K-token tokenizer, expands context to 8K tokens, and adopts grouped-query attention across all sizes. These models excel at text generation, summarization, reasoning, coding, and instruction following.
Code Structure
The code for this model is located in the /llama
directory within ModelZoo. Here’s how it’s organized:
Our implementation of LLaMA is built on top of our GPT-2 implementation. For more details, see gpt2_model.py
.
Available Configurations
All configs are meant to be run on Weight Streaming mode using Appliance mode and Kubernetes flow.
Workflow
For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.
For a complete list of Cerebras ModelZoo CLI commands, see the command reference.
References
- Radford, A. et al. (2019). Language Models are Unsupervised Multitask Learners.
- Touvron, Hugo, et al. (2023). Llama: Open and efficient foundation language models
- Touvron, Hugo, et al. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models
- Rozière, Baptiste, et al. (2023). Code Llama: Open Foundation Models for Code
- Meta AI (2024). Build the future of AI with Meta Llama 3