BLOOM
Multilingual decoder-only language model with ALiBi positional embeddings, designed to generalize across 46 natural and 13 programming languages.
Model Description
BLOOM is a decoder-only Transformer-based language model developed by the BigScience project. It supports multilingual training across 46 natural languages and 13 programming languages, with models ranging in size up to 176B parameters.
Architecturally, BLOOM resembles GPT-2 but introduces two important differences:
- Tokenizer: BLOOM uses a tokenizer and vocabulary specifically designed for multilingual generalization, consisting of ~250K tokens.
- Position Embeddings: Instead of learnable absolute position embeddings (as in GPT-2), BLOOM uses ALiBi — Attention with Linear Biases — which allows extrapolation to longer sequence lengths and introduces a recency bias in attention computation.
Code Structure
The code for this model is implemented under the bloom
directory.
configs/
: Contains YAML configuration files for various BLOOM variants.model.py
: Shared model code reused from GPT-2, modified to support ALiBi embeddings.
Our implementation of BLOOM is built on top of our GPT-2 backbone. For more details, see gpt2_model.py
.
Available Configurations
Configuration | Description |
---|---|
params_bloom_7b.yaml | BLOOM-7B model with hidden_size=4096 , num_hidden_layers=30 , and num_heads=32 . Uses ALiBi position embeddings. |
Workflow
For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.
For a complete list of Cerebras ModelZoo CLI commands, see the command reference.
Implementation Notes
The core model logic reuses components from the GPT-2 implementation, but the presence of ALiBi position embeddings enables better extrapolation to longer sequence lengths during inference. To enable this in your config file:
- Set
model.position_embedding_type: alibi
- Optionally, set
model.alibi_trainable_slopes: false
(recommended, based on the ALiBi paper’s findings)