SantaCoder
Decoder-only language model for code generation by BigCode, trained on Java, JavaScript, and Python with support for fill-in-the-middle and multi-query attention.
Model Description
SantaCoder is a 1.1B parameter decoder-only transformer model developed by the BigCode community. It is trained on Java, JavaScript, and Python code from The Stack v1.1.
Architecturally, SantaCoder uses Multi-Query Attention (MQA) for efficient inference and supports Fill-in-the-Middle (FIM) generation, allowing the model to complete and infill code based on context. It employs a 49K BPE tokenizer trained on raw bytes and achieves strong performance on MultiPL-E and HumanEval benchmarks despite being significantly smaller than competing models.
SantaCoder is well-suited for code completion, interactive development tools, and multilingual code generation.
Code Structure
The code for this model is located in the /santacoder
directory within ModelZoo. Here’s how it’s organized:
Our implementation of SantaCoder is built on top of our GPT-2 backbone. For more details, see gpt2_model.py
.
Available Configurations
Configuration | Description |
---|---|
params_santacoder_1b.yaml | 1.1B parameter SantaCoder model. |
Workflow
For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.
For a complete list of Cerebras ModelZoo CLI commands, see the command reference.
References
- Ben Allal, L., et al. (2023). SantaCoder: Don’t Reach for the Stars!
- Bavarian, M., et al. (2022). Efficient Training of Language Models to Fill in the Middle
- Shazeer, N. (2019). Fast Transformer Decoding: One Write-Head is All You Need