StarCoder
Decoder-only language models for code generation by BigCode, trained on permissively licensed code with support for fill-in-the-middle and multi-query attention.
Model Description
StarCoder is a family of decoder-only transformer models developed by the BigCode initiative, optimized for high-quality code generation. The flagship model, StarCoder (15.5B), was trained on 1 trillion tokens, with an emphasis on multilingual support and strong performance in Python.
Architecturally, StarCoder builds on the transformer decoder backbone with several enhancements: it uses multi-query attention (MQA) for fast inference, supports fill-in-the-middle (FIM) generation, and extends context lengths to 8K tokens. Variants of StarCoder have been fine-tuned for specific domains such as SQL, OctoPack, and WizardCoder-style instruction following.
These models are well-suited for tasks such as code completion, documentation generation, and interactive programming support.
Code Structure
The code for this model is located in the /starcoder
directory within ModelZoo. Here’s how it’s organized:
Our implementation of StarCoder is built on top of our GPT-2 backbone. For more details, see gpt2_model.py
.
Available Configurations
Configuration | Description |
---|---|
params_starcoder_15b.yaml | Base StarCoder model with 15.5B parameters. |
params_octocoder_15b.yaml | StarCoder variant fine-tuned with OctoPack-style data. |
params_sqlcoder_15b.yaml | StarCoder variant specialized for SQL generation. |
params_wizardcoder_15b.yaml | StarCoder variant tuned for instruction-style prompts. |
Workflow
For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.
For a complete list of Cerebras ModelZoo CLI commands, see the command reference.
References
- Li, R., et al. (2023). StarCoder: May the Source Be With You!
- Ainslie, J. et al. (2023). GQA: Generalized Multi-Query Attention