StarCoder

Model Description
Code Structure
Available Configurations
Workflow
References

Model Description

StarCoder is a family of decoder-only transformer models developed by the BigCode initiative, optimized for high-quality code generation. The flagship model, StarCoder (15.5B), was trained on 1 trillion tokens, with an emphasis on multilingual support and strong performance in Python. Architecturally, StarCoder builds on the transformer decoder backbone with several enhancements: it uses multi-query attention (MQA) for fast inference, supports fill-in-the-middle (FIM) generation, and extends context lengths to 8K tokens. Variants of StarCoder have been fine-tuned for specific domains such as SQL, OctoPack, and WizardCoder-style instruction following. These models are well-suited for tasks such as code completion, documentation generation, and interactive programming support.

Code Structure

The code for this model is located in the /starcoder directory within ModelZoo. Here’s how it’s organized:

/configs: Contains YAML configuration files.
model.py: The implementation of the StarCoder model.

Our implementation of StarCoder is built on top of our GPT-2 backbone. For more details, see gpt2_model.py.

Available Configurations

Configuration	Description
`params_starcoder_15b.yaml`	Base StarCoder model with 15.5B parameters.
`params_octocoder_15b.yaml`	StarCoder variant fine-tuned with OctoPack-style data.
`params_sqlcoder_15b.yaml`	StarCoder variant specialized for SQL generation.
`params_wizardcoder_15b.yaml`	StarCoder variant tuned for instruction-style prompts.

Workflow

For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning. For a complete list of Cerebras ModelZoo CLI commands, see the command reference.

References

Li, R., et al. (2023). StarCoder: May the Source Be With You!
Ainslie, J. et al. (2023). GQA: Generalized Multi-Query Attention

SantaCoder T5

⌘I

Get Started

Setup and Installation

Models

Data Preparation

Model Configuration

Training and Eval

Configure and Run Jobs

Monitoring and Troubleshooting

Convert and Port

Advanced Usage

Model Description

Code Structure

Available Configurations

Workflow

References

Get Started

Setup and Installation

Models

Data Preparation

Model Configuration

Training and Eval

Configure and Run Jobs

Monitoring and Troubleshooting

Convert and Port

Advanced Usage

​Model Description

​Code Structure

​Available Configurations

​Workflow

​References

Model Description

Code Structure

Available Configurations

Workflow

References