Model Description

SantaCoder is a 1.1B parameter decoder-only transformer model developed by the BigCode community. It is trained on Java, JavaScript, and Python code from The Stack v1.1.

Architecturally, SantaCoder uses Multi-Query Attention (MQA) for efficient inference and supports Fill-in-the-Middle (FIM) generation, allowing the model to complete and infill code based on context. It employs a 49K BPE tokenizer trained on raw bytes and achieves strong performance on MultiPL-E and HumanEval benchmarks despite being significantly smaller than competing models.

SantaCoder is well-suited for code completion, interactive development tools, and multilingual code generation.

Code Structure

The code for this model is located in the /santacoder directory within ModelZoo. Here’s how it’s organized:

  • /configs: Contains YAML configuration files.
  • model.py: The implementation of the SantaCoder model.

Our implementation of SantaCoder is built on top of our GPT-2 backbone. For more details, see gpt2_model.py.

Available Configurations

ConfigurationDescription
params_santacoder_1b.yaml1.1B parameter SantaCoder model.

Workflow

For example workflows using language models from the Cerebras Model Zoo, see our tutorials on pretraining and fine-tuning.

For a complete list of Cerebras ModelZoo CLI commands, see the command reference.

References