Implementation of the original Transformer architecture introduced in “Attention Is All You Need”.
transformer
directory. It reuses shared infrastructure where possible, especially components from the T5 implementation.
configs/
: YAML configuration files for various Transformer model sizes and training setups.data_preparation/nlp/transformer/
: Scripts for preprocessing the WMT16 English–German dataset.Configuration | Description |
---|---|
transformer_base.yaml | Base Transformer model with d_kv=64 , num_heads=8 , and encoder_num_hidden_layers=6 . |
transformer_large.yaml | Large Transformer model with d_kv=64 , num_heads=16 , and encoder_num_hidden_layers=6 . |