The Model Zoo registry serves as the central source of truth for all model definitions and their associated data processors. Through a YAML-based system, the registry maintains paths to model implementations and their compatible data processors, ensuring consistent model management across the ecosystem.

This streamlined approach simplifies the process of finding available models, accessing their implementations, and identifying compatible data processing options. When working with models like BERT or GPT, the registry provides direct paths to both their implementations and data processors, allowing for quick setup and deployment.

The registry can be found in the Model Zoo repository here.

The registry YAML file must be updated whenever models are added, moved, renamed, or removed from the Model Zoo to maintain accuracy and prevent broken references.

Registry Structure

The registry is defined in a YAML file that contains entries for each supported model. Each model entry requires three key components:

  • name: The identifier for the model

  • path: The full import path to the model’s implementation class. Note: the path includes both the module path as well as the class name.

  • data_processor_paths: A list of compatible data processor implementation paths

- name: name_of_the_model
  path: path.to.module.ModelClass
  data_processor_paths:
  - path.to.module.DataProcessorClass1
  - path.to.module.DataProcessorClass2
  ...

When adding new models to the Model Zoo, you must add a corresponding entry to the registry file. Similarly, when renaming, moving, or removing models, update the registry to reflect these changes.