Self-supervised vision model that learns general-purpose visual features without labeled data, excelling in diverse image and pixel-level tasks
DINOv2 data processing pipeline, from Oquab et al 2023.
dino
directory within ModelZoo. Here’s how it’s organized:
configs/
: Contains YAML configuration files.
scripts/
: Contains scripts for various workflows, including checkpoint conversion and image resizing.
model.py
: The implementation of the DINOv2 model.
DinoImageDataProcessor.py
: Data processor for DINOv2.
Configuration | Description |
---|---|
params_dinov2_large_224_bs1024.yaml | Config for pretraining, batch size 1024 |
params_dinov2_large_eval_linear.yaml | Config for finetuning with downstream evaluation. |
params_dinov2_large_patch14_img224.yaml | Reference implementation config of DINOv2. |
Input Name | Shape | Data Type | Description |
---|---|---|---|
collated_masks | (batch_size, 2, 256) | torch.bool | Boolean mask indicating which patches are masked during training. |
global_view | (batch_size, 2, 3, 224, 224) | torch.float32 | Global image views (2 samples per batch, 3-channel images of size 224x224). |
local_view | (batch_size, 8, 3, 98, 98) | torch.float32 | Local image views (8 samples per batch, 3-channel images of size 98x98). |
generic_image_encoders
architecture as its backbone. You can find the model architecture details in its directory.
Prerequisites and Setup
Data Preparation
torchvision
, please visit our guide here.Once completed, your dataset directory should look as follows:root
parameter under dataset
in the model config to point to the desired dataset location.Running the Model
change_image_size.py
to modify the checkpoint and config.
params_dinov2_large_224_bs1024_cszoov2.yaml
.
To run the script:
input_file_name
and output_file_name
flags.