The memory requirements of your model are too large to fit on the device. Potential workarounds include:
On transformer models, please compile again with the batch size set to 1 using one CS-2 system to determine if the specified maximum sequence length is feasible.
You can try a smaller batch size per device or enable batch tiling (only on transformer models) by setting the micro_batch_size
parameter in the train_input
or eval_input
section of your model’s yaml file (see working_with_microbatches). * If you ran with batch tiling with a specific micro_batch_size
value, you can try compiling with a decreased micro_batch_size
. The Using “explore” to Search for a Near-Optimal Microbatch Size flow can recommend performant micro batch sizes that will fit in memory.
On CNN models where batch tiling isn’t supported, try manually decreasing the batch size and/or the image/volume size.
Note
For more information on working with batch tiling and selecting performant micro_batch_size
values, visit working_with_microbatches
Note
The batch_size
parameter set on the yaml configuration is the global batch size. This means that the batch size per CS-2 system is computed as the global batch size divided by the number of CS-2s used.
The memory requirements of your model are too large to fit on the device. Potential workarounds include:
On transformer models, please compile again with the batch size set to 1 using one CS-2 system to determine if the specified maximum sequence length is feasible.
You can try a smaller batch size per device or enable batch tiling (only on transformer models) by setting the micro_batch_size
parameter in the train_input
or eval_input
section of your model’s yaml file (see working_with_microbatches). * If you ran with batch tiling with a specific micro_batch_size
value, you can try compiling with a decreased micro_batch_size
. The Using “explore” to Search for a Near-Optimal Microbatch Size flow can recommend performant micro batch sizes that will fit in memory.
On CNN models where batch tiling isn’t supported, try manually decreasing the batch size and/or the image/volume size.
Note
For more information on working with batch tiling and selecting performant micro_batch_size
values, visit working_with_microbatches
Note
The batch_size
parameter set on the yaml configuration is the global batch size. This means that the batch size per CS-2 system is computed as the global batch size divided by the number of CS-2s used.