Observed Error
Causes and Possible Solutions
The memory requirements of your model are too large to fit on the device. Potential workarounds include:- On transformer models, please compile again with the batch size set to 1 using one CS-2 system to determine if the specified maximum sequence length is feasible.
-
You can try a smaller batch size per device or enable batch tiling (only on transformer models) by setting the
micro_batch_size
parameter in thetrain_input
oreval_input
section of your model’s yaml file (see working_with_microbatches). * If you ran with batch tiling with a specificmicro_batch_size
value, you can try compiling with a decreasedmicro_batch_size
. The Using “explore” to Search for a Near-Optimal Microbatch Size flow can recommend performant micro batch sizes that will fit in memory. - On CNN models where batch tiling isn’t supported, try manually decreasing the batch size and/or the image/volume size.
NoteFor more information on working with batch tiling and selecting performant
micro_batch_size
values, visit working_with_microbatchesNoteThe
batch_size
parameter set on the yaml configuration is the global batch size. This means that the batch size per CS-2 system is computed as the global batch size divided by the number of CS-2s used.