How much vRAM do you actually need?

April 16, 2024
/
3 min read
How much vRAM do you actually need?

Inference

Initially let's start with the memory required for 1 parameters which is 4 bytes given FP32 precision.

1 Parameters (Weights) = 4 Bytes (FP32)

To calculate the memory required for 1 billion parameters we will multiply 4 bytes with a billion which would give us around 4 GB.

1 Billion Parameters = 4 * 10^9 = 4 GB

The following table shows the memory requirements for different model precisions per 1 billion parameters.

Full Precision (32bits)Half Precision (16bits)8bits
1 Billion Parameters4 GB2 GB1 GB

Accordingly now you can multiply the vRAM number in the table above with the number of billion parameters in the model based on its precision. The table below shows the minimum memory requirements to load the model for inference without accounting for the memory required for the hits on the model.

Model NameFull Precision (32bits)Half Precision (16bits)8bits
Falcon (7B)28 GB14 GB7 GB
Llama2 (7B)28 GB14 GB7 GB
Jais (13B)52 GB26 GB13 GB
Jais (30B)120 GB60 GB30 GB
Falcon (40B)160 GB80 GB40 GB

To determine how much more you need is based on your system requirements such as concurrent user queries, caching and so on. I believe stress testing is required.

Finetuning

To fine-tune a model we will need to load all the following into memory which means we will need X6 the minimum memory requirements for inference. The following shows the memory required for a full-precision model per 1 billion parameters.

Model ComponentFull Precision Memory
Model Weights4 GB
Optimizer States8 GB
Gradients4 GB
Activations8 GB
Total24 GB

However, this makes fine-tuning very large models infeasible using full precision. It is recommended to use mixed precision either half precision or 8 bit precision during fine-tuning.

The following table shows the differences between the minimum memory requirements for different precision types.

Model NameFull Precision (32bits)Half Precision (16bits)8bits4bits
Falcon (7B)168 GB84 GB42 GB21 GB
Llama2 (7B)168 GB84 GB42 GB21 GB
Jais (13B)312 GB156 GB78 GB39 GB
Jais (30B)720 GB360 GB180 GB90 GB
Falcon (40B)960 GB480 GB240 GB120 GB
← All PostsThanks for reading