Model quantization¶

Goal¶

In this tutorial you will quantize a standard deep learning model into a format that operates on lower-bit data representation to achieve better inference performance.

A bit of background¶

Quantization is the first step of the deployment process. Typically deep learning models operate on floating-point data. However, to improve inference performance it’s necessary to convert them to work with lower-bit representation during the quantization. Vitis AI provides a quantizer tool inside the deployment container. You can still run the quantized model on a standard PC, however, it’s optimized towards deployment on the edge. Mind that this optimization process is mandatory to deploy the model to the target DPU in later steps.

Prerequisites¶

Docker installed.
Model repository and environment set up as described in Setup the project environment .
Dataset prepared as described in Prepare the dataset .
Model weights prepared as described in Train the model .

Provided outputs¶

Following files (Tutorial files) are associated with this tutorial:

ML deployment/02 Model quantization/quantization_test_preds.h5 - calibration data
ML deployment/02 Model quantization/state_dict.pt - model weights
ML deployment/02 Model quantization/Unet_int.xmodel - quantized
ML deployment/02 Model quantization/Unet.py - Python description of the model

Due to file size following files are available separately:

quantization_samples.h5.xz - calibration data (compressed)

Prepare for deployment Machine learning workstation¶

The quantization process requires model weights and a subset (100 to 1000 samples) of the training dataset to calibrate the model. To avoid preprocessing the dataset for quantization inside the container, run the following command to prepare the calibration samples and model weights in advance:

Open and run the ~/sml-tutorials/ml-deployment/tools/03-quantize/deployment_preparation.ipynb Jupyter Notebook. At the end you should have two files:
- output/03-quantize/quantization_samples.h5 - calibration data
- output/03-quantize/state_dict.pt - model weights

Enter the Vitis AI deployment container with the working directory volume mounted:

customer@ml-workstation:~/sml-tutorials/ml-deployment$ docker run \
    -it \
    -v "$(pwd)":/workspace \
    -e UID="$(id -u)" -e GID="$(id -g)" \
    xilinx/vitis-ai-pytorch-cpu:ubuntu2004-3.5.0.306

Model quantization Vitis AI deployment container¶

Run the following commands in the container environment.

Activate the desired conda environment for PyTorch models deployment:

vitis-ai-user@vitis-ai-container-id:/workspace/tools$ conda activate vitis-ai-wego-torch2

Install third-party modules required to run the model inside the container environment:

(vitis-ai-wego-torch2) vitis-ai-user@vitis-ai-container-id:/workspace/tools$ pip install -r requirements-vitis-ai.txt

Quantize the model using Vitis AI Python libraries. The pytorch_nndct.apis.torch_quantizer function creates the quantizer which operates in two modes: "calib" and "test". The first one calibrates the model to work in lower-bit precision. The second one evaluates and exports the quantized model for further deployment.

Perform quantization by running the following script (remember that the demo model works with 512 by 512 3-channel images and 7 output classes):

(vitis-ai-wego-torch2) vitis-ai-user@vitis-ai-container-id:/workspace/tools$ python3 ./03-quantize/quantize_model.py \
     --input-size 3 512 512 \
     --num-classes 7 \
     --calib-batch-size 8 \
     --state-dict ../output/03-quantize/state_dict.pt \
     --quantization-samples ../output/03-quantize/quantization_samples.h5 \
     --test-samples ../output/03-quantize/quantization_test_preds.h5 \
     --output-dir ../output/03-quantize/

The quantized model will appear in ~/sml-tutorials/ml-deployment/output/03-quantize/.

Warning

Mind that the quantization process is time consuming.

Note

The quantization process includes evaluation of the quantized model. If you wish to skip this step to speed up the process pass an extra flag that will limit the number of calibration and test samples.

(vitis-ai-wego-torch2) vitis-ai-user@vitis-ai-container-id:/workspace/tools$ python3 ./03-quantize/quantize_model.py --calib-samples-limit 1 --test-samples-limit 1 ...

Walk through the quantization script to understand the process:

Quantization requires to load the model from state_dict.pt file first:

model = Unet(num_classes=NUM_CLASSES)
model.load_state_dict(torch.load(state_dict))

Use the quantizer in the "calib" mode to quantize the model. You have to pass a dummy sample with proper input shape (in this case it’s [batch_size, 3, 512, 512]) to initialize the quantizer:

dummy_input = torch.randn(batch_size, *input_shape)
quantizer = torch_quantizer("calib", model, (dummy_input), output_dir=str(output_dir))
quant_model = quantizer.quant_model

The script performs the quantization by passing the calibration samples from quantization_samples.h5 to the model in a loop:

with h5py.File(data_h5_path, "r") as f_in:
    sample_names = list(f_in["calibration"].keys())[:samples_num_limit]
    for names_batch in tqdm(batched(sample_names, batch_size)):
        input_batch = torch.stack([torch.as_tensor(f_in[f"calibration/{name}"]) for name in names_batch])
        quant_model(input_batch)

After calibration, export the quantized model parameters using:
```
quantizer.export_quant_model()
```
However, Vitis AI requires to serialize the model before it can undergo compilation. Set up the quantizer in the "test" mode to enable model export. The test mode requires batch size equal to 1:
```
dummy_input = torch.randn(batch_size, *input_shape)
quantizer = torch_quantizer("test", model, (dummy_input), output_dir=str(output_dir))
quant_model = quantizer.quant_model
```

Vitis AI quantizer requires to infer at least one sample from output/03-quantize/quantization_test_preds.h5 in the test mode before saving the model. You can also evaluate the quantized model in the test mode before it’s serialized:

with h5py.File(test_data_h5_path, "r") as f_in, h5py.File(test_samples, "w") as f_out:
    sample_names = list(f_in["calibration"].keys())[:samples_num_limit]
    for sample_name in tqdm(sample_names):
        input_image = torch.as_tensor(f_in[f"test/{sample_name}"])
        input_batch = input_image.unsqueeze(0)
        pred = quant_model(input_batch)
        f_out.create_dataset(sample_name, data=pred.detach())

Once the model performs inference in the test mode, the quantizer can export it to the .xmodel format for the further deployment:
```
quantizer.export_xmodel(str(output_dir))
```

Exit the Vitis AI container: exit.

Evaluate the quantized model metrics Machine learning workstation¶

The quantization script saves the calibrated model outputs in a file. Optionally you can evaluate metrics for these outputs and preview the results by running the ~/sml-tutorials/ml-deployment/tools/03-quantize/calc_quantized_metrics.ipynb notebook.

Note

Evaluation of the quantized model requires output/03-quantize/quantization_test_preds.h5 with all samples which you might have skipped to save some time.