Model quantization

Goal

In this tutorial you will quantize a standard deep learning model into a format that operates on lower-bit data representation to achieve better inference performance.

A bit of background

Quantization is the first step of the deployment process. Typically deep learning models operate on floating-point data. However, to improve inference performance it’s necessary to convert them to work with lower-bit representation during the quantizaiton. Vitis AI provides a quantizer tool inside the deployment container. You can still run the quantized model on a standard PC, however, it’s optimized towards deployment on the edge. Mind that this optimization process is mandatory to deploy the model to the target DPU in later steps.

Prerequisites

Provided outputs

Following files (Tutorial files) are associated with this tutorial:

  • ML deployment/02 Model quantization/quantization_test_preds.h5 - calibration data

  • ML deployment/02 Model quantization/state_dict.pt - model weights

  • ML deployment/02 Model quantization/Unet_int.xmodel - quantized

  • ML deployment/02 Model quantization/Unet.py - Python description of the model

Due to file size following files are available separately:

  • quantization_samples.h5.xz - calibration data (compressed)

Prepare for deployment Machine learning workstation

  1. The quantization process requires model weights and a subset (100 to 1000 samples) of the training dataset to calibrate the model. To avoid preprocessing the dataset for quantization inside the container, run the following command to prepare the calibration samples and model weights in advance:

    Open and run the ~/sml-tutorials/ml-deployment/tools/03-quantize/deployment_preparation.ipynb Jupyter Notebook. At the end you should have two files:

    • output/03-quantize/quantization_samples.h5 - calibration data

    • output/03-quantize/state_dict.pt - model weights

  2. Enter the Vitis AI deployment container with the working directory volume mounted:

    customer@ml-workstation:~/sml-tutorials/ml-deployment$ docker run \
        -it \
        -v "$(pwd)":/workspace \
        -e UID="$(id -u)" -e GID="$(id -g)" \
        xilinx/vitis-ai-pytorch-cpu:ubuntu2004-3.5.0.306
    

Model quantization Vitis AI deployment container

Run the following commands in the container environment.

  1. Activate the desired conda environment for PyTorch models deployment:

    vitis-ai-user@vitis-ai-container-id:/workspace/tools$ conda activate vitis-ai-wego-torch2
    
  2. Install third-party modules required to run the model inside the container environment:

    (vitis-ai-wego-torch2) vitis-ai-user@vitis-ai-container-id:/workspace/tools$ pip install -r requirements-vitis-ai.txt
    
  3. Quantize the model using Vitis AI Python libraries. The pytorch_nndct.apis.torch_quantizer function creates the quantizer which operates in two modes: "calib" and "test". The first one calibrates the model to work in lower-bit precision. The second one evaluates and exports the quantized model for further deployment.

    Perform quantization by running the following script (remember that the demo model works with 512 by 512 3-channel images and 7 output classes):

    (vitis-ai-wego-torch2) vitis-ai-user@vitis-ai-container-id:/workspace/tools$ python3 ./03-quantize/quantize_model.py \
         --input-size 3 512 512 \
         --num-classes 7 \
         --calib-batch-size 8 \
         --state-dict ../output/03-quantize/state_dict.pt \
         --quantization-samples ../output/03-quantize/quantization_samples.h5 \
         --test-samples ../output/03-quantize/quantization_test_preds.h5 \
         --output-dir ../output/03-quantize/ \
    

    The quantized model will appear in ~/sml-tutorials/ml-deployment/output/03-quantize/.

    Warning

    Mind that the quantization process is time consuming.

    Note

    The quantization process includes evaluation of the quantized model. If you wish to skip this step to speed up the process pass an extra flag that will limit the number of calibration and test samples.

    (vitis-ai-wego-torch2) vitis-ai-user@vitis-ai-container-id:/workspace/tools$ python3 ./03-quantize/quantize_model.py --calib-samples-limit 1 --test-samples-limit 1 ...
    

    Walk through the quantization script to understand the process:

    1. Quantization requires to load the model from state_dict.pt file first:

      model = Unet(num_classes=NUM_CLASSES)
      model.load_state_dict(torch.load(state_dict))
      
    2. Use the quantizer in the "calib" mode to quantize the model. You have to pass a dummy sample with proper input shape (in this case it’s [batch_size, 3, 512, 512]) to initialize the quantizer:

      dummy_input = torch.randn(batch_size, *input_shape)
      quantizer = torch_quantizer("calib", model, (dummy_input), output_dir=str(output_dir))
      quant_model = quantizer.quant_model
      
    3. The script performs the quantization by passing the calibration samples from quantization_samples.h5 to the model in a loop:

      with h5py.File(data_h5_path, "r") as f_in:
          sample_names = list(f_in["calibration"].keys())[:samples_num_limit]
          for names_batch in tqdm(batched(sample_names, batch_size)):
              input_batch = torch.stack([torch.as_tensor(f_in[f"calibration/{name}"]) for name in names_batch])
              quant_model(input_batch)
      
    1. After calibration, export the quantized model parameters using:

      quantizer.export_quant_model()
      
    2. However, Vitis AI requires to serialize the model before it can undergo compilation. Set up the quantizer in the "test" mode to enable model export. The test mode requires batch size equal to 1:

      dummy_input = torch.randn(batch_size, *input_shape)
      quantizer = torch_quantizer("test", model, (dummy_input), output_dir=str(output_dir))
      quant_model = quantizer.quant_model
      
    3. Vitis AI quantizer requires to infer at least one sample from output/03-quantize/quantization_test_preds.h5 in the test mode before saving the model. You can also evaluate the quantized model in the test mode before it’s serialized:

      with h5py.File(test_data_h5_path, "r") as f_in, h5py.File(test_samples, "w") as f_out:
          sample_names = list(f_in["calibration"].keys())[:samples_num_limit]
          for sample_name in tqdm(sample_names):
              input_image = torch.as_tensor(f_in[f"test/{sample_name}"])
              input_batch = input_image.unsqueeze(0)
              pred = quant_model(input_batch)
              f_out.create_dataset(sample_name, data=pred.detach())
      
    4. Once the model performs inference in the test mode, the quantizer can export it to the .xmodel format for the further deployment:

      quantizer.export_xmodel(str(output_dir))
      
  4. Exit the Vitis AI container: exit.

Evaluate the quantized model metrics Machine learning workstation

  1. The quantization script saves the calibrated model outputs in a file. Optionally you can evaluate metrics for these outputs and preview the results by running the ~/sml-tutorials/ml-deployment/tools/03-quantize/calc_quantized_metrics.ipynb notebook.

    Note

    Evaluation of the quantized model requires output/03-quantize/quantization_test_preds.h5 with all samples which you might have skipped to save some time.