TensorRT/Int8CFAQ

From eLinux.org
< TensorRT
Revision as of 03:22, 24 September 2019 by Nfeng (talk | contribs) (How to understand the principle of INT8 calibration?)
Jump to: navigation, search

How to do INT8 calibration without using BatchStream?

The way using BatchStream to do calibration is too complicate to accommodate for practice.

Here we provide an assistant class BatchFactory which utilizes OpenCV for calibration data pre-processing and simplify the calibration procedure.

File:BatchFactory.zip


And then when we implement the IInt8EntropyCalibrator, we can use the API loadBatch from assistant class to load batch data directly.

         bool getBatch(void* bindings[], const char* names[], int nbBindings) override
         {                                                                            
             float mean[3]{102.9801f, 115.9465f, 122.7717f}; // also in BGR order     
             float *batchBuf = mBF.loadBatch(mean, 1.0f);                             
                                                                                      
             // Indicates calibration data feeding done                               
             if (!batchBuf)                                                           
                 return false;                                                        
                                                                                      
             CHECK(cudaMemcpy(mDeviceInput, batchBuf, mInputCount * sizeof(float),    
 cudaMemcpyHostToDevice));                                                            
                                                                                      
             assert(!strcmp(names[0], INPUT_BLOB_NAME0));                             
             bindings[0] = mDeviceInput;                                                                                 
                                                                                      
             return true;                                                             
         }                                                                            

Can INT8 calibration table be compatible among different TRT versions or HW platforms?

INT8 calibration table is absolutely NOT compatible between different TRT versions. This is because the optimized network graph is probably different among various TRT versions. If you enforce to use them, TRT may not find the corresponding scaling factor for given tensor.
As long as the installed TensorRT version is identical for different HW platforms, then the INT8 calibration table can be compatible. That means you can perform INT8 calibration on a faster computation platform, like V100 or T4 and then deploy the calibration table to Tegra for INT8 inferencing as long as these platforms have the same TensorRT version installed (at least with the same major and minor version, like 5.1.5 and 5.1.6).


How to do INT8 calibration for the networks with multiple inputs

TensorRT uses bindings to denote the input and output buffer pointer and they are arranged in order. Hence, if your network has multiple input node/layer, you can pass through the input buffer pointers into bindings (void **) separately, like below network with two inputs required,

         bool getBatch(void* bindings[], const char* names[], int nbBindings) override 
         {                                                                             
             // Prepare the batch data (on GPU) for mDeviceInput and imInfoDev                                             
             ...
                                                                          
             assert(!strcmp(names[0], INPUT_BLOB_NAME0));                              
             bindings[0] = mDeviceInput;                                               
                                                                                       
             assert(!strcmp(names[1], INPUT_BLOB_NAME1));                              
             bindings[1] = imInfoDev;                                                  
                                                                                       
             return true;                                                              
         }     

NOTE: If your calibration batch is 10, then for each calibration cycle, you will need to fill each of your input buffer with 10 images accordingly.

How to understand the principle of INT8 calibration?

Refer to the slide to get the specification of INT8 quantization. It's a post training quantization method.

  • symmetric and per channel quantization for weights
  • symmetric and per tensor quantization for activation
  • Use KL divergence to evaluate the quantization loss of two activation tensors