Difference between revisions of "TensorRT/AccuracyIssues"
(→How to fix INT8 accuracy issue?) |
m (→How to fix FP16 accuracy issue?) |
||
Line 2: | Line 2: | ||
---- | ---- | ||
===== <big> How to fix FP16 accuracy issue?</big> ===== | ===== <big> How to fix FP16 accuracy issue?</big> ===== | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | + | Refer to [https://elinux.org/TensorRT/FP16_Accuracy this page]. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
===== <big> How to fix INT8 accuracy issue?</big> ===== | ===== <big> How to fix INT8 accuracy issue?</big> ===== |
Revision as of 01:14, 14 October 2019
How to fix FP16 accuracy issue?
Refer to this page.
How to fix INT8 accuracy issue?
Basically, you should be able to get an absolutely correct result for FP32 mode and roughly correct result for INT8 mode after TensorRT auto calibration or inserting external dynamic ranges. Otherwise, if FP32 result is as expected, while INT8 result is totally messing up, it’s probably due to invalid calibration procedure or inaccurate dynamic range.
If you are leveraging TensorRT auto calibration mechanism, please do the following checks to rule out calibration issue(refer to here regarding how to perform calibration without using the approach of BatchStream).
IInt8Calibrator contains four virtual methods that need to be implemented, as shown below, the most important and problematic one is getBatch(),
virtual int getBatchSize() const = 0; virtual bool getBatch(void* bindings[], const char* names[], int nbBindings) = 0; virtual const void* readCalibrationCache(std::size_t& length) = 0; virtual void writeCalibrationCache(const void* ptr, std::size_t length) = 0;
- Is the calibration input after preprocessing identical as the preprocessing of FP32 inferencing? If you are not sure about it, just dump the buff before feeding into TensorRT and compare them.
- Is the calibration dataset enough or not? Ensure the calibration dataset is diverse and representative.
- Is there any cached and incorrect calibration table being loaded unexpectedly?
Ultimately you should be able to get a roughly correct result for INT8 mode, and then you can start evaluating its accuracy against your whole test dataset.
If you get a poor classification or detection accuracy as opposed to FP32 mode (Q: which case can be treated as ‘poor’ result, for example, we are able to see within 1% INT8 accuracy loss for popular classification CNNs, like AlexNet, VGG19, Resnet50/101/152 and detection network, like VGG16_FasterRCNN_500x375, VGG16_SSD_300x300, if your accuracy loss is extremely larger than 1%, it might be the ‘poor’ case.), then we would suggest you to try the following approaches to fix it,
- Mix-precision inference
Follow the approach of page to analyze the accuracy of all layers and set higher precision for the layer of which loss is extremely larger than others,
virtual void setPrecision(DataType dataType) = 0;
NOTE: Don't forget configuring strict type for your network, or else, this format setting may compromise during network optimization.
builder->setStrictTypeConstraints(true);
- TensorRT does provide internal quantization way for customers to use, but it’s a post-training quantization way and expose less manipulation for users, so it can’t work for all the network cases. If your model is unluckily to be the case, then you should consider external quantization methodology and insert the dynamic range into TensorRT through the following API,
virtual bool setDynamicRange(float min, float max)
Further reading about the quantization ways in the other frameworks, Tensorflow Post-training Quantization, Tensorflow Quantization-aware Training, Pytorch Quantization.