https://elinux.org/api.php?action=feedcontributions&user=Lynettez&feedformat=atomeLinux.org - User contributions [en]2024-03-29T09:12:52ZUser contributionsMediaWiki 1.31.0https://elinux.org/index.php?title=TensorRT&diff=581476TensorRT2024-02-29T10:36:17Z<p>Lynettez: /* The Usage of Polygraphy */</p>
<hr />
<div>NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. TensorRT-based applications perform up to 40x faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and finally deploy to hyperscale data centers, embedded, or automotive product platforms.<br />
<br><br />
<br />
== Introduction ==<br />
<br />
<br />
[https://developer.nvidia.com/tensorrt TensorRT Download]<br><br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html TensorRT Developer Guide]<br />
<br><br />
<br />
== FAQ ==<br />
<br />
<br />
=== Official FAQ ===<br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#troubleshooting TensorRT Developer Guide#FAQs]<br><br />
<br />
<br />
----<br />
=== Common FAQ ===<br />
You can find answers here for some common questions about using TRT.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonFAQ TensorRT/CommonFAQ]<br><br />
<br />
<br />
----<br />
=== TRT Accuracy FAQ ===<br />
If your FP16 result or Int8 result is not as expected, below page may help you fix the accuracy issues.<br><br />
Refer to the page [https://elinux.org/TensorRT/AccuracyIssues TensorRT/AccuracyIssues]<br><br />
<br />
<br />
----<br />
=== TRT Performance FAQ ===<br />
If the performance of doing inference with TRT is not as expected, below page may help you to optimize the performance.<br><br />
Refer to the page [https://elinux.org/TensorRT/PerfIssues TensorRT/PerfIssues]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Int8 Calibration FAQ ===<br />
Below page will present some FAQs about TRT Int8 Calibration.<br><br />
Refer to the page [https://elinux.org/TensorRT/Int8CFAQ TensorRT/Int8CFAQ]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Plugin FAQ ===<br />
Below page will present some FAQs about TRT Plugin.<br><br />
Refer to the page [https://elinux.org/TensorRT/PluginFAQ TensorRT/PluginFAQ]<br><br />
<br />
<br />
----<br />
=== How to fix some Common Errors ===<br />
If you met some Errors during using TRT, please find from below page for the answer.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonErrorFix TensorRT/CommonErrorFix]<br><br />
<br />
<br />
----<br />
=== How to debug or analyze === <br />
Below page will help you debugging your inferencing in some ways.<br><br />
Refer to the page [https://elinux.org/TensorRT/How2Debug TensorRT/How2Debug]<br><br />
<br />
<br />
----<br />
<br />
=== TRT & YoloV3 FAQ ===<br />
Refer to the page [https://elinux.org/TensorRT/YoloV3 TensorRT/YoloV3]<br><br />
<br />
<br />
----<br />
<br />
=== TRT & YoloV4 FAQ ===<br />
Refer to the page [https://elinux.org/TensorRT/YoloV4 TensorRT/YoloV4]<br><br />
<br />
<br />
----<br />
<br />
=== TRT ONNXParser FAQ ===<br />
If you have some question about onnx dynamic shape and onnx Parsing issues, this page might be helpful.<br><br />
Refer to the page [https://elinux.org/TensorRT/ONNX TensorRT/ONNX]<br><br />
<br />
<br />
----<br />
<br />
=== The Usage of Polygraphy ===<br />
Polygraphy is really useful debugging toolkit for TensorRT<br><br />
Refer to the page [https://elinux.org/TensorRT/Polygraphy_Usage TensorRT/Polygraphy_Usage] <br><br />
<br />
<br />
----</div>Lynettezhttps://elinux.org/index.php?title=TensorRT&diff=581473TensorRT2024-02-29T10:35:59Z<p>Lynettez: /* The Usage of Polygraphy */</p>
<hr />
<div>NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. TensorRT-based applications perform up to 40x faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and finally deploy to hyperscale data centers, embedded, or automotive product platforms.<br />
<br><br />
<br />
== Introduction ==<br />
<br />
<br />
[https://developer.nvidia.com/tensorrt TensorRT Download]<br><br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html TensorRT Developer Guide]<br />
<br><br />
<br />
== FAQ ==<br />
<br />
<br />
=== Official FAQ ===<br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#troubleshooting TensorRT Developer Guide#FAQs]<br><br />
<br />
<br />
----<br />
=== Common FAQ ===<br />
You can find answers here for some common questions about using TRT.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonFAQ TensorRT/CommonFAQ]<br><br />
<br />
<br />
----<br />
=== TRT Accuracy FAQ ===<br />
If your FP16 result or Int8 result is not as expected, below page may help you fix the accuracy issues.<br><br />
Refer to the page [https://elinux.org/TensorRT/AccuracyIssues TensorRT/AccuracyIssues]<br><br />
<br />
<br />
----<br />
=== TRT Performance FAQ ===<br />
If the performance of doing inference with TRT is not as expected, below page may help you to optimize the performance.<br><br />
Refer to the page [https://elinux.org/TensorRT/PerfIssues TensorRT/PerfIssues]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Int8 Calibration FAQ ===<br />
Below page will present some FAQs about TRT Int8 Calibration.<br><br />
Refer to the page [https://elinux.org/TensorRT/Int8CFAQ TensorRT/Int8CFAQ]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Plugin FAQ ===<br />
Below page will present some FAQs about TRT Plugin.<br><br />
Refer to the page [https://elinux.org/TensorRT/PluginFAQ TensorRT/PluginFAQ]<br><br />
<br />
<br />
----<br />
=== How to fix some Common Errors ===<br />
If you met some Errors during using TRT, please find from below page for the answer.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonErrorFix TensorRT/CommonErrorFix]<br><br />
<br />
<br />
----<br />
=== How to debug or analyze === <br />
Below page will help you debugging your inferencing in some ways.<br><br />
Refer to the page [https://elinux.org/TensorRT/How2Debug TensorRT/How2Debug]<br><br />
<br />
<br />
----<br />
<br />
=== TRT & YoloV3 FAQ ===<br />
Refer to the page [https://elinux.org/TensorRT/YoloV3 TensorRT/YoloV3]<br><br />
<br />
<br />
----<br />
<br />
=== TRT & YoloV4 FAQ ===<br />
Refer to the page [https://elinux.org/TensorRT/YoloV4 TensorRT/YoloV4]<br><br />
<br />
<br />
----<br />
<br />
=== TRT ONNXParser FAQ ===<br />
If you have some question about onnx dynamic shape and onnx Parsing issues, this page might be helpful.<br><br />
Refer to the page [https://elinux.org/TensorRT/ONNX TensorRT/ONNX]<br><br />
<br />
<br />
----<br />
<br />
=== The Usage of Polygraphy ===<br />
Polygraphy is really useful debugging toolkit for TensorRT<br><br />
Refer to the page [https://elinux.org/TensorRT/Polygraphy_Usage|TensorRT/Polygraphy_Usage] <br><br />
<br />
<br />
----</div>Lynettezhttps://elinux.org/index.php?title=TensorRT&diff=581470TensorRT2024-02-29T10:35:33Z<p>Lynettez: /* The Usage of Polygraphy */</p>
<hr />
<div>NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. TensorRT-based applications perform up to 40x faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and finally deploy to hyperscale data centers, embedded, or automotive product platforms.<br />
<br><br />
<br />
== Introduction ==<br />
<br />
<br />
[https://developer.nvidia.com/tensorrt TensorRT Download]<br><br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html TensorRT Developer Guide]<br />
<br><br />
<br />
== FAQ ==<br />
<br />
<br />
=== Official FAQ ===<br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#troubleshooting TensorRT Developer Guide#FAQs]<br><br />
<br />
<br />
----<br />
=== Common FAQ ===<br />
You can find answers here for some common questions about using TRT.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonFAQ TensorRT/CommonFAQ]<br><br />
<br />
<br />
----<br />
=== TRT Accuracy FAQ ===<br />
If your FP16 result or Int8 result is not as expected, below page may help you fix the accuracy issues.<br><br />
Refer to the page [https://elinux.org/TensorRT/AccuracyIssues TensorRT/AccuracyIssues]<br><br />
<br />
<br />
----<br />
=== TRT Performance FAQ ===<br />
If the performance of doing inference with TRT is not as expected, below page may help you to optimize the performance.<br><br />
Refer to the page [https://elinux.org/TensorRT/PerfIssues TensorRT/PerfIssues]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Int8 Calibration FAQ ===<br />
Below page will present some FAQs about TRT Int8 Calibration.<br><br />
Refer to the page [https://elinux.org/TensorRT/Int8CFAQ TensorRT/Int8CFAQ]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Plugin FAQ ===<br />
Below page will present some FAQs about TRT Plugin.<br><br />
Refer to the page [https://elinux.org/TensorRT/PluginFAQ TensorRT/PluginFAQ]<br><br />
<br />
<br />
----<br />
=== How to fix some Common Errors ===<br />
If you met some Errors during using TRT, please find from below page for the answer.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonErrorFix TensorRT/CommonErrorFix]<br><br />
<br />
<br />
----<br />
=== How to debug or analyze === <br />
Below page will help you debugging your inferencing in some ways.<br><br />
Refer to the page [https://elinux.org/TensorRT/How2Debug TensorRT/How2Debug]<br><br />
<br />
<br />
----<br />
<br />
=== TRT & YoloV3 FAQ ===<br />
Refer to the page [https://elinux.org/TensorRT/YoloV3 TensorRT/YoloV3]<br><br />
<br />
<br />
----<br />
<br />
=== TRT & YoloV4 FAQ ===<br />
Refer to the page [https://elinux.org/TensorRT/YoloV4 TensorRT/YoloV4]<br><br />
<br />
<br />
----<br />
<br />
=== TRT ONNXParser FAQ ===<br />
If you have some question about onnx dynamic shape and onnx Parsing issues, this page might be helpful.<br><br />
Refer to the page [https://elinux.org/TensorRT/ONNX TensorRT/ONNX]<br><br />
<br />
<br />
----<br />
<br />
=== The Usage of Polygraphy ===<br />
Polygraphy is really useful debugging toolkit for TensorRT<br><br />
Refer to the page [[https://elinux.org/TensorRT/Polygraphy_Usage|TensorRT/Polygraphy_Usage]]<br><br />
<br />
<br />
----</div>Lynettezhttps://elinux.org/index.php?title=TensorRT&diff=581467TensorRT2024-02-29T10:33:42Z<p>Lynettez: /* FAQ */</p>
<hr />
<div>NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. TensorRT-based applications perform up to 40x faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and finally deploy to hyperscale data centers, embedded, or automotive product platforms.<br />
<br><br />
<br />
== Introduction ==<br />
<br />
<br />
[https://developer.nvidia.com/tensorrt TensorRT Download]<br><br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html TensorRT Developer Guide]<br />
<br><br />
<br />
== FAQ ==<br />
<br />
<br />
=== Official FAQ ===<br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#troubleshooting TensorRT Developer Guide#FAQs]<br><br />
<br />
<br />
----<br />
=== Common FAQ ===<br />
You can find answers here for some common questions about using TRT.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonFAQ TensorRT/CommonFAQ]<br><br />
<br />
<br />
----<br />
=== TRT Accuracy FAQ ===<br />
If your FP16 result or Int8 result is not as expected, below page may help you fix the accuracy issues.<br><br />
Refer to the page [https://elinux.org/TensorRT/AccuracyIssues TensorRT/AccuracyIssues]<br><br />
<br />
<br />
----<br />
=== TRT Performance FAQ ===<br />
If the performance of doing inference with TRT is not as expected, below page may help you to optimize the performance.<br><br />
Refer to the page [https://elinux.org/TensorRT/PerfIssues TensorRT/PerfIssues]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Int8 Calibration FAQ ===<br />
Below page will present some FAQs about TRT Int8 Calibration.<br><br />
Refer to the page [https://elinux.org/TensorRT/Int8CFAQ TensorRT/Int8CFAQ]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Plugin FAQ ===<br />
Below page will present some FAQs about TRT Plugin.<br><br />
Refer to the page [https://elinux.org/TensorRT/PluginFAQ TensorRT/PluginFAQ]<br><br />
<br />
<br />
----<br />
=== How to fix some Common Errors ===<br />
If you met some Errors during using TRT, please find from below page for the answer.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonErrorFix TensorRT/CommonErrorFix]<br><br />
<br />
<br />
----<br />
=== How to debug or analyze === <br />
Below page will help you debugging your inferencing in some ways.<br><br />
Refer to the page [https://elinux.org/TensorRT/How2Debug TensorRT/How2Debug]<br><br />
<br />
<br />
----<br />
<br />
=== TRT & YoloV3 FAQ ===<br />
Refer to the page [https://elinux.org/TensorRT/YoloV3 TensorRT/YoloV3]<br><br />
<br />
<br />
----<br />
<br />
=== TRT & YoloV4 FAQ ===<br />
Refer to the page [https://elinux.org/TensorRT/YoloV4 TensorRT/YoloV4]<br><br />
<br />
<br />
----<br />
<br />
=== TRT ONNXParser FAQ ===<br />
If you have some question about onnx dynamic shape and onnx Parsing issues, this page might be helpful.<br><br />
Refer to the page [https://elinux.org/TensorRT/ONNX TensorRT/ONNX]<br><br />
<br />
<br />
----<br />
<br />
=== The Usage of Polygraphy ===<br />
Polygraphy is really useful debugging toolkit for TensorRT<br><br />
Refer to the page [https://elinux.org/TensorRT/Polygraphy_Usage]<br><br />
<br />
<br />
----</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/Polygraphy_Usage&diff=581464TensorRT/Polygraphy Usage2024-02-29T10:31:33Z<p>Lynettez: /* Advanced Usage */</p>
<hr />
<div>Polygraphy is an open-source toolkit designed for debugging issues related to TensorRT during its build or runtime phase as well as modifications of the ONNX model. More detailed introduction and usages can be found here: [https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy TensorRT_OSS/Polygraphy]<br />
<br />
This page will summarize the most frequently used command lines during the debugging process.<br />
=== Install & Setup ===<br />
Prerequisite:<br><br />
* CUDA, cuDNN and TensorRT should be installed correctly<br />
* TensorRT-python is must to be installed '''sudo apt install -y python3-libnvinfer python3-libnvinfer-dev'''<br />
* Other dependencies:<br><br />
sudo apt install -y onnx-graphsurgeon python3-pip<br />
pip3 install onnx onnxruntime'''<br />
Quick Install:<br><br />
<pre><br />
python -m pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.com<br />
export PATH=${HOME}/.local/bin:$PATH<br />
</pre><br />
Build From Source: <br />
<pre><br />
git clone https://github.com/NVIDIA/TensorRT.git<br />
cd TensorRT/tools/Polygraphy<br />
python setup.py bdist_wheel<br />
pip3 uninstall -y polygraphy<br />
cd .. <br />
python -m pip install Polygraphy/dist/polygraphy-0.47.1-py2.py3-none-any.whl<br />
export PATH=${HOME}/.local/bin:$PATH<br />
</pre><br />
<br><br />
<br />
=== Onnx Model Modification ===<br />
==== Constant-fold ====<br />
<pre><br />
polygraphy surgeon sanitize model.onnx -o model_sim.onnx --fold-constants<br />
</pre><br />
==== Extract a Subgraph ====<br />
<pre><br />
polygraphy surgeon extract model.onnx -o subgraph.onnx \<br />
--inputs x1:[1,3,224,224]:float32 \ # --inputs x1:auto:auto<br />
--outputs add_out:float32 # --outputs add_out:auto<br />
</pre><br />
==== Modify Input shape ====<br />
<pre><br />
polygraphy surgeon sanitize model.onnx -o model_new.onnx --override-input-shapes image:[1,3,224,224] other_input:[10]<br />
</pre><br />
<br><br />
<br />
=== Debugging Accuracy Issue===<br />
==== Using '''debug precision''' Tool ====<br />
If the inference results of TensorRT is unacceptable only when FP16 or INT8 mode is enabled, we can use '''polygraphy debug precision''' to reduce the precision for some layer. By setting higher precision for partial layers, we can locate to a specific layer that producing unexpected outputs in lower precision.<br><br />
<pre><br />
Step 1. Generate input and golden output data from onnxrt:<br />
<br />
$ polygraphy run model.onnx --onnxrt --save-inputs=inputs.json --save-outputs golden_outputs.json<br />
<br />
Step 2. Verify that inference results in higher precision mode is good:<br />
e.g. check the results in FP16 mode comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --fp16 --load-inputs inputs.json --load-outputs golden_outputs.json --atol 0.5 --rtol 0.3<br />
<br />
Step 3. Reproduce the accuracy issue in lower precision mode:<br />
e.g. check the results in Int8 mode comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs inputs.json --load-outputs golden_outputs.json --atol 0.5 --rtol 0.3<br />
<br />
Step 4. Try to mark some layers running on high precision and check the accuracy with `polygraphy debug precision`<br />
e.g. Set the higher precision to float16 using '''-p''', set the mode to '''bisect''' and the direction to '''forward'''<br />
<br />
$ polygraphy debug precision model.onnx --int8 --fp16 --calibration-cache calibtable \<br />
--mode bisect --dir forward -p float16 \<br />
--check polygraphy run polygraphy_debug.engine --trt --load-inputs inputs.json \<br />
--load-outputs golden_outputs.json --atol 0.5 --rtol 0.3<br />
…<br />
[I] To achieve acceptable accuracy, try running the first 5 layer(s) in higher precision<br />
[I] Finished 4 iteration(s) | Passed: 2/4 | Pass Rate: 50.0%<br />
[I] PASSED | Runtime: 33.551s | Command: …<br />
<br />
Tips: Try different options with --mode bisect/linear --dir forward/reverse if it failed in some cases.<br />
<br />
</pre><br />
<br><br />
<br />
<br />
==== Using '''debug reduce''' Tool ====<br />
If the inference results of TensorRT is unacceptable only when DLA or sparsiry is enabled, we're able to debug such an accuracy issue using '''polygraphy debug reduce''' to find the smallest onnx subgraph that still fails in accuracy checking.<br />
<pre><br />
Step 1. Generate input and golden output data from onnxrt:<br />
<br />
$ polygraphy run spasity_qat_model.onnx --onnxrt --save-inputs=inputs.json --save-outputs golden_outputs.json<br />
<br />
Step 2. Verify that inference results in normal case is good:<br />
e.g. check the GPU Int8 results comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs=inputs.json --load-outputs=golden_outputs.json --atol 0.1 --rtol 0.05<br />
[I] PASSED | Output: 'output' | Difference is within tolerance (rel=0.05, abs=0.1)<br />
[I] PASSED | All outputs matched | Outputs: ['output']<br />
<br />
Step 3. Reproduce the accuracy drop:<br />
e.g. check the sparsity inference results comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs=inputs.json --load-outputs=golden_outputs.json --atol 0.1 --rtol 0.05 --sparse-weights<br />
[E] FAILED | Output: 'output' | Difference exceeds tolerance (rel=0.05, abs=0.1)<br />
<br />
e.g. check the DLA Int8 inference results comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs=inputs.json --load-outputs=golden_outputs.json --atol 0.3 --rtol 0.05 --use-dla --allow-gpu-fallback<br />
[E] Accuracy Summary | trt-runner-N0-12/13/23-05:15:21 vs. trt-runner-N0-12/13/23-05:07:53 | Passed: 0/1 iterations | Pass Rate: 0.0%<br />
<br />
Step 4. Generate layer-wise input and output data:<br />
<br />
$ polygraphy run debug.onnx --onnxrt --save-inputs=inputs.json --save-outputs=golden_lw_results.json \<br />
--onnx-outputs mark all<br />
$ polygraphy data to-input inputs.json golden_lw_results.json -o layerwise_inputs.json<br />
<br />
Step 5. Debug with `reduce` subtool, find the minimum bad model, the last layer of initial_reduced.onnx is the culprit:<br />
<br />
$ polygraphy debug reduce model.onnx --no-reduce-inputs -o initial_reduced.onnx --mode=bisect \<br />
--check polygraphy run polygraphy_debug.onnx --trt --int8 --calibration-cache calibtable \<br />
--load-inputs layerwise_inputs.json --load-outputs golden_lw_results.json --atol 0.3 --rtol 0.05 \<br />
--use-dla --allow-gpu-fallback OR --sparse-weights<br />
<br />
[I] Saving ONNX model to: initial_reduced.onnx<br />
<br />
Tips: Try different options with --mode bisect/linear --no-reduce-inputs/--no-reduce-outputs if it failed in some cases.<br />
<br />
</pre><br />
<br><br />
<br />
==== By Saving and Loading Tactics ====<br />
If the inference results of the same mode in the same precision are unacceptable after upgrading the TensorRT version, we can check if the inference results are good or not with the same tatics.<br><br />
<pre><br />
Step 1. Save the tactics, and output data in the envrienment you used before<br />
<br />
$ polygraphy run model.onnx --trt --fp16 --save-outputs=85_results.json --save-tactics=85_tactics.json --data-loader-script data_loader.py<br />
<br />
Step 2. Load the tactics in the envrienment you used for now, to see if the output data is the consistent.<br />
<br />
$ polygraphy run model.onnx --trt --fp16 --load-outputs=85_results.json --load-tactics=85_tactics.json --data-loader-script data_loader.py<br />
<br />
</pre><br />
<br><br />
Feel free to file bug to NV once you find a bad tactic.<br />
<br />
==== Limitation ====<br />
Can't work properly with the model that includes Q/DQ nodes.<br />
<br />
=== Advanced Usage ===<br />
Generate a netron-viewable network structure:<br><br />
'''polygraphy convert model.onnx --convert-to onnx-like-trt-network -o network.pb'''<br><br />
<br><br />
Convert ONNX models to FP16: --fp-to-fp16, Useful for checking if TRT’s FP16 error is same:<br><br />
'''polygraphy convert model.onnx --fp-to-fp16 -o model_fp16.onnx'''<br><br />
<br><br />
Convert run command to script with --gen/--gen-script:<br><br />
'''polygraphy run model.onnx --trt --fp16 --load-outputs=golden_outputs.json --gen -'''</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/Polygraphy_Usage&diff=581461TensorRT/Polygraphy Usage2024-02-29T10:31:00Z<p>Lynettez: /* Install & Setup */</p>
<hr />
<div>Polygraphy is an open-source toolkit designed for debugging issues related to TensorRT during its build or runtime phase as well as modifications of the ONNX model. More detailed introduction and usages can be found here: [https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy TensorRT_OSS/Polygraphy]<br />
<br />
This page will summarize the most frequently used command lines during the debugging process.<br />
=== Install & Setup ===<br />
Prerequisite:<br><br />
* CUDA, cuDNN and TensorRT should be installed correctly<br />
* TensorRT-python is must to be installed '''sudo apt install -y python3-libnvinfer python3-libnvinfer-dev'''<br />
* Other dependencies:<br><br />
sudo apt install -y onnx-graphsurgeon python3-pip<br />
pip3 install onnx onnxruntime'''<br />
Quick Install:<br><br />
<pre><br />
python -m pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.com<br />
export PATH=${HOME}/.local/bin:$PATH<br />
</pre><br />
Build From Source: <br />
<pre><br />
git clone https://github.com/NVIDIA/TensorRT.git<br />
cd TensorRT/tools/Polygraphy<br />
python setup.py bdist_wheel<br />
pip3 uninstall -y polygraphy<br />
cd .. <br />
python -m pip install Polygraphy/dist/polygraphy-0.47.1-py2.py3-none-any.whl<br />
export PATH=${HOME}/.local/bin:$PATH<br />
</pre><br />
<br><br />
<br />
=== Onnx Model Modification ===<br />
==== Constant-fold ====<br />
<pre><br />
polygraphy surgeon sanitize model.onnx -o model_sim.onnx --fold-constants<br />
</pre><br />
==== Extract a Subgraph ====<br />
<pre><br />
polygraphy surgeon extract model.onnx -o subgraph.onnx \<br />
--inputs x1:[1,3,224,224]:float32 \ # --inputs x1:auto:auto<br />
--outputs add_out:float32 # --outputs add_out:auto<br />
</pre><br />
==== Modify Input shape ====<br />
<pre><br />
polygraphy surgeon sanitize model.onnx -o model_new.onnx --override-input-shapes image:[1,3,224,224] other_input:[10]<br />
</pre><br />
<br><br />
<br />
=== Debugging Accuracy Issue===<br />
==== Using '''debug precision''' Tool ====<br />
If the inference results of TensorRT is unacceptable only when FP16 or INT8 mode is enabled, we can use '''polygraphy debug precision''' to reduce the precision for some layer. By setting higher precision for partial layers, we can locate to a specific layer that producing unexpected outputs in lower precision.<br><br />
<pre><br />
Step 1. Generate input and golden output data from onnxrt:<br />
<br />
$ polygraphy run model.onnx --onnxrt --save-inputs=inputs.json --save-outputs golden_outputs.json<br />
<br />
Step 2. Verify that inference results in higher precision mode is good:<br />
e.g. check the results in FP16 mode comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --fp16 --load-inputs inputs.json --load-outputs golden_outputs.json --atol 0.5 --rtol 0.3<br />
<br />
Step 3. Reproduce the accuracy issue in lower precision mode:<br />
e.g. check the results in Int8 mode comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs inputs.json --load-outputs golden_outputs.json --atol 0.5 --rtol 0.3<br />
<br />
Step 4. Try to mark some layers running on high precision and check the accuracy with `polygraphy debug precision`<br />
e.g. Set the higher precision to float16 using '''-p''', set the mode to '''bisect''' and the direction to '''forward'''<br />
<br />
$ polygraphy debug precision model.onnx --int8 --fp16 --calibration-cache calibtable \<br />
--mode bisect --dir forward -p float16 \<br />
--check polygraphy run polygraphy_debug.engine --trt --load-inputs inputs.json \<br />
--load-outputs golden_outputs.json --atol 0.5 --rtol 0.3<br />
…<br />
[I] To achieve acceptable accuracy, try running the first 5 layer(s) in higher precision<br />
[I] Finished 4 iteration(s) | Passed: 2/4 | Pass Rate: 50.0%<br />
[I] PASSED | Runtime: 33.551s | Command: …<br />
<br />
Tips: Try different options with --mode bisect/linear --dir forward/reverse if it failed in some cases.<br />
<br />
</pre><br />
<br><br />
<br />
<br />
==== Using '''debug reduce''' Tool ====<br />
If the inference results of TensorRT is unacceptable only when DLA or sparsiry is enabled, we're able to debug such an accuracy issue using '''polygraphy debug reduce''' to find the smallest onnx subgraph that still fails in accuracy checking.<br />
<pre><br />
Step 1. Generate input and golden output data from onnxrt:<br />
<br />
$ polygraphy run spasity_qat_model.onnx --onnxrt --save-inputs=inputs.json --save-outputs golden_outputs.json<br />
<br />
Step 2. Verify that inference results in normal case is good:<br />
e.g. check the GPU Int8 results comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs=inputs.json --load-outputs=golden_outputs.json --atol 0.1 --rtol 0.05<br />
[I] PASSED | Output: 'output' | Difference is within tolerance (rel=0.05, abs=0.1)<br />
[I] PASSED | All outputs matched | Outputs: ['output']<br />
<br />
Step 3. Reproduce the accuracy drop:<br />
e.g. check the sparsity inference results comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs=inputs.json --load-outputs=golden_outputs.json --atol 0.1 --rtol 0.05 --sparse-weights<br />
[E] FAILED | Output: 'output' | Difference exceeds tolerance (rel=0.05, abs=0.1)<br />
<br />
e.g. check the DLA Int8 inference results comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs=inputs.json --load-outputs=golden_outputs.json --atol 0.3 --rtol 0.05 --use-dla --allow-gpu-fallback<br />
[E] Accuracy Summary | trt-runner-N0-12/13/23-05:15:21 vs. trt-runner-N0-12/13/23-05:07:53 | Passed: 0/1 iterations | Pass Rate: 0.0%<br />
<br />
Step 4. Generate layer-wise input and output data:<br />
<br />
$ polygraphy run debug.onnx --onnxrt --save-inputs=inputs.json --save-outputs=golden_lw_results.json \<br />
--onnx-outputs mark all<br />
$ polygraphy data to-input inputs.json golden_lw_results.json -o layerwise_inputs.json<br />
<br />
Step 5. Debug with `reduce` subtool, find the minimum bad model, the last layer of initial_reduced.onnx is the culprit:<br />
<br />
$ polygraphy debug reduce model.onnx --no-reduce-inputs -o initial_reduced.onnx --mode=bisect \<br />
--check polygraphy run polygraphy_debug.onnx --trt --int8 --calibration-cache calibtable \<br />
--load-inputs layerwise_inputs.json --load-outputs golden_lw_results.json --atol 0.3 --rtol 0.05 \<br />
--use-dla --allow-gpu-fallback OR --sparse-weights<br />
<br />
[I] Saving ONNX model to: initial_reduced.onnx<br />
<br />
Tips: Try different options with --mode bisect/linear --no-reduce-inputs/--no-reduce-outputs if it failed in some cases.<br />
<br />
</pre><br />
<br><br />
<br />
==== By Saving and Loading Tactics ====<br />
If the inference results of the same mode in the same precision are unacceptable after upgrading the TensorRT version, we can check if the inference results are good or not with the same tatics.<br><br />
<pre><br />
Step 1. Save the tactics, and output data in the envrienment you used before<br />
<br />
$ polygraphy run model.onnx --trt --fp16 --save-outputs=85_results.json --save-tactics=85_tactics.json --data-loader-script data_loader.py<br />
<br />
Step 2. Load the tactics in the envrienment you used for now, to see if the output data is the consistent.<br />
<br />
$ polygraphy run model.onnx --trt --fp16 --load-outputs=85_results.json --load-tactics=85_tactics.json --data-loader-script data_loader.py<br />
<br />
</pre><br />
<br><br />
Feel free to file bug to NV once you find a bad tactic.<br />
<br />
==== Limitation ====<br />
Can't work properly with the model that includes Q/DQ nodes.<br />
<br />
=== Advanced Usage ===<br />
Generate a netron-viewable network structure:<br><br />
'''polygraphy convert model.onnx --convert-to onnx-like-trt-network -o network.pb'''<br><br />
<br><br />
Convert ONNX models to FP16: --fp-to-fp16, Useful for checking if TRT’s FP16 error is same:<br><br />
'''polygraphy convert model.onnx --fp-to-fp16 -o model_fp16.onnx'''<br><br />
<br><br />
Convert run command to script with --gen/--gen-script:<br><br />
'''polygraphy run model.onnx --trt --fp16 --load-outputs=golden_outputs.json --gen -'''</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/Polygraphy_Usage&diff=581458TensorRT/Polygraphy Usage2024-02-29T10:30:21Z<p>Lynettez: Created page with "Polygraphy is an open-source toolkit designed for debugging issues related to TensorRT during its build or runtime phase as well as modifications of the ONNX model. More detai..."</p>
<hr />
<div>Polygraphy is an open-source toolkit designed for debugging issues related to TensorRT during its build or runtime phase as well as modifications of the ONNX model. More detailed introduction and usages can be found here: [https://github.com/NVIDIA/TensorRT/tree/main/tools/Polygraphy TensorRT_OSS/Polygraphy]<br />
<br />
This page will summarize the most frequently used command lines during the debugging process.<br />
=== Install & Setup ===<br />
Prerequisite:<br><br />
* CUDA, cuDNN and TensorRT should be installed correctly<br />
* TensorRT-python is must to be installed '''sudo apt install -y python3-libnvinfer python3-libnvinfer-dev'''<br />
* Other dependencies:<br><br />
'''sudo apt install -y onnx-graphsurgeon python3-pip'''<br />
'''pip3 install onnx onnxruntime'''<br />
Quick Install:<br><br />
<pre><br />
python -m pip install colored polygraphy --extra-index-url https://pypi.ngc.nvidia.com<br />
export PATH=${HOME}/.local/bin:$PATH<br />
</pre><br />
Build From Source: <br />
<pre><br />
git clone https://github.com/NVIDIA/TensorRT.git<br />
cd TensorRT/tools/Polygraphy<br />
python setup.py bdist_wheel<br />
pip3 uninstall -y polygraphy<br />
cd .. <br />
python -m pip install Polygraphy/dist/polygraphy-0.47.1-py2.py3-none-any.whl<br />
export PATH=${HOME}/.local/bin:$PATH<br />
</pre><br />
<br><br />
=== Onnx Model Modification ===<br />
==== Constant-fold ====<br />
<pre><br />
polygraphy surgeon sanitize model.onnx -o model_sim.onnx --fold-constants<br />
</pre><br />
==== Extract a Subgraph ====<br />
<pre><br />
polygraphy surgeon extract model.onnx -o subgraph.onnx \<br />
--inputs x1:[1,3,224,224]:float32 \ # --inputs x1:auto:auto<br />
--outputs add_out:float32 # --outputs add_out:auto<br />
</pre><br />
==== Modify Input shape ====<br />
<pre><br />
polygraphy surgeon sanitize model.onnx -o model_new.onnx --override-input-shapes image:[1,3,224,224] other_input:[10]<br />
</pre><br />
<br><br />
<br />
=== Debugging Accuracy Issue===<br />
==== Using '''debug precision''' Tool ====<br />
If the inference results of TensorRT is unacceptable only when FP16 or INT8 mode is enabled, we can use '''polygraphy debug precision''' to reduce the precision for some layer. By setting higher precision for partial layers, we can locate to a specific layer that producing unexpected outputs in lower precision.<br><br />
<pre><br />
Step 1. Generate input and golden output data from onnxrt:<br />
<br />
$ polygraphy run model.onnx --onnxrt --save-inputs=inputs.json --save-outputs golden_outputs.json<br />
<br />
Step 2. Verify that inference results in higher precision mode is good:<br />
e.g. check the results in FP16 mode comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --fp16 --load-inputs inputs.json --load-outputs golden_outputs.json --atol 0.5 --rtol 0.3<br />
<br />
Step 3. Reproduce the accuracy issue in lower precision mode:<br />
e.g. check the results in Int8 mode comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs inputs.json --load-outputs golden_outputs.json --atol 0.5 --rtol 0.3<br />
<br />
Step 4. Try to mark some layers running on high precision and check the accuracy with `polygraphy debug precision`<br />
e.g. Set the higher precision to float16 using '''-p''', set the mode to '''bisect''' and the direction to '''forward'''<br />
<br />
$ polygraphy debug precision model.onnx --int8 --fp16 --calibration-cache calibtable \<br />
--mode bisect --dir forward -p float16 \<br />
--check polygraphy run polygraphy_debug.engine --trt --load-inputs inputs.json \<br />
--load-outputs golden_outputs.json --atol 0.5 --rtol 0.3<br />
…<br />
[I] To achieve acceptable accuracy, try running the first 5 layer(s) in higher precision<br />
[I] Finished 4 iteration(s) | Passed: 2/4 | Pass Rate: 50.0%<br />
[I] PASSED | Runtime: 33.551s | Command: …<br />
<br />
Tips: Try different options with --mode bisect/linear --dir forward/reverse if it failed in some cases.<br />
<br />
</pre><br />
<br><br />
<br />
<br />
==== Using '''debug reduce''' Tool ====<br />
If the inference results of TensorRT is unacceptable only when DLA or sparsiry is enabled, we're able to debug such an accuracy issue using '''polygraphy debug reduce''' to find the smallest onnx subgraph that still fails in accuracy checking.<br />
<pre><br />
Step 1. Generate input and golden output data from onnxrt:<br />
<br />
$ polygraphy run spasity_qat_model.onnx --onnxrt --save-inputs=inputs.json --save-outputs golden_outputs.json<br />
<br />
Step 2. Verify that inference results in normal case is good:<br />
e.g. check the GPU Int8 results comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs=inputs.json --load-outputs=golden_outputs.json --atol 0.1 --rtol 0.05<br />
[I] PASSED | Output: 'output' | Difference is within tolerance (rel=0.05, abs=0.1)<br />
[I] PASSED | All outputs matched | Outputs: ['output']<br />
<br />
Step 3. Reproduce the accuracy drop:<br />
e.g. check the sparsity inference results comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs=inputs.json --load-outputs=golden_outputs.json --atol 0.1 --rtol 0.05 --sparse-weights<br />
[E] FAILED | Output: 'output' | Difference exceeds tolerance (rel=0.05, abs=0.1)<br />
<br />
e.g. check the DLA Int8 inference results comparing with onnxrt outputs<br />
<br />
$ polygraphy run model.onnx --trt --int8 --calibration-cache calibtable --load-inputs=inputs.json --load-outputs=golden_outputs.json --atol 0.3 --rtol 0.05 --use-dla --allow-gpu-fallback<br />
[E] Accuracy Summary | trt-runner-N0-12/13/23-05:15:21 vs. trt-runner-N0-12/13/23-05:07:53 | Passed: 0/1 iterations | Pass Rate: 0.0%<br />
<br />
Step 4. Generate layer-wise input and output data:<br />
<br />
$ polygraphy run debug.onnx --onnxrt --save-inputs=inputs.json --save-outputs=golden_lw_results.json \<br />
--onnx-outputs mark all<br />
$ polygraphy data to-input inputs.json golden_lw_results.json -o layerwise_inputs.json<br />
<br />
Step 5. Debug with `reduce` subtool, find the minimum bad model, the last layer of initial_reduced.onnx is the culprit:<br />
<br />
$ polygraphy debug reduce model.onnx --no-reduce-inputs -o initial_reduced.onnx --mode=bisect \<br />
--check polygraphy run polygraphy_debug.onnx --trt --int8 --calibration-cache calibtable \<br />
--load-inputs layerwise_inputs.json --load-outputs golden_lw_results.json --atol 0.3 --rtol 0.05 \<br />
--use-dla --allow-gpu-fallback OR --sparse-weights<br />
<br />
[I] Saving ONNX model to: initial_reduced.onnx<br />
<br />
Tips: Try different options with --mode bisect/linear --no-reduce-inputs/--no-reduce-outputs if it failed in some cases.<br />
<br />
</pre><br />
<br><br />
<br />
==== By Saving and Loading Tactics ====<br />
If the inference results of the same mode in the same precision are unacceptable after upgrading the TensorRT version, we can check if the inference results are good or not with the same tatics.<br><br />
<pre><br />
Step 1. Save the tactics, and output data in the envrienment you used before<br />
<br />
$ polygraphy run model.onnx --trt --fp16 --save-outputs=85_results.json --save-tactics=85_tactics.json --data-loader-script data_loader.py<br />
<br />
Step 2. Load the tactics in the envrienment you used for now, to see if the output data is the consistent.<br />
<br />
$ polygraphy run model.onnx --trt --fp16 --load-outputs=85_results.json --load-tactics=85_tactics.json --data-loader-script data_loader.py<br />
<br />
</pre><br />
<br><br />
Feel free to file bug to NV once you find a bad tactic.<br />
<br />
==== Limitation ====<br />
Can't work properly with the model that includes Q/DQ nodes.<br />
<br />
=== Advanced Usage ===<br />
Generate a netron-viewable network structure:<br><br />
'''polygraphy convert model.onnx --convert-to onnx-like-trt-network -o network.pb'''<br><br />
<br><br />
Convert ONNX models to FP16: --fp-to-fp16, Useful for checking if TRT’s FP16 error is same:<br><br />
'''polygraphy convert model.onnx --fp-to-fp16 -o model_fp16.onnx'''<br><br />
<br><br />
Convert run command to script with --gen/--gen-script:<br><br />
'''polygraphy run model.onnx --trt --fp16 --load-outputs=golden_outputs.json --gen -'''</div>Lynettezhttps://elinux.org/index.php?title=File:CheckDLAOutputScripts.zip&diff=554276File:CheckDLAOutputScripts.zip2021-08-13T01:33:48Z<p>Lynettez: </p>
<hr />
<div></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=552901TensorRT/ONNX2021-06-24T04:02:26Z<p>Lynettez: /* Introduce some use cases of onnx-graphsurgeon */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Compatibility ===<br />
ONNX Operators: https://github.com/onnx/onnx-tensorrt/blob/master/docs/operators.md (e.g. TensorRT 7.2 supports operators up to Opset 11)<br><br />
cuDNN/TF/Pytorch/ONNX: "Compatibility" section in TensorRT release note - https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html <br><br />
Protobuf: https://github.com/onnx/onnx-tensorrt#dependencies (e.g. Protobuf >= 3.0.x)<br />
<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model, refer to the below introduction of Onnx-GraghSurgeon<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingDimensions()<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes. You also could try with newer onnx opset during the converting of onnx model.<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
If you still got error during building the engine, you may find the node which has static dims, let's say it's node named "OC2_DUMMY_1", then you need to change its first dims like this.<br />
# Array is not writeable, need to copy it first.<br />
tensors["OC2_DUMMY_1"].values = np.array(tensors["OC2_DUMMY_1"].values)<br />
tensors["OC2_DUMMY_1"].values[0] = gs.Tensor.DYNAMIC<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Prune the model with a certain output layer ====<br />
Sometimes we need to prune the model for narrowing down some issue, we could use the below script to specifically set the output tensor as you want:<br />
import onnx_graphsurgeon as gs<br />
import numpy as np<br />
import sys<br />
import onnx<br />
<br />
# Cut the model to generate the model ended to the test node<br />
print("cut model: ", sys.argv[1], " to end with node ", sys.argv[2])<br />
graph = gs.import_onnx(onnx.load(sys.argv[1]))<br />
tensors = graph.tensors()<br />
graph.outputs = [tensors[str(sys.argv[2])].to_variable(dtype=np.float32)]<br />
<br />
# removing any unnecessary nodes or tensors, so that we are left with only the subgraph.<br />
graph.cleanup()<br />
<br />
om=str(sys.argv[1])<br />
new_onnx_model_name=om[:om.rfind(".")]+"_ended_node_"+sys.argv[2]+".onnx"<br />
<br />
onnx.save(gs.export_onnx(graph), new_onnx_model_name)<br />
<br />
==== 4. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below, it replace the "DCNv2" with the plugin named "DCNv2_SS":<br><br />
[[File:Ori_model.jpeg|400px|thumb|left| original graph]]<br><br />
[[File:Inserted_plugin_model.jpeg|400px|thumb|left|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=552896TensorRT/ONNX2021-06-24T04:01:07Z<p>Lynettez: /* Introduce some use cases of onnx-graphsurgeon */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Compatibility ===<br />
ONNX Operators: https://github.com/onnx/onnx-tensorrt/blob/master/docs/operators.md (e.g. TensorRT 7.2 supports operators up to Opset 11)<br><br />
cuDNN/TF/Pytorch/ONNX: "Compatibility" section in TensorRT release note - https://docs.nvidia.com/deeplearning/tensorrt/release-notes/index.html <br><br />
Protobuf: https://github.com/onnx/onnx-tensorrt#dependencies (e.g. Protobuf >= 3.0.x)<br />
<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model, refer to the below introduction of Onnx-GraghSurgeon<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingDimensions()<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes. You also could try with newer onnx opset during the converting of onnx model.<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
If you still got error during building the engine, you may find the node which has static dims, let's say it's node named "OC2_DUMMY_1", then you need to change its first dims like this.<br />
# Array is not writeable, need to copy it first.<br />
tensors["OC2_DUMMY_1"].values = np.array(tensors["OC2_DUMMY_1"].values)<br />
tensors["OC2_DUMMY_1"].values[0] = gs.Tensor.DYNAMIC<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below, it replace the "DCNv2" with the plugin named "DCNv2_SS":<br><br />
[[File:Ori_model.jpeg|400px|thumb|left| original graph]]<br><br />
[[File:Inserted_plugin_model.jpeg|400px|thumb|left|new graph]]<br />
<br />
==== 4. Prune the model with a certain output layer ====<br />
Sometimes we need to prune the model for narrowing down some issue, we could use the below script to specifically set the output tensor as you want:<br />
import onnx_graphsurgeon as gs<br />
import numpy as np<br />
import sys<br />
import onnx<br />
<br />
# Cut the model to generate the model ended to the test node<br />
print("cut model: ", sys.argv[1], " to end with node ", sys.argv[2])<br />
graph = gs.import_onnx(onnx.load(sys.argv[1]))<br />
tensors = graph.tensors()<br />
graph.outputs = [tensors[str(sys.argv[2])].to_variable(dtype=np.float32)]<br />
<br />
# removing any unnecessary nodes or tensors, so that we are left with only the subgraph.<br />
graph.cleanup()<br />
<br />
om=str(sys.argv[1])<br />
new_onnx_model_name=om[:om.rfind(".")]+"_ended_node_"+sys.argv[2]+".onnx"<br />
<br />
onnx.save(gs.export_onnx(graph), new_onnx_model_name)</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=542521TensorRT/ONNX2021-01-14T07:31:39Z<p>Lynettez: /* If you met some error during converting onnx to engine */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model, refer to the below introduction of Onnx-GraghSurgeon<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingDimensions()<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes. You also could try with newer onnx opset during the converting of onnx model.<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
If you still got error during building the engine, you may find the node which has static dims, let's say it's node named "OC2_DUMMY_1", then you need to change its first dims like this.<br />
# Array is not writeable, need to copy it first.<br />
tensors["OC2_DUMMY_1"].values = np.array(tensors["OC2_DUMMY_1"].values)<br />
tensors["OC2_DUMMY_1"].values[0] = gs.Tensor.DYNAMIC<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below, it replace the "DCNv2" with the plugin named "DCNv2_SS":<br><br />
[[File:Ori_model.jpeg|400px|thumb|left| original graph]]<br><br />
[[File:Inserted_plugin_model.jpeg|400px|thumb|left|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=542516TensorRT/ONNX2021-01-14T07:31:12Z<p>Lynettez: /* If you met some error during converting onnx to engine */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model, refer to the below introduction of Onnx-GraghSurgeon<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingDimensions()<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes.<br />
Or you could try with newer onnx opset during the converting of onnx model.<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
If you still got error during building the engine, you may find the node which has static dims, let's say it's node named "OC2_DUMMY_1", then you need to change its first dims like this.<br />
# Array is not writeable, need to copy it first.<br />
tensors["OC2_DUMMY_1"].values = np.array(tensors["OC2_DUMMY_1"].values)<br />
tensors["OC2_DUMMY_1"].values[0] = gs.Tensor.DYNAMIC<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below, it replace the "DCNv2" with the plugin named "DCNv2_SS":<br><br />
[[File:Ori_model.jpeg|400px|thumb|left| original graph]]<br><br />
[[File:Inserted_plugin_model.jpeg|400px|thumb|left|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=529671TensorRT/ONNX2020-09-25T02:33:48Z<p>Lynettez: /* How to use trtexec to run inference with dynamic shape? */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model, refer to the below introduction of Onnx-GraghSurgeon<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingDimensions()<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
If you still got error during building the engine, you may find the node which has static dims, let's say it's node named "OC2_DUMMY_1", then you need to change its first dims like this.<br />
# Array is not writeable, need to copy it first.<br />
tensors["OC2_DUMMY_1"].values = np.array(tensors["OC2_DUMMY_1"].values)<br />
tensors["OC2_DUMMY_1"].values[0] = gs.Tensor.DYNAMIC<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below, it replace the "DCNv2" with the plugin named "DCNv2_SS":<br><br />
[[File:Ori_model.jpeg|400px|thumb|left| original graph]]<br><br />
[[File:Inserted_plugin_model.jpeg|400px|thumb|left|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520516TensorRT/ONNX2020-07-03T02:29:55Z<p>Lynettez: /* 3. Add your Plugin */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model, refer to the below introduction of Onnx-GraghSurgeon<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
If you still got error during building the engine, you may find the node which has static dims, let's say it's node named "OC2_DUMMY_1", then you need to change its first dims like this.<br />
# Array is not writeable, need to copy it first.<br />
tensors["OC2_DUMMY_1"].values = np.array(tensors["OC2_DUMMY_1"].values)<br />
tensors["OC2_DUMMY_1"].values[0] = gs.Tensor.DYNAMIC<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below, it replace the "DCNv2" with the plugin named "DCNv2_SS":<br><br />
[[File:Ori_model.jpeg|400px|thumb|left| original graph]]<br><br />
[[File:Inserted_plugin_model.jpeg|400px|thumb|left|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520511TensorRT/ONNX2020-07-03T02:22:38Z<p>Lynettez: /* Introduce some use cases of onnx-graphsurgeon */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model, refer to the below introduction of Onnx-GraghSurgeon<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
If you still got error during building the engine, you may find the node which has static dims, let's say it's node named "OC2_DUMMY_1", then you need to change its first dims like this.<br />
# Array is not writeable, need to copy it first.<br />
tensors["OC2_DUMMY_1"].values = np.array(tensors["OC2_DUMMY_1"].values)<br />
tensors["OC2_DUMMY_1"].values[0] = gs.Tensor.DYNAMIC<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below:<br><br />
[[File:Ori_model.jpeg|400px|thumb|left| original graph]]<br><br />
[[File:Inserted_plugin_model.jpeg|400px|thumb|left|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520506TensorRT/ONNX2020-07-03T02:17:25Z<p>Lynettez: /* How to modify the model to replace batch dimension with dynamic dim? */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model, refer to the below introduction of Onnx-GraghSurgeon<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below:<br><br />
[[File:Ori_model.jpeg|400px|thumb|left| original graph]]<br><br />
[[File:Inserted_plugin_model.jpeg|400px|thumb|left|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520501TensorRT/ONNX2020-07-03T02:09:43Z<p>Lynettez: /* 3. Add your Plugin */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below:<br><br />
[[File:Ori_model.jpeg|400px|thumb|left| original graph]]<br><br />
[[File:Inserted_plugin_model.jpeg|400px|thumb|left|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520496TensorRT/ONNX2020-07-03T02:09:12Z<p>Lynettez: /* 3. Add your Plugin */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below:<br><br />
Original graph: <br><br />
[[File:Ori_model.jpeg|400px|thumb|left| original graph]]<br><br />
Graph with plugin(DCNv2_SS):<br><br />
[[File:Inserted_plugin_model.jpeg|400px|thumb|left|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520491TensorRT/ONNX2020-07-03T02:08:32Z<p>Lynettez: /* 3. Add your Plugin */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below:<br><br />
Original graph: <br><br />
[[File:Ori_model.jpeg|200px|thumb|left]]<br><br />
Graph with plugin(DCNv2_SS):<br><br />
[[File:Inserted_plugin_model.jpeg|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520486TensorRT/ONNX2020-07-03T02:07:24Z<p>Lynettez: /* 3. Add your Plugin */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below:<br><br />
Original graph: <br><br />
[[File:Ori_model.jpeg]]<br />
[[images/c/cd/Ori_model.jpeg|origin graph]]<br><br />
Graph with plugin(DCNv2_SS):<br><br />
[[images/a/aa/Inserted_plugin_model.jpeg|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520481TensorRT/ONNX2020-07-03T02:07:08Z<p>Lynettez: /* 3. Add your Plugin */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below:<br><br />
Original graph: <br><br />
[[File:images/c/cd/Ori_model.jpeg]]<br />
[[images/c/cd/Ori_model.jpeg|origin graph]]<br><br />
Graph with plugin(DCNv2_SS):<br><br />
[[images/a/aa/Inserted_plugin_model.jpeg|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520476TensorRT/ONNX2020-07-03T02:05:27Z<p>Lynettez: </p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
----<br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below:<br><br />
Original graph: <br><br />
[[images/c/cd/Ori_model.jpeg|origin graph]]<br><br />
Graph with plugin(DCNv2_SS):<br><br />
[[images/a/aa/Inserted_plugin_model.jpeg|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520471TensorRT/ONNX2020-07-03T02:05:00Z<p>Lynettez: /* 3. Add your Plugin */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
--- <br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. <br><br />
Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br><br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br><br />
We can see what the sample did as below:<br><br />
Original graph: <br><br />
[[images/c/cd/Ori_model.jpeg|origin graph]]<br><br />
Graph with plugin(DCNv2_SS):<br><br />
[[images/a/aa/Inserted_plugin_model.jpeg|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520466TensorRT/ONNX2020-07-03T02:03:59Z<p>Lynettez: /* 3. Add your Plugin */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
--- <br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. Download this [https://elinux.org/images/e/e6/Insert_dcn_plugin.zip sample].<br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br />
We can see what the sample did as below:<br />
Original graph: [[images/c/cd/Ori_model.jpeg|origin graph]]<br />
Graph with plugin(DCNv2_SS): [[images/a/aa/Inserted_plugin_model.jpeg|new graph]]</div>Lynettezhttps://elinux.org/index.php?title=File:Ori_model.jpeg&diff=520461File:Ori model.jpeg2020-07-03T02:03:37Z<p>Lynettez: </p>
<hr />
<div></div>Lynettezhttps://elinux.org/index.php?title=File:Inserted_plugin_model.jpeg&diff=520456File:Inserted plugin model.jpeg2020-07-03T02:01:24Z<p>Lynettez: </p>
<hr />
<div></div>Lynettezhttps://elinux.org/index.php?title=File:Insert_dcn_plugin.zip&diff=520451File:Insert dcn plugin.zip2020-07-03T01:54:18Z<p>Lynettez: </p>
<hr />
<div></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=520446TensorRT/ONNX2020-07-03T01:48:37Z<p>Lynettez: /* Some performance tests about dynamic shape with onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br><br />
<br />
--- <br />
=== Introduce some use cases of onnx-graphsurgeon ===<br />
ONNX GraphSurgeon(Onnx-GS) is a tool that allows you to easily generate new ONNX graphs, or modify existing ones. It was released with [https://github.com/NVIDIA/TensorRT/tree/master/tools/onnx-graphsurgeon TensorRT OSS]. You may follow the readme to install it.<br />
This section will introduce some use cases modifying the onnx model using Onnx-GS.<br />
==== 1. Make dynamic ====<br />
Since TRT 6.0 released, it only support explicit batch onnx model. So some users have to make the model input dynamic, below sample is showing how to use onnx-GS to modify the input without regenerating the onnx model in the training framework.<br />
<br />
Here is the sample to make the model input dynamic:<br />
import onnx_graphsurgeon as gs<br />
import onnx<br />
graph1 = gs.import_onnx(onnx.load("resnet10.onnx"))<br />
tensors = graph1.tensors()<br />
tensors["input"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_cov_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
tensors["Layer7_bbox_Y"].shape[0] = gs.Tensor.DYNAMIC<br />
onnx.save(gs.export_onnx(graph1.cleanup().toposort()), "resnet10_dynamic.onnx")<br />
<br />
==== 2. Change node's name ====<br />
Here is the sample changing input and output names.<br />
tensors["input"].name = "data"<br />
tensors["Layer7_cov_Y"].name = "Layer7_cov"<br />
tensors["Layer7_bbox_Y"].name = "Layer7_bbox"<br />
<br />
==== 3. Add your Plugin ====<br />
Onnx-GS also can be used for modifying the model with the custom plugin. Here is the sample:<br />
<br />
Note that this sample only shows how to apply the layer params to the plugin layer and replace the original node with the plugin node. <br />
We can see what the sample did as below:</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513386TensorRT/ONNX2020-04-16T10:41:50Z<p>Lynettez: /* If you met some error during converting onnx to engine */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
Tips:<br />
If you’re converting tf model to onnx, you might have a try with:<br />
onnx_graph = tf2onnx.optimizer.optimize_graph(onnx_graph)<br />
This will help to avoid some converting error<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513166TensorRT/ONNX2020-04-15T02:41:52Z<p>Lynettez: /* TRT Inference with explicit batch onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. '''Fixed shape model'''<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. '''Dynamic shape model'''<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513161TensorRT/ONNX2020-04-15T02:40:57Z<p>Lynettez: /* Some performance tests about dynamic shape with onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! Inference Batch !! Execution time(ms)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || 1 || 1.01<br />
|-<br />
| rowspan="3"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:8x3x224x224 --maxShapes=data:16x3x224x224 || 1 || 1.36<br />
|-<br />
| 8 || 4.47<br />
|-<br />
| 16 || 8.76<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 1 || 1.44<br />
|-<br />
| 8 || 4.56<br />
|-<br />
| 16 || 8.23<br />
|-<br />
| 32 || 16.21<br />
|}<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513156TensorRT/ONNX2020-04-15T02:37:38Z<p>Lynettez: /* Some performance tests about dynamic shape with onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="4"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513151TensorRT/ONNX2020-04-15T02:37:06Z<p>Lynettez: /* Some performance tests about dynamic shape with onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| rowspan="2"| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| || --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| || --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| || --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513146TensorRT/ONNX2020-04-15T02:31:44Z<p>Lynettez: /* Some performance tests about dynamic shape with onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| || --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| || --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| || --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513141TensorRT/ONNX2020-04-15T02:27:18Z<p>Lynettez: /* Some performance tests about dynamic shape with onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br><br />
GPU: T4<br><br />
TensorRT: 7.0 <br><br />
CUDA: 10.2<br><br />
<br />
{| class="wikitable"<br />
|-<br />
! MobilenetV2 !! OptimizationProfile !! Engine size(bit)<br />
|-<br />
| Fixed shape [1, 3, 224 ,224] || - || 14487087<br />
|-<br />
| Dynamic shape [-1, 3, 224, 224] || Not setting, default to [1, 3, 224 ,224] || 14487537<br />
|-<br />
| Dynamic shape [-1, 3, 224, 224] || --minShapes=data:1x3x224x224 --optShapes=data:5x3x224x224 --maxShapes=data:10x3x224x224 || 14595945<br />
|-<br />
| Dynamic shape [-1, 3, 224, 224] || --minShapes=data:10x3x224x224 --optShapes=data:15x3x224x224 --maxShapes=data:20x3x224x224 || 14601941<br />
|-<br />
| Dynamic shape [-1, 3, 224, 224] || --minShapes=data:1x3x224x224 --optShapes=data:16x3x224x224 --maxShapes=data:32x3x224x224 || 14604501<br />
|}<br />
<br />
<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513136TensorRT/ONNX2020-04-15T02:22:31Z<p>Lynettez: /* If you met some error during converting onnx to engine */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br><br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
----<br />
<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br />
GPU: T4<br />
TensorRT: 7.0 <br />
CUDA: 10.2<br />
<br />
<br />
<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513131TensorRT/ONNX2020-04-15T02:22:03Z<p>Lynettez: /* How to modify the model to replace batch dimension with dynamic dim? */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size <br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
----<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br />
GPU: T4<br />
TensorRT: 7.0 <br />
CUDA: 10.2<br />
<br />
<br />
<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513126TensorRT/ONNX2020-04-15T02:21:20Z<p>Lynettez: /* How to convert your model to onnx? */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br />
<br />
----<br />
=== How to modify the model to replace batch dimension with dynamic dim? ===<br />
If you’re not willing to regenerate the onnx model, you can just set input dimension after parsing.<br />
auto input = network->getInput(0);<br />
input->setDimensions(Dims4{-1, 3, 224, 224});<br />
Or using onnx API:<br />
import onnx<br />
<br />
model = onnx.load('alexnet_fixed.onnx')<br />
model.graph.input[0].type.tensor_type.shape.dim[0].dim_param = '?'<br />
onnx.save(model, 'dynamic_alexnet.onnx')<br />
onnx.checker.check_model(model)<br />
<br />
If you still got below error after you modified the dimensions, that means you have to regenerate your onnx model with dynamic input shape. This error happened when you use dynamic shape for some model with static tensor, TRT 7.1 will support this case so that you don’t regenerate your model with the new version TRT.<br><br />
INTERNAL_ERROR: Assertion failed: mg.nodes[mg.regionIndices[outputRegion]].size == mg.nodes[mg.regionIndices[inputRegion]].size<br />
../builder/cudnnBuilderBlockChooser.cpp:136<br />
Aborting…<br />
<br />
----<br />
=== How to use trtexec to run inference with dynamic shape? ===<br />
trtexec --explicitBatch --onnx=mobilenet_dynamic.onnx \<br />
--minShapes=data:1x3x224x224 \ # kMIN shape<br />
--optShapes=data:3x3x224x224 \ # kOPT shape<br />
--maxShapes=data:5x3x224x224 \ # kMAX shape<br />
--shapes=data:3x3x224x224 \ # Inference shape - this is like context->setBindingShape(3,3,224,224)<br />
--saveEngine=mobilenet_dynamic.engine<br />
<br />
If you have onnx exported from TF with input “x:0”, you also could run with<br><br />
trtexec … --shapes=\'x:0\':5x3x224x224 …<br />
<br />
----<br />
=== How to convert onnx model to a tensorrt engine? ===<br />
Use OnnxParser to parse the onnx model, and then build engine as usual, if you’re not familiar with onnxParser and building engine, please refer to https://github.com/NVIDIA/TensorRT/blob/master/samples/opensource/sampleOnnxMNIST/sampleOnnxMNIST.cpp<br><br />
<br />
You also could use trtexec to do the same thing with below cmd:<br />
trtexec --explicitBatch --onnx=your_model.onnx<br />
<br />
----<br />
=== If you met some error during converting onnx to engine ===<br />
If you met some error during parsing, please add “--verbose” into trtexec cmd line to see if there is anything wrong with parsing some node, and check below two things:<br />
1. Check ONNX model using checker function and see if it passes?<br />
import onnx<br />
model = onnx.load("model.onnx")<br />
onnx.checker.check_model(model)<br />
2. If (1) passes, maybe try onnx-simplifier on it. https://github.com/daquexian/onnx-simplifier<br><br />
3. If (2) doesn’t work, there could be some unsupported operator casung the parsing error. Please check the OnnxParser supported operators list here, https://github.com/onnx/onnx-tensorrt/blob/84b5be1d6fc03564f2c0dba85a2ee75bad242c2e/operators.md. Or you need to see if anything looks off in Netron when viewing the failing nodes<br />
<br />
----<br />
=== Some performance tests about dynamic shape with onnx model ===<br />
Test environment<br />
GPU: T4<br />
TensorRT: 7.0 <br />
CUDA: 10.2<br />
<br />
<br />
<br />
As the test results showed,<br><br />
1. The engine size will increase if it is built with dynamic shape and OptimizationProfile, the bigger shape you set, the bigger engine size. but it will not increase much.<br><br />
2. The performance will be the best when the inference shape is the same as the optShape you set.<br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513116TensorRT/ONNX2020-04-15T01:44:03Z<p>Lynettez: /* How to convert your model to onnx? */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
=== How to convert your model to onnx? ===<br />
1. Convert Pytorch model, you can use torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br><br />
2. Convert Tensorflow model, using tf2onnx tool: <br><br />
https://github.com/onnx/tensorflow-onnx<br><br />
3. Convert Caffe model, using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br><br />
<br />
----</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513111TensorRT/ONNX2020-04-15T01:41:09Z<p>Lynettez: </p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
----<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
----<br />
=== How to convert your model to onnx? ===<br />
Convert Pytorch model:<br><br />
Using torch.onnx API, sample codes:<br><br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br><br />
Convert Tensorflow model: <br><br />
Using tf2onnx tool:<br><br />
https://github.com/onnx/tensorflow-onnx<br><br />
Convert Caffe model:<br><br />
Using caffe2onnx tool, it supports less operators than others:<br><br />
https://github.com/htshinichi/caffe-onnx<br><br />
<br />
----</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513106TensorRT/ONNX2020-04-15T01:40:02Z<p>Lynettez: </p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
---<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br><br />
<br />
---<br />
=== How to convert your model to onnx? ===<br />
Convert Pytorch model:<br />
Using torch.onnx API, sample codes:<br />
https://gist.github.com/rmccorm4/b72abac18aed6be4c1725db18eba4930<br />
Convert Tensorflow model: <br />
Using tf2onnx tool:<br />
https://github.com/onnx/tensorflow-onnx<br />
Convert Caffe model:<br />
Using caffe2onnx tool, it supports less operators than others:<br />
https://github.com/htshinichi/caffe-onnx</div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513101TensorRT/ONNX2020-04-15T01:36:43Z<p>Lynettez: /* TRT Inference with explicit batch onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and TRT use "without batch size instead", it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513096TensorRT/ONNX2020-04-15T01:34:31Z<p>Lynettez: /* TRT Inference with explicit batch onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.<br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br />
<br />
as the first log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and as the second log said, TRT use without batch size instead, it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513091TensorRT/ONNX2020-04-15T01:34:04Z<p>Lynettez: /* TRT Inference with explicit batch onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br><br />
<br />
as the first log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and as the second log said, TRT use without batch size instead, it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513086TensorRT/ONNX2020-04-15T01:33:30Z<p>Lynettez: /* TRT Inference with explicit batch onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br><br />
<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br><br />
<br />
as the first log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and as the second log said, TRT use without batch size instead, it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br><br />
<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513081TensorRT/ONNX2020-04-15T01:33:03Z<p>Lynettez: /* TRT Inference with explicit batch onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
1. Fixed shape model<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br><br />
<br />
as the first log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and as the second log said, TRT use without batch size instead, it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
2. Dynamic shape model<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513076TensorRT/ONNX2020-04-15T01:32:18Z<p>Lynettez: /* TRT Inference with explicit batch onnx model */</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
# Fixed shape model<br />
===== === If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br><br />
<br />
as the first log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
=== =====<br />
and as the second log said, TRT use without batch size instead, it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
# Dynamic shape model<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT/ONNX&diff=513071TensorRT/ONNX2020-04-15T01:30:21Z<p>Lynettez: Created page with "'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model''' === TRT I..."</p>
<hr />
<div>'''This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model'''<br />
<br />
=== TRT Inference with explicit batch onnx model ===<br />
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.<br><br />
<br />
# Fixed shape model<br />
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.<br><br />
<br />
If you got below warning log when you’re trying to do inference with onnx model.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.<br><br />
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.<br><br />
<br />
as the first log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126 <br><br />
<br />
and as the second log said, TRT use without batch size instead, it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network. <br><br />
<br />
# Dynamic shape model<br />
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.<br><br />
<br />
See a sample here:(https://github.com/lynettez/SampleONNX) <br></div>Lynettezhttps://elinux.org/index.php?title=TensorRT&diff=513066TensorRT2020-04-15T01:11:34Z<p>Lynettez: /* TRT ONNXParser FAQ */</p>
<hr />
<div>NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. TensorRT-based applications perform up to 40x faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and finally deploy to hyperscale data centers, embedded, or automotive product platforms.<br />
<br><br />
<br />
== Introduction ==<br />
<br />
<br />
[https://developer.nvidia.com/tensorrt TensorRT Download]<br><br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html TensorRT Developer Guide]<br />
<br><br />
<br />
== FAQ ==<br />
<br />
<br />
=== Official FAQ ===<br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#troubleshooting TensorRT Developer Guide#FAQs]<br><br />
<br />
<br />
----<br />
=== Common FAQ ===<br />
You can find answers here for some common questions about using TRT.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonFAQ TensorRT/CommonFAQ]<br><br />
<br />
<br />
----<br />
=== TRT Accuracy FAQ ===<br />
If your FP16 result or Int8 result is not as expected, below page may help you fix the accuracy issues.<br><br />
Refer to the page [https://elinux.org/TensorRT/AccuracyIssues TensorRT/AccuracyIssues]<br><br />
<br />
<br />
----<br />
=== TRT Performance FAQ ===<br />
If the performance of doing inference with TRT is not as expected, below page may help you to optimize the performance.<br><br />
Refer to the page [https://elinux.org/TensorRT/PerfIssues TensorRT/PerfIssues]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Int8 Calibration FAQ ===<br />
Below page will present some FAQs about TRT Int8 Calibration.<br><br />
Refer to the page [https://elinux.org/TensorRT/Int8CFAQ TensorRT/Int8CFAQ]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Plugin FAQ ===<br />
Below page will present some FAQs about TRT Plugin.<br><br />
Refer to the page [https://elinux.org/TensorRT/PluginFAQ TensorRT/PluginFAQ]<br><br />
<br />
<br />
----<br />
=== How to fix some Common Errors ===<br />
If you met some Errors during using TRT, please find from below page for the answer.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonErrorFix TensorRT/CommonErrorFix]<br><br />
<br />
<br />
----<br />
=== How to debug or analyze === <br />
Below page will help you debugging your inferencing in some ways.<br><br />
Refer to the page [https://elinux.org/TensorRT/How2Debug TensorRT/How2Debug]<br><br />
<br />
<br />
----<br />
<br />
=== TRT & YoloV3 FAQ ===<br />
Refer to the page [https://elinux.org/TensorRT/YoloV3 TensorRT/YoloV3]<br><br />
<br />
<br />
----<br />
=== TRT ONNXParser FAQ ===<br />
If you have some question about onnx dynamic shape and onnx Parsing issues, this page might be helpful.<br><br />
Refer to the page [https://elinux.org/TensorRT/ONNX TensorRT/ONNX]<br><br />
<br />
<br />
----</div>Lynettezhttps://elinux.org/index.php?title=TensorRT&diff=513061TensorRT2020-04-15T01:11:09Z<p>Lynettez: /* TRT ONNXParser FAQ */</p>
<hr />
<div>NVIDIA TensorRT™ is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. TensorRT-based applications perform up to 40x faster than CPU-only platforms during inference. With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and finally deploy to hyperscale data centers, embedded, or automotive product platforms.<br />
<br><br />
<br />
== Introduction ==<br />
<br />
<br />
[https://developer.nvidia.com/tensorrt TensorRT Download]<br><br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html TensorRT Developer Guide]<br />
<br><br />
<br />
== FAQ ==<br />
<br />
<br />
=== Official FAQ ===<br />
[https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html#troubleshooting TensorRT Developer Guide#FAQs]<br><br />
<br />
<br />
----<br />
=== Common FAQ ===<br />
You can find answers here for some common questions about using TRT.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonFAQ TensorRT/CommonFAQ]<br><br />
<br />
<br />
----<br />
=== TRT Accuracy FAQ ===<br />
If your FP16 result or Int8 result is not as expected, below page may help you fix the accuracy issues.<br><br />
Refer to the page [https://elinux.org/TensorRT/AccuracyIssues TensorRT/AccuracyIssues]<br><br />
<br />
<br />
----<br />
=== TRT Performance FAQ ===<br />
If the performance of doing inference with TRT is not as expected, below page may help you to optimize the performance.<br><br />
Refer to the page [https://elinux.org/TensorRT/PerfIssues TensorRT/PerfIssues]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Int8 Calibration FAQ ===<br />
Below page will present some FAQs about TRT Int8 Calibration.<br><br />
Refer to the page [https://elinux.org/TensorRT/Int8CFAQ TensorRT/Int8CFAQ]<br><br />
<br />
<br />
----<br />
<br />
=== TRT Plugin FAQ ===<br />
Below page will present some FAQs about TRT Plugin.<br><br />
Refer to the page [https://elinux.org/TensorRT/PluginFAQ TensorRT/PluginFAQ]<br><br />
<br />
<br />
----<br />
=== How to fix some Common Errors ===<br />
If you met some Errors during using TRT, please find from below page for the answer.<br><br />
Refer to the page [https://elinux.org/TensorRT/CommonErrorFix TensorRT/CommonErrorFix]<br><br />
<br />
<br />
----<br />
=== How to debug or analyze === <br />
Below page will help you debugging your inferencing in some ways.<br><br />
Refer to the page [https://elinux.org/TensorRT/How2Debug TensorRT/How2Debug]<br><br />
<br />
<br />
----<br />
<br />
=== TRT & YoloV3 FAQ ===<br />
Refer to the page [https://elinux.org/TensorRT/YoloV3 TensorRT/YoloV3]<br><br />
<br />
<br />
----<br />
=== TRT ONNXParser FAQ ===<br />
If you have some question about onnx dynamic shape and onnx Parsing issues, this page might be helpful<br />
Refer to the page [https://elinux.org/TensorRT/ONNX TensorRT/ONNX]<br><br />
<br />
<br />
----</div>Lynettez