This page intends to share some guidance regarding how to do inference with onnx model, how to convert onnx model and some common FAQ about parsing onnx model
TRT Inference with explicit batch onnx model
Since TensorRT 6.0 released and the ONNX parser only supports networks with an explicit batch dimension, this part will introduce how to do inference with onnx model, which has a fixed shape or dynamic shape.
- Fixed shape model
If your explicit batch network has fixed shape(N, C, H, W >= 1), then you should be able to just specific explicit batch flag and use executeV2() similar to how you used execute() in previous TensorRT versions.
If you got below warning log when you’re trying to do inference with onnx model.
[W] [TRT] Explicit batch network detected and batch size specified, use enqueue without batch size instead.
[W] [TRT] Explicit batch network detected and batch size specified, use execute without batch size instead.
as the first log said, your onnx model is explicit batch network, you need to specific the EXPLICIT_BATCH flag like this line: https://github.com/NVIDIA/TensorRT/blob/f5e8b8c55060447c4d3a45e7379bc25374edc93f/samples/opensource/sampleDynamicReshape/sampleDynamicReshape.cpp#L126
and as the second log said, TRT use without batch size instead, it means that setting batch size with TRT API will be ignored, TRT will always execute inference with the explicit batch size of the network.
- Dynamic shape model
If your explicit batch network has dynamic shape(one of the dims == -1), then you should create an optimization profile for it. Then you set this optimization profile for your execution context. But also before doing inference, you’ll need to specify the shape at inference time based on the input.
See a sample here:(https://github.com/lynettez/SampleONNX)