Difference between revisions of "TensorRT/CommonErrorFix"
Line 15: | Line 15: | ||
ctx.pop() | ctx.pop() | ||
del ctx | del ctx | ||
− | + | ---- | |
===== <big> Why there is ''(Unnamed Layer* <N>)'' appearing in calibration table or verbose log?</big> ===== | ===== <big> Why there is ''(Unnamed Layer* <N>)'' appearing in calibration table or verbose log?</big> ===== | ||
Revision as of 01:41, 31 July 2019
How to fix the error “Could not find scales for tensor xxxx” for INT8 mode?
Generally, after INT8 calibration is done, Int8Calibrator will save the scaling factors into a local file (through API writeCalibrationCache), so that it wouldn’t need to do calibration again for subsequent running and load the cached calibration table directly (through API readCalibrationCache).
If you change the network or update the network or run the network among different GPU platforms or different TensorRT versions, then you may probably get the error “Could not find scales for tensor xxxx”, that indicates builder couldn’t find corresponding scaling factor from local cached calibration table. It’s intended since the network graph after fusion would change among different GPU platform or different TensorRT version or modification to network itself. The solution is very simple that removes the local calibration table and does calibration again.
How to fix "LogicError: explicit_context_dependent failed" during running TRT Python in multi-thread?
If you are using the common.py of TRT/sample to do inference with multi-thread, and getting below error, this FAQ will help you to fix that.
"pycuda._driver.LogicError: explicit_context_dependent failed: invalid device context - no currently active context?"
Refer to "PyCuda FAQ: How does PyCUDA handle threading", this error is caused by the missing active context in work thread.
Please make context as below before trigger the GPU task that reported the error:
dev = cuda.Device(0) // 0 is your GPU number ctx = dev.make_context()
and cleans up after the GPU task using:
ctx.pop() del ctx
Why there is (Unnamed Layer* <N>) appearing in calibration table or verbose log?
For example, here is the calibration table for mnist,
TRT-5105-EntropyCalibration2 data: 3c000889 conv1: 3c8954be pool1: 3c8954be conv2: 3dd33169 pool2: 3dd33169 (Unnamed Layer* 4) [Fully Connected]_output: 3dcbd455 ip1: 3daeff02 ip2: 3e7d50e9 prob: 3c010a14
(Unnamed Layer* 4) [Fully Connected]_output actually denotes ip1 layer and the next ip1 denotes relu1 layer.
How does it happen?
It's because we use top attribute to name layer or tensor name, but ip1 and relu1 in mnist.prototxt share the same top name ip1. Hence there must be either of them to use system assigned name.
layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { lr_mult: 1.0 } param { lr_mult: 2.0 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1.0 }
This wouldn't have any impact on the execution behavior of network.
And it can be avoided through removing all in-place nodes, for mnist, updating the top attribute of relu1 with its layer name relu1, other than ip1 (and updating bottom attribute of ip2 with 'relu1' accordingly).