Difference between revisions of "Jetson nsight system"

From eLinux.org
Jump to: navigation, search
(Profile Application)
(Profile Application)
Line 52: Line 52:
  
 
Import QDSTRM file into GUI Nsys:<br>
 
Import QDSTRM file into GUI Nsys:<br>
1. TensorRT and DLA Inference Time
+
'''1. TensorRT and DLA Inference Time'''
<gallery  mode=nolines>
+
<gallery  mode=nolines widths="1000px" heights="520px">
 
Inference_Time1.png
 
Inference_Time1.png
 +
</gallery>
 +
'''2. CUDA Kernel Run Time'''
 +
<gallery  mode=nolines widths="1500px" heights="400px">
 +
CUDA_kernel_time_time.png
 
</gallery>
 
</gallery>

Revision as of 00:14, 7 January 2021

This page describes how to use Nsight System on Jetson L4T system

Installation

Install NS on x86 Linux Host

1. Install Nsight System via SDKManager

Just click Continue to install Nsight System on x86 Linux System.

2. Verify Installation
After installation is done, you can open it with "nsight-sys" command as below.

Install NS on Jetson Device

Note: on the newly installed Jetson system, there is not Nsight System on it.
1. Installation Steps

  1. On x86, launch the Nsight system installed via SDKManager as described above
  2. In Nsight System, create a new project, and connect to Jetson device as below, this step will install the Nsight target binaries ontO Jetson device

2. Verify Installation
After installation, nsys locates under /opt/nvidia/nsight_systems/ .

Profile

Remote Profile (UI)

User can run Nsight System on Host and remotely profile the application running on Jetson. User can select several options to enable the corresponding proiling.

Local Profile

User can check the profile option

$ /opt/nvidia/nsight_systems/nsys profile --help

Profile Application

Run an application and capture its profile into a QDSTRM file, and view it in Nsys GUI profiler

One exmaple to profile TRT inference:
$ sudo /opt/nvidia/nsight_systems/nsys profile -t cuda,nvtx,nvmedia,osrt --accelerator-trace=nvmedia --show-output=true --force-overwrite=true --delay=20 --duration=90 --output=%p /usr/src/tensorrt/bin/trtexec --loadEngine=yolo_dla_0_bs20.engine --useDLACore=0 --batch=20
Options: --accelerator-trace=nvmedia : enable profile DLA --delay : start profiling after 20 seconds --duration : profile time

Import QDSTRM file into GUI Nsys:
1. TensorRT and DLA Inference Time

2. CUDA Kernel Run Time