Difference between revisions of "Jetson/Computer Vision Performance"

From eLinux.org
Jump to: navigation, search
(Added the OpenCV4Tegra presentation and made some minor touchups)
(Created the computer vision power draw table)
Line 17: Line 17:
 
# '''OpenCV - Accelerated Computer Vision using GPUs''' (June 2013) gives a non-technical overview of OpenCV and the GPU module, showing what is available and why you would want to use it. [http://on-demand.gputechconf.com/gtc/2013/webinar/opencv.mp4 Video] and [http://on-demand.gputechconf.com/gtc/2013/webinar/opencv-gtc-express-shalini-gupta.pdf Slides].
 
# '''OpenCV - Accelerated Computer Vision using GPUs''' (June 2013) gives a non-technical overview of OpenCV and the GPU module, showing what is available and why you would want to use it. [http://on-demand.gputechconf.com/gtc/2013/webinar/opencv.mp4 Video] and [http://on-demand.gputechconf.com/gtc/2013/webinar/opencv-gtc-express-shalini-gupta.pdf Slides].
 
# '''Getting Started with GPU-accelerated Computer Vision using OpenCV and CUDA''' (July 2013) is more technical, it shows how you can install OpenCV's GPU module, shows the memory model of the GPU module, and how to combine OpenCV's GPU module with your own custom CUDA kernels. [http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-opencv-baksheev.mp4 Video] and [http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-itseez-opencv-webinar.pdf Slides].
 
# '''Getting Started with GPU-accelerated Computer Vision using OpenCV and CUDA''' (July 2013) is more technical, it shows how you can install OpenCV's GPU module, shows the memory model of the GPU module, and how to combine OpenCV's GPU module with your own custom CUDA kernels. [http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-opencv-baksheev.mp4 Video] and [http://on-demand.gputechconf.com/gtc/2013/webinar/gtc-express-itseez-opencv-webinar.pdf Slides].
 +
 +
 +
== Power draw during computer vision tasks ==
 +
The page [[Tegra/JetsonTK1_Power#Typical_power_draw_of_Jetson_TK1|Typical power draw of Jetson TK1]] shows that the total power draw for [[Jetson TK1]] is around 1.6W when idle (including the 0.4W fan) and is typically under 4W even when in moderate use. However, computer vision tasks are often able to push hardware to their limits, so this following section gives detailed power measurements for various computer vision programs.
 +
 +
This table covers various OpenCV sample CPU programs (in the "opencv-2.4.9/samples/cpp" folder), OpenCV sample GPU programs (in the "opencv-2.4.9/samples/gpu" folder), some VisionWorks sample CPU+GPU programs (freely available from NVIDIA), and some CUDA sample computer vision programs (in the "NVIDIA_CUDA6.0_Samples/3_Imaging" folder).
 +
 +
These are total power measurements for the whole [[Jetson TK1]] board when running from 12V with the default fan attached (using 0.4W) without any customizations or power savings applied. The OpenCV samples were executed remotely through ethernet (where the [[Jetson TK1]] was using 1.6W in between each of the OpenCV tests), while the VisionWorks and CUDA samples required a GPU-accelerated display and thus were executed through the Ubuntu Unity graphical desktop with a HDMI monitor and USB hub + keyboard & mouse attached (hence the [[Jetson TK1]] was using 3.4W in between each of the VisionWorks & CUDA tests).
 +
 +
{| class="wikitable sortable" style="text-align:center"
 +
|-
 +
! Sample code !! Library !! Processor !! Approximate power (Watts) for whole [[Jetson TK1]] board !! Performance
 +
|-
 +
| camshiftdemo || OpenCV || {{Table-CPU}} || {{Level|10|3.5}} ||
 +
|-
 +
| kalman || OpenCV || {{Table-CPU}} || {{Level|10|2.0}} ||
 +
|-
 +
| letter_recog || OpenCV || {{Table-CPU}} || {{Level|10|4.9}} ||
 +
|-
 +
| meanshift_segmentation || OpenCV || {{Table-CPU}} || {{Level|10|4.7}} ||
 +
|-
 +
| peopledetect || OpenCV || {{Table-CPU}} || {{Level|10|4.9}} ||
 +
|-
 +
| segment_objects || OpenCV || {{Table-CPU}} || {{Level|10|2.5}} ||
 +
|-
 +
| videostab || OpenCV || {{Table-CPU}} || {{Level|10|4.9}} ||
 +
|-
 +
| pyrlk_optical_flow || OpenCV || {{Table-GPU}} || {{Level|10|11.0}} ||
 +
|-
 +
| brox_optical_flow || OpenCV || {{Table-GPU}} || {{Level|10|11.5}} ||
 +
|-
 +
| bgfg_segm || OpenCV || {{Table-CPU}} || {{Level|10|2.8}} || ~7 FPS (MOG2 algorithm)
 +
|-
 +
| bgfg_segm || OpenCV || {{Table-GPU}} || {{Level|10|2.4}} || ~34 FPS (MOG2 algorithm)
 +
|-
 +
| hog || OpenCV || {{Table-CPU}} || {{Level|10|4.6}} || ~1.1 FPS
 +
|-
 +
| hog || OpenCV || {{Table-GPU}} || {{Level|10|4.7}} || ~5.0 FPS
 +
|-
 +
| farneback_optical_flow || OpenCV || {{Table-CPU}} || {{Level|10|5.0}} || ~0.24 FPS
 +
|-
 +
| farneback_optical_flow || OpenCV || {{Table-GPU}} || {{Level|10|10.8}} || ~0.46 FPS
 +
|-
 +
| stereo_match || OpenCV || {{Table-CPU}} || {{Level|10|2.4}} || ~3 FPS (BM algorithm)
 +
|-
 +
| stereo_match || OpenCV || {{Table-GPU}} || {{Level|10|3.4}} || ~24 FPS (BM algorithm)
 +
|-
 +
| feature_tracker || VisionWorks || {{Table-GPU}} || {{Level|10|6}} || ~40 FPS @ 720p (performing Color Conversion, Gaussian Pyramid, Pyramidal Optical Flow, plus Feature Tracking)
 +
|-
 +
| hough_lines || VisionWorks || {{Table-GPU}} || {{Level|10|7}} || ~30 FPS @ 720p (performing Color Conversion, Canny, plus Probabilistic Hough Lines)
 +
|-
 +
| motion_estimation || VisionWorks || {{Table-GPU}} || {{Level|10|6}} || ~20 FPS @ 720p (performing Color Conversion, plus IME Motion Estimation)
 +
|-
 +
| object_detector || VisionWorks || {{Table-GPU}} || {{Level|10|7}} || ~5 FPS @ 720p (performing Color Conversion, plus HOG Pedestrian Detection)
 +
|-
 +
| object_tracker || VisionWorks || {{Table-GPU}} || {{Level|10|6}} || ~60 FPS @ 720p (performing Color Conversion, Gaussian Pyramid, Forward Optical Flow, Backward Optical Flow, plus Median Flow)
 +
|-
 +
| pedestrian detector || VisionWorks || {{Table-GPU}} || {{Level|10|10}} || ~2 FPS @ 720p (performing Soft Cascade Classifier)
 +
|-
 +
| car detector || VisionWorks || {{Table-GPU}} || {{Level|10|10}} || ~5 FPS @ 720p (performing Soft Cascade Classifier)
 +
|-
 +
| SLAM || VisionWorks || {{Table-GPU}} || {{Level|10|7}} || ~25 FPS @ 480p (performing SLAM)
 +
|-
 +
| bilateralFilter || CUDA || {{Table-GPU}} || {{Level|10|11.4}} || ~34 FPS @ 640x480
 +
|-
 +
| boxFilter || CUDA || {{Table-GPU}} || {{Level|10|7.0}} || ~23 FPS @ 1024x1024
 +
|-
 +
| imageDenoising || CUDA || {{Table-GPU}} || {{Level|10|6.0}} || ~150 FPS @ 320x408
 +
|-
 +
| SobelFilter || CUDA || {{Table-GPU}} || {{Level|10|4.6}} || ~150 FPS @ 512x512
 +
|}

Revision as of 21:53, 13 June 2014

Hardware Acceleration of OpenCV

OpenCV is the de-facto standard Computer Vision library containg more than 2500 computer vision & image processing & machine learning algorithms. See Installing OpenCV on Jetson TK1 if you haven't done so yet. OpenCV has been significantly optimized by NVIDIA in 2 ways:

  1. OpenCV4Tegra: A free library provided by NVIDIA containing optimizations for NVIDIA's Tegra CPUs (ARM NEON SIMD optimizations, multi-core CPU optimizations and some GLSL GPU optimizations). OpenCV4Tegra is a binary replacement for the public OpenCV, thus the programmer just writes regular OpenCV code, that will automatically take advantage of OpenCV4Tegra optimizations without the developer or user necessarily knowing about it. It is supported on Android since Tegra 2 and also supported on Linux4Tegra, Vibrante, etc. It typically provides between 2x - 5x speedup on Tegra K1 compared to regular OpenCV.
  2. OpenCV 'gpu' module: The 'gpu' module in the public OpenCV library is designed purely for CUDA GPGPU acceleration with NVIDIA's mobile & desktop GPUs. The developer must make minor changes to their code to specifically call functions from the OpenCV 'gpu' module in order for their OpenCV code to take advantage of the GPU. This allows the developer to control memory allocations for the GPU, choose when it is transferred between CPU & GPU, and choose which functions should run on GPU vs CPU and control the streaming or multi-GPU behaviour, etc. It has been an important part of OpenCV on desktop since 2010/2011, and is supported by most NVIDIA GPUs available today. Some functions (such as Haar Cascade Classifiers) are not as suited to GPUs so only get minor speedups or don't exist, while other functions (such as LBP Cascade Classifiers, HOG, stereo vision, warping, etc) are much more suited to GPUs and thus can get 5x - 20x speedups on Tegra K1 compared to regular OpenCV.

Presentation videos about the OpenCV4Tegra module

A free online webinar (on NVIDIA's GTC Express page) introduces the OpenCV4Tegra module, from the actual OpenCV4Tegra development team:

  1. Introduction to OpenCV for Tegra (March 2013) describes OpenCV4Tegra including the installation steps for Android. Video and Slides.

Presentation videos about the OpenCV 'gpu' module

Two free online webinars (on NVIDIA's GTC Express page) introduce OpenCV's GPU module, from the actual OpenCV development team:

  1. OpenCV - Accelerated Computer Vision using GPUs (June 2013) gives a non-technical overview of OpenCV and the GPU module, showing what is available and why you would want to use it. Video and Slides.
  2. Getting Started with GPU-accelerated Computer Vision using OpenCV and CUDA (July 2013) is more technical, it shows how you can install OpenCV's GPU module, shows the memory model of the GPU module, and how to combine OpenCV's GPU module with your own custom CUDA kernels. Video and Slides.


Power draw during computer vision tasks

The page Typical power draw of Jetson TK1 shows that the total power draw for Jetson TK1 is around 1.6W when idle (including the 0.4W fan) and is typically under 4W even when in moderate use. However, computer vision tasks are often able to push hardware to their limits, so this following section gives detailed power measurements for various computer vision programs.

This table covers various OpenCV sample CPU programs (in the "opencv-2.4.9/samples/cpp" folder), OpenCV sample GPU programs (in the "opencv-2.4.9/samples/gpu" folder), some VisionWorks sample CPU+GPU programs (freely available from NVIDIA), and some CUDA sample computer vision programs (in the "NVIDIA_CUDA6.0_Samples/3_Imaging" folder).

These are total power measurements for the whole Jetson TK1 board when running from 12V with the default fan attached (using 0.4W) without any customizations or power savings applied. The OpenCV samples were executed remotely through ethernet (where the Jetson TK1 was using 1.6W in between each of the OpenCV tests), while the VisionWorks and CUDA samples required a GPU-accelerated display and thus were executed through the Ubuntu Unity graphical desktop with a HDMI monitor and USB hub + keyboard & mouse attached (hence the Jetson TK1 was using 3.4W in between each of the VisionWorks & CUDA tests).

Sample code Library Processor Approximate power (Watts) for whole Jetson TK1 board Performance
camshiftdemo OpenCV CPU 3.5
kalman OpenCV CPU 2.0
letter_recog OpenCV CPU 4.9
meanshift_segmentation OpenCV CPU 4.7
peopledetect OpenCV CPU 4.9
segment_objects OpenCV CPU 2.5
videostab OpenCV CPU 4.9
pyrlk_optical_flow OpenCV GPU 11.0
brox_optical_flow OpenCV GPU 11.5
bgfg_segm OpenCV CPU 2.8 ~7 FPS (MOG2 algorithm)
bgfg_segm OpenCV GPU 2.4 ~34 FPS (MOG2 algorithm)
hog OpenCV CPU 4.6 ~1.1 FPS
hog OpenCV GPU 4.7 ~5.0 FPS
farneback_optical_flow OpenCV CPU 5.0 ~0.24 FPS
farneback_optical_flow OpenCV GPU 10.8 ~0.46 FPS
stereo_match OpenCV CPU 2.4 ~3 FPS (BM algorithm)
stereo_match OpenCV GPU 3.4 ~24 FPS (BM algorithm)
feature_tracker VisionWorks GPU 6 ~40 FPS @ 720p (performing Color Conversion, Gaussian Pyramid, Pyramidal Optical Flow, plus Feature Tracking)
hough_lines VisionWorks GPU 7 ~30 FPS @ 720p (performing Color Conversion, Canny, plus Probabilistic Hough Lines)
motion_estimation VisionWorks GPU 6 ~20 FPS @ 720p (performing Color Conversion, plus IME Motion Estimation)
object_detector VisionWorks GPU 7 ~5 FPS @ 720p (performing Color Conversion, plus HOG Pedestrian Detection)
object_tracker VisionWorks GPU 6 ~60 FPS @ 720p (performing Color Conversion, Gaussian Pyramid, Forward Optical Flow, Backward Optical Flow, plus Median Flow)
pedestrian detector VisionWorks GPU 10 ~2 FPS @ 720p (performing Soft Cascade Classifier)
car detector VisionWorks GPU 10 ~5 FPS @ 720p (performing Soft Cascade Classifier)
SLAM VisionWorks GPU 7 ~25 FPS @ 480p (performing SLAM)
bilateralFilter CUDA GPU 11.4 ~34 FPS @ 640x480
boxFilter CUDA GPU 7.0 ~23 FPS @ 1024x1024
imageDenoising CUDA GPU 6.0 ~150 FPS @ 320x408
SobelFilter CUDA GPU 4.6 ~150 FPS @ 512x512