RichardB's notes from the seminar
These are notes from Silica's OMAP Workshop, 21 Jan. 09 – ARM, Cambridge, UK. TI's OMAP3 is used e.g. on BeagleBoard.
Contents
- 1 Cortex A8 Core – Bryan Lawrence – ARM
- 2 NEON SIMD – Ashley Stevens – ARM
- 3 OMAP35x Processor Overview – Chris Bowers – Snr Field Applications Engineer – TI
- 4 Understanding 2D/3D Graphics Dev using OMAP 35x - Jason Brand – Fields Apps Engineer – TI
- 5 ARM Software Development Tools – Elan Lennard – System Design Division – ARM
- 6 Tool Chain Overview – Chris Bower – TI
- 7 OMAP3 OS Support – Jason Brand – TI
- 8 Power for OMAP35x Processors – Miriam Corder – TI
Cortex A8 Core – Bryan Lawrence – ARM
- Cortex A8 Core is the design. OMAP is the physical implementation of this design by TI
- Cortex A8 is based on V7-A instruction-set architecture and includes:
- NEON advanced SIMD (multimedia accelerator – integer and floating-point SIMD (single-instruction multiple-data)
- Jazelle-RCT (Java accelerator)
- TrustZone security foundation (effectively virtualisation of the core)
- Particularly aimed at applications (rather than real-time or ‘deeply’ embedded)
- A8 Processor Core (design) can run up too 1GHz +, c. 2000 DMIPS, depending on silicon
- MMU for OS virtual memory management
- Thumb-2 allows 16 & 32-bit instructions. Allows efficient, but small (compressed) code size if required
- Dependant on the compiler t**produce better ‘code density’
- Thumb-2, for e.g., gives a 29% reduction in Linux kernel size. E.g.: http://www.arm.com/products/os/linux.html
- CoreSight; non-invasive real-time trace for debugging
- JTAG port
- Debug access port (DAP)
- Embedded Trace Macrocell (ETM) – captures instruction and data
NEON SIMD – Ashley Stevens – ARM
- Flexible, generic multimedia acceleration
- High-power consumption than dedicated hardware but supports emerging standards
- Hybrid 64/128-bit SIMD architecture
- Supports up too 64-bit integers, single-precision floating-point
- Adds additional registers
- Variety of ways to use: assembler , C Intrinsics, through too OpenMAX DL library (recommended), Vectorizing compilers (generates NEON SIMD instructions)
- Provides, for e.g., faster FFT’s
- Armcc vs gcc : armcc produces more compact, faster code.
- Lots of NEON-optimised codecs available
OMAP35x Processor Overview – Chris Bowers – Snr Field Applications Engineer – TI
- TI have a range of microcontrollers through t**Application processors & DSP
- OMAP tends t**be seen in things like digital signage, POS terminals, portable infotainment etc (Lower power, high performance); “Laptop-like performance”
- Up t**1200 Dhrystone MIPS
- ARCHOS7 Internet Media Table built on OMAP3
- TI are “nicely surprised” by things like BeagleBoard
- Has a DSP (in addition t**NEON) for vide**processing, up t**HD
- DSP is generic; not limited t**video/audi**processing
- Peripheral connectivity (USB, MMC, Serial , USB etc.)
- OMAP35 models:
- 3503 - ARM Cortex A8, Peripherals
- 3515 - ARM Cortex A8, Peripherals, PowerVR SGX (OpenGL ES) graphics engine
- 3525 - ARM Cortex A8, Peripherals, C64x DSP & video accelerator
- 3530 - ARM Cortex A8, Peripherals, PowerVR SGX (OpenGL ES) graphics engine, DSP & video accelerator
- Camera interface
- Auto-focus engine
- CCD & CMOS imager interface
- Preview engine etc.
- Display subsystem
- (24-bit RGB up to 1024x768 HD, 2 x 10-bit DAC’s; rotation, image resizing)
- Overlay, scaling, picture-in-picture
- Also discussed TI DaVinci platform: video-centric, based an ARM9, has some overlap with OMAP
- OMAP35x has power-management module. (PRCM), active and static (standby) modes of consumption
- Can reduce core voltage and frequency
- Various major components can be turned on/off as required – “power domains”
- Various complete boards available:
- OMAP35x evaluation module (EVM); OMAP 3530 plus touchscreen, RAM & NAND flash, Ethernet etc.
- BeagleBoard
- Gumstix Overo(tiny)
- LogicPD
- Analog & Micro
Understanding 2D/3D Graphics Dev using OMAP 35x - Jason Brand – Fields Apps Engineer – TI
- Lots of uses/major apps;
- Scalable UI’s , navigation, Games, Visualisations, Automotive
- OMAP 35x has NEON vector floating-point processor (VFP) +
- PowerVR SGX (graphics engine):
- Tile-based architecture
- Universal Scalable Shader Engine (USSE)
- Support for: OpenGL ES (Embedded Standard) 1.1 and 2.0, OpenVG 1.0 (t**accelerate Adobe Flash and SVG Tiny (Scalable Vector Graphics) and UI’s built on these)
- ~10M polygons/second, ~0.9 GFLOPS
- OpenGL ES is a well-defined subset of desktop OpenGL
- (lots of details on SGX engine)
- OpenGL ES support seems powerful
- Graphics SDK is available from TI; tools, headers, libs, demos etc
- IVA 2.2 – Image, Video, Audio subsystem- C64x DSP core:
- 32-bit fiex-point media processor
- Video & image accelerator
- TI supply compiler tools to optimize for this hardware
ARM Software Development Tools – Elan Lennard – System Design Division – ARM
- “Enabling all developers to get the best from their ARM-based system”
- Quality, high-performance s/w
- Tools: Compilation, Optimization, Middleware, Device Support, verification & debug, Fast simulation
- RealiVew Development Suite:
- Co-developed and validated with ARM processor IP; best code
- Extensive support for CoreSight (debug tech)
- Supports all ARM processors
- Std and Pro editions. ***Pro includes NEON compiler, RealView profiler, fast simulator (RTSM), ICE
- Automatic optimisation; data from profiler feeds back int**compiler == some perf improvement (c. 6%) and 40% (ish) code side reduction.
- Loop unrolling (where appropriate)
- Code reordering
- Link-time compilation; allows optimizations across source files, 5% size reduction, 5% perf improvement
- ARM compiler vs. GCC: ARM is 30% faster, 43% smaller. (similar when using Thumb code)
- NEON Vectorizing compiler; up t**400% (4x) performance improvement on a particular vide**decoder, compared t**regular ARM compiler
- ARM Workbench IDE – based on Eclipse 3.3
- ARM Eclipse plugins; ARM profiler, Flash programmer, ARM Linux project wizard etc. etc.
- Only really useful if RealView is used
- ARM Profiler:
- “Get the best out of ARM processors”
- Performance and code coverage analysis; detailed analysis of performance/usage, call-chain analysis
- Traces can be logged and replayed
- Completely non-intrusive; analyse running system/application
- Good e.g. show one instruction using 27% of application time
- RealView ICE and Trace,
- Hardware trace/debugger
Tool Chain Overview – Chris Bower – TI
- TI Code Composer Toolset
- DSPBIOS, low level ARM debug, DSP development and debug
- Montavista (for DaVinci)
- Linux-based, licensed through TI.
- Linux app development, Eclipse-based IDE
- Green Hills
- Integrity Linux based, MULTI debug environment for DSP and ARM. Application too
- Code Sourcery
- Linux (and Windows) – GNU Toolchain. For building Linux apps
- Eclipse-based for Pers and Pr**editions
- MPC
- WinCE
- Microsoft Platform Builder
- The choice for Windows CE etc. development
- Lauterbach TRACE32
- Low-level debug of ARM & DSP
OMAP3 OS Support – Jason Brand – TI
- Fundamentally this is Linux or Windows CE.
- TI issue a Linux 2.6.22 kernel, includes lots of device drivers, EVM drivers, on top of which:
- There is also DSPBIOS – scheduler, resource manager for DSP.
- Als**layers on top of these; codec interfaces, algorithm abstraction, Open VG, OpenGL ES, audio/vide**(GStreamer) etc.
- Windows CE 6.0 can als**function as the ‘underlying’ OS, some of the higher layers are different
- OMAP353 - SDK Beta SDK:
- Board boot, test, & flash utils
- Platform support:
- U-boot Linux boot-load and flashing
- Linux kernel with drivers
- Root fs
- Demo apps
- Image viewer
- DaVinci I/F dump
- Code Sourcery tools
- X-Loader
- Small user boot-loader t**boot from on-board flash
- Must be signed before use
- U-Boot
- The next-stage boot-loader
- Flexible open-source utility for boot-loading Linux
- Capable of reading kernel image from flash, Ethernet TFTP, and ?
- ITBOK (is the board ok)
- Based on u-boot
- Basic H/W functionality tests
- OMAP35x WinCE Support:
- TI seeing 40% WinCE vs 60% Linux
- MS suggest that total cost of development is cheaper , and to-market faster than with Linux
- BSQUARE’s WinCE 6.0 R2 BSP (board support package) Demo and Source is free with OMAP EVM
- Visual Tools plugin
- WinCE R2 Pro compiled with Visual Tools
- Various codecs, DirectShow filters etc. for a/v
- Production Tested (two full QA passes)
- 100% CETK passed
- Adobe Flash Lite 3 port available for WinCE R2 BSP – OMAP35x EVM
Power for OMAP35x Processors – Miriam Corder – TI
- Max power consumption is 360mW
- With dynamic voltage/freq scaling – averaging <100mW
- External power-management chips available (“analog companion “)
- Includes audio codec, RTC, USB OTG transceiver, battery charger etc.