CodeSourcery ARM GNU/Linux tool chain is the version with the support for the latest ARM architecture. Mainline gcc also has stable ARM support. Enhancements are made in the Codesourcery version first, and are then pushed back to mainline.
Note: A user reports that he found the use of CodeSourcery tool chain version 2007q1-21 and 2008q1-126 as strange. He has been using 2007q3-51 for a couple of months now.
Note: Some users report problems using Linux installer version. If installer doesn't work for you, download tar version (section Advanced Packages), copy extracted arm-2007q3 directory to /opt/codesourcery/arm-none-linux-gnueabi/ and add /opt/codesourcery/arm-none-linux-gnueabi/arm-2007q3/bin to your path.
- Note: CodeSourcery 2008q1 has the following issues:
- Vectorization + NEON is broken
- building static binaries with cortex-a8 flag (or any ARMv7a core) is broken
- some armv6 compilations end in ICE.
- Note: CodeSourcery 2008q3 (now replaced by 2008q3-72) has the following issues:
- Note: CodeSourcery 2008q3-72 has the following issues:
Note: If you will use only OpenEmbedded (OE) to build code for your Beagle, you don't need to download the CodeSourcery compiler. OE builds a cross-compiler from source as part of the bitbake process.
ARM RVDS compiler
ARM RVDS tools can be used to generate Linux applications and shared-libraries, by following the Apps Note 212 Building Linux applications using RVCT v4.0 and the GNU Tools and Libraries
ARM RVDS can also work in Scratchbox, by following Apps Note 221 RealView Development Suite 4.0 ARM Compiler for Scratchbox
ARM Cortex Floating Point
There are two types of instructions in the ARM v7 ISA that handle floating point:
1) VFPv3 Floating point instruction set (used for single/double precision scalar operations). These is used by gcc for C floating point operations on 'float' and 'double'
2) NEON NEON vectorized single precision operations (2 values in a D-register, or 4 values in a Q-register) These can be use by gcc when -ftree-vectorize is enabled and -mfpu=neon is specified, and the code can be vectorized. In other cases the VFPv3 scalar ops will be used.
ARM Cortex-A processors have separate floating point pipelines that handle these different instructions.
On Cortex-A8, the designers' focus was on the NEON unit performance which can sustain 1 cycle/instr throughput (processing 2 single-precision values at once). The scalar VFPv3 FPU cannot achieve this level of performance (cycle timings are in the Cortex-A8 TRM download), but it is still a lot better than doing floating point using integer instructions.
If you need the highest performance floating point on Cortex-A8, you need to use single precision and ensure the code uses the NEON vectorized instructions:
- use gcc with -ftree-vectorize (possibly modify source code to make it vector friendly)
- use NEON instrinsics (#include <arm_neon.h>, float32x2_t datatype and vmul_f32() etc)
- use NEON asm directly
Keep in mind that mixing NEON and ARM load/stores can sometimes stall significantly. See this link for more info.
On Cortex-A9, there is a much higher performance floating point unit which can sustain 1 cycle/instr throughput, with low result latencies.