CELF Project Proposal/Combine tcg with tcc
- Combine qemu's tcg with tcc to create a new embedded compiler
- Rob Landley
The QEMU project has a fairly general purpose "Tiny Code Generator" which is capable of producing machine code for every target QEMU supports. This code generator is well maintained (by the qemu development community), operates extremely rapidly (producing code "on the fly"), and supports a large and increasing number of platforms, even distinguishing many specific variants within each platform (the qemu -cpu options).
Before QEMU, Fabrice Bellard's previous open source project was the Tiny C Compiler (tcc), which was notable for its small size (approximately 100k for a combined compiler/assembler/linker), its self-contained nature (not requiring external packages such as binutils), its speed of compilation (millions of lines of source code per second even on low-end hardware), and its "-run" mode (allowing use of C as a scripting language by starting a source file with "#!/usr/bin/tcc -run" and setting the executable bit on the source file).
Tinycc provides almost full c99 support (most notably missing complex number support and variable extent arrays). In 2004, tinycc became the only open source compiler other than gcc to compile a working LInux kernel (albeit in limited circumstances). Fabrice Bellard created "tccboot", a proof-of-concept project in which tcc was used to boot a Linux kernel directly from source code. The tccboot ISO image booted directly into a modified tcc binary bundled with a modified subset of the 2.4 Linux kernel source. It compiled this source to create a vmlinux, which it then executed.
QEMU actually started as an offshoot of tcc. When fabrice looked into providing multiple output formats for tcc (to support targets other than 32- bit x86), he started playing with multiple input formats as well, such as pages of existing machine code. The result was qemu, which is actually a "dynamic recompiler" rather than an emulator.
The TCC project stalled when QEMU expanded to take up all Fabrice's time, and the project remained moribund for several years. (Recently the original tcc has been relaunched as a windows-centric project, but its current maintainer has shown little to no interest in Linux or non-x86 targets.)
The tcc codebase as Fabrice left it provides an almost complete c99 compiler. Combined with qemu's code generator, this could provide a small fast compiler capable of running on and producing output for a wide range of embedded hardware.
Creating a "qcc" from tcc and tcg would involve:
1) Turn tcc into a "swiss army knife" executable (like busybox) so it its individual functions could be called as cc, ld, as, strip, cpp, and so on.
1A) optional - use the Firmware Linux ccwrap.c code to increase understanding of gcc command line options.
1B) optional - add missing utilities such as readelf, objdump, objcopy...
2) Refactor the code to untangle preprocessor, compiler, assembler, and linker functions.
3) Replace existing target code generation with qemu's tcg.
4) Add support infrastructure for targets supported by tcg (assembly code parser, ELF header information)
5) Add missing functionality needed to build unmodified Linux 2.6, BusyBox, and uClibc. (The linux kernel needs variable extent arrays, simple dead code elimination, assembly output (at least via objdump) to generate asm-offsets.h...)