Here is a listing of some instrumentation systems for the kernel:
Existing Instrumentation Systems
Andrew Morton's system for measuring intervals between kernel events:
Produces printk's with extra time data on them. As of kernel 2.6.11 this is part of the mainline kernel enabled by CONFIG_PRINTK_TIME. Previous versions can add it via a very simple patch. It works for bootup time measurements, or other places where you can just jam in a printk or two.
See Printk Times
Starting from 2.6.28 the kernel has this new feature to optimize the boot time. It records the timings of the initcalls. Its aim is to be parsed by the scripts/bootgraph.pl tool to produce graphics about boot inefficiencies, giving a visual representation of the delays during initcalls. Users need to boot with the "initcall_debug" and "printk.time=1" parameters, and run "dmesg | perl scripts/bootgraph.pl > output.svg" to generate the final data.
Kernel Function Instrumentation (KFI)
A system which uses a compiler flag to instrument most of the functions in the kernel. Timing data is recorded at each function entry and exit. The data can be extracted and displayed later with a command-line program.
The kernel portion of this is available in the CELF tree now.
Grep for CONFIG_KFI.
See the page Kernel Function Instrumentation page for some preliminary notes.
FIXTHIS - need to isolate this as a patch.
Linux Trace Toolkit
Kernel Tracer (in IKD patch)
This is part of a general kernel tools package, maintained by Andrea Arcangeli.
The ktrace implementation is in the file kernel/debug/profiler.c It was originally written by Ingo Molnar, Richard Henderson and/or Andrea Arcangeli
It uses the compiler flag -pg to add profiling instrumentation to the kernel.
Function trace in KDB
Last year (Jan 2002) Jim Houston sent a patch to the kernel mailing list which provides support compiler-instrumented function calls.
Ftrace is a simple function tracer which initially came from the -rt patches but was mainlined in 2.6.27. Compiler profiling features are used to insert an instrumentation call (with gcc -pg option) that can be overwritten with a NOP sequence to ensure overhead is minimal with tracing disabled (this is enabled through CONFIG_DYNAMIC_FTRACE). There are a number of tracers in the kernel that use ftrace to trace high level events such as irq enabling/disabling preemption enabling/disabling, scheduler events and branch profiling.
The interface to access ftrace can be found in /debugfs/tracing, and is very extensively documented in Documentation/trace/ftrace.txt.
SystemTap / Kprobes
SystemTap is a sophisticated kernel instrumentation tool that can be scripted with it's own language to gather information about a running kernel. It uses the Kprobes infrastructure to implement it's tracing.
Some random thoughts on instrumentation:
- Most instrumentation systems need lots of memory to buffer the data produced
- Some instrumentation systems support filters or triggers to allow for better control over the information saved
- instrumentation systems tend to introduce overhead or otherwise interfere with the thing they are measuring
- instrumentation systems tend to pollute the cache lines for the processor
- There doesn't seem to be a single API to support in-kernel timing instrumentation which is supported on lots of different architectures. This is the main reason for CELF's current project to define an Instrumentation API