Kernel Function Trace

Introduction
Kernel Function Trace (KFT) is a kernel function tracing system, which uses the "-finstrument-functions" capability of the gcc compiler to add instrumentation callouts to every function entry and exit. The KFT system provides for capturing these callouts and generating a trace of events, with timing details. KFT is excellent at providing a good timing overview of kernel procedures, allowing you to see where time is spent in functions and sub-routines in the kernel.

The main mode of operation with KFT is to use the system with a dynamic trace configuration. That is, you can set a trace configuration after kernel startup, using the  interface, and retrieve trace data immediately. However, another (special) mode of operation is available, called STATIC_RUN mode, where the configuration for a KFT run is configured and compiled statically into the kernel. This mode is useful for getting a trace of kernel operation during system bootup (before user space is running).

The KFT configuration lets you specify how to automatically start and stop a trace, whether to include interrupts as part of the trace, and whether to filter the trace data by various criteria (for minimum function duration, only certain listed functions, etc.) KFT trace data is retrieved by reading from after the trace is complete.

Tools are supplied to convert numeric trace data to kernel symbols, and to process and analyze the data in a KFT trace.

Basic Use
Documentation for KFT is available (as of 2.6.12) in Documentation/kft.txt, after applying the kft-all-in-one.patch.

An online guide is provided at Using Kernel Function Trace

Here's a presentation about KFT usage:
 * Presentation: [[Media:KFI-presentation.pdf|Learning the Kernel and Finding Performance Problems with KFI]]
 * Sample trace used with presentation: [[Media:omap-serial_init.trace.txt|omap-serial_init.trace.txt]]

KFT used to be called KFI (for Kernel Function Instrumentation). For prior releases of KFT, see KernelFunctionInstrumentation

Patches

 * Download directory with recent versions: ftp://dslab.lzu.edu.cn/pub/kft/
 * This is a fairly slow link - you can download the patch for 2.6.21 here: [[Media:kft-all-in-one-2.6.21.patch|kft-all-in-one-2.6.21.patch]]
 * Patches for Linux 2.6.8.1, 2.6.11 and 2.6.12: see the Patch Archive page (available as an all-in-one patch or a tar archive of broken-out patches)
 * Patch for Linux 2.6.11: (can just download [[Media:kfi-2.patch|kfi-2.patch]])
 * Patch for Linux 2.6.7 (for x86 only): [[Media:kfi-26-test1.patch|kfi-26-test1.patch]]
 * Patch for CELF kernel (based on linux-2.4.20): [[Media:kfi-24-test4.patch|kfi-24-test4.patch]]

KFT utilities
KFT includes several helper scripts which are located in the kernel  directory:
 * addr2sym - convert function addresses to symbols in the trace data
 * kd - KFT dump - does filtering, sorting, analysis and trace formatting of KFT trace logs
 * mkkftrun.pl - used during building the kernel to convert a configuration file into a C file to be compiled into the kernel
 * sym2addr - convert function names to addresses in a KFT configuration file (for a dynamic trace)

See Documentation/kft.txt, in the kernel source tree after applying the patch, for instructions on using these programs.

How To Use

 * download both the patch
 * apply the patch in the kernel top-level directory:
 * patch -p1 <kft.patch
 * read the rest of the instructions in the Documentation/kft.txt file. (my apologies for being lazy!)

Adding platform support for the kft clock source
The current patch (from Sep 2005), uses sched_clock as the clock source for kft_readclock. sched_clock is new in the 2.6 kernel, and returns a 64-bit value containing nanoseconds (not necessarily relative to any particular time base, but assumed to be monotonically increasing, and relatively frequency-stable.)

If your platform has good support for sched_clock, then KFT should work for you unmodified. If not, you may wish to do one of two things:
 * improve support for sched_clock in your board port, or
 * write a custom kft_readclock routine.

A "good" sched_clock routine will provide at least microsecond resolution on return values. Some architectures have sched_clock returning values based on the  variable, which on many embedded platforms only has resolution to 10 milliseconds.

There are some sample custom kft_readclock routines in the current patch for different architectures.

Issues
Here is a list of things that need more work:
 * may need to add noinstrument attributes for some time-critical code (need to check this)
 * maybe check "Function Trace in KDB" patch for help with this

Overhead
Mitsubishi measured the overhead of KFI (the predecessor to KFT). The period is from start_kernel to smp_init.

Platform was: SH7751R 240MHz (Memory Clock 80MHz)

Similar technologies
There are other technologies for doing call traces or kernel profiling that are similar to KFT. Some of these are mentioned on the Kernel Instrumentation page. One that is very similar is a kernel trace mechanism for use with KDB. A patch was posted to LKML in January of 2002. See the message: http://www.uwsg.iu.edu/hypermail/linux/kernel/0201.3/0888.html

Filter Q&A
Tim asked the question:

Q. Is there a way to adjust the trigger or filters to reduce the memory usage?

A. The memory usage is determined by the size of the log, which is specified by  in the KFT configuration. If  is not specified, it defaults to a rather large number (20,000 in the current code). To use a smaller trace log, specify a smaller number of logentries in the KFT configuration.

The use of triggers and filters can help you fit more data (or more pertinent data) into the log, so you can more readily see the information you are interested in.

By setting start and stop triggers with a narrower "range" of operation, then the amount of data put into the log will be more limited. For example, the default configuration for a static trace uses

This will trace EVERYTHING that the kernel does between those two routines. However, you can limit tracing to a much smaller time area of kernel initialization using better triggers. Here is an example showing a triggers for just watching mem_init:

Filters are also vital to reduce the number of entries the trace log. With no time filters in place, KFT will log every single function executed by the kernel. This will quickly overrun the log (no matter what size you have reserved with.

When using KFT to find long-duration functions in the kernel, we usually are not interested in routines that execute quickly, and instead use something like "filter mintime 500" to filter out routines taking less than 500 microseconds.

Sample results
Here is an excerpt from a KFI log trace (processed with addr2sym). It shows all functions which lasted longer than 500 microseconds, from when the kernel entered start_kernel to when it entered to_userspace.

kft log output (excerpt)
The log is attached here: [[Media:Kfiboot-9.lst]] A Delta value of 0 usually means the exit from the routine was not seen.

kft log analysis with 'kd'
Below is a  dump of the data from the above log.

For the purpose of finding areas of big time in the kernel, the functions with high "Local" time are important. For example,  is called 156 times, resulting in 619 milliseconds of duration. Other time-consuming routines were:,  ,.

The top line showing schedule called 192 times and lasting over 5 seconds, is accounted wrong due to the switch in execution control inside the schedule routine. (The count of 192 calls is correct, but the duration is wrong.)

$ ~/work/kft/kft/kd -n 30 kftboot-9.lst Function                 Count Time     Average  Local - -  schedule                    192  5173790    26946  5173790 do_basic_setup               1  1159270  1159270       14 do_initcalls                 1  1159256  1159256      627 __delay                    156   619322     3970        0 delay_tsc                  156   619322     3970   619322 __const_udelay             146   608427     4167        0 probe_hwif                   8   553972    69246      126 do_probe                    31   553025    17839       68 ide_delay_50ms             103   552588     5364        0 isapnp_init                  1   383138   383138       18 isapnp_isolate               1   383120   383120   311629 ide_init                     1   339778   339778       22 probe_for_hwifs              1   339756   339756      103 ide_scan_pcibus              1   339653   339653       13 init_setup_piix              2   339640   169820        0 ide_scan_pcidev              2   339640   169820        0 piix_init_one                2   339640   169820        0 ide_setup_pci_device         2   339640   169820      242 probe_hwif_init              4   339398    84849       40 time_init                    1   266911   266911        0 get_cmos_time                1   261404   261404   261404 ide_generic_init             1   214614   214614        0 ideprobe_init                1   214614   214614        0 wait_for_completion          6   194573    32428        0 default_idle               183   192589     1052   192589 io_schedule                 18   171313     9517        0 __wait_on_buffer            14   150369    10740      141 i8042_init                   1   137210   137210      295 i8042_port_register          2   135318    67659      301 __serio_register_port        2   135017    67508        0

kft nested call trace with 'kd -c'
Below is a  trace of the data from a log taken from a PPC440g platform, from a (dynamic) trace of the function do_fork.

Here is the configuration file that was used:

Here is the first part of the trace in nested call format: Times (Entry, Duration and Local) are in micro-seconds. Note the timer interrupt during the routine.

To see the full trace, go to the KftDoForkTrace page.