Tracing BOFat ELC2006

CELF Embedded Linux Conference Tracing BOF
At the recent Embedded Linux Conference (April 2006 in San Jose), representatives from 3 of major kernel tracing systems got together to discuss ways to leverage each other's work, and reduce fragmentation and duplication of effort.

Present at the meeting were:
 * Tohru Nojiri (Hitachi) - LKST
 * Hirohisa Iijima (Lineo Solutions) - LKST
 * Takaaki Kasuga (Lineo Solutions)
 * Yoshihisa Ozawa (Lineo Solutions) - LKST (implementor of Kprobes for SSH)
 * Mathieu Desnoyers - LTTng
 * Tim Bird - KFT

We started by discussing the various aspects of our systems, how they worked and what capabilities they had.

(sorry, I didn't write detailed notes here).

A couple of miscellanoues items I can remember:
 * KFT instruments every kernel function. It is highly intrusive and is not appropriate for production-time tracing.  Data is only output to user space when trace is stopped. It uses a simple /proc interface for configuration and trace collection.
 * LKST has recently been modified to use Kprobes. Hitachi and Lineo are working to support Kprobes for SH architecture.
 * LTT has big focus on unintrusive tracing, and in being able to collect traces on a running, production system.
 * LKST has good plotting post-processors
 * LTTV is modular, supporting arbitrary plugins for different data views
 * LTT has a document describing standards for trace data.

The trace format standard is available at: http://ltt.polymtl.ca/svn/ltt/branches/poly/doc/developer/format.html (Holy XML Batman, that's a complicated format) ''This format document does not (as of July 2010) appear to be available any more.''

We should produce a taxonomy of various tracer attributes, in order to find points in common which might be shared in the future. Tim will come up with a list of questions to ask about attributes of each of the trace systems, to use to characterize them.

Some possible areas of collaboration are:
 * macros to insert static trace points in the kernel
 * trace buffering mechanisms (Paul Mundt recently removed relayfs from the kernel - what to do now?)
 * control interface between user space and kernel
 * techniques for lockless or other non-intrusive, preemptible trace collection routines
 * agree on a standard for binary and/or ascii trace log formats
 * rendering/presentation of trace data
 * tools for aggregation (summing, averaging, etc.) of trace data

Tim said that if we want to mainline a tracing system, we should specifically identify barriers to mainlining, and address the issues in a methodical fashion.

We agreed to use the ltt-dev mailing list for collaboration discussions.

We agreed to use the CELF wiki to share information and documents. (That's this page.)

We agreed to meet together at the Ottawa Linux Symposium to discuss these issues further. There's a tracing BOF there, hosted by William E. Cohen of Redhat. See http://www.linuxsymposium.org/2006/view_abstract.php?content_key=117

Action Items

 * Tim Bird will create a list of question for the tracer survey and send them to various tracer leads.