Difference between revisions of "Kernel Timer Systems"

From eLinux.org
Jump to: navigation, search
(Dynamic ticks)
(Updated section on clocksource)
(15 intermediate revisions by 4 users not shown)
Line 1: Line 1:
This page has links to information about the (relatively) new timer systems for the Linux kernel.
 
The current linux kernel received a major enhancement to it's timer system (as of about 2.6.21),
 
which solved a number of problems.
 
 
A good article on the subject is at at lwn.net: [http://lwn.net/Articles/223185/ Clockevents and dyntick]
 
 
-------------------
 
 
 
== Timer Wheel, Jiffies and HZ (or, the way it was) ==
 
== Timer Wheel, Jiffies and HZ (or, the way it was) ==
 
The original kernel timer system (called the "timer wheel) was based on
 
The original kernel timer system (called the "timer wheel) was based on
Line 31: Line 23:
  
 
== ktimers ==
 
== ktimers ==
 +
Update: ktimers have been replaced by hrtimer framework also by Thomas Gleixner, only using a set of functions and datastructures in linux/ktime.h.
 
=== Material needs rework ===
 
=== Material needs rework ===
A bunch of material in this section needs to be created or expanded to take into account the new
+
A bunch of material in this section needs to be created or expanded to take into account the hrtimer system by Thomas Gleixner.
ktimer system by Thomas Gleixner.
 
  
 
=== clock events ===
 
=== clock events ===
=== clock sources ===
+
 
 +
[http://lwn.net/Articles/223185/ Basic LWN coverage on clockevents concepts]
 +
 
 +
=== clocksource ===
 +
[http://article.gmane.org/gmane.linux.kernel/1062438 Clocksource Documentation patch that didn't get accepted]. Has some coverage of clock sources although care to be taken by going through patch responses.
 +
 
 +
Clocksource is also related or the same as the GTOD (Generic time of Day) work by John Stultz that hrtimer framework depends on (as mentioned on p.18 in the OLS 2006 slides).
 +
 
 +
Also refer to the kernel documentation on [https://www.kernel.org/doc/Documentation/timers/highres.txt High resolution timers and dynamic ticks design notes] for some notes on clock source.
 +
 
 +
== Timer information ==
 +
 
 +
There are two /proc files that are very useful for gathering information about timers on your system.
 +
 
 +
=== /proc/timer_list ===
 +
 
 +
/proc/timer_list has information about the currently configured clocks and timers on the system.
 +
This is useful for debugging the current status of the timer system (especially while you are developing
 +
clockevent and clocksource support for your platform.)
 +
 
 +
You can tell if high resolution is configured for you machine by looking at a few different things:
 +
 
 +
For standard resolution (at jiffy resolution), a clock will have a value for it's '.resolution' field
 +
equal to the period of a jiffy.  For embedded machines, where HZ is typically 100, this will be
 +
10 milliseconds, or 10000000 (ten million) nanoseconds.
 +
 
 +
Also for standard resolution, the Clock Event Device will have an event handler of "tick_handle_periodic".
 +
 
 +
For high resolution, the resolution of the clock will be listed as 1 nanosecond
 +
(which is ridiculous, but serves as an indicator of essentially arbitrary precision.)
 +
Also, the Clock Event Device will have an event handler of "hrtimer_interrupt".
 +
----
 +
[need more info here - and this should probably be written up and put in Documentation/filesystems/proc.txt]
 +
 
 +
=== /proc/timer_stats ===
 +
 
 +
/proc/timer_stats is a file in the /proc pseudo file system which allows you to see information
 +
about the routines that are requesting timers of the Linux kernel.  By cat'ing this file,
 +
you can see which routines are using lots of timers, and how frequently they are requesting them.
 +
This can be of interest to see
 +
 
 +
To use /proc/timer_stats, configure the kernel with support for the feature.
 +
That is, set CONFIG_TIMER_STATS=y in your .config.
 +
This is on the Kernel Hacking menu, with the prompt:
 +
"Collect kernel timers statistics"
 +
 
 +
Compile and install your kernel, and reboot your machine.
 +
 
 +
To activate the collection of stats (and reset the counters),
 +
do "echo 1 >/proc/timer_stats"
 +
 
 +
To stop collecting stats, do "echo 0 >/proc/timer_stats"
 +
 
 +
You can dump the statistics either while the collection system
 +
is running or stopped.
 +
To dump the stats, use 'cat /proc/timer_stats'. This shows the average events/sec at
 +
the end as well so you get a rough idea of system activity.
 +
 
 +
/proc/timer_stats fields (for version 0.1 of the format) are:
 +
<count>,  <pid> <command>  <start_func> (<expire_func>)
  
 
== Dynamic ticks ==
 
== Dynamic ticks ==
 +
 +
Tickless kernel, dynamic ticks or NO_HZ is a config option that enables a kernel to run without a regular timer tick. The timer tick is a timer interrupt that is usually generated HZ times per second, with the value of HZ being set at compile time and varying between around 100 to 1500. Running without a timer tick means the kernel does less work when idle and can potentially save power because it does not have to wake up regularly just to service the timer. The configuration option is CONFIG_NO_HZ and is set by Tickless System (Dynamic Ticks), on the Kernel Features configuration menu.
 +
 
* See the [http://lwn.net/Articles/223185/ Clockevents and dyntick] LNW.net article
 
* See the [http://lwn.net/Articles/223185/ Clockevents and dyntick] LNW.net article
  
The configuration option is: CONFIG_NO_HZ
+
=== Testing ===
 +
 
 +
To tell if dynamic ticks is supported in your kernel you can:
 +
 
 +
Look in dmesg for a line like this one:
  
Prompt is: Tickless System (Dynamic Ticks), on the Kernel Features configuration menu.
+
  # dmesg | grep -i nohz
 +
  Switched to NOHz mode on CPU #0
  
=== Testing ===
+
Or look at the timer interrupts and compare to jiffies:
How to tell if dynamic ticks is supported on your kernel:
 
  
You can look at the timer interrupts and compare to jiffies:
+
  # cat /proc/interrupts | grep -i time
 +
  # sleep 10
 +
  # cat /proc/interrupts | grep -i time
  
  cat /proc/interrupts | grep -i time
+
=== Powertop ===
  sleep 10
 
  cat /proc/interrupts | grep -i time
 
  
Then, cat /proc/timer_stats, and it gives and average events/sec at
+
Powertop is a tool that parses the /proc/timer_stats output and gives a picture of what is causing wakeups on your system. Minimizing these wakeups should allow you to decrease power consumption in your device. Powertop was originally written for the x86 architecture but also works for embedded processors. However, in order to get a clean display from it, you will need an ncurses lib with wide character support.
the end as well so you get a rough idea of system activity.
 
  
You can use powertop on embedded processors, but in order to get a clean display
+
Here's a poor-man's version of powertop:
from it, you need an ncurses lib with wide-char support. (From Kevin Hilman, Oct 2007)
+
  # watch "cat /proc/timer_stats | sort -nr | head -n 20"
  
 
== timer API ==
 
== timer API ==
  - interval timers
+
* interval timers
  - posix timer API
+
* posix timer API
  _ sleep, usleep and nanosleep
+
* sleep, usleep and nanosleep
  
 
== time API ==
 
== time API ==
Line 100: Line 157:
 
  - high res timer for periodic absolute wakeup (wake up every 10 ms, whether last one was late or nt
 
  - high res timer for periodic absolute wakeup (wake up every 10 ms, whether last one was late or nt
 
  - high res timer for periodic relative wakeup (wake up 10 ms from now)
 
  - high res timer for periodic relative wakeup (wake up 10 ms from now)
 +
 +
[[Category:Kernel]]

Revision as of 21:34, 12 August 2013

Timer Wheel, Jiffies and HZ (or, the way it was)

The original kernel timer system (called the "timer wheel) was based on incrementing a kernel-internal value (jiffies) every timer interrupt. The timer interrupt becomes the default scheduling quamtum, and all other timers are based on jiffies. The timer interrupt rate (and jiffy increment rate) is defined by a compile-time constant called HZ. Different platforms use different values for HZ. Historically, the kernel used 100 as the value for HZ, yielding a jiffy interval of 10 ms. With 2.4, the HZ value for i386 was changed to 1000, yeilding a jiffy interval of 1 ms. Recently (2.6.13) the kernel changed HZ for i386 to 250. (1000 was deemed too high).

Ingo Molnar's explanation of timer wheel performance

Ingo Molnar did an in-depth explanation about the performance of the current "timer wheel" implementation of timers. This was part of a series of messages trying to justify the addition of ktimers (which have different characteristics).

It is possibly the best explanation of the timer wheel avaiable: See http://lkml.org/lkml/2005/10/19/46 and http://lwn.net/Articles/156329/

ktimers

Update: ktimers have been replaced by hrtimer framework also by Thomas Gleixner, only using a set of functions and datastructures in linux/ktime.h.

Material needs rework

A bunch of material in this section needs to be created or expanded to take into account the hrtimer system by Thomas Gleixner.

clock events

Basic LWN coverage on clockevents concepts

clocksource

Clocksource Documentation patch that didn't get accepted. Has some coverage of clock sources although care to be taken by going through patch responses.

Clocksource is also related or the same as the GTOD (Generic time of Day) work by John Stultz that hrtimer framework depends on (as mentioned on p.18 in the OLS 2006 slides).

Also refer to the kernel documentation on High resolution timers and dynamic ticks design notes for some notes on clock source.

Timer information

There are two /proc files that are very useful for gathering information about timers on your system.

/proc/timer_list

/proc/timer_list has information about the currently configured clocks and timers on the system. This is useful for debugging the current status of the timer system (especially while you are developing clockevent and clocksource support for your platform.)

You can tell if high resolution is configured for you machine by looking at a few different things:

For standard resolution (at jiffy resolution), a clock will have a value for it's '.resolution' field equal to the period of a jiffy. For embedded machines, where HZ is typically 100, this will be 10 milliseconds, or 10000000 (ten million) nanoseconds.

Also for standard resolution, the Clock Event Device will have an event handler of "tick_handle_periodic".

For high resolution, the resolution of the clock will be listed as 1 nanosecond (which is ridiculous, but serves as an indicator of essentially arbitrary precision.) Also, the Clock Event Device will have an event handler of "hrtimer_interrupt".


[need more info here - and this should probably be written up and put in Documentation/filesystems/proc.txt]

/proc/timer_stats

/proc/timer_stats is a file in the /proc pseudo file system which allows you to see information about the routines that are requesting timers of the Linux kernel. By cat'ing this file, you can see which routines are using lots of timers, and how frequently they are requesting them. This can be of interest to see

To use /proc/timer_stats, configure the kernel with support for the feature. That is, set CONFIG_TIMER_STATS=y in your .config. This is on the Kernel Hacking menu, with the prompt: "Collect kernel timers statistics"

Compile and install your kernel, and reboot your machine.

To activate the collection of stats (and reset the counters), do "echo 1 >/proc/timer_stats"

To stop collecting stats, do "echo 0 >/proc/timer_stats"

You can dump the statistics either while the collection system is running or stopped. To dump the stats, use 'cat /proc/timer_stats'. This shows the average events/sec at the end as well so you get a rough idea of system activity.

/proc/timer_stats fields (for version 0.1 of the format) are:

<count>,  <pid> <command>   <start_func> (<expire_func>)

Dynamic ticks

Tickless kernel, dynamic ticks or NO_HZ is a config option that enables a kernel to run without a regular timer tick. The timer tick is a timer interrupt that is usually generated HZ times per second, with the value of HZ being set at compile time and varying between around 100 to 1500. Running without a timer tick means the kernel does less work when idle and can potentially save power because it does not have to wake up regularly just to service the timer. The configuration option is CONFIG_NO_HZ and is set by Tickless System (Dynamic Ticks), on the Kernel Features configuration menu.

Testing

To tell if dynamic ticks is supported in your kernel you can:

Look in dmesg for a line like this one:

 # dmesg | grep -i nohz
 Switched to NOHz mode on CPU #0

Or look at the timer interrupts and compare to jiffies:

 # cat /proc/interrupts | grep -i time
 # sleep 10
 # cat /proc/interrupts | grep -i time

Powertop

Powertop is a tool that parses the /proc/timer_stats output and gives a picture of what is causing wakeups on your system. Minimizing these wakeups should allow you to decrease power consumption in your device. Powertop was originally written for the x86 architecture but also works for embedded processors. However, in order to get a clean display from it, you will need an ncurses lib with wide character support.

Here's a poor-man's version of powertop:

 # watch "cat /proc/timer_stats | sort -nr | head -n 20"

timer API

  • interval timers
  • posix timer API
  • sleep, usleep and nanosleep

time API

- do_gettimeofday

High Resolution Timers

See High Resolution Timers, which describe sub-jiffy timers.

Old timer wheel/jiffy replacement proposals

Jun Sun's "tock" proposal

See http://linux.junsun.net/HRT/index.html

This systems replaces jiffies and xtime with tocks (arch-dependent), mtime (monotonic time) and wtime (wall time), and proposes a strategy for migrating to that.

John Stultz

In 2005, John Stultz proposed changes to the timers to use a 64-bit nanosecond value as the base. He did a presentation and BOF at OLS 2005. (It should be available online)

Timer Tick Thread - LKML July 2005

There was a very long thread about timers, jiffies, and related subjects in July of 2005 on the kernel mailing list.

The title was: "Re: [PATCH] i386: Selectable Frequency of the Timer Interrupt"

Linus said jiffies is not going away

- still need 32-bit counter, shouldn't be real-time value (too much overhead to calculate)
- high-res timers shouldn't be sub-HZ, but instead, HZ should be high and timer tick should not be 1:1 with HZ
  - in other words, have HZ be high (like 2K), have the timer interrupt fire off at some lower frequency,
  and increment jiffies by more than one on each interrupt.
  - rationale for this is to keep a single sub-system

Arjan had good points about coalescing low-res timers

- 3 use cases:
- low res timeouts
- high res timer for periodic absolute wakeup (wake up every 10 ms, whether last one was late or nt
- high res timer for periodic relative wakeup (wake up 10 ms from now)