Difference between revisions of "Realtime Testing Best Practices"

From eLinux.org
Jump to: navigation, search
m (Terminology: adjust for local link)
(convert from MoinMoin to Mediawiki format)
Line 1: Line 1:
Table Of Contents:
+
== Introduction ==
[[TableOfContents]]
 
 
 
= Introduction =
 
 
This page is intended to serve as a collecting point for presentations, documents, results, links and descriptions
 
This page is intended to serve as a collecting point for presentations, documents, results, links and descriptions
 
about testing Realtime performance of Linux systems.  In the first section, please upload or place links to presentations
 
about testing Realtime performance of Linux systems.  In the first section, please upload or place links to presentations
 
or documentsion on the subject of RT testing for linux.
 
or documentsion on the subject of RT testing for linux.
  
== Terminology ==
+
=== Terminology ===
 
This document uses the definitions for real time terminology found in: [[Real Time Terms]]
 
This document uses the definitions for real time terminology found in: [[Real Time Terms]]
  
= Test programs =
+
== Test programs ==
  
== RT Measurement programs ==
+
=== RT Measurement programs ===
 
Here is a list of programs that have been used for realtime testing:
 
Here is a list of programs that have been used for realtime testing:
=== lpptest ===
+
==== lpptest ====
* lpptest - included in the RT-preempt patch
+
* lpptest - included in the RT-preempt patch
  * It consists of a
+
** It consists of a
    1. driver in the linux kernel, to toggle a bit on the parallel port, and watch for a response toggle back
+
**1. driver in the linux kernel, to toggle a bit on the parallel port, and watch for a response toggle back
    2. a user program to cause the measurement to happen
+
**2. a user program to cause the measurement to happen
    3. a driver to respond to this toggling
+
**3. a driver to respond to this toggling
  * with the RT-preempt patch applied, see:
+
* with the RT-preempt patch applied, see:
    * drivers/char/lpptest.c
+
** drivers/char/lpptest.c
    * scripts/testlpp.c
+
** scripts/testlpp.c
  * For some other modifications, see http://www.ussg.iu.edu/hypermail/linux/kernel/0702.2/0342.html
+
* For some other modifications, see http://www.ussg.iu.edu/hypermail/linux/kernel/0702.2/0342.html
    * remove dependency on TSC
+
** remove dependency on TSC
  
 
This requires a separate machine to send the signal on the parallel port and receive the response.
 
This requires a separate machine to send the signal on the parallel port and receive the response.
Line 31: Line 28:
 
Are there any writeups of use of this test?
 
Are there any writeups of use of this test?
  
=== RealFeel ===
+
==== RealFeel ====
  * !RealFeel -  
+
  * RealFeel -  
 
   * code at: http://brain.mcmaster.ca/~hahn/realfeel.c
 
   * code at: http://brain.mcmaster.ca/~hahn/realfeel.c
  
Line 56: Line 53:
 
   * how is rtc timestamp used??
 
   * how is rtc timestamp used??
  
=== Cyclitest ===
+
==== Cyclictest ====
  * Cyclitest - See http://rt.wiki.kernel.org/index.php/Cyclictest
+
  * Cyclictest - See http://rt.wiki.kernel.org/index.php/Cyclictest
  
=== LRTB ===
+
==== LRTB ====
 
  * Linux Real-Time Benchmarking Framework - See http://www.opersys.com/lrtbf/
 
  * Linux Real-Time Benchmarking Framework - See http://www.opersys.com/lrtbf/
 
   * quickie overview at: http://groups.google.com/group/linux.kernel/msg/11860ef9e4263fa3?hl=en&
 
   * quickie overview at: http://groups.google.com/group/linux.kernel/msg/11860ef9e4263fa3?hl=en&
  
=== Hourglass ===
+
==== Hourglass ====
 
  * Hourglass is a synthetic real-time application that can be used to learn how CPU scheduling in a general-purpose operating system works at microsecond and millisecond granularities
 
  * Hourglass is a synthetic real-time application that can be used to learn how CPU scheduling in a general-purpose operating system works at microsecond and millisecond granularities
 
   * See: http://www.cs.utah.edu/~regehr/hourglass/
 
   * See: http://www.cs.utah.edu/~regehr/hourglass/
  
=== Woerner test ===
+
==== Woerner test ====
 
Trevor Woerner wrote an interesting test which received an interrupt on the serial port, and pushed data through several
 
Trevor Woerner wrote an interesting test which received an interrupt on the serial port, and pushed data through several
 
processes, before sending back out the serial port.  This test requires an external machine for triggering the test and measuring
 
processes, before sending back out the serial port.  This test requires an external machine for triggering the test and measuring
Line 74: Line 71:
 
See [http://geek.vtnet.ca/embedded/LatencyTests/html/index.html Trevor Woerner's latency tests]
 
See [http://geek.vtnet.ca/embedded/LatencyTests/html/index.html Trevor Woerner's latency tests]
  
=== Senoner test ===
+
==== Senoner test ====
 
Benno Senoner has a latency test that simulates and audio workload.
 
Benno Senoner has a latency test that simulates and audio workload.
 
See http://www.gardena.net/benno/linux/audio/
 
See http://www.gardena.net/benno/linux/audio/
Line 80: Line 77:
 
Used (and extended??) by Takahashi Iwai - see http://www.alsa-project.org/~iwai/latencytest-0.5.6.tar.gz
 
Used (and extended??) by Takahashi Iwai - see http://www.alsa-project.org/~iwai/latencytest-0.5.6.tar.gz
  
== Test Features Table ==
+
=== Test Features Table ===
||<:rowbgcolor='#80d0d0'>'''Feature'''||<:>'''Rf-etri'''                      ||<:>'''Williams'''              ||<:>'''LRTB'''                  ||
+
{|
 +
!- bgcolor='#80d0d0'
 +
!Feature!!Rf-etri
 +
!Williams
 +
!LRTB
 +
|-
 +
|}
 
||Is it platform specific (for target)?||yes - i386                            ||no, but requires serial port on target ||no, but requires parallel port on target||
 
||Is it platform specific (for target)?||yes - i386                            ||no, but requires serial port on target ||no, but requires parallel port on target||
 
||How is interrupt generated?          ||periodic timer programmed via /dev/rtc||data on serial port            ||data on parallel port          ||
 
||How is interrupt generated?          ||periodic timer programmed via /dev/rtc||data on serial port            ||data on parallel port          ||
 
||What does test measure?              ||interrupt and scheduling latency      ||end-to-end response latency    ||end-to-end response latency    ||
 
||What does test measure?              ||interrupt and scheduling latency      ||end-to-end response latency    ||end-to-end response latency    ||
  
== Benchmarking programs ==
+
==== Benchmarking programs ====
* see BenchmarkPrograms
+
* see BenchmarkPrograms
* some to look into:
+
* some to look into:
  * hackbench
+
** hackbench
  * lmbench
+
** lmbench
  * unixbench
+
** unixbench
  
== Stress programs ==
+
=== Stress programs ===
* Ingo Molnar has a shell script which he calls [http://groups.google.com/group/linux.kernel/msg/0c88c397347cbd2a?hl=en& dohell]
+
* Ingo Molnar has a shell script which he calls [http://groups.google.com/group/linux.kernel/msg/0c88c397347cbd2a?hl=en& dohell]
  * good candidates seem to be:
+
** good candidates seem to be:
    * find
+
*** find
    * du
+
*** du
    * ping
+
*** ping
* [http://monetdb.cwi.nl/Calibrator/ Cache Calibrator] - see [http://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO RT-Preempt howto]
+
* [http://monetdb.cwi.nl/Calibrator/ Cache Calibrator] - see [http://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO RT-Preempt howto]
  
=== Stress actions ===
+
==== Stress actions ====
 
Here are some things that will kill your RT performance:
 
Here are some things that will kill your RT performance:
* write the time of day to the CMOS of your RTC (see drivers/char/rtc.c - only by code inspection, no test yet)
+
* write the time of day to the CMOS of your RTC (see drivers/char/rtc.c - only by code inspection, no test yet)
* have a bus-master device do a long DMA on the bus
+
* have a bus-master device do a long DMA on the bus
* get a page fault on your RT process (can be prevented with mlockall)
+
* get a page fault on your RT process (can be prevented with mlockall)
* get multiple TLB flushes on your RT code path (how to cause this??)
+
* get multiple TLB flushes on your RT code path (how to cause this??)
* get lots of instruction and data cache misses on your RT code path
+
* get lots of instruction and data cache misses on your RT code path
  * how to cause this?
+
** how to cause this?
    * go down error paths in the RT case?
+
*** go down error paths in the RT case?
    * be ON a big error case when the RT event happens?
+
*** be ON a big error case when the RT event happens?
    * push your main RT code path and data sets out of cache with other work (in your RT process), prior to the next RT event?
+
*** push your main RT code path and data sets out of cache with other work (in your RT process), prior to the next RT event?
    * access data in a very non-localized way on your RT code path
+
*** access data in a very non-localized way on your RT code path
  
= Test Hardware =
+
== Test Hardware ==
* LRTB uses a 3-machine system:
+
* LRTB uses a 3-machine system:
  * target, host, and logger
+
** target, host, and logger
  * target is the system under test
+
** target is the system under test
  * host is a control system, and it also collects the data
+
** host is a control system, and it also collects the data
  * logger is a special machine used to cause interrupts on the target, and record the time it takes for the target to respond
+
** logger is a special machine used to cause interrupts on the target, and record the time it takes for the target to respond
  * Paulo Marqes [http://marc.info/?l=linux-kernel&m=111953832212835&w=2 offered] to create custom hardware for the logger
+
** Paulo Marqes [http://marc.info/?l=linux-kernel&m=111953832212835&w=2 offered] to create custom hardware for the logger
  
= Issues and Techniques =
+
== Issues and Techniques ==
 
This is a list of issues and techniques for dealing with them, having to do with
 
This is a list of issues and techniques for dealing with them, having to do with
 
testing realtime performance in Linux.
 
testing realtime performance in Linux.
  
== ping flood isn't good as stress test ==
+
=== ping flood isn't good as stress test ===
 
 
 
At one of the sessions at ELC 2007, Nicholas McGuire stated that a pingflood test
 
At one of the sessions at ELC 2007, Nicholas McGuire stated that a pingflood test
 
is actually a poor test of RT performance, since it causes locality in the networking
 
is actually a poor test of RT performance, since it causes locality in the networking
Line 133: Line 135:
  
 
Here is a list of issues that have to be dealt with:
 
Here is a list of issues that have to be dealt with:
* what tests are available on all platforms?
+
* what tests are available on all platforms?
  * is special clock hardware or registers required for a test (e.g. realfeel, which only supports i386?)
+
** is special clock hardware or registers required for a test (e.g. realfeel, which only supports i386?)
  * does the program cross-compile?
+
** does the program cross-compile?
* Does generation of the test conditions perturb the test results?
+
** Does generation of the test conditions perturb the test results?
* Is special external hardware required?
+
** Is special external hardware required?
* How is the system stressed?
+
** How is the system stressed?
  * How to stress memory (cause cache-flushes and swapping)
+
*** How to stress memory (cause cache-flushes and swapping)
  * How to stress bad code paths (long error paths, fault injection?)
+
*** How to stress bad code paths (long error paths, fault injection?)
* How is performance measured?
+
* How is performance measured?
  
 
== Using the LATENCY_TRACE option ==
 
== Using the LATENCY_TRACE option ==
 
Quote about latency-test from Ingo:
 
Quote about latency-test from Ingo:
{{{i'm seeing roughly half of that worst-case IRQ latency on similar
+
I'm seeing roughly half of that worst-case IRQ latency on similar  
hardware (2GHz Athlon64), so i believe your system has some hardware
+
hardware (2GHz Athlon64), so i believe your system has some hardware
latency that masks the capabilities of the underlying RTOS. It would be
+
latency that masks the capabilities of the underlying RTOS. It would be
interesting to see IRQSOFF_TIMING + LATENCY_TRACE critical path
+
interesting to see IRQSOFF_TIMING + LATENCY_TRACE critical path
information from the -RT tree. Just enable those two options in the
+
information from the -RT tree. Just enable those two options in the
.config (on the host side), and do:
+
.config (on the host side), and do:
  
 
         echo 0 > /proc/sys/kernel/preempt_max_latency
 
         echo 0 > /proc/sys/kernel/preempt_max_latency
  
and the kernel will begin measuring and tracing worst-case latency
+
and the kernel will begin measuring and tracing worst-case latency
paths. Then put some load on the host when you see a 50+ usec latency
+
paths. Then put some load on the host when you see a 50+ usec latency
reported to the syslog, send me the /proc/latency_trace. It should be a
+
reported to the syslog, send me the /proc/latency_trace. It should be a
matter of a few minutes to capture this information.
+
matter of a few minutes to capture this information.
}}}
 
  
 
== Number of samples recommended ==
 
== Number of samples recommended ==
 
Ingo wrote:
 
Ingo wrote:
{{{
+
 
also, i'm wondering why you tested with only 1,000,000 samples. I
+
also, i'm wondering why you tested with only 1,000,000 samples. I
routinely do 100,000,000 sample tests, and i did one overnight test with
+
routinely do 100,000,000 sample tests, and i did one overnight test with
more than 1 billion samples, and the latency difference is quite
+
more than 1 billion samples, and the latency difference is quite
significant between say 1,000,000 samples and 100,000,000 samples. All
+
significant between say 1,000,000 samples and 100,000,000 samples. All
you need to do is to increase the rate of interrupts generated by the
+
you need to do is to increase the rate of interrupts generated by the
logger - e.g. my testbox can handle 80,000 irqs/sec with only 15% CPU
+
logger - e.g. my testbox can handle 80,000 irqs/sec with only 15% CPU
overhead.
+
overhead.
}}}
 
  
 
== Things to watch for in testing ==
 
== Things to watch for in testing ==
 
Another note from Ingo - see [http://groups.google.com/group/linux.kernel/msg/8c7e61d0926dba80?hl=en& here]
 
Another note from Ingo - see [http://groups.google.com/group/linux.kernel/msg/8c7e61d0926dba80?hl=en& here]
* Note the bit about IRQ 7 - what's up with that?
+
* Note the bit about IRQ 7 - what's up with that?
{{{
+
 
> First things first, we want to report back that our setup is validated
+
> First things first, we want to report back that our setup is validated
> before we go onto this one. So we've modified LRTBF to do the
+
> before we go onto this one. So we've modified LRTBF to do the
> busy-wait thing.
+
> busy-wait thing.
  
here's another bug in the way you are testing PREEMPT_RT irq latencies.   
+
here's another bug in the way you are testing PREEMPT_RT irq latencies.   
Right now you are doing this in lrtbf-0.1a/drivers/par-test.c:
+
Right now you are doing this in lrtbf-0.1a/drivers/par-test.c:
  
 
     if (request_irq ( PAR_TEST_IRQ,
 
     if (request_irq ( PAR_TEST_IRQ,
Line 191: Line 191:
 
  #endif //PREEMPT_RT
 
  #endif //PREEMPT_RT
  
you should set the SA_INTERRUPT flag in the PREEMPT_RT case too! I.e.
+
you should set the SA_INTERRUPT flag in the PREEMPT_RT case too! I.e.
the relevant line above should be:
+
the relevant line above should be:
  
 
                                           SA_NODELAY | SA_INTERRUPT,
 
                                           SA_NODELAY | SA_INTERRUPT,
  
otherwise par_test_irq_handler will run with interrupts enabled, opening
+
otherwise par_test_irq_handler will run with interrupts enabled, opening
the window for other interrupts to be injected and increasing the
+
the window for other interrupts to be injected and increasing the
worst-case latency! Take a look at drivers/char/lpptest.c how to do this
+
worst-case latency! Take a look at drivers/char/lpptest.c how to do this
properly. Also, double-check that there is no IRQ 7 thread running on
+
properly. Also, double-check that there is no IRQ 7 thread running on
the PREEMPT_RT kernel, to make sure you are measuring irq latencies.  
+
the PREEMPT_RT kernel, to make sure you are measuring irq latencies.  
}}}
 
  
 
= Tests results taxonomy =
 
= Tests results taxonomy =
Line 212: Line 211:
 
||Tsutomu Owa      ||Toshiba||Cell (ppc64)            ||2.6.12          ||??            ||?? ||??||
 
||Tsutomu Owa      ||Toshiba||Cell (ppc64)            ||2.6.12          ||??            ||?? ||??||
  
= Test presentations and documents =
+
== Test presentations and documents ==
== Presentations ==
+
=== Presentations ===
 
[Add links here, most recent at top]
 
[Add links here, most recent at top]
* [http://tree.celinuxforum.org/CelfPubWiki/ELC2007Presentations?action=AttachFile&do=get&target=CELF_ELC_Interrupt_Latency_2.4_vs_2.6.pdf  Analysis of Interrupt Entry Latency in Linux 2.4 vs 2.6] by !SangBae Lee of Samsung for ELC 2007
+
 
 +
* [http://tree.celinuxforum.org/CelfPubWiki/ELC2007Presentations?action=AttachFile&do=get&target=CELF_ELC_Interrupt_Latency_2.4_vs_2.6.pdf  Analysis of Interrupt Entry Latency in Linux 2.4 vs 2.6] by !SangBae Lee of Samsung for ELC 2007
 
   * Analyzed MV 3.1 (2.4.20) and MV 4.0 (2.6.10), using LTT, on OSK board (OMAP 5920 ARM 192 MHZ)
 
   * Analyzed MV 3.1 (2.4.20) and MV 4.0 (2.6.10), using LTT, on OSK board (OMAP 5920 ARM 192 MHZ)
 
   * Initial results were that linux.2.4.20 was 3X fast for best-case interrupt latency
 
   * Initial results were that linux.2.4.20 was 3X fast for best-case interrupt latency
Line 229: Line 229:
 
       * 2.4.20 - min = ??, max = ??
 
       * 2.4.20 - min = ??, max = ??
 
   * Basic result = Don't use LTT for measuring RT performance
 
   * Basic result = Don't use LTT for measuring RT performance
* [http://tree.celinuxforum.org/CelfPubWiki/ELC2007Presentations?action=AttachFile&do=get&target=preempt070418celfelc.pdf  Porting and Evaluating the Linux Realtime Preemption on Embedded Platform] by Katsuya Matsubara of Igel at ELC 2007
+
* [http://tree.celinuxforum.org/CelfPubWiki/ELC2007Presentations?action=AttachFile&do=get&target=preempt070418celfelc.pdf  Porting and Evaluating the Linux Realtime Preemption on Embedded Platform] by Katsuya Matsubara of Igel at ELC 2007
* [http://tree.celinuxforum.org/CelfPubWiki/ELC2007Presentations?action=AttachFile&do=get&target=RT-BoF-2007-04-17.pdf Realtime Preempt Patch Adaptation Experience (and Real Time BOF notes)] - !YungJoon Jung of ETRI at ELC 2007
+
* [http://tree.celinuxforum.org/CelfPubWiki/ELC2007Presentations?action=AttachFile&do=get&target=RT-BoF-2007-04-17.pdf Realtime Preempt Patch Adaptation Experience (and Real Time BOF notes)] - !YungJoon Jung of ETRI at ELC 2007
  * This is the presentation of Realtime BoF in ELC 2007. It includes realtime preempt patch adaptation kernel's test
+
** This is the presentation of Realtime BoF in ELC 2007. It includes realtime preempt patch adaptation kernel's test
  * Test on VIA Nehemiah board, 1GHZ, 256M memory
+
** Test on VIA Nehemiah board, 1GHZ, 256M memory
  * See http://tree.celinuxforum.org/CelfPriWiki/RealTime_20Performance_20Test (need to make this public)
+
** See http://tree.celinuxforum.org/CelfPriWiki/RealTime_20Performance_20Test (need to make this public)
  * has good charts comparing vanilla, voluntary preempt, preemptible kernel and RT-preempt
+
** has good charts comparing vanilla, voluntary preempt, preemptible kernel and RT-preempt
  * min = 5.6 us, max = 41.1 us
+
** min = 5.6 us, max = 41.1 us
  * showed RT-preempt has throughput problems (reported by hackbench)
+
** showed RT-preempt has throughput problems (reported by hackbench)
* [http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree13?action=AttachFile&do=get&target=CELF_PPC64_RT_20070222.pdf Performance Measurement of PPC64 RT patch (update)] ([http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree13?action=AttachFile&do=get&target=CELF_PPC64_RT_20070222-en.txt english text]) - by Tsutomu Owa of Toshiba at CELF Jamboree 13
+
* [http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree13?action=AttachFile&do=get&target=CELF_PPC64_RT_20070222.pdf Performance Measurement of PPC64 RT patch (update)] ([http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree13?action=AttachFile&do=get&target=CELF_PPC64_RT_20070222-en.txt english text]) - by Tsutomu Owa of Toshiba at CELF Jamboree 13
* [http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree13?action=AttachFile&do=get&target=preempt070222celfjambo13.pdf Porting pre-empt RT patch on SuperH] ([http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree13?action=AttachFile&do=get&target=preempt070222celfjambo13-en.txt english text]) - by Katsuya Matsubara (IGEL)  at CELF Jamboree 13
+
* [http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree13?action=AttachFile&do=get&target=preempt070222celfjambo13.pdf Porting pre-empt RT patch on SuperH] ([http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree13?action=AttachFile&do=get&target=preempt070222celfjambo13-en.txt english text]) - by Katsuya Matsubara (IGEL)  at CELF Jamboree 13
* [http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree12?action=AttachFile&do=get&target=CELF_PPC64_RT_20061208.pdf Performance Measurement of PPC64 RT Patch] ([http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree12?action=AttachFile&do=get&target=MS_CELF-TJ12-07-en.txt english text]) - by Tsutomu Owa of Toshiba at CELF Jamboree 12
+
* [http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree12?action=AttachFile&do=get&target=CELF_PPC64_RT_20061208.pdf Performance Measurement of PPC64 RT Patch] ([http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree12?action=AttachFile&do=get&target=MS_CELF-TJ12-07-en.txt english text]) - by Tsutomu Owa of Toshiba at CELF Jamboree 12
* [http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree11?action=AttachFile&do=get&target=preempt061027celfjambo11-en.odp  Linux Realtime Preemption and Its Impact on ULDD] by Katsuya Matsubara & Hitomi Takahashi of IGEL, for CELF Jamboree 11  
+
* [http://tree.celinuxforum.org/CelfPubWiki/JapanTechnicalJamboree11?action=AttachFile&do=get&target=preempt061027celfjambo11-en.odp  Linux Realtime Preemption and Its Impact on ULDD] by Katsuya Matsubara & Hitomi Takahashi of IGEL, for CELF Jamboree 11  
  * very good summary of RT-preempt patch.  Also good description of work done on SH4 and work on User Level device drivers.
+
** very good summary of RT-preempt patch.  Also good description of work done on SH4 and work on User Level device drivers.
  * Describes basic steps to do a new port of RT-preempt
+
** Describes basic steps to do a new port of RT-preempt
* [http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=ExperienceWithRealtimePerformance.pdf  Experience with Realtime Performance] - by Shinichi Ochiai of Mitsubishi Electric Corporation at CELF ELC 2006  
+
* [http://tree.celinuxforum.org/CelfPubWiki/ELC2006Presentations?action=AttachFile&do=get&target=ExperienceWithRealtimePerformance.pdf  Experience with Realtime Performance] - by Shinichi Ochiai of Mitsubishi Electric Corporation at CELF ELC 2006  
  * This describes RT features and how they evolved from 2.4.20 to 2.6.16.  Test results are shown for preemptible kernel (2.4.20),
+
** This describes RT features and how they evolved from 2.4.20 to 2.6.16.  Test results are shown for preemptible kernel (2.4.20), voluntary preemption, RT-preempt, and hybrid kernel approach (RTAI).  The platforms tested were an SH4 board and an EDEN board, with a VIA processor (i386 clone).  RT-preempt is shown to have good RT characteristics, for later kernel versions.
  voluntary preemption, RT-preempt, and hybrid kernel approach (RTAI).  The platforms tested were an SH4 board and an EDEN board, with a
+
* [http://groups.google.com/group/linux.kernel/msg/d420ad15b1215e54 PREEMPT-RT vs I-PIPE: the numbers, take 3] - by Kristian Benoit, LKML message, 2005
  VIA processor (i386 clone).  RT-preempt is shown to have good RT characteristics, for later kernel versions.
+
** about extensive testing by Kristian Benoit and Karim Yaghmour  
* [http://groups.google.com/group/linux.kernel/msg/d420ad15b1215e54 PREEMPT-RT vs I-PIPE: the numbers, take 3] - by Kristian Benoit, LKML message, 2005
+
** See also [http://groups.google.com/group/linux.kernel/msg/b59137a4da507a55?hl=en& PREEMPT RT vs ADEOS: the numbers, part 1]
  * about extensive testing by Kristian Benoit and Karim Yaghmour  
+
** and [http://groups.google.com/group/linux.kernel/msg/43360d2d2ba5121a?hl=en& PREEMPT_RT vs I-PIPE: the numbers, take 2]
  * See also [http://groups.google.com/group/linux.kernel/msg/b59137a4da507a55?hl=en& PREEMPT RT vs ADEOS: the numbers, part 1]
+
* [http://geek.vtnet.ca/embedded/LatencyTests/html/index.html Trevor Woerner's latency tests]
  * and [http://groups.google.com/group/linux.kernel/msg/43360d2d2ba5121a?hl=en& PREEMPT_RT vs I-PIPE: the numbers, take 2]
+
** Interesting host/target test of latency via transmission and reception of strings over serial port
* [http://geek.vtnet.ca/embedded/LatencyTests/html/index.html Trevor Woerner's latency tests]
+
* [http://tree.celinuxforum.org/CelfPubWiki/TechConference2005Docs?action=AttachFile&do=get&target=Real-Time-Preemption-Patchset.pdf Real-Time Preemption Patchset] - by Manas Saksena, CELF tech conference 2005
  * Interesting host/target test of latency via transmission and reception of strings over serial port
+
** Good paper with overview of RT-preempt patch features
* [http://tree.celinuxforum.org/CelfPubWiki/TechConference2005Docs?action=AttachFile&do=get&target=Real-Time-Preemption-Patchset.pdf Real-Time Preemption Patchset] - by Manas Saksena, CELF tech conference 2005
+
* [http://www.alsa-project.org/~iwai/suselabs2003-audio-latency.pdf Audio Latency on Linux Kernels] - Takahashi Awai, SUSE, 2003
  * Good paper with overview of RT-preempt patch features
+
* [http://www.linuxdevices.com/articles/AT8906594941.html Linux Scheduler Latency] - by Clark Williams, Red Hat, March 2002
* [http://www.alsa-project.org/~iwai/suselabs2003-audio-latency.pdf Audio Latency on Linux Kernels] - Takahashi Awai, SUSE, 2003
+
* [http://www.linuxjournal.com/article/6405 Realfeel Test of the Preemptible Kernel Patch] - article in Linux Journal, 2002 by Andrew Webber
* [http://www.linuxdevices.com/articles/AT8906594941.html Linux Scheduler Latency] - by Clark Williams, Red Hat, March 2002
+
** This is a test of the preemptible kernel feature in 2.4.19, on i386 hardware.
* [http://www.linuxjournal.com/article/6405 Realfeel Test of the Preemptible Kernel Patch] - article in Linux Journal, 2002 by Andrew Webber
+
* [http://www.linuxdevices.com/articles/AT6320079446.html Real Time and Linux, Part 3: Sub-Kernels and Benchmarks] - article in Embedded Linux Journal, online, 2002 by Kevin Dankwardt
  * This is a test of the preemptible kernel feature in 2.4.19, on i386 hardware.
 
* [http://www.linuxdevices.com/articles/AT6320079446.html Real Time and Linux, Part 3: Sub-Kernels and Benchmarks] - article in Embedded Linux Journal, online, 2002 by Kevin Dankwardt
 
  
* [attachment:p-a03_wilshire.pdf Real Time Linux: Testing and Evaluation] - By Phil Wilshire of Lineo at the Second Real Time Linux Workshop, 2000
+
* [attachment:p-a03_wilshire.pdf Real Time Linux: Testing and Evaluation] - By Phil Wilshire of Lineo at the Second Real Time Linux Workshop, 2000
  * This paper discusses the different benchmarking tools used to evaluate the performance of Linux and
+
** This paper discusses the different benchmarking tools used to evaluate the performance of Linux and
 
   their suitability for evaluating Real Time system Performance.  It is focused on RTAI.
 
   their suitability for evaluating Real Time system Performance.  It is focused on RTAI.
== OLS papers ==
+
 
 +
=== OLS papers ===
 
[FIXTHIS - need to scan for past papers]
 
[FIXTHIS - need to scan for past papers]
* OLS 2006 BOF - Steven Rostedt, RedHat and Klaas Van Gend, MontaVista - See [http://www.opentux.nl/artikelen/OLS2006_state_of_RT_and_common_mistakes.odp The State of RT and Common Mistakes (OLS 2006 BOF)]
+
* OLS 2006 BOF - Steven Rostedt, RedHat and Klaas Van Gend, MontaVista - See [http://www.opentux.nl/artikelen/OLS2006_state_of_RT_and_common_mistakes.odp The State of RT and Common Mistakes (OLS 2006 BOF)]
* OLS 2007 - Paper by Steven Rostedt - see http://www.linuxsymposium.org/2007/view_abstract.php?content_key=75
+
* OLS 2007 - Paper by Steven Rostedt - see http://www.linuxsymposium.org/2007/view_abstract.php?content_key=75
  
 
Darren Hart wrote:
 
Darren Hart wrote:
{{{
+
I have contributed some testing results to Steven Rostedt's OLS RT Internals  
I have contributed some testing results to Steven Rostedt's OLS RT Internals  
+
paper.  That will be available to link to after the conference sometime.
paper.  That will be available to link to after the conference sometime.}}}
 
  
== Real Time Linux Foundation RTL Workshops ==
+
=== Real Time Linux Foundation RTL Workshops ===
 
Nicholas said:
 
Nicholas said:
  
Line 282: Line 280:
  
 
Here is a link to the RTLF events page:
 
Here is a link to the RTLF events page:
* http://www.realtimelinuxfoundation.org/events/events.html
+
* http://www.realtimelinuxfoundation.org/events/events.html
  
 
So far, I've scanned 1999-2000 for interesting links.
 
So far, I've scanned 1999-2000 for interesting links.
  
 +
== Uncategorized stuff ==
 +
This section has random stuff I haven't organized yet:
 +
* http://eaglet.rain.com/rick/linux/schedstat/ - scheduler statistics
 +
** maybe this can be used to analyze process wakeup latency??  Need to see what stats are kept.
  
 +
* Low-latency HowTo (for audio) - http://lowlatency.linuxaudio.org/
  
 +
== Notes on ineffective tests ==
 +
Nicholas McGuire wrote:
  
 +
The tests noted in the LKML post on this page are very problematic,
 +
ping - -f is not testing RT at all, it keeps the kernel in a very small active
 +
page set thus reducing page related penalties, the while loop using dd
 +
is also not too helpfull as it will de-facto run only in memory and cause
 +
absolutely no disk/mass-storage related interaction (try the same with
 +
mount -o remount,sync /  first and it will be devastating ! (limited to ext2/ext3/ufs))
  
= Uncategorized stuff =
 
This section has random stuff I haven't organized yet:
 
* http://eaglet.rain.com/rick/linux/schedstat/ - scheduler statistics
 
  * maybe this can be used to analyze process wakeup latency??  Need to see what stats are kept.
 
 
* Low-latency HowTo (for audio) - http://lowlatency.linuxaudio.org/
 
 
== Notes on ineffective tests ==
 
Nicholas !McGuire wrote:
 
{{{
 
The tests noted in the LKML post on this page are very problematic,
 
ping - -f is not testing RT at all, it keeps the kernel in a very small active
 
page set thus reducing page related penalties, the while loop using dd
 
is also not too helpfull as it will de-facto run only in memory and cause
 
absolutely no disk/mass-storage related interaction (try the same with
 
mount -o remount,sync /  first and it will be devastating ! (limited to ext2/ext3/ufs))
 
}}}
 
  
== Notes on test requirements - need to test kernel error paths ==
+
=== Notes on test requirements - need to test kernel error paths ===
Nichoal !McGuire wrote:
+
Nicholas McGuire wrote:
  
The big problem with RT tests published is that they are all looking at the good case,
+
The big problem with RT tests published is that they are all looking at the good case,
they are loading the system but assuming successfull operations. The worst cases pop
+
they are loading the system but assuming successfull operations. The worst cases pop
up when you run in the error paths of the kernel - then a trivial application can
+
up when you run in the error paths of the kernel - then a trivial application can
induce very large jitter in the system (run crashme in the background and rerun
+
induce very large jitter in the system (run crashme in the background and rerun
the tests...)
+
the tests...)
  
== Notes on test requirements - need for usage profile ==
+
=== Notes on test requirements - need for usage profile ===
 
Also lmbench can give a statistic view of things (and not even that very precisely
 
Also lmbench can give a statistic view of things (and not even that very precisely
 
in some case i.e. context switch measurements are flawed) so this is not of much
 
in some case i.e. context switch measurements are flawed) so this is not of much

Revision as of 16:06, 18 October 2007

Introduction

This page is intended to serve as a collecting point for presentations, documents, results, links and descriptions about testing Realtime performance of Linux systems. In the first section, please upload or place links to presentations or documentsion on the subject of RT testing for linux.

Terminology

This document uses the definitions for real time terminology found in: Real Time Terms

Test programs

RT Measurement programs

Here is a list of programs that have been used for realtime testing:

lpptest

  • lpptest - included in the RT-preempt patch
    • It consists of a
    • 1. driver in the linux kernel, to toggle a bit on the parallel port, and watch for a response toggle back
    • 2. a user program to cause the measurement to happen
    • 3. a driver to respond to this toggling
  • with the RT-preempt patch applied, see:
    • drivers/char/lpptest.c
    • scripts/testlpp.c
  • For some other modifications, see http://www.ussg.iu.edu/hypermail/linux/kernel/0702.2/0342.html
    • remove dependency on TSC

This requires a separate machine to send the signal on the parallel port and receive the response. (Can this be run with a loopback cable? It seems like this would disturb the findings).

Are there any writeups of use of this test?

RealFeel

* RealFeel - 
  * code at: http://brain.mcmaster.ca/~hahn/realfeel.c

This program is a very simple test of how well a periodic interrupt is processed. The program programs a periodic interrupt using /dev/rtc to fire at a fixed interval. The program measures the time duration from interrupt to interrupt, and compares this to the expected value for the duration. This simple program just prints a list of variances from the expected value, forever.

This program uses the TSC in user space for timestamps.

RealFeel (ETRI version rf-etri)

This program (latency.c) extends realfeel in several ways:

* it adds command line arguments to allow runtime control of most parameters
* it adds a histogram feature to dump the results to a histogram
  * it can do both linear and logarithmic histograms
* it locks the process pages in memory (very important)
* it changes the scheduling priority to SCHED_FIFO, at highest priority (very important)
* it adds conditional code to trigger output to a parallel port pin (for capture to an external probe or logic analyzer)
* it abstracts the routine to get the timestamp, with the function: getticks()
* it handles the interrupt signal and does a clean exit of the main loop (on user break?)
* it tracks min, max and average latency for whole run, and for every 1000 cycles of the loop
* it adds a timestamp to the /dev/rtc driver, and reads this as part of the rtc data
  * how is rtc timestamp used??

Cyclictest

* Cyclictest - See http://rt.wiki.kernel.org/index.php/Cyclictest

LRTB

* Linux Real-Time Benchmarking Framework - See http://www.opersys.com/lrtbf/
  * quickie overview at: http://groups.google.com/group/linux.kernel/msg/11860ef9e4263fa3?hl=en&

Hourglass

* Hourglass is a synthetic real-time application that can be used to learn how CPU scheduling in a general-purpose operating system works at microsecond and millisecond granularities
  * See: http://www.cs.utah.edu/~regehr/hourglass/

Woerner test

Trevor Woerner wrote an interesting test which received an interrupt on the serial port, and pushed data through several processes, before sending back out the serial port. This test requires an external machine for triggering the test and measuring the results.

See Trevor Woerner's latency tests

Senoner test

Benno Senoner has a latency test that simulates and audio workload. See http://www.gardena.net/benno/linux/audio/

Used (and extended??) by Takahashi Iwai - see http://www.alsa-project.org/~iwai/latencytest-0.5.6.tar.gz

Test Features Table

- bgcolor='#80d0d0' Feature Rf-etri Williams LRTB

||Is it platform specific (for target)?||yes - i386 ||no, but requires serial port on target ||no, but requires parallel port on target|| ||How is interrupt generated? ||periodic timer programmed via /dev/rtc||data on serial port ||data on parallel port || ||What does test measure? ||interrupt and scheduling latency ||end-to-end response latency ||end-to-end response latency ||

Benchmarking programs

  • see BenchmarkPrograms
  • some to look into:
    • hackbench
    • lmbench
    • unixbench

Stress programs

Stress actions

Here are some things that will kill your RT performance:

  • write the time of day to the CMOS of your RTC (see drivers/char/rtc.c - only by code inspection, no test yet)
  • have a bus-master device do a long DMA on the bus
  • get a page fault on your RT process (can be prevented with mlockall)
  • get multiple TLB flushes on your RT code path (how to cause this??)
  • get lots of instruction and data cache misses on your RT code path
    • how to cause this?
      • go down error paths in the RT case?
      • be ON a big error case when the RT event happens?
      • push your main RT code path and data sets out of cache with other work (in your RT process), prior to the next RT event?
      • access data in a very non-localized way on your RT code path

Test Hardware

  • LRTB uses a 3-machine system:
    • target, host, and logger
    • target is the system under test
    • host is a control system, and it also collects the data
    • logger is a special machine used to cause interrupts on the target, and record the time it takes for the target to respond
    • Paulo Marqes offered to create custom hardware for the logger

Issues and Techniques

This is a list of issues and techniques for dealing with them, having to do with testing realtime performance in Linux.

ping flood isn't good as stress test

At one of the sessions at ELC 2007, Nicholas McGuire stated that a pingflood test is actually a poor test of RT performance, since it causes locality in the networking code rather than stressing the system.

Here is a list of issues that have to be dealt with:

  • what tests are available on all platforms?
    • is special clock hardware or registers required for a test (e.g. realfeel, which only supports i386?)
    • does the program cross-compile?
    • Does generation of the test conditions perturb the test results?
    • Is special external hardware required?
    • How is the system stressed?
      • How to stress memory (cause cache-flushes and swapping)
      • How to stress bad code paths (long error paths, fault injection?)
  • How is performance measured?

Using the LATENCY_TRACE option

Quote about latency-test from Ingo:

I'm seeing roughly half of that worst-case IRQ latency on similar 
hardware (2GHz Athlon64), so i believe your system has some hardware
latency that masks the capabilities of the underlying RTOS. It would be
interesting to see IRQSOFF_TIMING + LATENCY_TRACE critical path
information from the -RT tree. Just enable those two options in the
.config (on the host side), and do:
       echo 0 > /proc/sys/kernel/preempt_max_latency
and the kernel will begin measuring and tracing worst-case latency
paths. Then put some load on the host when you see a 50+ usec latency
reported to the syslog, send me the /proc/latency_trace. It should be a
matter of a few minutes to capture this information.

Number of samples recommended

Ingo wrote:

also, i'm wondering why you tested with only 1,000,000 samples. I
routinely do 100,000,000 sample tests, and i did one overnight test with
more than 1 billion samples, and the latency difference is quite
significant between say 1,000,000 samples and 100,000,000 samples. All
you need to do is to increase the rate of interrupts generated by the
logger - e.g. my testbox can handle 80,000 irqs/sec with only 15% CPU
overhead.

Things to watch for in testing

Another note from Ingo - see here

  • Note the bit about IRQ 7 - what's up with that?
> First things first, we want to report back that our setup is validated
> before we go onto this one. So we've modified LRTBF to do the
> busy-wait thing.
here's another bug in the way you are testing PREEMPT_RT irq latencies.  
Right now you are doing this in lrtbf-0.1a/drivers/par-test.c:
   if (request_irq ( PAR_TEST_IRQ,
                                         &par_test_irq_handler,
#if CONFIG_PREEMPT_RT
                                          SA_NODELAY,
#else //!CONFIG_PREEMPT_RT
                                          SA_INTERRUPT,
#endif //PREEMPT_RT
you should set the SA_INTERRUPT flag in the PREEMPT_RT case too! I.e.
the relevant line above should be:
                                          SA_NODELAY | SA_INTERRUPT,
otherwise par_test_irq_handler will run with interrupts enabled, opening
the window for other interrupts to be injected and increasing the
worst-case latency! Take a look at drivers/char/lpptest.c how to do this
properly. Also, double-check that there is no IRQ 7 thread running on
the PREEMPT_RT kernel, to make sure you are measuring irq latencies. 

Tests results taxonomy

Test Table

||<rowbgcolor='#80dodo'>Person||Company||Hardware ||Kernel ||test method ||Measurement method||Results|| ||Sangbae Lee ||Samsung||OSK - OMAP (ARM) 192 MHZ)||2.4.20 and 2.6.10||?? ||?? ||??|| ||Sangbae Lee ||Samsung||MIPS 264 MHZ ||2.6.10 ?? ||?? ||?? ||??|| ||Katsuya Matsubara||IGEL ||SH4 ||2.6.?? ||?? ||?? ||??|| ||!YungJoon Jung ||ETRI ||Via Nehemiah (i386) ||2.6.12 ||periodic interrupt||rf-etri - measure scheduling latency minus interrupt latency ||30 us max scheduling latency with RT-preempt|| ||Tsutomu Owa ||Toshiba||Cell (ppc64) ||2.6.12 ||?? ||?? ||??||

Test presentations and documents

Presentations

[Add links here, most recent at top]

  * Analyzed MV 3.1 (2.4.20) and MV 4.0 (2.6.10), using LTT, on OSK board (OMAP 5920 ARM 192 MHZ)
  * Initial results were that linux.2.4.20 was 3X fast for best-case interrupt latency
  * After reviewing code and finding that the interrupt code path was almost identical, a different, more lightweight
  tracer was used (Zoom-in tracer) showing latencies were almost the same between 2.4 kernel and 2.6 kernel
  * Also measured on MIPS 264 MHZ (for real TV system)
  * Interrupt response time measured:
    * with LTT instrumentation:
      * 2.6.10 - min = 30 us, max = 400 us
      * 2.4.20 - min = 10 us, max = 30 us
    * with ZI instrumentation:
      * 2.6.10 - min = 3 us, max = ??
      * 2.4.20 - min = ??, max = ??
  * Basic result = Don't use LTT for measuring RT performance
  • [attachment:p-a03_wilshire.pdf Real Time Linux: Testing and Evaluation] - By Phil Wilshire of Lineo at the Second Real Time Linux Workshop, 2000
    • This paper discusses the different benchmarking tools used to evaluate the performance of Linux and
  their suitability for evaluating Real Time system Performance.  It is focused on RTAI.

OLS papers

[FIXTHIS - need to scan for past papers]

Darren Hart wrote:

I have contributed some testing results to Steven Rostedt's OLS RT Internals 
paper.  That will be available to link to after the conference sometime.

Real Time Linux Foundation RTL Workshops

Nicholas said:

There are a number of publications related to both benchmarking and analysis of hardware related artifacts (cache,BTB,TLB,etc.) which were published at the real-time Linux Workshops.

Here is a link to the RTLF events page:

So far, I've scanned 1999-2000 for interesting links.

Uncategorized stuff

This section has random stuff I haven't organized yet:

Notes on ineffective tests

Nicholas McGuire wrote:

The tests noted in the LKML post on this page are very problematic,
ping - -f is not testing RT at all, it keeps the kernel in a very small active
page set thus reducing page related penalties, the while loop using dd
is also not too helpfull as it will de-facto run only in memory and cause
absolutely no disk/mass-storage related interaction (try the same with
mount -o remount,sync /  first and it will be devastating ! (limited to ext2/ext3/ufs))


Notes on test requirements - need to test kernel error paths

Nicholas McGuire wrote:

The big problem with RT tests published is that they are all looking at the good case,
they are loading the system but assuming successfull operations. The worst cases pop
up when you run in the error paths of the kernel - then a trivial application can
induce very large jitter in the system (run crashme in the background and rerun
the tests...)

Notes on test requirements - need for usage profile

Also lmbench can give a statistic view of things (and not even that very precisely in some case i.e. context switch measurements are flawed) so this is not of much help for descision makers which variant to use - it does not help if the average performance is good but the mobile phone or mp3 klicks at 1s intervals "deterministically" - so I guess RT benchmarks need a notion of usage-profile to be of value.