Linux real-time – Is the kernel waking up my program too late?

Tips and tricks for setting up and running the RT kernel

Author: Dr. Carsten Emde, Open Source Automation Development Lab (OSADL) eG

Contribution – Embedded Software Engineering Congress 2017

Giving the Linux kernel possible real-time (RT) properties is not difficult:

• Download kernel,
• Download the RT patch,
• Install RT patch,
• Translate kernel,
• Restart your computer and
• Select RT kernel.

Checking to what extent the response behavior of the newly created kernel has actually improved is also quite simple: The program cyclictest start and wait a few hours & Assess the result.

What can be done if latencies are detected, i.e., if the kernel occasionally wakes up the userspace program too late? Various measurement methods exist for this; however, it's no longer as simple as building the RT kernel. Therefore, the individual measurement methods will be discussed.

• Breaktrace with subsequent trace analysis
• Continuous latency recording with peak detection

It is explained in detail and demonstrated with examples. Furthermore, common sources of latency are discussed, such as...

• Frequency modulation and
• Sleep stages,

which must be excluded.

Recap: How do I create a real-time capable Linux kernel – e.g., on an Intel PC with a standard distribution?

The first step is to select the patch level and sublevel of the RT kernel whose number is as close as possible to that of the kernel of the respective distribution. This is usually possible, since with few exceptions there is an RT patch for every other patch level. So, for example, if the Debian distribution 9 (stretch) is used, you can select the appropriate patch level by entering the command...

dpkg -l linux-image-`uname -r`

the information

||/ Name Version
+++-=======================—========
ii linux-image-4.9.0-4-amd 4.9.51-1

I received a message indicating that the current installation is using Linux kernel version 4, patch level 9, and sublevel 51. This information is available on the Linux RT project download page. https://cdn.kernel.org/pub/linux/kernel/projects/rt/4.9/older/ – then you find out that the RT patch is there

patch-4.9.47-rt37.patch.xz

is available, which is most similar to the current non-RT kernel. The following commands will

• downloaded the corresponding source code,
• the patch was downloaded,
• unpacked the archives and
• the patch installed:

mkdir -p /usr/src/kernels
cd /usr/src/kernels
wget https://cdn.kernel.org/pub/linux/kernel/v4.x/linux-
4.9.47.tar.xz
wget https://cdn.kernel.org/pub/linux/kernel/projects/rt/
4.9/older/patch-4.9.47-rt37.patch.xz
tar xf linux-4.9.47.tar.xz
mv linux-4.9.47 linux-4.9.47-rt37
cd linux-4.9.47-rt37
xz -d ../patch-4.9.47-rt37.patch.xz
patch -p1 <../patch-4.9.47-rt37.patch

To ensure the RT kernel supports the current distribution as well as possible, the first step is to adopt its kernel configuration, which, in the case of Debian as in many other distributions, is located in the directory /boat located and whose version number matches the kernel. In this case, it is the file /boot/config-4.9.0-4-amd64, which is now copied into the root directory of the kernel source code under the name .config:

cp /boot/config-4.9.0-4-amd64 .config

In the final step before the kernel can be compiled, the new kernel must now be configured so that the functionality introduced with the RT patch is used. This is done using the command...

make menuconfig

accessed the page

Processor type and features —>
Preemption Model —>

called up and Fully Preemptible Kernel (RT) selected as shown in Figure 1 (see PDFThe process is then executed. The kernel is subsequently compiled and installed in the usual manner, and the system must then be restarted.

make -j4
make modules_install install
reboot

Upon restart, the RT kernel is then selected in the boot menu.

Review of the system's response behavior

The first important step after rebooting is to verify that the new kernel has actually been configured correctly – at least formally. This can be determined by checking if the flags are set correctly. PREEMPT and RT in the program's output uname Included are:

uname -v | cut -d“ “ -f1-4
#1 SMP PREEMPT RT

which is obviously the case here. However, a much better proof is, of course, to generate asynchronously arriving events and check how long the system takes at most for a userspace process waiting for this event to react with real-time priority. The expected time span can be estimated using the rule of thumb...

Maximum latency = clock interval x 10⁵

estimate. Accordingly, for example, a system with a clock frequency of 1 GHz and thus a clock interval of 1 ns can be expected to have a maximum latency of less than 100 μs. The test program has proven itself for measuring the maximum response time of a userspace process. cyclictest proven in the RT-Test Suite RT tests It is included and can be obtained from the following repository:

git clone git://git.kernel.org/pub/scm/utils/rt-tests/rttests.git

The program cyclictest It also includes a histogram function, which can be used to create so-called latency plots in the standard format. In this standard format, the... x-Axis latency classes in microseconds with a granularity of one microsecond, and on the yThe x-axis represents the frequency of measurements per class in logarithmic form. It is therefore a histogram with the special feature that the logarithmic scale... yThe axis can display both very low and very high frequencies. A shell script that allows this cyclictest The OSADL website provides information on how to execute the test and generate a standard latency plot directly from the measured values. https://www.osadl.org/uploads/media/mklatencyplot.bash It can be downloaded. It requires prior installation of the program package. gnuplot, which is included in most distributions. Additionally, a program is needed to process the latency plot generated by the script. plot.png to display.

# 1. Run cyclic test
cyclictest -l100000000 -m -Sp90 -i200 -h400 -q >output

# 2. Get maximum latency
max=`grep „Max Latencies“ output | tr “ “ “ “ | sort -n | tail -1 | sed s/^0*//`

# 3. Grep data lines, no empty lines a common field separator
grep -v -e „^#“ -e „^$“ output | tr “ “ “ “ >histogram

# 4. Set the number of cores, for example
cores=4

# 5. Create two-column data sets
for i in `seq 1 `
do
column=`expr 1 + 1`
cut -f1, histogram >histogram1
done

# 6. Create plot command header
echo -n -e „set title \“Latency plot\“ \
set terminal png \
set xlabel \“Latency (us), max us\“ \
set logscale y \
set xrange [0:400] \
set yrange [0.8:*] \
set ylabel \“Number of latency samples\“ \
set output “plot.png“
plot “ >plotcmd

# 7. Append plot command data references
for i in `seq 1 `
do
if test 1 != 1
then
echo -n „, “ >>plotcmd
fi
cpuno=`expr 1 – 1`
if test -lt 10
then
title="CPU"“
else
title="CPU"“
fi
echo -n „\“histogram1\“ using 1:2 title \“\“ with
histeps“ >>plotcmd
done

# 8. Execute plot command
gnuplot -persist

If the script is used unchanged, it has a runtime of 5 hours and 33 minutes and generates data from 100 million cycles. For meaningful results, suitable stress scenarios must be set up during the measurement. A typical result is shown in Figure 2 (see PDF) reproduced; the scaling of xThe x-axis is deliberately set very high to allow for comparison with slower processors or systems with unsatisfactory real-time capabilities. A steep right-hand slope on the curve is desirable. Since this is a processor with a clock frequency of 2.5 GHz and thus a clock interval of 0.4 ns, a maximum latency of 40 μs would be permissible according to the rule of thumb mentioned above. The measured value of 19 μs is significantly lower, so investigating the causes of this latency and attempting to reduce it further is probably not worthwhile.

Analysis of a system with unsatisfactory real-time properties

In Figure 3 (see PDFThe latency plot shown is for a uniprocessor system with x86 architecture. This architecture often requires so-called System Management Interrupts (SMIs), which are used, for example, to emulate certain communication protocols, manage thermal control measures, and execute microcode patches. Since the operating system has no way to prevent SMIs, it is quite possible that SMIs with an execution time longer than the system's acceptable latency will render such a system fundamentally unsuitable for real-time tasks. Sometimes, a BIOS update by the manufacturer can eliminate SMIs or at least shorten them sufficiently so that they do not impair the system's real-time capabilities.

The latency plot in Figure 3 (see PDFThis suggests that the latencies are due to SMI (System Migratory Interface). To investigate this further, the program was... hwlatdetect used, just like cyclictest in the aforementioned RT-Test Suite RT tests is included. However, this program can no longer be used from kernel patch level 4.9 onwards, as the functionality has been included in the mainline kernel since this kernel and is accessed using the tracing-subsystem is managed. Regardless of the implementation variant, the measurement consists of a half-second halt operation of the processor that occurs once per second, during which no instructions are normally executed. If this nevertheless happens, which is determined by regular polling of the Time Stamp Counters When a processor's activity can be detected, it is an externally triggered activity that leads to latency, the duration of which can be determined. The result of such a hardware latency measurement can then be displayed in the form of a histogram.

In Figure 4 (see PDF) one can recognize a large number of hardware latencies, whose duration of just over 300 μs fits well with the latency plot of the same system in Figure 3.

Further testing methods in case of unsatisfactory real-time properties

If a system exhibits increased latency that cannot be explained by SMIs, the cause of the latency must be determined. There might be another process with a higher priority contributing to the latency of the observed process; alternatively, a system component may have suspended interrupt handling or process restarts for an excessively long period. While the former can usually be resolved by analyzing currently running userspace processes, the latter requires an analysis of kernel processes. The previously mentioned and used test program includes tools for such kernel process analysis. cyclictest A special option allows kernel tracing to be enabled at the start of the operation and disabled again the moment latency occurs. The argument for this option, which breaktrace This is called a threshold value, above which kernel tracing is terminated. If so-called Function Tracing If the tracing method is chosen, it must be taken into account that this will slow the system down by approximately a factor of four. However, this does not fundamentally hinder the search for the cause of a latency, since the slowdown of the system usually also leads to an increase in latency.

Example of using the test program cyclictest with breaktrace

The following example assumes a multi-core system that should ideally have a maximum latency of less than 100 μs, but in reality, latencies of over 1000 μs are measured. To determine the expected latency, taking into account the... Function Tracing To prevent the acceptable performance loss of the system from leading to a false positive termination of the measurement, the threshold should be more than four times higher, for example, 600 μs. Since the latency to be analyzed is significantly higher at 1000 μs, this threshold still results in the desired measurement termination. The program call cyclictest In this case, it could therefore read:

cyclictest -m -Sp90 -i600 -d0 -fb600

and if the measurement is terminated due to exceeding the latency threshold, cyclictest then display the following message (individual values removed for better clarity):

T: 0 P:90 I:600 C:12049 Min:19 Act: 79 Avg:120 Max: 312
T: 1 P:90 I:600 C:11940 Min:29 Act: 111 Avg:118 Max: 291
T: 2 P:90 I:600 C:11928 Min:31 Act: 189 Avg:141 Max: 295
T: 3 P:90 I:600 C:11911 Min:28 Act:3120 Avg:296 Max:3120
# Thread Ids: 18116 18118 18124 18127
# Break thread: 18127
# Break value: 3120

The thread with PID 18127 was evidently woken up with a delay of 3.12 ms, which is above the 600 μs threshold. This situation fits well with the observed latencies. The next step is to locate the kernel tracing time at which the expected start of the delayed thread was to occur. cyclictest-threads have failed to materialize, and the cause must be investigated. Often, this is because another process has blocked interrupts at that time, preventing the timer interrupt of the process from being triggered. cyclictest The timer that was wound up cannot be processed. This programming error must then be corrected because the same issue, which in this case is the placeholder program, occurs. cyclictest This could, under production conditions, hinder the processing of an important interrupt and thus lead to a serious machine failure.

Example of using the kernel's internal latency histogram

Similar to the program cyclictest In userspace, the delayed execution of an interrupt can be measured, and this can also be done in the kernel. Each time a timer expires, the difference between the scheduled and actual wake-up time can be determined and stored in a histogram. Furthermore, the time between the start of the scheduling process of a process to be woken up and its actual execution start can be measured and also stored in a histogram. This also applies to the sum of the durations of the two processes, each of which is stored in a third histogram. While the program is running... cyclictest While measurements should never run concurrently with a real-time program under production conditions—at least not on the same processor core—the internal latency histograms can be run in parallel, even capturing idle conditions. To make the kernel's internal latency histograms available, they must be configured and enabled at runtime. Configuration is done in the menu. Kernel Hacking/Tracers with the individual components Scheduling Latency Tracer, Scheduling Latency Histogram and Missed Timer Offsets Histogram. To enable histograms, a non-zero value must be entered in the enable-directory to be written to the respective file.

Figure 5 (see PDFFigure 1 shows a 30-hour recording of the sum of timer and wake-up delays of a quad-core system; the maxima of consecutive 5-minute intervals are shown. This is possible because the kernel's internal latency histograms have a reset function that was triggered every five minutes in this case. The very low latency is clearly visible between 5:10 and 6:40 AM and between 7:10 and 12:43 PM, and again 12 hours later, when the processor clock speeds were set to maximum and all sleep states were disabled. During the remaining time, the latency values are significantly higher, as this is a relatively modern processor (Intel Bay Trail) with a wide range of power-saving methods. Allowing the processor to use these methods results in a significantly increased latency or even the complete loss of any real-time capability.

In addition to histograms, data from the processes involved in scheduling is also recorded, and the processes that have resulted in the highest latency so far are stored along with their priorities, process IDs, and latencies. This makes it possible to identify the process, thread, or driver responsible for increased latency in a given case. This usually makes it much easier to fix the underlying error.

The example in Table 1 (see PDF) shows a list of the scheduling data of the eight highest latencies during a test with cyclictest. As you can see, the program was definitely meminfo responsible for the observed increased latency. This is strong evidence that meminfo makes a kernel call that prevents the Linux kernel from, cyclictest to wake up in the normally short time of at most 20 μs.

Further analysis revealed that the configuration of this kernel

CONFIG_SLABINFO

was switched on and during the measurement the call

cat /proc/slabinfo

was executed repeatedly. When reading the debug data of the SLAB Depending on memory allocation and configuration, the kernel may not be interrupted for an extended period due to the allocator configuration. The following patch ensures that this kernel configuration is not enabled when Real-Time is selected.

—
init/Kconfig | 1 +
1 file changed, 1 insertion(+)
Index: linux-4.9.47-rt37/init/Kconfig
==============================================================
— linux-4.9.47-rt37.orig/init/Kconfig
+++ linux-4.9.47-rt37/init/Kconfig
@@ -1946.6 +1946.7 @@ config SLABINFO
bool
depends on PROC_FS
depends on SLAB || SLUB_DEBUG
+ depends on !PREEMPT_RT_FULL
default y
config RT_MUTEXES

After applying this patch, compiling the kernel, installation, and restarting, the observed latencies reappeared even with rapid and frequent rereading of the virtual file. /proc/slabinfo not open anymore.

If the corresponding function is used when reading from /proc/slabinfo with switched on CONFIG_SLABINFOIf the -flag is adapted to the requirements of the real-time kernel in the future, this patch, which is currently only used locally, can of course be removed.

author

Carsten Emde has over 25 years of experience as a software developer, system integrator, and trainer. His areas of expertise include graphical user interfaces, machine vision, and real-time operating systems. He has been the managing director of the Open Source Automation Development Lab (OSADL) eG since its founding in 2005.

Download the article as a PDF

Open Source – our training & coaching

Do you want to bring yourself up to date with the latest technology?

Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of Open Source / Embedded Software Engineering.

Training & coaching on the other topics in our portfolio can be found here.

Open Source – Expertise

Valuable expertise in the field of Open Source / Embedded Software Engineering is available. here Available for you to download free of charge.

To the specialist information

You can find expertise on other topics in our portfolio here. here.

MicroConsult Newsletter

With the MicroConsult newsletter, you'll stay on the pulse of the embedded world. Look forward to proven practical knowledge, real professional tips, and current events – directly from our experts for your project success.

Subscribe now!

Published by

weissblau media

← Tux Armored Making the Most of What's Available →