Configuring Linux systems to stabilise latency
Over the course of the last few months, Codethink have conducted an investigation into whether or not Linux systems can be configured to be deterministic, so that performance over time is made to be more predictable and overall improved by tweaking the kernel in certain ways. The investigation was kicked off by Niall Dalton at Tensyr, who started running tests with certain kernel boot parameters changed and saw positive results. Codethink reproduced Niall's experiments and then attempted further performance improvements.
Possible applications of this would be in critical systems, where processes must be able to run either uninterrupted or with a predictable level of interference factored in.
We conclude that, with appropriate separation and tunings, we can get something approximating soft real-time on Linux without too much issue. However, difficulty arises from interference due to the housekeeping processes.
The results we saw from our tuned systems were far better than our baseline systems, and variance was predictably limited.
We decided to measure performance through the latency, and variation in latency seen when running simple processes on a Linux system, both with and without external stresses running. We used 3 different bits of hardware; an automotive infotainment rig (Intel Atom E3840 processor, mainline 4.14.55 kernel), a Jetson TK1 board (ARM Cortex-A15 processor, Tegra 4.17 kernel), and a Lenovo ThinkPad X240 running minimal Debian (Intel i7-4600U 2.10GHz processor, mainline 4.14.0 kernel).
Our test program was a simple C script that essentially said
'Get time' -> 'Do some work' -> 'Get time'
The stress program we used was stress-ng. The latency was the difference between the start and end time of each measurement. We ran a number of different work processes:
- Memory operations
- Register operations
- Kernel operation kill(0,0)
- Kernel operation clock_gettime()
We ran a mixture of individual tests with each process running on each isolated CPU core (1-3), and parallel tests with a different process on each core, running simultaneously. With the latter, we could see certain events (for example dips and spikes in the latency) occurring at the same time across all three cores. Example below:
The following boot parameters (supplied by Niall Dalton) were used:
- Ignore corrected errors and associated scans that cause
periodic latency spikes with
- Avoid logging of backtraces when a process executes on a CPU
for longer than the softlockup threshold with
vm.stat_interval=120to limit updating VM statistics
- Disable the kernel trying to coalesce our pages with
- Disable CPU idling so that we're running at max performance with
processor.max_cstate=1 idle=poll intel_idle.max_cstate=0(this negates power saving)
- Isolate CPUs 1,2 and 3 with
isolcpus=1,2,3(replacing the numbers with whichever CPU you wish to isolate). We found this to cause the most significant improvement in latency and variance. Our setup was as follows:
- CPU0 - housekeeping processes
- CPUs 1-3 - running the latency test
- Stress applied to CPUs 1-3
- Enable fully tickless mode for each isolated CPU with
Various "housekeeping" and "actually do the IO work" threads exist and need to run in the kernel, but we want to keep them away from our designated cores. So, we moved IRQs away from our CPUs and pointed them towards the housekeeping CPU with
echo <CPU bitmask> > /proc/irq/<IRQ number>/smp_affinity
The rest of the configurations are as follows (note, each item of hardware used a slightly different set of configurations, depending on its capabilities):
- PREEMPT_RT patch applied.
- Set the CPU governor to performance and locked the CPU frequency.
- Set the clock source to TSC (as opposed to HPET).
- Set the policy scheduler to FIFO (as opposed to RR; round robin).
- Set the priority of our process to 20 so it would take precedence over other processes occurring in the kernel.
A selection of graphs plotted from data from the automotive rig are below, with the tuned kernel results plotted against baseline results. Comparing the two, we can see less variance as well as a marked improvement in overall latency of anywhere between 40-96%, when using the tuned kernel.
Full presentation of our results can be found here; you will need to run a program called Jupyter to view them. Instructions on how to install Jupyter and the necessary dependencies can be found here.
For the truly curious, our GitLab instance is open and can be viewed here. There are a number of wiki pages containing information and results from the tests we ran and developed throughout the project.
We found with the tests run on the laptop that we were seeing jumps in latency every so often. These correspond with CPU frequency changes seen in the datasets, probably due to thermal variance in the Intel processor. Example below:
From the facts that we saw similar (albeit less frequent) frequency changes on the automotive rig, and that we didn't see these jumps on the Jetson TK1, we conclude that:
a) ARM processors may be less susceptible than Intel processors to CPU frequency changes, and b) Intel Atom processors may not jump in frequency as often as the Intel i7, but the jumps will likely last for longer when they do occur.
It is worth noting that even with the jumps, the variance remains within ~5-6%.
Another interesting event we saw with the automotive rig parallel tests was periodic spikes in the tuned kernel when under stress, only in one CPU. Example below:
Since we were allowing stress to run across CPUs 1-3, we believe the kernel had allocated it to that core and, despite our process having priority, was being fair to both processes by allowing stress to interrupt intermittently. From this we can conclude that even when we tune and set ourselves to be high priority, without a real-time scheduler we can't stop ourselves being pre-empted eventually. However, the periodicity means this can be factored in, and the results are still much better than baseline.
- We are seeing largely positive results.
- Where we don't have positive results, there is justification - but even when Linux forces us to give up, it's by a predictable amount, it's for a predictable amount of time, and both of these elements can be factored in. Real-time is not about being super-fast after all, it's about being predictable.
- We can confidently say that with the tunings used, you can expect latency variance of within ~5%, ~6% as a maximum.
- Next steps would be to investigate how far we can isolate the housekeeping processes.
- Things we don't yet have a grasp of in this context would be other kernel threads, and system management mode.
- Using Git LFS and fast-import together
- Testing in a Box: Streamlining Embedded Systems Testing
- SDV Europe: What Codethink has planned
- How do Hardware Security Modules impact the automotive sector? The final blog in a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part two of a three part discussion
- How do Hardware Security Modules impact the automotive sector? Part one of a three part discussion
- Automated Kernel Testing on RISC-V Hardware
- Automated end-to-end testing for Android Automotive on Hardware
- GUADEC 2023
- Embedded Open Source Summit 2023
- RISC-V: exploring a bug in stack unwinding
- Adding RISC-V Vector Cryptography Extension support to QEMU
- Introducing Our New Open-Source Tool: Quality Assurance Daemon
- Long Term Maintainability
- FOSDEM 2023
- Think before you Pip
- BuildStream 2.0 is here, just in time for the holidays!
- A Valuable & Comprehensive Firmware Code Review by Codethink
- GNOME OS & Atomic Upgrades on the PinePhone
- Flathub-Codethink Collaboration
- Codethink proudly sponsors GUADEC 2022
- Tracking Down an Obscure Reproducibility Bug in glibc
- Web app test automation with `cdt`
- FOSDEM Testing and Automation talk
- Protecting your project from dependency access problems
- Porting GNOME OS to Microchip's PolarFire Icicle Kit
- YAML Schemas: Validating Data without Writing Code
- Deterministic Construction Service
- Codethink becomes a Microchip Design Partner
- Hamsa: Using an NVIDIA Jetson Development Kit to create a fully open-source Robot Nano Hand
- Using STPA with software-intensive systems
- Codethink achieves ISO 26262 ASIL D Tool Certification
- RISC-V: running GNOME OS on SiFive hardware for the first time
- Automated Linux kernel testing
- Native compilation on Arm servers is so much faster now
- Higher quality of FOSS: How we are helping GNOME to improve their test pipeline
- RISC-V: A Small Hardware Project
- Why aligning with open source mainline is the way to go
- Build Meetup 2021: The BuildTeam Community Event
- A new approach to software safety
- Does the "Hypocrite Commits" incident prove that Linux is unsafe?
- ABI Stability in freedesktop-sdk
- Why your organisation needs to embrace working in the open-source ecosystem
- RISC-V User space access Oops
- Tracking Players at the Edge: An Overview
- What is Remote Asset API?
- Running a devroom at FOSDEM: Safety and Open Source
- Meet the codethings: Understanding BuildGrid and BuildBox with Beth White
- Streamlining Terraform configuration with Jsonnet
- Bloodlight: Designing a Heart Rate Sensor with STM32, LEDs and Photodiode
- Making the tech industry more inclusive for women
- Bloodlight Case Design: Lessons Learned
- Safety is a system property, not a software property
- RISC-V: Codethink's first research about the open instruction set
- Meet the Codethings: Safety-critical systems and the benefits of STPA with Shaun Mooney
- Why Project Managers are essential in an effective software consultancy
- FOSDEM 2021: Devroom for Safety and Open Source
- Meet the Codethings: Ben Dooks talks about Linux kernel and RISC-V
- Here we go 2021: 4 open source events for software engineers and project leaders
- Xmas Greetings from Codethink
- Call for Papers: FOSDEM 2021 Dev Room Safety and Open Source Software
- Building the abseil-hello Bazel project for a different architecture using a dynamically generated toolchain
- Advent of Code: programming puzzle challenges
- Full archive