Select Page

Properly distribute software in real-time systems

Avoiding typical errors in concurrent software

Authors: Dr. Jochen Härdtlein, Simon Kramer, Peter Häfele, Dr. Dirk Ziegenbein, Robert Bosch GmbH

Contribution – Embedded Software Engineering Congress 2017

Solving errors arising from concurrent or parallel software execution using traditional methods—such as locks—leads to significant system overhead and severely limits distributability and the effective use of parallel computing power. The constructive approach of Logical Execution Time decouples communication from computations and structures it temporally across the system. This paper analyzes typical error scenarios in concurrent real-time software and summarizes constructive mechanisms for their prevention [6]. It also provides an outlook on tool-supported, correct software distribution.

The necessary performance increase for embedded control units in vehicles can no longer be achieved solely through increased clock speeds in single-core computers, as was the case in previous decades. Alongside the development of microcontrollers with multi- or many-core architectures, multi-microcontroller solutions or domain controllers with microprocessors are also being used. These computer architectures consist of a multitude of sometimes heterogeneous processing cores, meaning they have different computing power and capabilities. Additionally, the associated memory architecture is increasingly characterized by NUMA (Non-Uniform Memory Access), meaning that access speeds and bandwidths vary between computers and memory. The requirements for the software architecture of current and future embedded systems are heavily influenced by the increasing parallelism and heterogeneity of these parallel systems (see Figure 1)., PDF).

Concurrency and real-time fields

Embedded systems – such as engine control systems for internal combustion engines – are systems with a mix of event-driven and time-driven computations that have a high degree of real-time requirements to ensure the correctness of the implemented controls. Furthermore, due to their complexity, these solutions are developed in a distributed manner, across different locations and companies (OEM, Tier 1, Tier 2). In this setup, it is essential to understand that securing the systems becomes increasingly complex and resource-intensive with increasing heterogeneous parallelism. Therefore, as a first step, we aim to understand the potential error scenarios of the software from both a concurrency and real-time perspective in order to subsequently derive solutions for their prevention ([1], [5]).

Embedded software systems consist of several hundred functions that are called at specific frequencies. The operating system is responsible for activating calculations according to their activation rate and, if necessary, interrupting ongoing calculations with lower priority (rate-monotonic scheduling). Every correct processing of a function relies on dependencies on other functions—called order—that are calculated before or after it (e.g., a sensor value must be processed before it can be further processed). For calculations on a single processor core, this is easy to ensure for functions with the same activation rate (through a fixed sequence within the task). However, even the sequence between different time slices can only be partially controlled via priorities or implemented costly using synchronization points. If the calculations are distributed across multiple processor cores, the sequence can only be achieved via synchronization points (or global scheduling), which significantly reduces the benefit of parallel computing power. Furthermore, it turns out that the implicit assumptions of legacy software in single-core systems are often not completely transparent. This poses the risk that, with a simple distribution of the software, the necessary calculation sequences may not be followed (sporadically) (see Figure 2, PDF).

Perhaps the most well-known sources of errors during synchronization are deadlocks. Deadlocks can occur, for example, when locks are set successively and the order is not consistent system-wide. Possible solutions include using a single resource for all parts to be protected or ensuring a consistent system-wide locking order. The former solution can lead to significant system overhead because it affects calculations that are unrelated to the content. Ensuring a complete system-wide locking order is generally extremely difficult (see Figure 3)., PDF).

Data inconsistencies are the most frequently cited source of errors in concurrent systems. These can occur when data is accessed simultaneously or functions are called multiple times. To avoid these overlap errors, synchronization is often used, which in turn generates overhead. Additionally, state transitions in software must be considered, as implementations from single-core environments, in particular, often exhibit timing effects that are difficult to manage in parallel systems (see Figure 4)., PDF).

The last category addresses violations of real-time requirements, such as missed deadlines, which result in results being provided to other functions too late. Depending on the severity of the impact, this may be irrelevant or lead to the complete failure of the embedded system (see Figure 5)., PDF).

The listed errors are typically avoided using synchronization mechanisms such as spinlocks. However, as system load increases, the overhead from locking and querying these locks also rises significantly.

Timed Communication

A structural solution to reduce the effort required to ensure consistent communication in parallel real-time systems was introduced by Henzinger et al. with the Logical Execution Time and implemented in their Giotto Framework ([2], [3], [4]). Using the Logical Execution Time, communication is decoupled from computations and coordinated system-wide. This reduces individual synchronizations to a minimum. This type of communication is called Timed Communication (see Figure 6, PDF).

In contrast to traditional implementations where time slices exchange data directly with global storage, Timed Communication (TDC) facilitates data exchange directly between two time slices. Furthermore, communication only occurs when new data is available. This temporal decoupling of communication from computation ensures that time-driven tasks always receive input data of the same age when computations, including communication, are completed within the given period.

Timed Communication can prevent ordering errors between time slices and synchronization or overlap errors (for data). Additionally, Timed Communication structures deterministic, lock-free communication and allows for task-specific detection of real-time errors.

Furthermore, based on deterministic communication and the temporal requirements of communication between functions, it is possible to achieve a tool-supported distribution that ensures or verifies the underlying time requirements through design. For this, the time-critical interfaces and their time requirements must be recorded for all functions. Additionally, the identification of time-critical cascades with their deadlines must be recorded. Likewise, an abstract description of the underlying hardware is required. However, all this data typically requires expert knowledge to obtain. This data can already be managed in the publicly available Amalthea format (see Figure 7)., PDF) [7].

The currently used preemptive rate-monotonic scheduling in automotive single-core systems already results in a concurrent real-time system in which the presented error states can already occur. However, the probability of errors increases significantly by distributing the software across parallel processing units. Avoiding errors through additional synchronization leads to additional overhead, which partially negates the benefit of the additional parallel computing power and is also difficult to verify. Furthermore, the absence of real-time errors remains difficult to validate. The Timed Communication Framework can now be used as a basis to ensure the timing requirements of functions and cascades system-wide during system setup, thereby massively reducing the significantly increasing and more complex verification effort in parallel systems.

Sources

[1] Clarke SJ and McDermid JA. Software fault trees and weakest preconditions: a comparison

and analysis. Software Engineering Journal. 8(4):225-236, 1993.

[2] Henzinger TA, Horowitz B, Kirsch CM (2001a) Embedded control systems development with Giotto. In: Proceedings of the ACM SIGPLAN workshop on languages, compilers, and tools for embedded systems (LCTES). ACM

[3] Henzinger TA, Horowitz B, Kirsch CM (2001b) Giotto: A time-triggered language for embedded programming. In: Proceedings of the international workshop on embedded software (EMSOFT), vol 2211 of LNCS, Springer, pp 166–184

[4] Henzinger TA, Horowitz B, Kirsch CM (2003a) Giotto: A time-triggered language for embedded programming. Proc IEEE 91(1):84–99

[5] Thane, Henrik. Monitoring, testing and debugging of distributed real-time systems. Diss. Ph. D. Thesis, MRTC Report 00/15, 2000.

[6] Härdtlein, Jochen, Distributed software in real-time systems. HANSER automotive 03-04/2017

[7] www.amalthea-project.org, An Open Platform Project for Embedded Multicore Systems, Publicly funded ITEA project

Download the article as a PDF file


Real-time – MicroConsult Training & Coaching

Do you want to bring yourself up to date with the latest technology?

Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of embedded and real-time software development.

Training & coaching on the other topics in our portfolio can be found here. here.


Real-time expertise

Valuable expertise in the field of embedded and real-time software development is available. here Available for you to download free of charge.

To the specialist information

You can find expertise on other topics in our portfolio here. here.

MicroConsult Newsletter

With the MicroConsult newsletter, you'll stay on the pulse of the embedded world. Look forward to proven practical knowledge, real professional tips, and current events – directly from our experts for your project success.

Subscribe now!

Published by

weissblau media

weissblau media