Use of virtualization for secure software architectures

Central control units, mixed criticality, testing, multicore

Authors: Thomas Bock, Dr. Henning Kleinwechter, Volkswagen; Dr. Ralph Sasse, OpenSynergy; Armin Stingl, iSystem; Dr. Ralf Münzenberger, Philip Rehkop, Olaf Schmidt, INCHRON

Contribution – Embedded Software Engineering Congress 2017

The large number of electronic control units (ECUs) is prompting the automotive industry to consolidate multiple ECUs into central units. This involves executing many functions previously performed on individual ECUs together on a single CPU. This article describes the challenges and solutions that arise regarding the temporal distribution of CPU computing power across the individual functions. The focus is on interference-free operation for systems that integrate ISO 26262 ASIL-relevant functions with ASIL quality management (QM) functions. The article is based on the results of a joint pre-development project by Volkswagen, OpenSynergy, and INCHRON. This project utilizes a hypervisor as an architectural approach. Real-time behavior and causal chains are modeled, simulated, and refined and verified through measurements. The results enable the formulation of best practices for the use of a hypervisor in domain ECUs.

motivation

Central control units increasingly incorporate software functions developed independently by different software suppliers (Tier 2). This is especially true for body control modules (BCMs). They integrate a large number of functions with varying characteristics.

Safety,
Security,
Relevance for admission and
Degree of change.

This means that software systems with varying safety requirements (mixed criticality) are running on a single piece of hardware. They must all comply with the requirements of ISO 26262, for example, ensuring interference-free operation between systems classified at different ASIL levels. Furthermore, new, innovative functions with high potential for change are integrated alongside relatively stable functions, which may also need to meet legal requirements. Without safeguards, any modification to a software function can affect the behavior of the entire system, meaning that even a single change necessitates re-safetying the entire system.

Volkswagen launched the pre-development project to comply with the required standards and legal requirements and to reduce the rapidly increasing testing and recertification effort.

Goals and challenges

If, in the future, passenger cars will only have a few high-performance control units with a wide variety of software functions, this will bring the following challenges:

It must be ensured that the allocated resources (memory, computing time) are available for the integrated software functions and are not affected by other software functions.
It must be ensured that the data defined within the framework of resource budgeting does not overload the processor and meets all real-time requirements.

The framework conditions of the body electronics in the pre-development project should be taken into account, such as relatively resource-limited processors that typically only have a Memory Protection Unit (MPU).

AUTOSAR has increasingly established itself as a standard in software architecture development. Regarding interference-free operation, AUTOSAR defines a partitioning concept (OS applications) for memory protection. AUTOSAR does not provide explicit mechanisms for guaranteeing runtime for application software. While it is possible to monitor the runtime of individual runnables or tasks, this is a complex and sometimes resource-intensive approach to handling runtime overruns.

Therefore, the pre-development project focused on a software architecture in which the software systems are encapsulated in virtual machines (VMs). A hypervisor manages the VMs. This architecture is already successfully used in other domains. It ensures that the system remains free of interference with both memory access and processing time. Figure 1 (see below) illustrates this. PDFThis architecture, along with requirements for three software modules to be integrated, is presented as an example.

The goal of the pre-development project described here was twofold: first, to port a virtualized software architecture to a modern processor for BodyControlModules; and second, to investigate the specific characteristics of such an architecture through simulation, particularly regarding timing and real-time behavior. The insights gained from this process enabled the definition of best practices for designing a hypervisor-based software architecture.

Virtualization

Thanks to the hypervisor, multiple AUTOSAR software stacks, for example, can run concurrently on a single CPU. The hypervisor isolates the individual virtual machines (VMs). Each VM has access to the CPU's resources as if it were running alone on that CPU. Guest operating systems, real-time executables (RTEs), and other components can thus be programmed and operated independently of other VMs. Multiple VMs share a single core. In multicore systems, VMs can be distributed across multiple cores. The hypervisor's role is to guarantee the unimpeded independence of each VM with regard to time (timing/scheduler) and memory (flash/RAM), while simultaneously enabling and monitoring the desired cooperation between VMs through communication and sharing mechanisms.

In Fig. 2 (see. PDFThe first (global) stage of scheduling, implemented in the hypervisor, is shown. A purely static, TDMA-based, table-driven approach was chosen for this. The project results show that cycle times should be neither too short (increasing overhead due to frequent VM switching) nor too long (increasing latency due to planned later processing). The second (local) stages are implemented by the guest operating systems in the respective VMs.

In a BCM environment, only the limited capabilities of an MPU are available for protecting memory areas. However, the reprogramming of the MPU required when switching VMs can be minimized by carefully selecting the memory allocation to VMs, so that only the minor differences between the VMs need to be changed.

For interrupts handled by currently inactive VMs, a pre-emptive, budget-based concept was introduced that still guarantees the planned execution times of the VMs (fast interrupts), but at the cost of the utilization of the affected CPU cores.

Timing analyses

A key objective of the pre-development project was to systematically investigate timing effects resulting from the use of virtualization and subsequently design best practices for such systems. For this purpose, a timing model of the dynamic architecture with priority-based scheduling (AUTOSAR OS) was created for a virtualized ECU. The net software execution times, the mapping of software components to tasks, and the stimulation of ISRs were derived from a series development project. Simultaneously, a timing model of the series development project was created at the same level of abstraction as the virtualized system. This allows for a comparison of the dynamic behavior with and without virtualization.

The real-time simulator chronSIM was used for the timing analyses. A key finding is that using a hypervisor with time slices for individual VMs results in timing effects that are surprising compared to pure priority-based scheduling:

Choosing the appropriate hypervisor period and time slice length for individual VMs significantly impacts adherence to timing requirements. Even with low overall system utilization, real-time violations can occur within a single VM.
Unlike a system with priority-based scheduling, a system with a hypervisor requires careful planning of time-critical workflows during the design process when data flows across multiple VMs. This concerns the execution order of the VMs, ensuring that the steps of a workflow are executed as sequentially as possible. This applies to both single-core and multi-core CPUs.
Mapping software components to tasks and tasks to VMs can very easily be done in an unfavorable way, with regard to the required load per VM (see point 1) or compliance with the maximum latency of time-critical chains of operations (point 2).
Since interrupt handling is resource-intensive in virtualization, polling should be used whenever possible.

Overall, a key lesson learned from the project is that temporal behavior must be analyzed and the dynamic architecture optimized as early as the design phase. It is crucial to clearly specify the time requirements early on and to define the time budgets for the software of each supplier. Two of the points mentioned above will be explained in more detail below.

Point 1:
In Fig. 3 (see. PDFThe lower part of the graph shows a snapshot of the OS status during task execution in the respective VMs, while the upper part displays the overall system utilization. The execution of task VM3_APP_20ms in VM 3 will not complete within the scheduled VM cycle, even though only a short time remains for full completion. The complete execution of task VM3_APP_20ms must wait until the next but one invocation of VM 3, which in this case corresponds to twice the hypervisor cycle time of 10 ms. This real-time violation occurs due to high load on VM 3, even though computing capacity is still available in other VMs.

Point 2:
In Fig. 4 (see. PDFThe diagrams show the behavior of two cascades. In the blue cascade, the execution order of the VMs and tasks is optimized for processing the cascade steps, resulting in a small waiting time between the execution of successive cascade steps in VMs. This differs from the purple cascade, which therefore significantly exceeds the maximum latency.

Alongside the model-based timing analysis, measurements were conducted in the project using winIDEA from iSystems. VMs, tasks, ISRs, and runnables were measured.

Best practices for the use of virtualization

The design of a virtualized system should be a multi-stage process. A global architecture design defines the hypervisor's characteristics. This requires key information about each subsystem from all project partners (OEMs, Tier 2 suppliers). Initial negotiations with these partners regarding their software development time budgets are also conducted. The local architecture design focuses on the scheduling within the ECU and the virtual machines.

The process developed by Volkswagen comprises five tasks. One lesson learned from the project is that several iterations of these tasks are necessary to design a robust and scalable architecture.

Identify requirements
At the beginning of the project, the software architect must create a resource plan that shows the available runtime budget, memory, and other resources for each software component.
Clustering of software applications
The number of VMs can be determined based on the criteria (a) ASIL level, (b) security relevance, (c) software vendor, (d) need for interference-free operation, and other criteria.
Global Architectural Design
A timing model is created from the budgeting data and feedback from the individual software components. The planned budgets and resources are verified through simulation within the intended architecture using this model. Optimizations can be derived from the results. The model serves to ensure adherence to resource limits throughout the entire development period; therefore, cyclical updates must be planned from the outset.

Coordination and finalization of the time budget for each project partner
Based on the global architectural design, time budgets for each software component are negotiated and agreed upon with the project partners. A new timing analysis must be performed after every change.
Local architectural design
For individual VMs (scheduling, mapping of software components to tasks, etc.), a similar approach to that described in point 3 is recommended to avoid exceeding local resource limits. Timing analyses are used to verify the VM's local time requirements.
Continuous verification of the architecture
The functionality of the individual components and the overall system is ensured through the monitoring of resources and the dynamic behavior of the software during development.

Continuous monitoring of time budgets and adherence to deadlines is essential after every integration. The above points must be carried out for every relevant change request.

Summary and Outlook

The increasing integration density in high-performance ECUs also increases the risk of resource conflicts and mutual interference between individual functional modules. These challenges can be managed through the use of virtualization concepts. For the effective use of a hypervisor in domain control units, detailed timing, memory, and I/O resource planning during the design phase and continuous monitoring during development using simulation are highly recommended.

Limited virtualization support by the processor restricts the applicability of the mechanisms in the hypervisor and increases implementation effort. However, it does not constitute a fundamental exclusion criterion for this feature. Since the rapidly increasing functional integration density is accompanied by a continuous increase in processor performance, virtualization concepts will quickly become established.

authors

Thomas Bock He has worked in the development of body electronics at Volkswagen for more than 15 years. As a project manager, he oversees the development of software function modules in this field, with a particular focus on dynamic software analysis.

Dr. Henning Kleinwechter At Volkswagen, he oversees the software architecture and management of third-party application software for body controllers. He has more than 15 years of experience as a project manager and software architect for model-based software development at Carmeq and Daimler.

Armin Stingl has been with iSYSTEM AG since 2013, where he is responsible for the definition, validation, and market launch of innovative debug and trace tools. With over 20 years of professional experience, he has been involved in the development of several RISC cores. As a systems engineer, he was primarily responsible for architecture definition at renowned semiconductor manufacturers. In this role, he participated in the development of on-chip debug and trace solutions for several processors.

Dr.-Ing. Ralf Münzenberger As co-founder of INCHRON GmbH, he is responsible for the Professional Services division as Managing Director. He has provided extensive support to clients in over 160 projects, focusing on the design of robust dynamic architectures and ensuring real-time capability. This includes topics such as architecture optimization, migration from single-core to multi-core processors, functional safety, and process consulting.

Download the article as a PDF

Automotive – our training & coaching

Do you want to bring yourself up to date with the latest technology?

Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of automotive/embedded and real-time software development.

Training & coaching on the other topics in our portfolio can be found here. here.

Automotive – Expertise

Valuable expertise in automotive/embedded and real-time software development is available. here Available for you to download free of charge.

To the specialist information

You can find expertise on other topics in our portfolio here. here.

MicroConsult Newsletter

With the MicroConsult newsletter, you'll stay on the pulse of the embedded world. Look forward to proven practical knowledge, real professional tips, and current events – directly from our experts for your project success.

Subscribe now!

Published by

weissblau media

← Software design for the AUTOSAR Adaptive Platform Properly distribute software in real-time systems →