Fail-Operational: How highly automated functions work despite errors
Authors: Gereon Weiß and Philipp Schleiß, Fraunhofer ESK
Contribution – Embedded Software Engineering Congress 2016
The increasing automation of systems necessitates new approaches to enhance their reliability and flexibility. In future highly automated vehicles, the driver can completely relinquish control and only needs to be able to resume it after 10 seconds. For this to work, the highly automated driving functions must continue to operate even in the event of a failure, i.e., they must be fail-operational. This paper presents a new concept and solution for future adaptive vehicle software architectures. This enables cost-effective implementation of fail-safe operation in embedded, safety-critical systems. The fundamental challenges, new mechanisms, and integration into current development processes (including AUTOSAR) are described. The concept has been implemented and evaluated in an electric vehicle, among other applications.
Automation – from fail-silent to fail-operational
The progressive automation towards autonomous driving takes place in successive stages with increasing levels of automation. However, the next step from a partially automated to a highly automated system (VDA Level 3 [1]) requires a higher degree of flexibility (as is also necessary for new mobility concepts [2]) and reliability. In this step, the driver does not have to constantly monitor the vehicle and the traffic situation, but rather resumes control within a defined period. If a situation arises that the automated system cannot handle independently, the driver should regain control of the vehicle within 10 seconds. In this case, however, it must also be guaranteed that the vehicle system remains functional independently for this period. If, for example, a control unit fails, the system must be able to compensate for this independently. If, for example, the steering function is affected by the failure, it must be restored as quickly as possible by other means, without the vehicle entering an unsafe state. However, this requires a new type of safety for the automotive sector, because previously faulty systems could often simply be switched off thanks to mechanical backup solutions. Now, a change from this so-called fail-silent to a fail-operational Behavior is necessary. The latter means that even in the event of a failure, the safety-critical functionality is maintained. This safety concept is already implemented in avionics with multiple redundancies [3]. However, this cannot simply be adopted for the automotive sector, partly due to the high costs. Current approaches typically use dedicated hardware in a multi-redundant configuration for each functionality. This approach may still be useful for a small number of driving functions. However, it appears too complex and expensive with regard to the many fail-safe driving functions in future vehicles.
Software-based redundancy through adaptivity
To achieve a more efficient solution for fail-operational behavior, current methods for integrating multiple independent functions onto a single ECU can be utilized. With such a concept, the highly integrated functions are no longer tied to a dedicated ECU, but instead share computing resources with other functions on a high-performance platform.
In such a system, potential failures of individual control units can be globally considered and compensated for. For this to work, the system must be able to react adaptively to such fault events. For example, different faults can be handled by the same backup control unit in the event of a failure. According to previous approaches with individual hardware redundancy, every control unit failure would have to be covered by at least one other dedicated control unit. Since not all functions are safety-critical—that is, not necessary for achieving the safety objective—the amount of hardware redundancy can be further reduced. In the considered fault scenarios, only the necessary functions are then executed, by discarding non-critical functions or executing less resource-intensive versions. This concept is also known as Graceful Degradation This is referred to as "this feature." It allows for secure configurations to be achieved despite reduced system capacities following component failures.
Future E/E architectures for automated vehicles will feature numerous highly available driving functions and must account for a multitude of potential failure scenarios. Manually configuring each ECU for every failure scenario is simply not feasible. This is because various constraints must be met. For example, it must be ensured that each fail-safe function is reactivated within a function-specific timeframe, typically a few milliseconds. Similarly, it must be guaranteed that functions are activated deterministically and that individual functions are not activated twice after reconfiguration. To develop a system of such complexity without errors, methods for the automated generation of fail-safe systems are crucial. These methods can solve the complex problem of distributing and allocating software components to ECUs and CPU cores for all failure situations. Furthermore, valid configurations for all modules involved in the reconfiguration, such as those for scheduling or communication relationships, can be automatically generated.
To enable the outlined system-wide approach to increased fault tolerance through adaptivity [4], general requirements for the hardware architecture (see Fig. 1, PDF) and a specific software component are necessary. Parts of the hardware architecture intended for functions with fail-operational behavior must be robust against certain faults. In addition to remote and redundant sensors and actuators, two communication paths to these are also required. This is the only way to ensure that a link break does not lead to a complete failure of functionality, for example, because an actuator becomes unreachable. Furthermore, synchronization of the participating control units is essential so that adaptation can occur consistently and at the same time. Each participating control unit must also be able to reliably detect its own faults in order to either deactivate itself or, alternatively, to deduce the effects of the fault on the respective functions. Since a single power source Single point of failure As this represents, it is important to protect these separately or to design them redundantly. The special software component builds on this hardware architecture [5]. It reliably performs and distributes the reconfiguration of the individual control units by adaptively activating the required functions on the existing control units.
Distributed adaptation at runtime
To keep all critical functionalities of a vehicle operational after a failure, deterministic adaptation of all involved control units is necessary. For this purpose, a new basic software module called [module name missing] was developed. Safe Adaptation Platform Core (SAPC) [6] developed, which manages the availability of all software components (SWC) at runtime. To collect information about the system state decentrally, each SAPC instance sends synchronous status messages (so-called Health Vectors) with the state of all managed SWC instances. Based on this, each SAPC instance can independently analyze whether it needs to perform a reconfiguration. Since the adaptations are defined by central planning during the design phase, the decisions of the distributed instances for safety-critical functions are consistent across the network. This procedure for exchanging system state is illustrated in Figure 1 (see PDF).
Tool for automated systems synthesis
To automatically arrive at a fault-tolerant system configuration, a new modeling concept was designed based on the AUTOSAR exchange format [7] to precisely specify the requirements regarding the availability of individual SWC instances in different fault modes. Based on this information, the developed tooling analyzes the data and control flow between the individual runnables in the system model and determines a schedule for each ECU and each fault mode. Similarly, the transmission times and the composition of PDUs are also determined, taking into account the bus-specific characteristics. The entire automated process is shown schematically in Fig. 3 (see PDF).
Summary & Outlook
To meet the fail-safe requirements of future E/E architectures in a cost-effective manner, new architectural and automation approaches are essential. The presented approach allows for achieving the high availability requirements of automated vehicle systems through a lean hardware architecture and software-based adaptation. To ensure efficient implementation in emerging systems, standardization of fail-operational concepts is necessary. This will enable the establishment of cost-effective and vendor-independent interoperability of non-competitive mechanisms.
Bibliography and list of sources
[1] German Association of the Automotive Industry (VDA). (last accessed: 14 October 2016) Automated driving. [Online].
https://vda.de/de/themen/innovation-und-technik/automatisiertes-fahren/automatisiertes-fahren.html
[2] Project: Adaptive City Mobility 2. (last accessed: 14.10.2016) [Online].
https://www.adaptive-city-mobility.de/
[3] P. Bieber, E. Noulard, C. Pagetti, T. Planche, and F. Vialard, „Design of Future Reconfigurable IMA Platforms,“ Special Issue on the 2nd International Workshop on Adaptive and Reconfiurable Embedded Systems (APRES'09), 2009.
[4] SafeAdapt. (last accessed: 14 October 2016) Safe Adaptive Software for Fully Electric Vehicles. [Online]. https://www.safeadapt.eu
[5] A. Ruiz, G. Juez, P. Schleiss, and G. Weiss, „A safe generic adaptation mechanism for smart cars,“ IEEE 26th International Symposium on Software Reliability Engineering (ISSRE 2015), 2015.
[6] SafeAdapt, „D3.1 Concept for Enforcing Safe Adaptation during Runtime,”,“ Project Deliverable, 2015.
[7] AUTOSAR. (last accessed: 14.10.2016) AUTomotive Open System Architecture. [Online]. https://www.autosar.org
Download the article as a PDF
Architecture – MicroConsult Training & Coaching
Do you want to bring yourself up to date with the latest technology?
Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of architecture/embedded and real-time software development.
Training & coaching on the other topics in our portfolio can be found here. here.
Architecture – Expertise
Valuable expertise in architecture/embedded and real-time software development is available. here Available for you to download free of charge.
You can find expertise on other topics in our portfolio here. here.
