Model-based risk analysis of safety-critical systems

Experiences with a UML profile in the railway engineering environment

Author: Markus Schacher, KnowGravity Inc.

Contribution – Embedded Software Engineering Congress 2015

The European standard for demonstrating the reliability, availability, maintainability, and safety of railway applications defines safety as "freedom from risks that are unacceptable to humans or the environment" and risk as "the combination of the expected frequency of a loss and the expected severity of that loss." This article shows how such risk considerations can be developed in the form of a UML model.

Risks in the railway environment

Operating a railway inherently carries risks due to the high number of passengers transported. Both technical and human error can lead to fatal accidents, causing extensive property damage or even endangering lives. For this reason, a safety culture has developed in the railway sector over many decades, which is now enshrined in the European CENELEC standard [EN50126]. This standard not only describes techniques for identifying and mitigating risks, but also regulates the fundamental interaction between railway operators and their suppliers.

The central concept is hazard: a situation that could lead to an accident, but doesn't necessarily have to. This could be, for example, a signal that is green even though there is a train on the track behind it. A hazard always relates to a specific system context, such as the signal and its underlying control logic (interlocking system), including other sensors. Based on this, scenarios that could lead to various accidents can now be analyzed.

Conversely, hazards always have causes that originate within the system context: a sensor is defective, the control logic exhibits faulty behavior under certain circumstances, or maintenance personnel have made an incorrect configuration. The application of this analysis technique is also known as "fault tree analysis" (FTA).

According to EN 50126, the analysis of the consequences of a hazard lies primarily with the railway operator: They must demonstrate to the regulatory authorities, and thus to the public, that they can live with, i.e., manage, the risks resulting from the hazard. However, they can only do this if the frequency of these hazards (the "hazard rate") does not exceed a certain level. In contrast, the responsibility for analyzing the causes of a hazard lies with the system supplier, which is limited by the system context. They must demonstrate that their system does not exceed the required hazard rate under any circumstances. For the supplier, hazard rates are therefore safety-relevant requirements that they must meet to obtain approval for their system (see Figure 1)., PDF).

Model-based risk analysis

When a risk analysis is model-based, the key concepts of the risk analysis are captured in the form of model elements and related to each other. The most important relationships between these concepts are shown in the following figure (Fig. 2)., PDF) summarized.

Thematically, a model-based risk analysis can be grouped into three areas:

In system modeling (green area in Figure 2, see PDFThe relevant systems and subsystems, as well as their functions, are described. This can be done in detail, for example, using the OMG's Systems Modeling Language (SysML). Furthermore, various organizations can be designated as system owners in this area.
In risk modeling (yellow area), (system) states are classified as hazards, causes, and consequences, linked via cause-and-effect relationships, and the triggering incidents are identified. These system states can then be assigned to individual system components (technical modeling) or individual functions (functional modeling).
Requirements modeling (blue area) combines system modeling with risk modeling by identifying protective measures that minimize the frequency of undesired system states and/or reduce their consequences. These protective measures are then assigned to the underlying system components or functions using security requirements.

Risk acceptance and risk evolution

A key technology for system operators is the "risk matrix." A risk in the risk matrix is a possible consequence (typically an accident) resulting from a hazard. To determine this, the expected frequency of the consequences is multiplied by the expected severity of the damage and entered into the risk matrix. This allows for the classification of risks according to their frequency (vertical axis) and their severity of damage (horizontal axis). The two axes are typically divided into a few descriptive sections.

Quantifying the axes allows the risks associated with the cells to be quantified and their acceptability to be assessed (see Figure 3, PDFUsing the unit of measurement "money per unit of time", these figures illustrate the respective risks very clearly: This amount has an enormous range and must be "set aside" by the operator for each specified unit of time in order to cover a risk which is assigned to the respective cell.

Typically, a risk matrix is divided into three color-coded areas:

The risks that fall within the green zone are so small that they can be accepted without discussion.
The risks in the red zone are so significant that they are completely unacceptable. Action is absolutely necessary here.
The yellow zone represents risks that are borderline cases. Investment in measures here is only made if they can be implemented with reasonable effort. This area is also called the "ALARP zone" ("As Low As Reasonably Practicable").

The result of this process is also referred to as "risk acceptance," as it declares the system operator's risk appetite. When the individual risks examined in the risk analysis are entered into the risk matrix, this is called "risk evaluation." This is shown in Figure 4 (see PDF) based on three risks.

In the diagram, each risk is shown twice: once without considering protective measures (light blue) and once with them (dark blue). A protective measure to reduce the frequency of risk 1 is clearly in place. In contrast, a protective measure to reduce the extent of damage appears to be effective for risk 2. For risk 3, both types of protective measures appear to be effective, so that the risk is reduced from the unacceptable range to at least the ALARP range.

Domain-specific modeling languages using UML profiling

UML profiling is a mechanism that has been part of UML for over 10 years. It allows the development of domain-specific modeling languages that can then be used with most UML tools. A UML profile is a special package that defines so-called stereotypes. These can be applied to standard UML elements (such as packages, classes, use cases, etc.) to express a domain-specific meaning. A stereotype can also introduce specific properties for these model elements (so-called "tag definitions") and even have its own symbol.

As part of a project for the Swiss Federal Railways, we have now developed a UML profile for modeling the key concepts from Figure 2. Figure 3 (see both figures) PDFFigure 5 shows a section of this profile. What is noticeable in this diagram is that the names of many tag definitions begin with a slash („/“). This means that the values of these properties do not need to be set by the modeler, but are calculated automatically by the UML tool. Corresponding formulas from the risk analysis are stored for these properties (see Figure 5)., PDF).

Application

Figure 6 (see PDFFigure 1 shows an excerpt from a simplified risk model for a fictitious level crossing (real risk models are considerably more complex). The diagram uses the UML profile introduced above and shows both causes (yellow) and hazards (orange) that can lead to various collisions (red). The collisions also show some of the automatically calculated properties, both with and without consideration of protective measures (green) and safety requirements (blue).

Practical experience

However, certain difficulties also arose during the project-specific development of concrete risk analyses:

Cultural difficulties: After more than 10 years of stability, the EN 50126 standards underwent a major revision, which has not yet been fully established. This has led to differing interpretations and corresponding discussions among regulatory authorities and railway operators. Furthermore, a model-based approach was fundamentally new to many stakeholders, as risk assessments had previously been carried out primarily using individual Excel spreadsheets.
Model structuring: Several iterations were necessary to find an optimal and easily maintainable model structure. Furthermore, it was often difficult to find the cause of an absurd calculated value. An automated plausibility check introduced during the course of the project greatly simplified this process.
Performance: The client/server architecture of the UML tool used, with its client-side calculations, was only partially suitable for the sometimes very complex calculations. This resulted in the diagram editors reacting very sluggishly to changes, as many properties had to be recalculated.

Summary

The risks of complex, heterogeneous systems can and must be quantified to make them comprehensible, acceptable, and therefore justifiable. Model-based risk analysis enables the structured identification of system risks and the systematic derivation of countermeasures and safety requirements from these risks. Using UML profiling, a model-based risk analysis can be performed with a conventional UML tool and, if necessary, closely linked to other models. The author is also involved in the further development of the OMG's UML Testing Profile [UTP] for model-based testing. The integration of risk analyses with risk-based testing techniques is of particular interest here.

Bibliography and list of sources

[EN50126]
CENELEC: Railway applications – The specification and demonstration of Reliability, Availability, Maintainability and Safety (RAMS) – Part 1: Generic RAMS Process, CENELEC – European Committee for Electrotechnical Standardization, WG14, prEN 50126-1, DRAFT, 04-SEP-2012

[SysML]
Object Management Group: OMG Systems Modeling Language (OMG SysML™) – version 1.3, ptc/2012-06-01, June 2012

[UTP]
Object Management Group: UML Testing Profile (UTP) – version 1.2, ptc/2012-09-13, September 2012

Download the article as a PDF

Our training courses & coaching sessions

Do you want to bring yourself up to date with the latest technology?

Then find out more here Regarding training courses/seminars/workshops and individual coaching sessions offered by MircoConsult on the topic Quality, Safety & Security.

Here You will also find training courses on software and contract law.

Training & coaching on the other topics in our portfolio can be found here. here.

Quality, Safety & Security – Expertise

Valuable expertise on the topics of quality, safety & security is available. here Available for you to download free of charge.

To the specialist information

You can find expertise on other topics in our portfolio here. here.

MicroConsult Newsletter

With the MicroConsult newsletter, you'll stay on the pulse of the embedded world. Look forward to proven practical knowledge, real professional tips, and current events – directly from our experts for your project success.

Subscribe now!

Published by

weissblau media

← Integrated Model-based Safety Engineering with I-SafE The paradox of "custom-specific standard software"" →