Multicore hardware tracing in practice

An industrial case study

Authors: Felix Martin, Maximilian Hempe and Michael Deubzer, Timing-Architects Embedded Systems GmbH

Contribution – Embedded Software Engineering Congress 2016

The automotive industry is experiencing a steadily growing demand for increasingly complex embedded systems driven by innovative functions. The number of new functions implemented entirely through software will continue to rise, particularly in the areas of ADAS, connected cars, autonomous driving, and mobility services. This demand is met by higher-performance embedded systems that incorporate multi- and many-core controllers and are increasingly being deployed in traditional vehicle domains as well. This case study focuses on the traditional domain of steering, but its findings are applicable to other traditional domains as well.

Safety-critical functions in these areas are subject to stringent real-time requirements to prevent malfunctions that endanger human lives. Therefore, verification of the performance and timing of these functions is necessary and required by ISO 26262. As the complexity of these functions and systems increases, so does the need for automation and abstraction, which is reflected in new and enhanced tools.

Tracing plays a crucial role alongside simulation and static analysis as a method for verifying temporal behavior. Tracing allows for the detailed recording and validation of the dynamic behavior of time-critical applications. This work examines and compares various tracing methods with the aim of providing an optimal solution that offers very high measurement depth, width, and length with minimal measurement deviation. Existing approaches are combined, and a selection of recommended tools is made.

Introduction

In real-time systems, correct system behavior depends not only on the accuracy of the calculated data and the decisions derived from it, but also on when this data is available and the corresponding decisions are made. The time a function must adhere to in order to operate correctly is defined in informal timing requirements. Particularly in safety-critical domains such as the automotive industry, these requirements must be met to minimize the potential hazards to people and the environment [1]. The use of multi-core processors complicates this issue because the timing behavior is negatively affected by the simultaneous access of various software components to memory and peripherals [2].

Real-time requirements at various levels in safety-critical systems are checked for each release throughout the entire development process [3] [4]. Tracing allows the timestamps of function execution and data access to be recorded, enabling the evaluation of the correct timing behavior of these functions. This paper demonstrates which steps are recommended, considering costs and benefits, for setting up such a process in practice. After a brief introduction to the fundamentals of real-time analysis and tracing, as well as a comparison of existing tracing methods, the following section discusses the specific steps for designing a recommended trace process for checking time-critical functions. This evaluation and recommendation incorporates experience gained from a real-world steering system project.

Fundamentals of real-time analysis and tracing

The often informal timing requirements in existing projects must be translated into a formal description for further automated processing. A data model based on current automotive industry standards is desirable for this purpose. One possibility is the AUTOSAR Timing Extensions [3], whose approach is discussed below. To select the tracing methodology with the highest possible measurement depth, width, and length while minimizing measurement error, various tracing techniques are first considered and compared. The open BTF format specified in the AMALTHEA standard was chosen for recording the traces in this case, as it offers full support for multi-core and multi-processor traces [4]. Several solutions exist for evaluating traces. The last section of this document explains the basic functionality of such tools. In this context, the TA Inspector from Timing-Architects has proven to be the best choice, as it allows for profiling as well as automatic evaluation and reuse of formalized timing requirements [5].

Time requirements

Time requirements are defined at various levels of abstraction, such as the system, ECU, and software levels. At all levels, the evaluation of real-time behavior is of interest. The software level encompasses memory accesses and executed instructions. Memory accesses can be read or write accesses to a variable or a hardware register. Instructions refer to the execution of a CPU command. In contrast, the ECU level contains parts of the application software, such as software components and runnables, as well as parts of the RTE and middleware, including tasks and communication interfaces.

All types of state changes in the system are referred to as events and must be annotated with a timestamp for real-time analysis. Time requirements can be defined for the periods between two or more different events. An example from the given domain is the reading and processing of the signal. torque of the torque sensor and calculation of the target torque for the motor of an electromechanical steering system. Observable events here are the reading and writing of the signals. Sensor torque and Motor torque rating and the execution or status changes of the functions processing and Target torque calculation. These four events must occur in a causal chain within a maximum time, or the system must react in time within a safety-critical fault tolerance time and thus prevent endangering human life.

Figure 1 (see PDFFigure 1 shows the definition of a chain of events across various control units. In this example, the events are defined on the communication interfaces of the participating software components. Time requirements can be defined for the chain as a whole or for individual segments.

Tracing

To evaluate the defined time requirements, it is necessary to record a trace. A trace is a list of events with their respective timestamps. To enable the evaluation of all requirements, all events associated with requirements must be included in the trace. According to Ferrari, there are three different methods for recording traces: software-based, hybrid, and hardware-based tracing [6].

Software-based tracing allows trace recording without dedicated hardware. Instead, relevant events are instrumented for analysis. Several strategies exist for adding this instrumentation. One method is to instrument the source code at the points where the relevant events occur. Another option is to use hooks provided by the operating system. If the source code is unavailable, the instrumentation can also be added directly to the software binary. With software-based tracing, the trace can be made available for evaluation in two ways. The first option is to store the data in memory on the ECU at runtime and read it back later. Alternatively, the data can be transmitted from the ECU in real time via an interface, such as CAN [7].

In contrast to software-based tracing, hardware-based tracing does not require any instrumentation a priori. Instead, the relevant events are read directly via a dedicated hardware component (emulation device) and a corresponding interface. This approach makes it possible to record both memory accesses via data tracing and executed instructions via program flow tracing [6]. For this purpose, the dedicated hardware component has access to the processor's cores and memory buses. The detected events are timestamped and sent via the trace interface.

Hybrid tracing refers to approaches that combine software- and hardware-based tracing. For example, operating system hooks can be used to write events relevant for analysis into memory. These messages can then be read via existing data interfaces such as CAN.

Software-based tracing is generally easier to set up because it doesn't require dedicated hardware. In contrast, hardware-based tracing requires an emulation device and specialized tracing hardware. The advantage of hardware-based tracing, however, is its significantly greater bandwidth, which allows for longer traces with a larger number of objects and greater measurement depth. The number of objects and the trace length are particularly limited in software-based or hybrid approaches without a dedicated trace interface due to available memory and the limited bandwidth of traditional data interfaces.

Tool-supported trace analysis

Regardless of the chosen tracing technique, the result is a trace that enables the evaluation of the time requirements. The evaluation is performed in two steps: In the first step, all metrics relevant for validating the requirements are calculated. In the second step, the calculated metrics are compared with the requirements. The result is a list of requirements along with information on how often each requirement was violated. The bottom row in Figure 2 (see PDFThe graphical representation of the results of a requirements analysis in TA Inspector is shown. Using a traffic light system, the user can see which requirements have been violated.

Trace-based real-time analysis in practice

This section describes in detail the steps necessary to monitor the timing requirements and complete dynamic behavior of an electronic control unit (ECU) in the automotive industry. The reference system is a steering system from a major German OEM. The first section addresses the selection of the appropriate tracing technique and the challenges encountered during the implementation of hardware-based tracing. It then describes how the trace can be converted into the BTF format. The final section demonstrates how the generated data can be prepared for further analysis.

Tracing

Before a tracing process can be set up, the requirements for the generated traces should be defined. For the steering control unit, in addition to tasks and ISRs, runnables and accesses to specific signals should also be recorded. Furthermore, recording traces over several seconds is necessary to cover various steering movement patterns, such as the movement from one steering lock to the other. The combination of these two requirements necessitates a hardware-based tracing approach. Temporarily storing trace data for nearly 300 runnables is a task that a purely software-based tracing approach cannot handle.

Hardware-based tracing requires a dedicated trace module on the processor. The processor used for the ECU is an NXP Leopard (MPC5643L). A look at the datasheet shows that the processor provides data and program flow tracing according to the NEXUS standard [8]. The former can be found in the datasheet under data trace messaging (DTM), the latter under branch trace messaging (BTM) [9]. Thus, the first requirement is met.

Now the recorded data must be transferred from the processor. Depending on the package used, the Leopard has either four or twelve message data out (mdo) pins through which the events from the chip can be sent. The version of the package used here offers four output pins, which limits the maximum trace bandwidth. A bigger problem is that the NEXUS pins are not routed to a connector on the ECU. Therefore, there is no way to access the trace data.

There are several ways to solve this problem; two of them are discussed below. The board supplier could be asked to modify the board layout so that the four pins lead to a socket. However, this is not a solution that can be implemented quickly. Furthermore, the interface on the board would be inaccessible due to its placement within the housing.

The second solution is the use of an emulation board [10]. Here, a Leopard processor on the board with a BGA257 package (i.e., the larger package with a 12-pin Nexus interface) emulates the actual processor of the control unit. This solves two problems: First, the limitation of the four-pin interface is bypassed. Second, the Nexus interface is exposed via the emulation board. Therefore, no further modifications to the control unit itself are necessary.

To connect the emulation board to the ECU, the existing processor with the QFP100 package must be desoldered. A solder socket is then attached in its place. This socket can now be used to connect the emulation board to the ECU. To make this possible, a hole was cut into the ECU housing. This setup behaves the same as the original ECU, with the advantage that the 12-pin Nexus interface is available for tracing.

transformation

The Nexus interface now allows for the recording of traces. For real-time analysis, these must be transformed into the BTF format. The objects to be analyzed in this project are tasks, ISRs, runnables, and selected variable accesses. In OSEK operating systems, tasks and ISRs can be recorded via data tracing. Variable accesses are also captured via data tracing. Runnable events are function calls that can be registered using program flow tracing.

However, implementing the transformation in this way presented difficulties. First, the µC/OS operating system from Micrium is used, which is not OSEK-compliant [11]. Consequently, the usual task-trace approach via the OSEK Task State Array is not applicable. Another problem is that the ISRs correspond to OSEK Category One. Therefore, the operating system does not know when ISRs are executed and consequently cannot store this information in traceable data structures. To address this limitation, a program flow trace can also be used for the ISRs. However, this is not possible due to the third limitation. At the time of this project, iSYSTEM did not yet support full multi-core profiling. Therefore, using the program flow trace for multiple cores is not possible. Since simultaneous analysis of both cores is mandatory, program flow tracing cannot be used, and consequently, runnable and ISR events cannot be directly recorded.

Ultimately, despite the emulation adapter, a hybrid tracing approach is necessary. This approach involves recording runnable and ISR information via instrumentation. The prerequisites for this are, firstly, that the source code for instrumentation is available, and secondly, that the instrumentation's influence must not significantly distort the software's runtime characteristics. The first condition is met, but the second condition must be accepted for the time being due to a lack of alternatives.

The instrumentation itself is performed according to the following scheme: First, an ID is assigned to all ISRs and Runnables of interest for analysis. Then, the source code is searched for calls to the Runnables and the ISR definitions. If a call to a relevant Runnable is found, the corresponding ID is stored in a dedicated variable before and after the Runnable. Additionally, an extra bit is set after the call, reserved to indicate whether a start or termination has occurred. The procedure is similar for the ISRs, except that the instrumentation is added within the context of the ISR itself. This makes it possible to record Runnable and ISR events by monitoring the dedicated variable via data tracing. The same applies to accesses to certain other variables of interest. In this case, for example, this is the system state variable, which indicates the current state of the system.

Finally, it is necessary to examine how the task states can be reconstructed. This requires a closer look at the operating system. It becomes apparent that an array of task context blocks (OSTCBTbl) exists. Each of these blocks is used by the operating system to store various pieces of information, such as the task stack address, task stack size, priority, and task state. There is also a variable that contains a reference to the context view of the currently running task (OSTCBCurIn OSEK operating systems, the state in the task context view would be sufficient to reconstruct the events for a task. However, in the case of microcontrollers/operating systems, there are no separate states for this. Ready and Running. Instead, these are in the state Runnable In summary, similar to the task state model in Linux. This implies that it is additionally necessary to define the variable OSTCBCur to trace in order to differentiate between Ready and Running.

With this knowledge, it is now possible to convert the entire trace to BTF. This involves data tracing and recording write accesses to the dedicated variable for the ISRs and runnables, the complete task context block array, and the variable OSTCBCur. The exported data trace is then a CSV file containing the following fields: timestamp, address of the written variable, value written, and kernel from which the access originated. In addition to the trace itself, the mappings of IDs to runnables and ISRs, addresses to variable names, and task state variable addresses to task names are also necessary. The first of these is created during instrumentation. The other two can be read using the debugger.

Evaluation

The final step is evaluating the trace in BTF format using the TA Inspector. In addition to the previously discussed metrics relating to the time between two or more events, metrics are also calculated that are not time-based but are nevertheless important for analysis. Examples include the number of interruptions of a task instance, the load caused by an ISR on a specific core, and the number of multiple task activations. The TA Inspector is capable of calculating all these metrics.

On the other hand, not all metrics calculated during trace analysis are always relevant. Therefore, the TA Tool Suite supports the automated generation of reports in various formats. The metrics and requirements included in these reports can be freely configured. Report generation is possible in various formats such as HTML, LaTeX, and XML. Both the analysis itself and the report generation can be operated via a console interface. This allows for fully automated report generation.

Advantages of trace-based real-time analysis

Once the process for trace-based real-time analysis is implemented, the generated data must be used to continuously monitor and improve the robustness of time-critical functions. For the steering system, the evaluation results could be used for the following use cases:

Performance tests: Timing and scheduling verification at the system level
Resource Usage Tests: Robustness tests to verify timing budgets at the system and unit levels.

The steering system in a vehicle is a safety-critical component classified as ASIL D according to ISO 26262 [12]. Therefore, the standard requires that correct real-time behavior be demonstrated at the system level using defined methodologies, particularly for the system's safety aspects. At the system level, the goal is to verify correct functional performance and the error-free operation of safety mechanisms. To this end, section 8.4ff of the standard defines performance tests, which are to be applied to verify timing and scheduling at the system level, and resource usage tests to ensure that memory sizes and runtime of runnables remain within defined budgets, thereby ensuring the robustness and availability of the system.

For this reason, an important business objective of steering system development is the complete verification of real-time capability: proof that timing requirements, such as chains of action in the system from sensor to actuator, function according to specification in the correct time and that maximum execution times of functions and maximum data ages of signals are adhered to.

For example, in an electromechanical steering system, it must be ensured that the hand torque is applied via a torque sensor, through the preprocessing of the basic software, the calculation of customer functionality in applications, and finally to the torque setting at the motor, all within the safety-critical requirements. Fault tolerance times serve to ensure a timely response to signal errors, incorrect signal processing, faulty applications, or incorrect torque setting at the motor, which is ensured by monitoring functions. The methodology shown here is also used to verify compliance with these fault tolerance times, depending on the safety objective.

Additional budgeting of the overall system is performed in distributed software development to partition the system and allocate the necessary resources to the base software, the application, and any other parties involved. Furthermore, detailed resource budgets are defined at the unit level in the software specifications. These budgets for partitioning integration components and units are verified through resource usage tests, which are performed at the system level for each release using worst-case timing scenarios – also known as stress tests.

An additional business objective is performance evaluation and optimization, which involves analyzing the system for resource bottlenecks and top consumers, and identifying potential runtime optimizations. Evaluating hardware traces is a crucial tool for assessing timing issues within the overall system context. For example, worst-case scenarios can be explained by scheduling effects. Analyses such as interference analysis help identify these effects.

For the purposes of timing verification and analysis, system, component, and software requirements regarding timing behavior are translated into formal, machine-readable timing requirements and constraints. The goal of this processing step is to enable automated evaluation of the timing requirements on the recorded traces using tool support. The TA Tool Suite offers a convenient function for this, allowing timing analysts to define formalized timing requirements/constraints not only in tabular form but also graphically, and to easily reuse them in future tests. This enables a regression testing strategy. In this case study, the timing requirements/constraints listed in Table 1 were used to formalize the timing requirements specified in the project.

In addition to these timing requirements/constraints, metrics for analyzing the dynamic behavior of the system are derived from the recorded BTF traces. Table 2 shows the metrics used here.

Level timing requirement	Timing requirement/constraint
Overall performance	Utilization Constraint -> Max. Load in %
Unit budgets	Upper Limit -> Net Execution Time
Task/ISR deadlines	Upper Limit -> Response Time Tasks/ISRs
Chains of effects	Min/Max Interval -> Delay Requirement -> Event Chain Requirement Events: Task/ISR/Runnable-State change or signal access (read/write)

Table 1: Requirement types used to define timing requirements for verification

level	Metric	Short name	definition
System/CPU	CPU load of the cores	CLC	CPU Load Cores; relative load in % of a core, time in which a core performs active calculations.
Tasks/ISRs	CPU load	CLP	CPU Load Processes; relative net response time in % of a task/ISR, time from start to end of a task minus interruptions.
	Normalized response time	NRT	Normalized Response Time; normalized net response time of a task/ISR, time from start to end of a task minus interruptions.
	Execution time	NET	Net Execution Time; net execution time of a task/ISR, time in ms from start to end of a task minus interruptions.
	Response time	RT	Response Time; response time of a task/ISR, time in ms from activation of a task until end including interruptions.
	Activation time	A2A	Activate-to-Activate Time; distance between two activations of a task/ISR
Runnables	Net execution time	NET	Net Execution Time; net execution time of a runnable, time in ms from start to finish excluding interruptions.
	Gross execution time	GET	Gross Execution Time; Gross execution time of a runnable, time in ms from start to finish including interruptions.

Table 2: Metrics used for performance tests

The timing verification is performed iteratively for each major and intermediate release of a software [13]. As shown in Figure 4 (see PDFAs shown, three essential steps are necessary to carry out the timing verification:

The test specifications for the given release are updated: Changed and added timing requirements are formalized and recorded in timing requirements and constraints. AMALTHEA was chosen as the format for describing the timing requirements/constraints, as it is a quasi-industry standard with the largest set of expression options for requirements regarding the dynamic behavior of software [14].
The hardware trace test is performed using a hybrid method consisting of functional and data tracing, partially instrumented. As described above, the method uses a Nexus interface and the iSYSTEM iC5000 trace debugger. The captured raw data is translated into the BTF format, which is AMALTHEA-compliant.
During the verification process, the BTF traces recorded in step 2 are analyzed and evaluated. The requirements/constraints updated in step 1 are automatically checked. A configurable report is generated, containing the results of the evaluated requirements and desired profiling data (timing metrics). The report provides direct indication of requirement violations, enabling the derivation of corrective actions, for example, due to design changes. Verification and report generation are performed using the TA Inspector tool.

Conclusion and Outlook

This work examines how a combined approach of software- and hardware-based tracing can be used to evaluate safety-critical real-time requirements. This approach allows the recording of traces with a breadth, depth, and length that are not possible with a software-based tracing approach.

With the transformation to the BTF format, requirements evaluation is also possible for systems with multiple cores or even ECUs. TA Inspector enables the evaluation of formally defined requirements and supports the AUTOSAR and AMALTHEA standards. This ensures application security according to ISO 26262 across all software layers. In the future, requirements for events distributed across multiple ECUs will be made verifiable. This requires combining and consolidating bus and ECU traces, i.e., synchronizing them to a common point in time. This problem is already being addressed in a research project.

Sources

[1] R. Hilbrich, R. van Kampenhout and H.-J. Goltz, „Model-based generation of static schedules for safety-critical, embedded systems with multicore processors and hard real-time requirements“, in Challenges of Real-Time Operation, Springer, 2012, pp. 29-38.

[2] K. Schmidt, D. Marx, K. Richter, K. Reif, A. Schulze and T. Flämig, „On timing requirements and a critical gap between function development and ECU integration,“ in SAE Technical Paper, 2015.

[3] AUTOSAR, „Specification of Timing Extensions,“ 2014.

[4] Timing Architects Embedded Systems GmbH, „BTF-Specification“, AMALTHEA ITEA2 Project (https://wiki.eclipse.org/images/e/e6/TA_BTF_Specification_2.1.3_Eclipse_Auto_IWG.pdf).

[5] Timing-Architects Embedded Systems GmbH, „TA Tool Suite – TA Inspector“, https://www.timing-architects.com/ta-tool-suite/inspector/ (Accessed: 2016-09-10), Regensburg, 2016.

[6] D. Ferrari, Computer systems performance evaluation, Prentice Hall, 1987.

[7] J. Kraft, A. Wall and H. Kienle, „Trace recording for embedded systems: Lessons learned from five industrial projects,“ in Runtime Verification, Springer, 2015, pp. 315-329.

[8] J. Turley, „Nexus standard brings order to microprocessor debugging,“ www.nexus5001.org, 2004.

[9] NXP, MPC5643L Microcontroller Reference Manual, https://cache.nxp.com/files/32bit/doc/ref_manual/MPC5643LRM.pdf (Accessed: 2016-09-10), 2013.

[10] iSYSTEM, „Nexus Emulation Board,“ https://www.isystem.com/files/products/OnChip/MPC55xx/IA257BGA100TQ-564xL_V13.pdf, 2012.

[11] M. Holenderski, M. van den Heuvel, RJ Bril and JJ Lukkien, „Grasp: Tracing, visualizing and measuring the behavior of real-time systems,“ in International Workshop on Analysis Tools and Methodologies for Embedded and Real-time Systems (WATERS), 2010, pp. 37–42.

[12] ISO, „ISO/FDIS 26262-4:2010(E) Road vehicles Functional safety – Part 4: Product development: system level“, ISO, Geneva, 2012.

[13] A. Dr. Schulze, S. Richter, T. Flämig, K. Dr. Schmidt, D. Marx, H. Christlbauer, K. Richter, S. Schliecker and C. Ficek, „Multi-core-Hardware-Tracing in Practice“, ELIF Congress, Baden-Baden, 2015.

[14] L. Michel, T. Flämig, D. Claraz and R. Mader, „Shared SW development in multi-core automotive context“, ERTS Congress, Toulouse, 2016.

Download the article as a PDF

Multicore – our training & coaching

Do you want to bring yourself up to date with the latest technology?

Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of multicore/microcontrollers.

Training & coaching on the other topics in our portfolio can be found here. here.

Multicore – Expertise

Valuable expertise on the topic of multicore/microcontrollers is available. here Available for you to download free of charge.

To the specialist information

You can find expertise on other topics in our portfolio here. here.

MicroConsult Newsletter

With the MicroConsult newsletter, you'll stay on the pulse of the embedded world. Look forward to proven practical knowledge, real professional tips, and current events – directly from our experts for your project success.

Subscribe now!

Published by

weissblau media

← One thing at a time! Cross-platform software for multicore & FPGAs →

Multicore hardware tracing in practice

An industrial case study

Contribution – Embedded Software Engineering Congress 2016

Introduction

Fundamentals of real-time analysis and tracing

Time requirements

Tracing

Tool-supported trace analysis

Trace-based real-time analysis in practice

Tracing

transformation

Evaluation

Advantages of trace-based real-time analysis

Conclusion and Outlook

Sources

Multicore – our training & coaching

Multicore – Expertise

MicroConsult Newsletter

Published by

weissblau media

Latest posts

Categories

Subscribe to the blog