Parallel programming of systems-on-a-chip
Author: Tobias Schüle, Siemens AG
Contribution – Embedded Software Engineering Congress 2015
The Embedded Multicore Building Blocks (EMB²) are an open-source library for parallel programming of embedded systems. EMB² is based on MTAPI (Multicore Task Management API), a standard for task management in applications implemented with C/C++. Below, we provide an overview of EMB² and demonstrate how parallelism can be leveraged beyond the limitations of traditional multicore processors using MTAPI.
When programming embedded systems, developers are often confronted with parallelism and heterogeneity [1]. Parallelism is required at the application level to fully utilize the performance of multicore processors. However, this is associated with numerous hurdles and pitfalls in software development. To complicate matters, many embedded systems contain not only multiple identical processor cores but also specialized accelerators such as signal processors or even programmable logic devices (FPGAs). In fact, modern systems-on-a-chip (SoCs) consist of a wide variety of processors optimized for different purposes. Such SoCs are characterized by high performance with relatively low power consumption. The downsides include high software development complexity, vendor lock-in, and consequently, a lack of portability. The MTAPI standard [2] promises to solve these problems. Before we delve into the fundamental concepts of MTAPI, however, we will provide an overview of EMB².
Embedded Multicore Building Blocks (EMB²)
Figure 1 (see PDFThe diagram shows the essential library components and their place within the overall system. At the lowest level is the base library, which abstracts from the operating system and processor architecture. The MTAPI implementation builds upon this, and can be used either directly by the application or indirectly via the components. Algorithms and Dataflow. The latter provide frequently used parallel algorithms or templates for processing data streams. A special feature of the MTAPI implementation is its support for task priorities and affinities. These support the implementation of real-time properties and allow for fine-grained control over the hardware. For example, affinities can be used to reserve processor cores for specific tasks.
Furthermore, thread-safe data structures (containers) are available to the developer, specifically designed for use in embedded systems. One key feature is that these data structures do not dynamically allocate memory during operation – a necessity for many embedded systems, especially in safety-critical areas. Additionally, they do not require blocking synchronization mechanisms.lock-/wait-free), from which guarantees regarding the progress of the accessing threads can be derived [3]. This is a significant advantage over classical, blocking methods, especially in embedded systems.
A more detailed description of the components with code examples can be found in [4] and [5, 6] (see also https://github.com/siemens/embb/).
Multicore Task Management API (MTAPI)
To fully utilize the performance of multicore processors, the work to be done must be divided into small parts (Tasks) are decomposed. Since threads are generally too heavyweight for this purpose, efficient scheduling methods have been developed over the last one to two decades that map executable tasks to the available cores. For several years now, MTAPI has also provided standardized interfaces for creating, managing, and synchronizing tasks. Key goals in defining MTAPI were usability in small, resource-constrained systems via a lightweight API, as well as support for heterogeneous systems with distributed memory and different instruction set architectures. Furthermore, MTAPI does not necessarily require an operating system but can also run directly on the hardware (bare metal) are operated.
Figure 2 (see PDFThis illustrates the programming of heterogeneous systems with MTAPI. The example system consists of a multicore CPU with four cores, where one core plays a special role with its own operating system, a graphics processing unit (GPU), and a digital signal processor (DSP). These components are each addressed by an MTAPI node (Node) represents and together form a domain (domainAs a user, you can either explicitly specify on which node a task should be executed, or leave the decision to the scheduler. By default, EMB² uses priority-based scheduling within a node. Work Stealing, While tasks within a domain are distributed depending on node utilization, MTAPI itself does not prescribe any scheduling methods, and thus it is also possible to use custom scheduling algorithms in EMB².
Besides tasks, MTAPI recognizes two other related concepts: Jobs and Actions. According to the MTAPI specification, a task is the execution of a job with specific data. A job, in turn, represents the function to be executed at an abstract level. Every job is implemented by at least one action. This allows for implementations for different processors in heterogeneous systems. An action can even be implemented entirely in hardware. Figure 3 illustrates the relationship between tasks, jobs, and actions. Since actions can be very diverse, EMB² provides a plugin interface through which user-defined actions can be integrated into MTAPI. For example, an FPGA-based action can be assigned to a job via a plugin and executed transparently with other actions. In addition to supporting heterogeneous systems, the separation of jobs and actions has the advantage that applications for a product family consisting of different hardware configurations can be developed in a uniform manner.
EMB² provides predefined plugins for commonly used technologies. The following example demonstrates the creation of an action for OpenCL code. First, the corresponding header file is included, the OpenCL kernel is defined, and then the plugin is initialized:
#include <embb/mtapi/c/mtapi_opencl.h>
const char * kernel = "__kernel void MyKernel(…) {…} ";
mtapi_status_t status;
mtapi_opencl_plugin_initialize(&status);
Now the action can be registered and assigned to OPENCL_JOB:
float node_local = 1.0f;
action = mtapi_opencl_action_create(
OPENCL_JOB,
kernel, „MyKernel“, local_work_size, element_size,
&node_local, sizeof(float),
&status);
This includes local_work_size and element_size OpenCL-specific arguments, and node_local refers to shared data within the node [5, 6]. Before the job is executed by a task, additional actions can optionally be assigned to it [2].
Summary
Parallel programming is becoming increasingly important in the development of embedded systems, as no significant speed improvements can be expected from single-core processors. However, conventional thread-based approaches are often error-prone, inefficient, and cumbersome to use. Heterogeneous systems-on-a-chip, which integrate diverse processors onto a single chip, present a further challenge. Standardized interfaces such as MTAPI help to abstract away this diversity and efficiently utilize parallelism at the system level.
Bibliography
[1] H. Alkhatib, P. Faraboschi, E. Güterberg, H. Kasahara, D. Lange, P. Laplante, A. Merchant, D. Milojicic, K. Schwan. „IEEE CS 2022 Report“. IEEE Computer Society, 2014.
[2] „Multicore Task Management API (MTAPI) Specification V1.0“. The Multicore Association, 2013.
[3] M. Herlihy, N. Shavit. „On the Nature of Progress.“ International Conference on Principles of Distributed Systems (OPODIS), Springer, 2011.
[4] T. School. „Embedded Multicore Building Blocks – Parallel Programming Made Easy“. Embedded World, 2015.
[5] Embedded Multicore Building Blocks – Tutorial, 2015.
[6] Embedded Multicore Building Blocks – Reference Manual, 2015.
Multicore – our training & coaching
Do you want to bring yourself up to date with the latest technology?
Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of multicore/microcontrollers.
Training & coaching on the other topics in our portfolio can be found here. here.
Multicore – Expertise
Valuable expertise in modeling/embedded and real-time software development is available. here Available for you to download free of charge.
You can find expertise on other topics in our portfolio here. here.
