Select Page

Software Design Automation for efficient multicore utilization

Software distribution under performance and energy limitations

Author: Maximilian Odendahl, CEO, Silexica GmbH

Contribution – Embedded Software Engineering Congress 2016

The development of software for novel multicore systems presents numerous industries in the embedded high-performance sector with new challenges. Compared to existing single-core systems, the use of modern multicore systems requires not only functional verification and single-core optimization, but also the optimal distribution of the software across the various processor cores of the target platform. This involves not only considering timing and functional requirements, but ideally also minimizing performance and energy consumption. Even for small applications and target systems, this results in a multitude of possible solutions, the development and evaluation of which push existing, primarily manual, development approaches to their limits. Novel algorithms and tools from the field of software design automation can provide a solution. The following section presents a novel tool-based solution approach in more detail and demonstrates its advantages using an application example from wireless telecommunications.

Multicore programming and its challenges

The development of software for novel multi- and many-core systems presents numerous industries in the embedded high-performance sector with additional challenges. This is due to the increasing complexity of available hardware systems, as well as rising demands. Alongside high performance for a wide range of application scenarios, minimal power consumption is often required simultaneously. This problem is further exacerbated by ever-shorter development cycles and increased security requirements. The resulting "efficiency gap" between the outstanding capabilities of modern hardware and the ability to efficiently utilize these capabilities through software cannot be economically closed using the currently common manual system design and programming methods.

The challenges here, besides parallelizing existing (often decades-old) C/C++ code, lie primarily in optimally distributing already parallelized software while considering temporal and functional constraints. For example, a Long Term Evolution (LTE) Layer 3 implementation consists of up to 50,000 separate tasks that must be efficiently distributed across hundreds of heterogeneous processor cores (RISCs, DSPs, hardware accelerators, etc.). This requires considering not only the necessary and available computing power, but also, increasingly, the communication between the processor cores, which is becoming a bottleneck for system performance. Therefore, to meet all temporal and functional constraints, a combined approach to computing power and communication is necessary.

Solutions through Software Design Automation

For various problems, such as the example described above from the telecommunications sector, manual design and programming are no longer economically viable. A new generation of automated development tools solves these problems through dynamic data flow analysis, automatic runtime estimation, and automatic software task allocation. Throughout the entire development process, the target platform is directly integrated, enabling combined software and hardware optimization.

Based on process networks that explicitly describe relationships between parallel processes (tasks) through data channels, in addition to a basic application analysis (deadlocks, task runtimes, channel throughputs, etc.), an automated software distribution of the tasks to a target platform is performed (See Fig. 1 in the PDF.Several optimization criteria (performance, storage, power consumption) are available, which can be automatically iterated and manually fine-tuned, taking into account various requirements or limitations (latencies, throughput, storage, etc.).

Power consumption and energy consumption

For many industries, reduced power consumption and/or low energy consumption are already of great importance. Examples include new systems in the telecommunications sector for the next generation of mobile communications (5G) as well as current systems in the field of automotive driver assistance systems and autonomous driving.

The optimization criteria can vary considerably: While the design of base stations in mobile communications aims to reduce maximum power consumption in order to continue using the existing network infrastructure, in the automotive sector the focus is on reducing average power consumption to increase vehicle range and minimize heat generation.

Two algorithms were developed for these two scenarios:

  1. Algorithm for minimizing average power consumption:
    The goal of this algorithm is to determine the optimal processor for each task of a (parallel) application, and the optimal voltage and frequency combination for each processor. Iteratively considers both static and dynamic power consumption using task-specific instructions executed on the processor. Furthermore, different frequency and voltage domains are automatically taken into account to prevent invalid configurations. In each iteration, the tasks are distributed as evenly as possible across the platform, considering their individual runtimes (which can also vary depending on the processor), without increasing the overall application runtime. Runtime requirements (latencies, throughput, etc.) are verified separately in each iteration using a high-level scheduler simulation. The result of the scheduler simulation is then used in the next iteration to select the next platform configuration.
  2. Algorithm for minimizing maximum power consumption:
    The goal is to avoid exceeding a given power budget (maximum power consumption) during application execution on a target platform. To achieve this, tasks are proportionally distributed across different processors based on their priority and power consumption profile (power consumption over time). Task priorities are dynamically determined based on their membership in the application's critical path. The processor power consumption profile is calculated using the static and dynamic power consumption for each task distribution and frequency/voltage configuration (so). If the power budget for a configuration is exceeded, lower (less performant) frequency/voltage configurations are investigated in the next iteration.

The algorithms presented here are implemented in a similar form in the Silexica Tool Suite and are used in the following case study.

Case study telecommunications

Due to increasing connectivity and the growing importance of information technology in everyday life, companies in the telecommunications industry, in particular, are confronted with rapid technological advancements. Existing hardware becomes virtually obsolete with each new standard and requires costly redesign. Many existing applications are not optimized for use on multicore processors. The high complexity of existing system architectures therefore makes it practically impossible to assess the potential of existing or newly developed systems without investing significant time and resources in hardware and software design, manufacturing, and testing.

By using special software tools, hardware manufacturers can investigate the potential for power savings and performance improvements in existing systems before developing new systems with unknown outcomes, high risk, and high costs.

Using the SLX Tool Suite (and the algorithms described above), a major hardware manufacturer in the telecommunications industry was able to model the performance and power consumption of an existing multiprocessor architecture in detail and analyze it with reference to various power-saving functions.

The heterogeneous target architecture has more than 100 processor cores and allows voltage and frequency to be varied largely independently for each processor.

Depending on the maximum application runtime, the maximum power consumption could be reduced by 67 % when using dynamic voltage/frequency scaling (DVFS). Energy efficiency could be improved by 24 % with only a slight change in runtime, while the average power consumption could be reduced by up to 32 %.

The results of the investigations show high potential for performance improvements and energy savings compared to systems equipped exclusively with fixed voltages and clock gating.

Although the results shown here cannot be directly transferred to other systems, they nevertheless demonstrate the potential of software design automation tools: A significantly reduced maximum power consumption enables cost savings in the power supply, while the reduction in average power consumption reduces cooling requirements and can thus save costs.

Summary

The increasing complexity of modern multiprocessor systems, combined with shortened development cycles and rising application demands, has led to an efficiency gap: Efficient use of modern multiprocessor systems can only be achieved to a limited extent economically with manual system design and programming. With increasing platform and software complexity, this problem will only worsen in the future. Software design automation tools can be used to close this gap and solve even complex optimization problems. In the case study presented here, the average and maximum power consumption of a multiprocessor platform with more than 100 processors was significantly reduced fully automatically.

Download the article as a PDF


Multicore – our training & coaching

Do you want to bring yourself up to date with the latest technology?

Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of multicore/microcontrollers.

Training & coaching on the other topics in our portfolio can be found here. here.


Multicore – Expertise

Valuable expertise on the topic of multicore/microcontrollers is available. here Available for you to download free of charge.

To the specialist information

You can find expertise on other topics in our portfolio here. here.

MicroConsult Newsletter

With the MicroConsult newsletter, you'll stay on the pulse of the embedded world. Look forward to proven practical knowledge, real professional tips, and current events – directly from our experts for your project success.

Subscribe now!

Published by

weissblau media

weissblau media