Why C++ makes sense down to the driver level
Author: Matthias Bauer, redlogix Software & System Engineering GmbH
Contribution – Embedded Software Engineering Congress 2015
Some prejudices are incredibly persistent. For example, this one: C++ is unsuitable for extremely resource-poor systems. This is simply not true! On the contrary, using the right C++ language features offers invaluable advantages, especially for systems with extremely limited resources.
By using the compiler as a code generator, programs can be created from generic, flexibly configurable software components, without any runtime overhead compared to a custom solution. This is particularly interesting for applications that are integrated on platforms with very limited resources (e.g., code storage, RAM, processing speed, energy).
Does "++" indicate a greater need for resources?
At a very superficial glance, one might actually conclude that software developed in C++ generates larger binary code than its counterpart in C.
For example, if you build a simple Hello World program in C and C++ using the Keil uVision development environment for a Cortex-M3 target, C++ skeptics will initially feel vindicated: While the program written in C occupies code memory in the low single-digit kilobyte range, the C++ program occupies more than 30 kBytes (see Figures 1 & 2)., PDF).
However, the increased code memory consumption is not due to the C++ programming language itself. Using certain parts of the Standard Template Library introduces library code that is not optimized for embedded applications and, in some cases, utilizes C++ features that are better left out in applications with extremely limited resources (e.g., exception handling).
Therefore, caution is advised when using the STL on embedded targets for several reasons. For example, the collection classes use dynamic memory allocation by default, which sooner or later leads to memory fragmentation problems on systems without virtual memory management.
The C++ programming language itself can be used without hesitation, without tying up more valuable system resources than C. With C++, you only pay for the features you actually use, using the currency of "resources." Of course, you should know exactly which language features might incur additional resource costs.
Overview of resource costs of some C++ language features
| Language feature | Resource costs | Recommended use (*) |
| Classes without virtual methods | No additional resource requirements compared to normal C functions that work on data structures. | without restriction |
| Inheritance | No additional resources are required. A class that inherits members from base classes does not require more code or data storage than if it had defined all members itself. | without restriction |
| Use of access protection for class members (private, protected, public) | No additional resources are required, as the check takes place exclusively at compile time. The concept of inline functions also eliminates the costs associated with public getter and setter methods for accessing protected member variables. | without restriction |
| Virtual methods | Typically (depending on the compiler), additional code memory is required for method jump tables and data memory for managing a pointer to them. If one wanted to recreate virtual methods in C „by hand“, it would incur at least equally high resource costs. |
with care |
| Exceptions | Code storage and data storage (implementation varies significantly in efficiency from compiler to compiler) | It's best not to use it |
| Using templates | They are fully evaluated at compile time, therefore no additional resources are required.
However, their careless use can lead to a significant increase in the need for code storage. |
with care |
Templates, in particular, make the C++ language extremely interesting for embedded applications. They allow the C++ compiler to be used as a code generator, producing highly efficient code at compile time from generic and therefore flexibly deployable software components. The concept of code generation can even be used effectively down to the driver level. To illustrate this, let's consider a driver for digital inputs that, for example, should call an application function from the interrupt context on every rising pulse edge.
The ISR routine of such a driver typically consists of two parts:
- A hardware-dependent part checks whether the interrupt was triggered by the relevant hardware component (in our case, the input pin) (if multiple sources can trigger the interrupt) and resets the interrupt if necessary.
- Calling a callback routine that is not part of the driver but contains application-specific functionality (see Figure 3, PDF).
Case study: A driver with ISR in C
The ISRs of drivers implemented in C typically call the application-specific callbacks via function pointers. The relevant code snippets of such a driver then look like this (see Figure 4)., PDF).
With this approach, the callback routine only needs to be communicated to the driver at runtime, which often allows for unnecessary flexibility. In most cases, the callback routine that the driver should call is already determined at compile time.
The same driver with C++
If the driver is implemented using the C++ programming language, arbitrary callback functions can be included in the driver without any function pointers. This is achieved using C++ templates and the concept of template specialization. A driver implemented in C++ can therefore look like this (see Figure 5)., PDF).
Where the driver wants to execute a callback, it simply calls the static method `invokeCallback()` of the class template `TDriverIsrCallback`. By specializing the class template `TDriverIsrCallback`, it is possible to define for each driver what should happen in this static method, i.e., to call the desired application-specific callback routine from there (see Figure 6)., PDF).
When the compiler parses the driver class code, the specialization does not yet need to be available to it. Therefore, the specialization can be implemented outside the driver code module, ensuring the driver remains generic and independent of the application code. The code module with the template specialization shown serves as the link between the application modules and the drivers.
Since the compiler optimizes away the invokeCallback() call as an inline method, binary code is generated after code generation as if the callback routine had been called directly in the driver, as shown here (see Figure 7)., PDF).
This technique offers the following advantages over the callback solution using function pointers:
- Less data storage, since no function pointer needs to be stored.
- Less code memory is required because the address of the callback function does not need to be populated using setCallbackFCT() during driver initialization.
- The compiler can even "inline" the application callback functions.
The last point is particularly interesting: For short inline callback routines, the compiler places their contents directly into the driver ISR, meaning that no function is actually called from the ISR; saving the return address, the jump, and the return are therefore completely eliminated. This can be quite significant with very frequent interrupts.
The somewhat unusual syntax for inserting callback routines using template specialization can be significantly simplified with a macro. In the redBlocks embedded software toolkit, where the technique presented here is used, the following macro is employed instead of template specialization as shown in Figure 6 (the second parameter, DigitalInputA0:: CBK_ON_INPUT_CHANGED, allows the selection of the callback if a driver has more than one callback routine; see Figure 8)., PDF).
Squaring the circle – reusable and no overhead?
The implementation approach shown allows for a clean separation of drivers into a hardware-independent high-level part and a hardware-dependent low-level part. The high-level part (e.g., responsible for data buffering in a UART driver) can work together with different low-level drivers on various hardware targets without any modifications, making it portable and reusable (see Figure 9)., PDF).
Despite the clean modularization, the technique presented in this article does not incur any resource overhead. In contrast, the use of function pointers introduces unnecessary indirection in callbacks by dividing drivers into high-level and low-level parts.
Summary
With C++ templates, the C++ standard defines a powerful code generator that can be used in a targeted manner to create highly efficient code for resource-efficient embedded systems from generically reusable software components. This allows for the implementation of a maintainable, modular software architecture down to the driver level without additional resource costs.
The technology presented here is used in the redBlocks component library and is used, among other things, to integrate embedded software into the SiL environment of the redBlocks WYSIWYG simulator and to enable automated testing there.
Using the redBlocks evaluation package (available via www.redblocks.deThe technique described here can be easily replicated.
Implementation – our training & coaching
Do you want to bring yourself up to date with the latest technology?
Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of implementation/embedded and real-time software development.
Training & coaching on the other topics in our portfolio can be found here. here.
Implementation – Expertise
Valuable expertise in the field of implementation/embedded and real-time software development is available. here Available for you to download free of charge.
You can find expertise on other topics in our portfolio here. here.
