Experience Embedded

Professionelle Schulungen, Beratung und Projektunterstützung

How to Measure RTOS Performance

Author: Colin Walls, Mentor Graphics, Newbury UK

Beitrag - Embedded Software Engineering Kongress 2015

 

Why Make Measurements?

Desktop or laptop computers are extremely powerful and amazingly low cost. This means that developers of software for desktop systems assume that there is infinite CPU power, so they worry very little about the speed of their code. They also assume that indefinite amounts of memory are available, so they do not worry about code size either.

Embedded systems are different. Typically, there is enough CPU power to do the job, but only just enough – there is no excess. Memory size is limited. It is not normally unreasonably small, but there is unlikely to be any possibility of adding more. Power consumption is usually an issue and the software – its size and efficiency – can have a significant bearing on the number of Watts burned by the embedded device.

It is clear that, with an embedded system, it is vital that the RTOS has the smallest possible impact on memory footprint and makes very efficient use of the CPU.

RTOS Metrics

There are three areas of interest if you are looking at the performance and usage characteristics of an RTOS:

  1. Memory – how much ROM and RAM does the kernel need and how is this affected by options and configuration.

  2. Latency, which is broadly the delay between something happening and the response to that occurrence. This is a particular minefield of terminology and misinformation, but there are two essential latencies to consider: interrupt response and task scheduling.

  3. Performance of kernel services. How long does it take to perform specific actions?

 

Each of these measurements will be addressed in turn.

A key problem is that there is no real standardization. One possibility would be the Embedded Microprocessor Benchmark Consortium, but that is not widely adopted and, anyway, it is more oriented towards CPU benchmarking.

RTOS Metrics - Memory Footprint

As all embedded systems have some limitations on available memory, the requirements of an RTOS, on a given CPU, need to be understood. An OS will use both ROM and RAM.

ROM, which is normally flash memory, is used to store the kernel code, along with the code for the runtime library and any middleware components. This code – or parts of it – may be copied to RAM on boot up, as this can offer improved performance. There is also likely to be some read only data. If the kernel is statically configured, this data will include extensive information about kernel objects. However, nowadays, most kernels are dynamically configured.

RAM space will be used for kernel data structures, including some or all of the kernel object information, again depending upon whether the kernel is statically or dynamically configured. There will also be some global variables.

If code is copied from flash to RAM, that space must also be accounted for.

Dependencies:

There are a number of factors that affect the memory footprint of an RTOS.

The CPU architecture is key. The number of instructions can vary drastically from one processor to another, so looking at size figures for, say, PowerPC give no indication of what the ARM version might be like.

Embedded compilers generally have a large number of optimization settings. These can be used to reduce code size, but that will most likely affect performance. Optimizations affect ROM footprint, but also RAM if the code is copied. Data size can also be affected by optimization, as data structures can be packed or unpacked. Again both ROM and RAM can be affected. Packing data has an adverse effect on performance.

Most RTOS products have a number of optional components. Obviously, the choice of those components will have a very significant effect upon memory footprint.

Most RTOS kernels are scalable, which means that, all being well, only the code to support required functionality is included in the memory image. For some RTOSes, scalability only applies to the kernel. For others, scalability is extended to the rest of the middleware.

Different people have different ideas about what scalability means. Fine grain scalability means that only the core of the RTOS [the scheduler etc.] and the code for the service calls that are actually used are included in the final memory image. There should be no redundant code.

Measurement:

Although an RTOS vendor may provide or publish memory usage information, you may wish to make measurements yourself in order to ensure that they are representative of the type of application that you are designing.

These measurements are not difficult. Normally the map file, generated by the linker, gives the necessary memory utilization data. Different linkers produce different kinds of map files with varying amounts of information included in a variety of formats. Possibilities extend from a mass of hex numbers through to an interactive HTML document and everything in between.

There are some specialized tools that extract memory usage information from executable files. An example is objdump.

Importance:

The importance of RTOS memory footprint must be understood, as its implications may be non-obvious.

As mentioned earlier, memory is always an issue with embedded systems, but the detailed priorities vary from one system to another.

A small system may only have limited on-chip memory and, of course, the application code must be accommodated. Hence, the RTOS must be as small as possible.

A bigger system may not have such a pressure on total memory space. System performance is more likely to be the priority. This means that the peak performance is required from the RTOS, so placing it into on-chip memory or locking it into cache may be attractive. Both of these options are most feasible if the kernel size is minimized.

If the system copies code from flash to RAM, it is particularly important to understand the memory space requirements.

RTOS Metrics - Interrupt Latency

The time related performance measurements are probably of most concern to developers using an RTOS.

A key characteristic of a real time system is its timely response to external events. An embedded system is typically notified of an event by means of an interrupt, so the delay between the interrupt occurring and the response to that interrupt – the interrupt latency – is critical.

Unfortunately, there are two definitions, at least, of the term "interrupt latency":

System: the total delay between the interrupt signal being asserted and the start of the interrupt service routine execution.

OS: the time between the CPU interrupt sequence starting and the initiation of the ISR. This is really the operating system overhead, but many people refer to it as the latency. This means that some vendors claim zero interrupt latency.

The two defintions are illustrated in the diagram (see 1. Image, PDF file).

Measurement: Interrupt response is the sum of two distinct times:

ƮIL = ƮH + ƮOS

where:

ƮH is the hardware dependent time, which depends on the interrupt controller on the board as well as the type of the interrupt

ƮOS is the OS induced overhead

Ideally, quoted figures should include the best and worst case scenarios. The worst case is when the kernel disables interrupts.

To measure a time interval, like interrupt latency, with any accuracy, requires a suitable instrument. The best tool to use is an oscilloscope. One approach is to use one pin on a GPIO interface to generate the interrupt. This pin can be monitored on the oscilloscope. At the start of the interrupt service routine, another pin, which is also being monitored, is toggled. The interval between the two signals may be easily read from the instrument.

Importance: Many embedded systems are real time and it is those applications, along with fault tolerant systems, where knowledge of interrupt latency is important.

If the requirement is to maximize bandwidth on a particular interface, the latency on that specific interrupt needs to be measured.

To give an idea of numbers, the majority of systems exhibit no problems, even if they are subjected to interrupt latencies of tens of microseconds

RTOS Metrics - Scheduling Latency

A key part of the funtionality of an RTOS is its ability to support a multi-threading execution environment. Being real time, the efficiency at which threads or tasks are scheduled is of some importance.

The scheduler is at the core of an RTOS, so it is reasonable that a user might be interested in its performance

It is hard to get a clear picture, as there is a wide variation in the techniques employed to make measurements and in the interpretation of the results.

There are really two separate measurements to consider:

the context switch time

the time overhead that the RTOS introduces when scheduling a task

The context switch latency is the time it takes for the context switch to complete.

In the diagram (see 2. Image, PDF file), we are looking at the elapsed time between the last instructions from Task A being executed and the first instruction from Task B.

It is unlikely to make any difference whether Task B has been run before and was paused or it is being run for the first time.

The other scenario is when the RTOS is idling and an external event causes the RTOS to schedule a task. In this case, the overhead is the elapsed time before the required task is actually running, as shown in this diagram (see 3. Image, PDF file)

Measurement:

The scheduling latency is the maximum of two times:

where:

ƮSO is the scheduling overhead; the end of the ISR to the start of task schedule

ƮCS is the time taken to save and restore thread context

Measurements may be made in a similar way to the interrupt latency timings.

Importance:

Developers who are working on time critical or fault tolerant systems are likely to be interesed in scheduling latency. Much the same as interrupt latencies, but remember they are quite different measurements and must both be considered individually.

RTOS Metrics - Timing Kernel Services

An RTOS is likely to have a great many API calls, probably numbering into the hundreds. To assess timing, it is not useful to try to analyze every single call. It makes more sense to focus solely on the frequently used services.

For most RTOSes, there are four key categories of service call:

  • Threading services
  • Synchronization services
  • Inter-process communication services
  • Memory services

 

Conclusions

All RTOS vendors provide performance data for their products, some of which is more comprehensive than others. This information may be very useful, but can also be misleading if interpreted incorrectly.

It is important to understand the technqiues used to make measurements and the terminology used to describe the results. There are also trade-offs - generally size against speed - and these, too, need to be thoroughly understood. Without this understanding, a fair comparison is not possible.

If timing is critical to your application, it is strongly recommend that you perform your own measurements. This enables you to be sure that the hardware and software environment is correct and that the figures are directly relevant to your application.

 

Beitrag als PDF-Datei herunterladen


Echtzeit - MicroConsult Trainings & Coachings

Wollen Sie sich auf den aktuellen Stand der Technik bringen?

Dann informieren Sie sich hier zu Schulungen/ Seminaren/ Trainings/ Workshops und individuellen Coachings von MircoConsult zum Thema Embedded- und Echtzeit-Softwareentwicklung.

 

Training & Coaching zu den weiteren Themen unseren Portfolios finden Sie hier.


Echtzeit - Fachwissen

Wertvolles Fachwissen zum Thema Embedded- und Echtzeit-Softwareentwicklung steht hier für Sie zum kostenfreien Download bereit.

Zu den Fachinformationen

 
Fachwissen zu weiteren Themen unseren Portfolios finden Sie hier.