Select Page

Hardware-level software development

How do I develop software with hardware in mind?

Author: Christian Siemers, Clausthal University of Technology, Institute for Process and Production Engineering

Contribution – Embedded Software Engineering Congress 2015

Introduction

A topic like Low-level programming (In a high-level language) it shouldn't really exist at all, because a high-level language implies hardware.independence – and not a specific addressing of their particularities. Nevertheless, this topic is indispensable in practice for the following reasons:

  • Resource limitations (especially in embedded systems)
  • Cumbersome configuration of peripheral elements (this is more at the bit and byte level)
  • Extreme adaptation of the software to hardware conditions, e.g., in the case of specialized hardware or missing hardware components.
  • Real-time programming with very tight computing times

Low-level programming often involves system development with elements of hardware/software co-design.

Resource constraints using the discrete Fourier transform as an example

Fourier transform (FT):
The Fourier transform is the analysis of a (usually time-)continuous signal to determine its constituent frequencies. This process begins with periodic signals, meaning they repeat after a certain time. This repetition frequency then yields the so-called fundamental frequency; the frequency spectrum is discrete.

The transition from periodic to aperiodic signal shapes then results in the transition from the discrete to the continuous spectrum. Conversely, the original signal shape g(x) can be reconstructed by superimposing the frequencies present in the spectrum with correct phase and amplitude (Fourier synthesis). Here are the formulas for the calculation in real notation, assuming that g(x) is a real-valued function (see Figure 1a)., PDF).

Discrete Fourier Transform (DFT):
In computers, the "time axis" is never continuous, but always discrete. This is partly because even analog-to-digital converters (ADCs) can never transfer data continuously from the analog to the digital world, but only discretely. ADCs discretize both data and time values.

This transforms the computational integrals of the general Fourier transform into computational sums that can be implemented relatively easily in the form of algorithms [1]. T denotes in the equations (Figure 1b, see PDF) the period of the signal, Dt the time interval between two successive measurement points.

The DFT is used when the number of data points per period is not predetermined or is not a power of two. However, for powers of two such as 512 or 1024, the Fast Fourier Transform (FFT) is a more suitable implementation. The complexity of the DFT is O(N²), while that of the FFT is O(N*log(N)).

DFT algorithm with floating-point numbers

The formulas in Figure 1 b) (see. PDF) can be incorporated relatively easily into an algorithm, e.g. in C:

void vComputeDFT( unsigned int uiNumOfPoints, int *iValue )

{

   unsigned int k, m;

   double dCoeffAtemp, dCoeffBtemp;

   for( k = 0; k < NUM_OF_COEFFICIENTS; k++ )

   {

      dCoeffAtemp = 0.L;

      dCoeffBtemp = 0.L;

      for( m = 0; m < uiNumOfPoints; m++ )

      {

         dCoeffAtemp += (double)*(iValue+m) *
cos(2.L * PI * m * k / uiNumOfPoints);

         dCoeffBtemp += (double)*(iValue+m) *
sin(2.L * PI * m * k / uiNumOfPoints);

      }

      dCoeffA[k] = dCoeffAtemp / (double)uiNumOfPoints;

      dCoeffB[k] = dCoeffBtemp / (double)uiNumOfPoints;

   }

}

This program includes a constant called NUM_OF_COEFFICIENTS, which indicates how many coefficients a and b from Figure 1 (see PDF) are to be calculated; this constant can, of course, also easily be formulated as a variable. However, what is very easy to assess is that this involves a large number of additions and multiplications, not to mention the calculation of the sine and cosine values. Implementing this algorithm on a microprocessor or DSP that does not have a floating-point unit can result in surprisingly long runtimes, since the floating-point format has to be emulated in a rather cumbersome way.

Conversion to a fixed-point format

It therefore makes sense to convert the data to a fixed-point format and expect improved performance. Another improvement is, for example, reduced storage space, although this may be of secondary importance. The basic formula for the conversion is (see Equation 1, PDF)

where Δ denotes the so-called offset (displacement of the zero point) and Q initially represents a linear scaling factor. This mapping, however, initially only concerns the mapping of a physical (process) quantity to an analogous (!) quantity. If, however, formula (1) is interpreted such that Q represents a quantization quantity, and that the mapping occurs in quanta, then we obtain the desired mapping of physical quantities into representations in computer-usable data values.

Equation (2, see PDFThis is often further simplified by setting the offset Δ to zero. This significantly simplifies arithmetic operations such as addition, subtraction, etc.

As simple as formula (2) may seem, it has a catch, because it only applies to a limited range of values of x.phys. This quantity represents the physical world or a measured quantity, and since the numerical range of all representations in the computer is limited, the measured quantity must also be kept within a permissible interval.

Even more so if you xphys by xfloat If the floating-point representation is replaced – which is precisely what we intend to do here, namely replace it with a fixed-point representation – then different interval limits apply to both number formats for their validity. This must be taken into account during implementation.

The Qm.n format

The usable fixed-point format can be thought of as a modification of the familiar integer format, where the distance between two adjacent representable values is not 1, but, for example, 0.5, 0.25, or 0.1. In the digital world, powers of two are usually used for this as well, and the resulting format is called Qn.m.

The index n Here, represents the number of bits before the decimal point., m for the number after that. Furthermore, as in the integer case, a distinction is made between unsigned and signed representations, although this is not apparent in the format name.

Arithmetic with Qn.m is not significantly different from arithmetic with integers, as long as the decimal point is always in the same position for the operands. However, if the representations of the operands are different, i.e., if n and m no longer correspond for the respective operands, then some mental work and bit shifting are required.

Sine and cosine

And there is one more task waiting in the source code (Listing 2) that needs to be solved for a compilation: the transcendental functions sine and cosine. These functions are usually available in libraries for floating-point values, but this is precisely the data type that should not be used here.

#define PI 3.14159265358979L

#define LIMIT 2048

#define FRACTION_BITS 30

/* This results in a Q1.30 format with sign */

#define LIMIT_TEST 960

void main()

{

   int k;

   long int lResult;

   double dfResult;

   for( k = 0; k < LIMIT; k++ )

   {

      dfResult = sin( 2.L * PI * (double)k / (double)LIMIT );

      lResult = (long int) (dfResult * (1 << FRACTION_BITS));

      printf( „%lf  %08x “, dfResult, lResult );

   }

}

The solution consists, for example, of calculating the sine and cosine values. before The values are calculated at runtime and stored in a data area in Qn.m format (here, for example, signed Q1.30) (see Listing 2). To then have precise values for all input parameters available at runtime, a second table (or the values directly) can be created from the basic table using linear interpolation. This method is very fast, even for simple processors.

…and now the DFT

Using the approaches to low-level software development just discussed, the algorithm from Listing 1 will now be transformed into a version that uses only integer values. The following details are important:

  • All calculations are performed as integer calculations, no casting!
  • A (sample) sine curve is stored as 513 (= 512 + 1) values in the signed-Q1.30 format, specifically only the first quarter (because all others are derived from it). The complete curve then has 2048 values.
  • Access to this value table is possible through access functions such as i32GetCosineValue( uint16 ui16Index ) Encapsulated. This is a concept borrowed from object-oriented programming.
  • For the DFT, a sine and a cosine table are required, whose period is exactly equal to the number of data points. This is calculated at the beginning (by linear interpolation), and then the respective values are derived from these tables using the functions. (i32GetSineTableValue() and i32GetCosineTableValue()read.

 

The central routine for calculating the DFT is now:

void vComputeDFT( uint16 ui16NumOfPoints, int16 *i16Value )

{

   uint16 k, m, ui16Index;

   int32 i32CoeffATemp, i32CoeffBTemp;

   for( k = 0; k < NUM_OF_COEFFICIENTS; k++ )

   {

      i32CoeffATemp = 0;

      i32CoeffBTemp = 0;

      for( m = 0, ui16Index = 0; m < ui16NumOfPoints; )

      {

         i32CoeffATemp += *(i16Value+m) *
i32CosineTable[ui16Index];

         i32CoeffBTemp += *(i16Value+m) * i32SineTable[ui16Index];

         m++;

         ui16Index = (ui16Index + k) % ui16NumOfPoints;

      }

      i32CoeffA[k] = i32CoeffATemp;

      i32CoeffB[k] = i32CoeffBTemp;

   }

}

A rectangle with 5 periods and 1000 points is chosen as the test function, since the result is known here: If the fundamental wave has a height of 1, then all even harmonics are 0, and all odd harmonics (index k) exhibit an amplitude of 1/k Figure 2 shows the result for the variant implemented in floating point, Figure 3 (see PDF) for those in fixed-point format.

The images show unmistakable differences, even though they implement the same algorithm. Furthermore, it is at least qualitatively clear that image 2 (see PDF) displays a correct result, as the result agrees with the theory. A note on this: To verify the actual accuracy of the results, one must switch to a logarithmic representation and consider the difference between the calculated and theoretical values logarithmically. Depending on the number of significant bits, this will yield a value that must be below the achievable signal-to-noise ratio.

Returning to the evaluation: The reason for the failure of the DFT calculation in fixed-point format is data overflow. The calculation uses sine and cosine values with a width of 16 bits. In this example, these are multiplied by 13-bit values (the chosen format for the rectangle, 12-bit value + sign), resulting in 29 significant bits, and then summed 1000 times, leading to 10 additional bits (10 = log).2(1024)). Therefore, the results can have a total width of 39 bits, so it is not surprising that it can overflow for 32-bit variables.

The solution to the problem

There are two different ways to address the data overflow problem: reducing data widths, which results in a loss of accuracy, or using larger data types, such as 64 bits. The first approach is suitable if the achievable accuracy is not required; the second if the loss of accuracy is unacceptable. Splitting the data across two variables, i.e., emulating higher data widths through software coupling, also remains a viable, high-performance solution. Figure 4 (see PDF) shows the result of the Fourier transform for a correct algorithm in fixed-point format that takes into account the resulting data widths.

The performance is truly impressive. While a DSP with an intrinsic data width of 16 bits without a floating-point unit required a full 7 seconds for a 1000-point DFT, the fixed-point version took only 700 ms, and after optimization (mapping the division instructions to successive subtraction), only 120 ms.

In conclusion, simply encoding the algorithm is often ineffective when transitioning to fixed-point representation, as data overflows are likely, especially during arithmetic operations. It is strongly recommended to consider the possible data values, including intermediate results.

Access to hardware with special features

A completely different issue involves read and write access to areas that behave like RAM within the program (read and write accesses), but only emulate it through a special functionality. If the compiler or driver then hides this access, undesirable interactions can occur.

Variables in EEPROM

The topic of "variables in ROM" seems like a contradiction in terms, since a ROM is a read-only memory. However, since the introduction of EEPROM, an electrically erasable, rewritable ROM, this is no longer a contradiction, and some compilers even support it using the (non-standard) keyword. EEPROM The values are displayed to the compiler/linker and should also be stored in the EEPROM.

Suppose a set of 16 data points needs to be written to the EEPROM of a microcontroller because the values should be stored when the power is switched off. These could be, for example, configuration values that are loaded on every restart.

As mentioned, writing to the EEPROM is supported by appropriate C compilers using runtime libraries; from the perspective of application programming, this is a simple matter:

eeprom int eepIBasicConfig[16];

int k, iConfig[16];

for( k = 0; k < 16; k++ )

eepIBasicConfig[k] = iConfig[k];

The dangerous aspect of this code sequence is that behind the seemingly simple value assignment to eepIBasicConfig[] lies an entire runtime routine with blocking communication, which is not immediately apparent. Usually, the first value is written immediately (because there is a buffer in the hardware), but from the second value onward, the system waits for the previous value to be written – which can easily take several milliseconds.

In plain terms: Such a write operation to memory, which can be performed quickly in RAM, can lead to surprisingly long processing times. Similar effects occur when accessing peripherals via a network or bus system (e.g., I²C or TWI in small microcontroller systems), which may be blocked by other devices; here, too, long, even indefinite waiting times can result.

The art of non-blocking communication

The art of non-blocking communication dictates that writing should only occur when it is immediately possible (because the buffer or network is free, meaning the necessary resources are available). Otherwise, the program should remember that there is still something to write/read and return to further program execution. It must then be ensured that the writing process is eventually completed, perhaps through cyclical design, and that there are no side effects.

Ideally, within a small operating system, a thread can be started that executes the desired action and calls itself with the same parameter if the action fails. If the re-call is later enabled, the process checks again whether the action can be performed until everything is completed or otherwise aborted.

This, however, entails a slightly higher initial effort. How to do it without an operating system is described, for example, in [1]; alternatively, such non-blocking communication can also be implemented in interrupt service routines, called by a timer.

Conclusion

This example illustrates what hardware-centric software development means and its consequences for the development process. Naturally, the software loses its claim to be hardware-independent and therefore portable. This is a frequently implicit, extra-functional requirement.

On the other hand, other, also extra-functional requirements become (more) achievable, so it is worth considering some adjustments.

In any case, low-level software development means that one has to deal with the hardware and its properties and/or the properties of data types, both to exploit them and to avoid failure.

literature

[1] Christian Siemers, Embedded Systems Engineering Handbook.
https://www.elektronikpraxis.vogel.de/themen/embeddedsoftwareengineering/management/articles/340538/

Download the article as a PDF


Implementation – our training & coaching

Do you want to bring yourself up to date with the latest technology?

Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of implementation/embedded and real-time software development.

Training & coaching on the other topics in our portfolio can be found here. here.


Implementation – Expertise

Valuable expertise in the field of implementation/embedded and real-time software development is available. here Available for you to download free of charge.

To the specialist information

You can find expertise on other topics in our portfolio here. here.

MicroConsult Newsletter

With the MicroConsult newsletter, you'll stay on the pulse of the embedded world. Look forward to proven practical knowledge, real professional tips, and current events – directly from our experts for your project success.

Subscribe now!

Published by

weissblau media

weissblau media