Consistent metrics for determining testing activities

Always know how far the development has really progressed.

Author: Ingo Nickles, Vector Software

Contribution – Embedded Software Engineering Congress 2016

In modern software development, new functionalities must be implemented at increasingly shorter intervals. Source code becomes ever more extensive and complex, yet software quality is expected to remain the same or even improve. The biggest challenge is undoubtedly reliably predicting and adhering to release cycles. Often, effort is misjudged and resources are poorly allocated because project managers frequently lack a clear understanding of which code changes will require which resources.

Of course, many things contribute to software quality in the software development process. Besides effective requirements management, these include, for example, a suitable software architecture and good design. Nevertheless, software quality is ultimately measured and demonstrated in the testing process, and therefore it makes perfect sense to examine both the testing process itself and the resulting metrics more closely.

Continuous testing and the ongoing integration of code changes and adapted test cases into configuration management can prevent frequent delays. Change-based test execution allows every developer to automatically run all types of test cases affected by your code changes.

Consistent and traceable metrics can provide important insights into the current development status, progress, and testing status. Knowing this data allows for a very precise overview of the current project status and software quality. This, in turn, enables more accurate prediction of a potential release date. It also allows for the identification of potential bottlenecks at any time and better resource planning.

introduction

Looking at the evolution of software (SW) versus hardware (HW) components in products, a clear trend towards more SW is evident. SW is increasingly becoming a "differentiator"—the element that distinguishes the products of two manufacturers and thus provides a competitive advantage. As more functionality is integrated into the SW, it naturally becomes more complex—but also more important for the product manufacturer. High product quality increasingly depends on error-free software. While implementing appropriate development guidelines can certainly reduce the introduction of errors into the product, a comprehensive testing process remains essential. Let's therefore examine the testing process within a development process in more detail.

Software test

The most widespread development process is probably the V-model (see Fig. 1, PDF), which is divided into three phases:

Design phase
Coding phase
Test phase

Although the V-model places the testing phase at the end of the development cycle, it was soon recognized that it makes perfect sense to start testing much earlier, because the sooner a fault is found, the cheaper it is to eliminate.

In the classic V-model, this fact is addressed by thoroughly testing the smallest, most isolable unit of the software as soon as it is available: a unit or a single source file. After the individual file, or the functions implemented within it, has passed all tests, several files are combined and their interaction is checked in the integration test. Finally, the entire software is tested in the system test. Let's now take a closer look at the "right side" of the V-model (see PDF).

Automation plays a central role in testing. Manually repeating the same test procedures is unreasonable for humans, which is why test automation not only improves employee satisfaction but also ensures that tests are actually executed. Therefore, when designing the testing environment, care should be taken to automate everything that can be automated. Only in this way can a meaningful regression test be implemented.

Regression test

At the end of a V-development process, you have well-tested, high-quality software. If this means that the possibilities for software changes to your product end—for example, if you send it to Mars—then regression testing isn't particularly relevant. However, even if you don't have the option of making software changes to the finished product, the need for regression testing can arise simply from using a different development model. For example, agile development methods require the continuous re-execution of existing test cases. Other reasons for software changes, and thus for repeating all tests, can include:

Bug fixes
New Features (CI)
Changed hardware
Changed requirements
Redesign

In fact, investigations, e.g. by the FDA (Food and Drug Administration, USA), have shown that a large proportion (79%) of software errors that led to a product recall were subsequently introduced into the finished product through code changes.^[1]. Code changes whose impact on software quality could have been detected through a corresponding regression test.

Test environments

When considering the number of test environments that are created in a project using a recommended development process, quite a lot comes together:

One unit test environment per source file
Multiple integration test environments, each with multiple source files
One or more test environments with all source files for system testing

Furthermore, there is usually more than one configuration in which all test environments should be run. For example, it is obviously sensible, and required by various standards, to run the tests on the eventual target system or a corresponding board. However, unit tests typically take place very early in the development process. A board for executing unit tests may itself be part of the product under development and therefore not yet available for running software tests.

However, even with existing hardware, it can be advantageous to avoid using the actual hardware. For example, the test application is typically flashed onto the hardware, and the number of flash cycles supported by the hardware is limited. To prevent excessive wear and tear on the hardware, one can switch to simulators or emulators. This also speeds up the execution of test cases, which addresses the tester's impatience when creating them.

In the absence of simulators, setting up a test environment on the host can also be beneficial. Naturally, this involves more effort in creating the test environment: cross-compiler-specific keywords must be "defined away" to allow the code to be compiled with a host compiler. Hardware accesses must be "stubbed away" or replaced by including different header files. Nevertheless, this initial effort pays off later through various advantages: faster test case creation, conservation of hardware resources, and, last but not least, the ability to verify the software in a different test environment (different compiler, different operating system). For example, Windows will throw a "Segmentation Violation Error" when accessing invalid pointers, while the same test case might, at worst, run cleanly on your RTOS.

Last but not least, software is often delivered to subsequent customers in more than one configuration. For example, there may be different versions of your product running on different boards. Often, functionalities in the software are enabled or disabled via compile-defined rules. Or, core parts of the software may run in completely different products.

What does this mean for the number of test environments? (see Fig. 2), PDF)

Number of test environments =
1 Host Configuration
+ 1 Simulator Configuration
+ (Number of boards * Number of compile-define combinations)

The "compile-define combinations" in which your product is ultimately delivered are ideally defined, so the number is known. If this is not the case, for example, because customers can assemble their product modularly, it can be useful to run all test cases again in the required configuration after the customer requirements are known.

The formula above makes no claim to completeness, and a factor or term may be added or omitted in your specific environment. It merely serves to illustrate that it is "generally" insufficient to execute test cases in only one test environment configuration. In principle, all test types (i.e., unit, integration, and system tests) should be performed in all possible configurations. Or would you want to be on an airplane whose software has only been tested on the host? Or entrust the braking assistance of your vehicle to a piece of software that has never been tested in this compile-define combination on this board?

It is often claimed that testing all combinations is simply impossible. However, test automation and appropriate optimization algorithms make the number of resulting test cases manageable, and if necessary, the variance at the customer's site can be reduced. Flexibility must not come at the expense of software quality.

Change-based testing

The number of test cases described above naturally leads to a dilemma. How can ever-shorter product release cycles be achieved, and how can a software developer deliver "clean" code if executing all test cases takes two months? The current state of the art is a pragmatic approach. When code changes are made, all unit tests of the modified file are run in a configuration before the change is checked into the codebase. Integration tests are run sporadically, e.g., once a week, and system tests, for example, only every two months. This means that errors discovered during system testing may originate from a code change that was potentially checked in two months prior. Besides the additional effort required to first identify the source of the error, the developer then needs more time to correct it than if they had been notified of the problem on the day the code change occurred.

The goal must therefore be to verify every code change as quickly as possible, across all configurations and at all test levels (unit, integration, and system tests). To achieve this, the test environment can first be optimized by parallelizing test execution. Automatable tests can then be run in much shorter timeframes by providing a sufficient number of test servers. Manual testing should be reduced as much as possible. It is simply unacceptable to manually perform the same tests in 300 different configurations when automation is possible.

In addition to parallelizing test activities, test execution time can be drastically reduced through intelligent test case selection. First of all, it is obvious that a source code change only affects test environments where the modified file exists. Through intelligent selection (see Fig. 3, PDFThe number of tests to be performed can also be reduced by only re-executing those test cases within the affected test environments that are impacted by the source code change. Particularly in system testing, which typically involves testing all files together and is therefore affected by every code change, significant effort can be saved through intelligent test case selection.

Configuration Management and Continuous Integration

Professional software development today uses centralized configuration management. A base version of the software is maintained centrally, for example, on a build server. Developers can create local copies of this base version, make local changes to the code, and then check these changes back into the base version. Centralized software management offers several advantages (see Fig. 4)., PDF):

Developers make changes in the local copy of the code, not directly in the base code.
Change tracking: Who changed what and when?
Version management of the base software: Give me the software version as of April 1, 2016

Problems can arise when local copies are checked in at excessively long intervals. This often results in code change testing being delayed, as software developers typically lack the capacity to execute all relevant test cases. Furthermore, the delayed merging with the base system increases the likelihood of merge conflicts or even code incompatibility issues. Therefore, optimized software development should…

… enable everyone involved in the development process to perform all (relevant) tests
… Keep merge cycles of local copies with the base as short as possible

The goal must be to keep the core software continuously in a working state. Every code change typically requires an adjustment of the test case data, so this should be modified and checked in along with the code change.

Measure, evaluate, improve

A crucial aspect of process improvement is measuring the current state. Unfortunately, this measurement is all too often equated with monitoring the people involved and is therefore frequently rejected. Metrics are also often dismissed as "a manager's toy with no added value." But in reality, metrics can help improve processes or optimize workflows at all levels. The goal of providing metrics should therefore always be to make the figures transparently accessible to every employee and to clearly demonstrate their value.

The automotive industry is also increasingly attributing importance to metrics. For example, the Software Testing Working Group of the Manufacturers' Initiative Software (HIS, consisting of the automotive manufacturers Audi, BMW Group, DaimlerChrysler, Porsche and Volkswagen) has published a recommendation of relevant metrics (HIS metrics) including acceptable upper limits for the respective values.^[4].

Let's first consider relevant measurement data in the software development and testing process:

Regarding the source code:

Number of files
Number of functions
Number of lines of code
Comment density
Compiler Warnings
Compiler Error
Cyclomatic complexity
Number of function calls:
How often is a function called?
How many function calls are there within a function?
Number of function parameters
Language scope (number of operators and operands)

Static code analysis:

Warnings
Errors

Regarding dynamic test cases:

At all levels (unit, integration, system testing)
Total number
Number performed
PASS/FAIL
Durations

Code Coverage:

Statement
Branch
Condition
MC/DC
function
function call
basis path

Regarding requirements:

Requirements/test case coverage
Number of test cases per requirement

Regarding code changes:

Number of changed/added/deleted lines of code
Trends

Questions that these metrics are intended to answer include:

How good is the software quality?
Is my software "ready-to-release"?
How high is the risk of an error in the code?
How many tests have been carried out and how many more tests need to be carried out?
How long will this take?
Where is more testing needed?
Change Impact Analysis: Which test cases are affected by a code change?
How long does it take to execute?

Examples of representations of metrics

The art of presenting metrics lies in presenting the numbers in a perceptible way without letting problems get lost in the statistical noise. For example, an average cyclomatic complexity of 3.5 per function is a good value. However, there might be one outlier with a value of 158.

To provide truly relevant information for every development process, it's crucial to present metrics interactively. This means users must be able to view the data that is relevant to them. Furthermore, it should be possible to delve deeper into specific areas of the software or to broaden the overall perspective.

Figure 5 (see PDF) shows an example of how to display metrics with both averages and outliers.

An additional dimension can be achieved in the presentation of metrics through the use of color. The added value becomes clear when looking at Figure 6 (see PDFThe code coverage, i.e., the portion of the source code executed in dynamic software tests, is color-coded. The size of the view indicates the code size (on the left) or the code complexity (on the right). In this example, a large red block is located in the upper left. This block represents the file `lvm.c`, which is not only large (in the sense of "many statements") but also complex and poorly tested. This signals to testers, test managers, or project managers that they still have a considerable amount of work ahead of them.

The cyclomatic complexity of software is a significant factor that warrants closer examination. In the aerospace industry, for example, certification authorities typically set an upper limit of 10. Similarly, the HIS AK Software Testing committee recommends an upper limit of 10.^[4]. In fact, user reports show^[3], The number of "deviations" (a deviation being defined as anything that led to a change in the code after a review, such as typos, bugs, missing comments, violations of coding conventions, etc.) in the code increases with cyclomatic complexity. Functions with a cyclomatic complexity > 20 should generally be avoided, as they are very error-prone and difficult to maintain.

Figure 7 (see PDFThe diagram shows an example of how the risk of an error in the code arises from the cyclomatic complexity of the implemented functions. It answers the questions of what proportion of functions are highly complex and therefore prone to errors, and how well these functions are tested.

As a final example of information that can be derived from the combination of existing metrics, the Change Impact Report, as shown in Figure 8 (see PDFThis is illustrated and mentioned. It results from evaluating code changes, code coverage, and test case data. Knowing which test cases are affected by code changes allows you to estimate the effort required to test a bug fix or a new release.

Conclusion

The increasing importance of software across all industries leads to growing software complexity. Metrics can help manage this complexity and reduce testing efforts through change-based testing. Many of the metrics I've discussed in this article may already be available to you, or can be obtained with minimal effort. In return, these metrics provide answers to several questions that help everyone involved in the process work more efficiently, minimize the number of software errors in the product, and keep the software high-quality and maintainable.

However, in all types of measurement and evaluation, it is crucial to ensure that creating visually appealing images does not become an end in itself. Furthermore, all involved parties should be included in the evaluation process to avoid any perception of control or employee evaluation.

Finally, common sense should not be disregarded amidst all the measuring and evaluating. While adhering to fixed upper limits for individual measurements, such as cyclomatic complexity, seems sensible, the process must always allow for exceptions to the rule.

When used correctly, metrics can ultimately help to reduce product release cycles while simultaneously optimizing product quality, which is ultimately in everyone's interest: managers, employees, and customers.

Abbreviations

Fig.

FDA

HIS

MC/DC

RTOS

cf.

e.g.

illustration

Working group

Continuous Integration

Food and Drug Administration

Manufacturer initiative software

Hardware

Modified Condition/Decision Coverage

Real Time Operating System

software

compare

for example

References

[1] General Principles of Software Validation; Final Guidance for Industry and FDA Staff, FDA, 2002
[2] James Martin, An Information Systems Manifesto, Prentice-Hall, Inc., Englewood Cliffs, New Jersey
[3] softwaretesting.vectorcast.com/acton/formfd/10305/0018:d-009d
[4] HIS Source Code Metrics of the HIS AK Software Test

Download the article as a PDF

Testing, Quality & Debugging – Our Training & Coaching

Do you want to bring yourself up to date with the latest technology?

Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topics of testing, quality & debugging.

Training & coaching on the other topics in our portfolio can be found here.

Testing, Quality & Debug Expertise

Valuable expertise on the topics of testing, quality & debugging is available. here Available for you to download free of charge.

To the specialist information

You can find expertise on other topics in our portfolio here. here.

MicroConsult Newsletter

With the MicroConsult newsletter, you'll stay on the pulse of the embedded world. Look forward to proven practical knowledge, real professional tips, and current events – directly from our experts for your project success.

Subscribe now!

Published by

weissblau media

← Use of debuggers in hardware-in-the-loop testing Efficient unit testing in C++ and C →