Black-box tests using white-box metrics do not result in gray-box tests. However, each of the three fulfills an important function in defining the end of the test and minimizing the number of necessary software tests.
The test end point is always a compromise between quality, cost, and time. High quality increases both cost and time. Lowering costs reduces quality and time. We want to define, through a plan, where within this triangle the test end point should be, prioritizing quality over time. This defines our position within the triangle. To avoid leaving the test end point decision to chance, metrics are used as a basis for decision-making. By setting target values for selected metrics, we aim to define our position within this "magic triangle.".
White-box metrics can be quickly determined using appropriate tools, and the desired target values for achieving test completion are easy to define. Therefore, a quality-, cost-, and time-optimized approach involves designing and executing tests using black-box methods, using these black-box tests to determine white-box metrics, and, depending on the situation, adding a few more white-box tests to achieve the desired test completion criterion.
At any time, additional experience-based tests can be added to further improve test coverage and also achieve the subjectively perceived good feeling for product acceptance.

Figure 1: Manual static tests (reviews) of almost all artifacts
Static tests vs. dynamic tests
Even though the method for determining test completion criteria presented here refers to dynamic tests, the importance of static tests for improving quality cannot be overstated. The significant impact of static tests stems from two main reasons:
- While static tests can be applied to almost all artifacts of a development project, dynamic tests can only be applied to software components (Figure 1).
- Experience shows that faulty, incomplete, ambiguous and difficult-to-test requirements contribute more than 50 % of the causes of failure, and up to 30 % in the design phase (Figure 2).
Practical tip:
If you want or need to invest in quality-enhancing measures, start with reviews.
Practical tip:
To reduce the high costs of reviews, tool-supported static testing procedures should be used before the review. This increases the initial quality of the review item and reduces the
Review duration.

Figure 2: Distribution of error causes across the phases of the software lifecycle
Black box testing vs. white box testing
It's not always immediately obvious whether a test was developed using a black-box or white-box test design methodology. While the intended design goal may differ significantly, a designed test always consists of the same components and, after execution, compares the expected target result with the actual result achieved. Any deviation is then considered an error.
The design goal of a black-box test relates to the specification. The test therefore aims to compare the agreement between a functional or non-functional expected result, derived from the specification, and the actual result.
The design goal of a white-box test, on the other hand, is to fulfill a specific code metric, such as having been in a particular statement/line of code, having passed through an if or else condition of a decision, or having a specific part of a condition set to a boolean value. The other conditions are set to a value that allows the change in the true/false value of the condition under investigation to be observed through a different output.
Combination of black box tests and white box tests
Figure 3 illustrates another difference between black-box and white-box design methods: While test coverage initially increases rapidly with black-box methods, the curve shown flattens out considerably over time. Therefore, a large number of black-box tests must be designed to achieve the desired goal of high test coverage.
The white-box test design method works in exactly the opposite way. You have to design many white-box tests until the coverage increases – but then very quickly.
It therefore makes sense to combine the two design methods. Ideally, start with black-box design methods and then switch to white-box design methods. This results in significantly faster test coverage and requires fewer tests to achieve the goal.
In this context, two questions immediately arise:
- How do I find the switching point?
- How is the target test coverage determined?
Figure 3: Combination of black-box and white-box tests
Switching between black box and white box tests
The good news first: The switching point does not need to be determined exactly.
If you switch over too early, you will still be on the faster rising part of the blackbox test curve and will replace some blackbox test cases with more laborious whitebox tests.
If you switch over a little too late, you will already be on the flattened part of the black box test curve and have probably already created some black box test cases that contribute little to achieving the desired test completion criteria.
The bad news is that the supposedly ideal switchover point can only be approximated through experience. If you only had to create a handful or two more tests using the white-box test design methodology to meet the test completion criteria, then the switchover point was well chosen.
Define test end criterion
How exactly is the target test coverage determined?
Some whitebox metrics must be met with a 100 % level. Really? No, there isn't a single whitebox metric that can reach a 100 % level under all circumstances.
Even the crucial and fundamental requirement of a statement coverage of 100 % is not always achievable; for example, when you generate code snippets and simultaneously perform clean checks on input parameter values to prevent division by zero. The generated code may also contain such checks, and only the first one can be tested. That makes sense.
Are you thinking of removing individual parameter checks to increase statement coverage? Please don't do that; it won't improve the quality of your code in any way. Instead, accept the resulting lower statement coverage.
Practical tip:
If your statement coverage for certain parts of the code cannot reach 100 % because you have combined code generation or libraries (e.g., C++ class libraries) and defensive programming, then review the affected parts of the code and thereby support your claim that the failure to reach 100 % statement coverage is due to duplicate parameter checks.
The same applies analogously to decision, condition, multiple condition, and minimum-determined multiple condition coverage (MC/DC). Especially with multiple condition coverage, it's quite possible that the data and control flow won't allow for all combinations of conditions to be set. This means that 100 % will not be achievable. Therefore, it's better to start with 90 % or 95 % and allow for fine-tuning as the project progresses.
Practical tip:
If you cannot reach the current target value for a test completion metric, review the affected code sections to ensure quality even without reaching the target value.
Practical tip:
Where you would need a large number of black-box test cases, you can shorten the process using gray-box tests. While this does introduce a dependency of your tests on the chosen coding, it is still significantly better than not creating any tests at all due to lack of time or excessive effort.
Graybox tests to reduce the number of tests required
Please consider the simple example of an `isLeapYear()` function. An integer is passed as a parameter, and the function returns a Boolean value depending on whether the year passed in the parameter is a leap year or not.
When determining equivalence classes and limits for black-box tests, it quickly becomes clear that a separate test is needed for 75 possible years. Assuming a range of values from 0 to 9999, this would amount to a staggering 7500 tests. And this doesn't even take into account the transition from the Julian to the Gregorian calendar by the Council of 1582. To illustrate: Let's briefly disregard the 100-year and 400-year leap year rules for simplicity; then every fourth year would be a leap year. The three intervening years each form an equivalence class, and the lower and upper values would have to be tested using limit analysis. With 2500 leap years, this results in a total of 5000 tests for non-leap years and 2500 for leap years. The black-box method is simply not suitable here.
However, if you look at the code and find that the developer has relied on the modulo function, which is a permitted operation in many programming languages, then you can be confident that the modulo function is working correctly: it has already been tested by the compiler vendor. This reduces the necessary test cases to just a handful or two.
If, in a later refactoring, the modulo function were replaced by a table, for example, errors in the table would likely go undetected. That's the price you pay for reducing testing effort.
Practical tip:
Where you use graybox tests instead of blackbox tests to significantly reduce the number of tests required, you add further highly coding-dependent tests or additional assertions within the tests to highlight the coding dependency through failed tests in later changes through refactoring.
Experience-based tests
Finally, experience-based tests should definitely be added to increase the test depth beyond the black-box and white-box test design procedures in error-prone areas.
MicroConsult We specialize in training, further education, and consulting for manufacturers of embedded systems. We would be pleased to support you with advice and practical assistance on your journey to implementing new testing methods.
Sources
[1] Andreas Spillner, Theo Linz, Basic Knowledge of Software Testing, dpunkt.verlag
[2] Graham Bath, Judy McKay, Practical Knowledge of Software Testing – Test Analyst and Technical Test Analyst – Advanced Level according to ISTQB Standard, dpunkt.verlag
[3] Embedded test
[4] Agile-TDD
[5] Remo Markgraf, Test-First = Test first, think later?,
ESE Congress 2019
Further information
MicroConsult Training: Embedded SW test
MicroConsult Training: Agile Testing and TDD
MicroConsult Training & Coaching on the subject of testing

