How do you safely gather the herd back together?

Architectural analysis for software variants

Authors: Professor Dr. Rainer Koschke, University of Bremen & Thomas Eisenbarth, Axivion GmbH

Contribution – Embedded Software Engineering Congress 2016

When managing highly variant-intensive software products, development in a so-called Software Product Line (SPL) A product line is a family of programs that fulfill a large number of identical requirements, but each individual program does not do exactly the same thing as the others – either because it has slightly different functional requirements or because it has slightly different non-functional properties (e.g., implements different algorithms with varying resource requirements). Some authors therefore also refer to SPLs as Software product families.

In practice, when developing a new product, an approach is often favored that completely copies an existing product from the software product line and modifies and adapts it to solve the specific problem. This approach is called "„Clone-and-Own-Proceed".

The advantage of the clone-and-own approach in practice is greater development efficiency. It allows for a faster start, more independence, and the approach is also quite easy to implement.

However, the disadvantages of the clone-and-own approach are considerable. It creates additional overhead for modifications. Because the code becomes highly redundant through copying, consistent changes require similar modifications to be made in many places throughout the code. Inconsistent changes can lead to errors and integration problems. Logically identical code may need to be tested multiple times.

The clone-and-own approach represents short-sighted thinking. It is a flawed form of reuse, often resulting from a lack of knowledge and resources for better SPL practices. The clone-and-own approach is also frequently indicative of poor management of software development projects. There is no infrastructure for reuse, nor are there defined processes and roles for it. Furthermore, the degree of reuse and its impact on costs, development speed, and correctness are not measured and are therefore neglected.

The clone-and-own approach leads to a large number of very similar implementations, all of which must be maintained in parallel. The resulting disadvantages must be compensated for to maintain control. In this situation, clone analysis is a useful tool for identifying identical parts, points of variation, and optional components. Clone analysis automatically searches for identical code segments. It can be used to relate copied files to each other and quantify their differences and similarities. The following functions are available for each pair of files. f₁ and f₂ can be defined for this purpose (whereby let tokens a function that returns the tokens of a file and hash a function that calculates a hash value for a file using its tokens, e.g., using MD5; a Token is the smallest semantic unit of a programming language, such as a keyword, an operator, or an identifier; the function clone The following provides the amount of cloned code fragments between two files and parameters The set of parameters of a file or code fragment; parameters are considered tokens that can be substituted one-to-one, for example, identifiers or literals are often exchanged in cloned code):

path-identical(f₁, f₂): two files have the same relative path and filename in the source code directory

t-identical(f₁, f₂) ⇔ hash(tokens(f₁)) = hash(tokens(f₂))

tsim(f₁, f₂) = |{t|t ∈ f₁ ∧ f ∈ clone(f₁, f₂)}| / |f₁|

psim(f₁, f₂) = |parameters(f₁) ∩ parameters(f₂)|
/ |parameters(f₁) ∪ parameters(f₂)|

similar(f₁, f₂) ⇔ (tsim(f₁, f₂) ≥ 0.7 ∨ tsim(f₂, f₁) ≥ 0.7) ∧ psim(f₁, f₂) ≥ 0.75

The function similar The selected threshold values were determined by us in an empirical study.

These functions allow the code in the product line's family tree to be divided file by file into the following categories (see also Fig. 1, PDF):

different
moved and varied
rewritten
varied
moved
identical

Depending on how the cloned products have developed, very similar or very dissimilar variations may have occurred. This makes a gradual transformation into a product line more or less difficult. Cloning analysis can also support the analysis of this aspect.

The second step after identifying the similarities and differences is consolidating the software variants. The problem to be solved here is eliminating redundancy in the software variants. This requires a catalog of approaches for consolidating the variants, for example, through one or more of the following forms of refactoring:

Parameterization
Use of templates/genericity
Use of design patterns
Use of code generation

The basic prerequisite for all these consolidation measures, as well as for ensuring consistent change if the clones are not to be eliminated, is that commonalities and variabilities between variants are known at all abstraction levels of the software:

Source code level: this can be achieved through the use of textual comparison tools, such as... diff, or achieved through dedicated clone detection tools.
Architecture: Using architectural reconstruction techniques [1, 2], the architecture can be recovered through reverse engineering of the code, and the different architectures of the individual product variants can be compared. This provides a higher level of abstraction than the code level.
Functionality: The implemented features will also vary between variants. There may be identical and completely different features, as well as similar ones. This applies to both functional and non-functional requirements (such as resource consumption, timing, robustness, etc.).

At the source code level, similar functions of two variants can be identified using the following procedure:

Identification of functional pairs using clone recognition
Measurement of similarity (Levenshtein distance) either based on a textual, lexical or syntactic representation of the program
Sorting by similarity and validation of the similarity; the latter is necessary because an automated algorithm determines similarity only on the basis of syntactic features, but has no access to the semantics of the programs.

At the architectural level, the architecture of a variant can be determined using the hypothesis-driven approach to architectural recovery, see [2]:

Formulate a hypothesis for an architectural model
Extract the implementation model from the code
Map the two models onto each other.
Calculate the reflection model that automatically reveals similarities and differences between the architectures.
Refine/correct the hypothesis, identify candidates for refactoring

For product line architecture, this approach must be extended to determine a higher-level product line architecture from the individual product architectures. To this end, a hypothesis for the product line architecture is formulated, which is then subsequently compared with the various products; see [link/reference]. Fig. 2 (PDF).

Essentially, in this process, attributes such as < are assigned to the components and, if applicable, their dependencies in the product line architecture via UML stereotypes. >, < > and < > included. The attribute < > describes components and dependencies that are identical in all products, < > those that don't appear in all of them, and < > those that occur in all product architectures, but in different forms.

In practice, we have used this approach to analyze a number of systems within the framework of industrial case studies. The following findings emerged:

Our method was able to reliably identify identical and similar functional pairs between variants.
We found high similarities, especially for code functions of the units identified as kernel modules, but also variability.
There were also similarities between product-specific code functions. In this case, code was simply copied and adapted between individual products.
There was a high degree of similarity on an architectural level, which allowed one to get a view of the whole.

Conclusion

Especially in the field of embedded systems, which are often characterized by a variety of underlying hardware, extensive reuse via copy and paste is common. Repeatedly practicing this leads to enormous code redundancy, making changes more difficult. In such cases, the redundancy can be eliminated through appropriate refactoring or other measures such as code generation. If this is not possible or desired, the negative consequences of the redundancy must at least be mitigated through redundancy analysis. Our approach allows us to identify redundancies, and thus similarities and differences, at both the code and architecture levels. The knowledge gained can then be used to implement appropriate measures, such as consolidation or the planned, consistent maintenance of multiple similar code segments. This reduces development costs and increases predictability and error prevention.

Sources

[1] Koschke, Rainer; Simon, Daniel: Hierarchical Reflection Models. In: Working Conference on Reverse Engineering, IEEE Computer Society Press, November 2003, pp. 36–45

[2] R. Koschke, P. Frenzel, A. Breu, K. Angstmann. Extending the reflection method for consolidating software variants into product lines. Software Quality Journal December 2009; 17(4):331–366.

Download the article as a PDF

Architecture – MicroConsult Training & Coaching

Do you want to bring yourself up to date with the latest technology?

Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of architecture/embedded and real-time software development.

Training & coaching on the other topics in our portfolio can be found here.

Architecture – Expertise

Valuable expertise in architecture/embedded and real-time software development is available. here Available for you to download free of charge.

To the specialist information

You can find expertise on other topics in our portfolio here. here.

MicroConsult Newsletter

With the MicroConsult newsletter, you'll stay on the pulse of the embedded world. Look forward to proven practical knowledge, real professional tips, and current events – directly from our experts for your project success.

Subscribe now!

Published by

weissblau media

← What is your processor doing right now? Adaptive software architectures for automated systems →