Machine Learning – before the first step: Why? For whom?

An introduction for those interested

Authors: Andy Yap, Gregor Schock, Fabio Ferreira, AKKA Automotive; Jens Bruno Wittek, AKKA Digital

Contribution – Embedded Software Engineering Congress 2018

Machine learning can generate added value in numerous application areas. Newcomers often struggle with the terminology and are sometimes confused by marketing jargon. Terms from this field are currently ubiquitous and used generically. This article examines the fundamental principles of machine learning from various perspectives and explains them clearly. It also conveys realistic expectations for data-driven analytics projects.

The terms ARTIFICIAL INTELLIGENCE (AI) and ARTIFICIAL INTELLGENCE (AI) are frequently used – they mean the same thing – and encompass all methods that aim to represent intelligence in a technical system. A very general definition of intelligence is the ability to solve complex problems [1]. MACHINE LEARNING (ML) [2] is the umbrella term for those AI methods that automatically improve algorithms by using existing data.

A key goal of machine learning (ML) is the construction of a generalizable system. This system is built, shaped, and improved based on relevant data. An important ML concept, based on certain properties of natural nervous systems, is the simulation of neural networks (NN). These mathematical simulations are called artificial neural networks (ANN).

Note: In the field of AI, KNN is also the abbreviation for an algorithm of a classification procedure [3]. This is not what is meant in this article, but it illustrates the diversity of terms in this field. Biological neural networks have only limited similarities to the KNNs described here, as they merely serve as a template for structure and function. Therefore, there are networks that behave similarly to their biological counterparts. However, other functionalities can also be represented using KNNs.

A KNN (Figure 1, see. PDFThe ANN consists of several neurons arranged in layers (input, hidden, and output layers). Here, we present DEEP LEARNING (DL) [4], in which the ANN contains several hidden intermediate layers. Three types of neurons are classified, each assigned to its respective layer: input, output, and hidden neurons. An artificial neuron (Figure 2, see Figure 3) is represented by a layer of the ANN. PDFThe system calculates a single output value o, which it transmits to all neurons connected to it. A connection between two neurons is assigned a weight w, which allows for the attenuation or amplification of a signal.

Learning involves meaningfully changing the weights until a termination criterion is reached (learning by error). During this process, an output is repeatedly calculated and optimized by adjusting the weights. Termination criteria for this learning phase can be a maximum number of iterations, a good solution (optimal output), or a minimal change in the network output. After the learning phase is complete, all weights of a k-nearest neighbor (KNN) represent the learned knowledge. The previously mentioned depth of a KNN refers to the number of hidden layers, and with today's computing power, several hundred layers can be computed.

Benefits and comparison with classical solution methods

Application areas for KNN (K-nearest neighbors) are primarily found where the complexity of physical or mathematical models is very high, or where details are important but cannot be considered in a model. Key disadvantages of KNN include the high computational effort, the large amount of data required, and the lack of a readable or understandable underlying structure.

A key advantage is the universal applicability of KNN, as no system knowledge is required for its creation. Furthermore, all relevant data can be processed directly, regardless of its quality (e.g., noise, resolution, etc.).

In principle, large datasets are a prerequisite for the meaningful application of KNN. However, to utilize them correctly, it is useful to introduce a closely related field: DATA MINING (DM). DM is closely related to ML in terms of methods and goals.

Process of a DM project

One way to carry out a DM project [5] is the „Cross Industry Standard Process for Data Mining“ (Figure 3, see below). PDFThis outlines the essential steps of a data-driven project. New insights are incorporated iteratively to improve and repeat previous steps.

Understanding the problem

A prerequisite for the success of a project is a comprehensive understanding of the research question and the objectives. Often, there are specifications for the analysis, such as achieving the best possible results, ensuring good interpretability, or maintaining speed. The content-related objectives must be translated into an optimizable metric that is as clearly defined or easily measurable as possible and that best represents the research question.

Understanding the data

The software involved in the creation, processing, storage, and use of the data must be analyzed. There may be indications that the data is not representative for the intended use case or that the data contains errors. Descriptive statistical measures, outlier analysis, and exploratory graphs are useful for gaining an initial understanding.

Data processing

It makes sense to split the data (Figure 4, see below). PDF), so that the developed models (such as KNN) can be tested and validated using some of the data.

Data preprocessing typically consumes around 801,000 hours of time in many projects. Significant improvements to AI systems can be achieved here. Important steps include replacing missing or incorrect values (imputation) and combining variable categories. Often, it's necessary to transform measurements into a standardized scale to make different variables comparable. Depending on the method, it may also be beneficial to remove highly correlated variables or those with insufficient explanatory power, or to generally reduce their dimensionality. The format of units, encoding, and time and date variables must always be taken into account.

Modeling

Depending on the characteristics of the data and the goal of the analysis, the most suitable methods are determined (Figure 5, see below). PDFWhen a dependent variable exists that needs to be modeled as accurately as possible using other variables, this is called supervised learning. Examples include classification problems (identifying customers, defective components, objects in images, etc.) with a countable dependent variable (usually binary) or regression problems (predicting visitor numbers, revenue, remaining product lifespan, etc.) with a continuous dependent variable.

Without a dependent variable (unsupervised learning), the objective is crucial: Clustering can reduce the number of observations or group them. Observations within the same cluster should be similar to each other (examples: customer segmentation, identifying similar texts and images). Association analysis can also be used to identify relationships that are frequently observed together within variables (examples: purchase patterns, recommendations for products, tourist attractions, or music).

Most applications of machine learning (ML) are classification and regression, where a dependent variable is present and the algorithm can learn from examples. Model initialization should be based on scientific criteria; in some cases, parameters can be estimated from the data. When using rote learning, the system may struggle with unknown data (overfitting).

Evaluation of the models

The models are evaluated using statistical hypothesis tests. For supervised learning, there are meaningful classification and regression metrics based on the differences between observed and predicted test data.

Deployment

Once the model has finished learning, it can be fed new data and usually delivers a result in fractions of a second. A pilot phase in a small area with quantitative (monitoring) and qualitative (discussions with users) evaluation is generally recommended. In the long term, translating prototypes into faster, low-level programming languages is sometimes advisable. Model degradation (a decline in results with new data) is possible if the underlying conditions change. It is recommended to regularly incorporate new data into the model and retrain it.

Example: Object detection on a smartphone

Google has demonstrated a latency-optimized image-based object detection application on a smartphone [6]. Object detection (Figure 6, see below). PDF) usually includes a classification (e.g., human or car) as well as the localization of the objects and their highlighting by a frame in the image stream (bounding box).

In this project, MobileNets [7] were used, a custom-designed family of KNNs for solving detection, recognition, and classification problems, which are particularly well-suited for porting to mobile devices by maximizing model accuracy while minimizing power consumption, latency, and memory usage. A parameter allows for balancing these criteria and adapting the model properties and size to the available system.

A MobileNet pre-trained on a very large image dataset was used for the object detection task. This method, called transfer learning, allows for a significant reduction in the training time of KNNs. To further reduce the latency of the MobileNet, the activations and weights of the KNN were quantized by switching from floating-point-based to 8-bit integer-based inference [8], thereby enabling better latency on, for example, ARM CPUs.

Instead of CPUs, other hardware can also be used for calculating KNNs. For embedded applications, for example, Nvidia offers a GPU solution with its Drive PX series, while Xilinx offers an FPGA solution with its Zynq Ultrascale+ MPSoC chips. The advantage of these hardware solutions is that the numerous calculations can be processed in parallel, whereas a CPU solves the tasks serially.

outlook

As Google's example shows, the use of deep learning with neural networks as a subfield of AI is becoming increasingly affordable with rising computing power, and the improvement in learning performance is opening up previously untapped and new application areas. Representative and, above all, large datasets are a prerequisite for good generalization performance from these complex networks. The use of specialized, pre-trained networks drastically reduces the otherwise required time.

Currently, many topics in the area of product development are being addressed; the goal is to bring the concepts from research and development to series production readiness. This involves the entire spectrum of embedded systems development, starting with hardware selection and the need for high-quality software implementations.

In the future, many more applications will undoubtedly be developed, and further improvements in the performance and content capabilities of AI systems will be achieved. The publications listed in the bibliography are a good starting point for further exploration of this topic.

Bibliography

[1]	Max Tegmark; “Life 3.0: Being Human in the Age of Artificial Intelligence”; Ullstein Verlag; 2017
[2]	Christopher Bishop. “Pattern Recognition and Machine Learning”. Springer, 2006.
[3]	Evelyn Fix and Joseph L. Hodges Jr. “Discriminatory analysis-nonparametric discrimination: consistency properties”. University of California Berkeley, 1951.
[4]	Ian Goodfellow et al., “Deep Learning.” MIT Press, 2016
[5]	C. Shearer, “The CRISP-DM model: the new blueprint for data mining,” Journal of Data Warehousing ; vol. 5, no 4: pp 13-22, 2000
[6]	Google AI Blog: Posted by Jonathan Huang; “Accelerated Training and Inference with the Tensorflow Object Detection API”, July 13, 2018 [as of October 1, 2018]. URL: https://ai.googleblog.com/2018/07/accelerated-training-and-inference-with.html
[7]	AG Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H. Adam. Google Inc.: „MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications“. arXiv preprint arXiv: 1704.04861v1, 2017. URL: https://arxiv.org/pdf/1704.04861.pdf
[8]	B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, D. Kalenichenko. Google Inc.: “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference,” The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 2704-2713. URL: https://arxiv.org/pdf/1712.05877.pdf

Image credits

[a]	Wikimedia: User Chrislb, Perhelion; “Schematic representation of an artificial neuron with the index j.”, October 8, 2010, 7:52 PM [Accessed September 24, 2018]. URL: https://commons.wikimedia.org/wiki/File:NeuronModel_deutsch.svg
[b]	Photo by Juanedc (CC BY 2.0); Figure 6 from [7]

Download the article as a PDF

Our training courses & coaching sessions

Do you want to bring yourself up to date with the latest technology?

Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching for system and hardware development.

Training & coaching on the other topics in our portfolio can be found here. here.

Expertise

Valuable expertise in system and hardware development is available. here Available for you to download free of charge.

To the specialist information

You can find expertise on other topics in our portfolio here. here.

MicroConsult Newsletter

With the MicroConsult newsletter, you'll stay on the pulse of the embedded world. Look forward to proven practical knowledge, real professional tips, and current events – directly from our experts for your project success.

Subscribe now!

Published by

weissblau media

← C++ in Deeply Embedded Systems Deploying AI and Machine Learning for the IoT →