An open-source framework for asymmetric multiprocessing
Author: Frank Storm, Avnet Silica
Contribution – Embedded Software Engineering Congress 2017
Many different processor cores are standard in today's SoCs. However, how to operate these cores asynchronously, i.e., with different operating systems, and how to enable them to communicate with each other, is unfortunately not standard practice. This leads many users to implement their own solutions, which are time-consuming and require significant maintenance. This article describes OpenAMP, an open standard that aims to solve this problem.
Almost all large SoCs today come with more than one processor. In the high-end segment, quad-core Arm processors are now standard, whether as the A53, A57, or as a big.LITTLE combination of the A15 and A7 (i.MX8 [1], Marvell Armada [2], Xilinx Zynq UltraScale+ [3]). In the mid-range segment, dual-core systems are more common (i.MX8, Xilinx Zynq 7000 [4]). Since typical application processors (Arm A) are not ideal for all applications, manufacturers also include other cores alongside the application processors. These are either dedicated real-time processors, such as the Cortex-R5 in the Zynq UltraScale+, or typical microcontroller cores, such as the Cortex-M4 in the i.MX8. Even in the lower segment, a Cortex-M4 is sometimes paired with a Cortex-M0+, as in the NXP LPC54000 controller family [5].
This combination of a general-purpose processor and a specialized processor has existed before, as seen in the OMAP family, which was widespread in mobile communications for a long time. In that case, the specialized processor was a DSP responsible for processing the GSM stack. Looking at the use of these additional processors, they are very often specifically deployed for real-time tasks, whether for processing a radio protocol stack or for drive control in industrial applications. The application processor then takes on the tasks of managing external interfaces, implementing the human-machine interface (display or web server), and also performing general system maintenance.
We've seen this split in many customer applications, even in systems with two application processors, such as the Zynq-7000, which has two Cortex-A9 cores. Here, it's almost standard practice to run Linux on the first core, handling interfaces like Ethernet and USB and providing a comfortable environment for full-fledged web servers. The second core then runs a lightweight, real-time operating system, such as FreeRTOS or µC-OS, or even no operating system at all (bare metal).
In practice, these heterogeneous systems present a number of challenges. Tightly integrated processors can be detrimental. Typical issues include memory allocation, defining shared memory used by both cores for data exchange, distributing system resources (e.g., who manages the CAN bus?), and interrupt handling (who interrupts whom, and whether this is managed via a shared interrupt controller or dedicated mailbox hardware?). This also raises the question of how the MMU or MPU should be configured and which areas should not be cached. The more flexibility the SoC manufacturer has built into the chip, the more considerations the user must take during implementation.
Experience has shown that application notes for AMP operation of the cores [6] are gratefully received, but are usually only the first step. The more complex an operating system is, the more difficult it becomes to intervene in, for example, memory and cache management. With a bare metal program, you simply configure which memory areas should not be cached. With Linux, you wouldn't want to have to deal with that at all. Unfortunately, experience has also shown that you have to deal with it at the latest when interrupts are being ignored and performance is suddenly insufficient under certain conditions.
Questions regarding so-called lifecycle management, i.e., loading and starting programs, as well as targeted sleep, shutdown and reloading, are also not necessarily trivial.
The desire quickly arises to be able to build upon something existing, stable, and standardized. Because manufacturers of SoCs, operating systems, and tools have also recognized this, several companies have joined together to form the Multicore Association (MCA) [7]. The Multicore Association develops standards and implementations in working groups to regulate and simplify the interaction of multi-core systems.
One of these standards is OpenAMP – the Open Asymmetric Multiprocessing Framework [8]. OpenAMP provides an open-source framework that enables standardized communication between a large number of heterogeneous systems. The corresponding working group at the MCA is responsible for standardizing the interfaces.
A pragmatic approach was chosen. Instead of opting for a general approach intended to cover all possible scenarios, the focus was placed on typical cases encountered in practice and on existing software interfaces.
Specifically, this means the following: OpenAMP distinguishes between a master or host processor, which has complete control, and a remote processor. The master processor typically runs Linux, while the remote processor runs an RTOS or a bare-metal program. The interface component for Lifecycle Management (LCM) is... remoteproc [9] used for interprocessor communication (IPC) RPMsg [10] Usage. Both components are part of the Linux kernel from version 3.4 onwards. remoteproc and RPMsg both rely on virtIO on, a layer that allows virtual device drivers to communicate directly with the host OS or a hypervisor.
This applies to Linux on the host side, but not to an RTOS used on the remote processor side. Therefore, it is OpenAMP's task to provide an implementation that can be integrated into an RTOS.
In addition to LCM and IPC, OpenAMP adds a third component. A program running on the remote processor should also be able to produce output that can be viewed alongside the output of the host processor in a terminal window. Furthermore, it would be advantageous for the remote processor to have (controlled) access to the host's file system, whether to read configuration files or to access data autonomously. For this purpose, a proxy infrastructure is provided that handles I/O and file access from the remote processor via the host.
While OpenAMP handles a significant amount of the work, some preliminary steps are still required when installing the Linux kernel. Fortunately, these mainly involve configuring remoteproc and RPMsg (if they aren't already active) and adding a few entries to the device tree that define shared memory areas and interrupts, and establish the connection to remoteproc.
An example
A small example will demonstrate how OpenAMP appears from the user's perspective. The example is a program that calculates an FFT on the remote processor and then sends the calculated data back.
About the file firmware The program is loaded onto the remote processor in the Sysfs file system:
# echo fft_server >
/sys/class/remoteproc/remoteproc0/firmware
It is then started in a similar way by writing the value "start" to the sysfs file. state:
# echo start > /sys/class/remoteproc/remoteproc0/state
Communication within the user program is quite simple. The corresponding RPMsg device is opened, and the FFT input data is written to it. The calculated output data is then read out.
int fd;
int input_data[512];
int output_data[512];
fd = open(„/dev/rpmsg0“, O_RDWR);
write(fd, input_data, sizeof(input_data));
…
read(fd, output_data, sizeof(output_data));
Running the program on the remote processor is somewhat more complex.
int output_data[512];
void rpmsg_read_cb(
struct rpmsg_channel *rp_chnl,
void *data, int len,
void *priv, unsigned long src)
{
calculate_fft(data, output_data);
rpmsg_send(rp_chnl, output_data, sizeof(output_data));
}
…
struct hil_proc *hproc;
struct rsc_table_info rsc_info;
…
hproc = platform_create_proc(proc_id);
rsc_info.rsc_tab = get_resource_table((int)rsc_id,
&rsc_info.size)
remoteproc_resource_init(&rsc_info, hproc,
rpmsg_channel_created,
rpmsg_channel_deleted,
rpmsg_read_cb, &proc, 0);
Over remote_proc_resource_init The callback function will be rpmsg_read_cb registered, which is triggered upon receiving data and then takes over the actual service (here calculating the FFT).
Depending on the function the program has to perform on the remote processor, there may be additional steps required. In the user program, it can also be advisable to decouple the sending of input data and the receipt of the calculated data using threads.
So what does the user get?
The OpenAMP source code and example programs are available on GitHub [11]. Xilinx has published an application note describing how to use OpenAMP [12]. Xilinx also provides templates for OpenAMP projects in its Software Development Kit. NXP offers a resource-efficient RPMsg implementation (RPMsg-Lite), which is also available on GitHub [13]. Mentor Graphics has also released a... Mentor Embedded Multicore Framework the first commercial implementation of the OpenAMP standard [14].
Why not OpenCL?
Where dealing with a very large number of cores (> 1000), i.e., in the realm of GPUs, frameworks for managing multicore architectures already exist. These are either proprietary (like NVIDIA's CUDA) or open, like OpenCL from the Khronos Group [15]. One might naturally ask whether a framework like OpenCL could also be used for the AMP scenarios described above, and why a new framework like OpenAMP was created.
OpenCL was designed to handle a generic number of cores. Programs developed with OpenCL are intended to run on different systems with varying numbers of processors. Querying the system's number of processors in order to distribute an algorithm generically is one of the first tasks of OpenCL initialization. This results in a powerful but also complex API. OpenAMP, on the other hand, addresses systems that typically provide one or perhaps two remote processors. Here, the remote processor's task is clearly defined, and the hardware setup is known from the outset. This allows for significantly lower overhead compared to an environment where an arbitrary number of cores need to be managed. OpenAMP is therefore much leaner, which also drastically reduces the learning curve compared to OpenCL.
What happens next?
OpenAMP is currently gaining traction among developers. The first requests for extensions are already coming in ("We'd like a C++ layer"). Some who already have a solution will likely take their time with the adoption. However, with OpenAMP, a standard is now available for the first time on which to build AMP systems without having to reinvent the wheel.
List of sources
[2] https://www.marvell.com/embedded-processors/
[3] https://www.xilinx.com/products/silicon-devices/soc/zynq-ultrascale-mpsoc.html
[4] https://www.xilinx.com/products/silicon-devices/soc/zynq-7000.html
[5] https://www.nxp.com/docs/en/fact-sheet/LPC541XXFAMFS.pdf
[6] https://www.xilinx.com/support/documentation/application_notes/xapp1078-amp-linux-bare-metal.pdf
[7] https://www.multicore-association.org/
[8] https://www.multicore-association.org/workgroup/oamp.php
[9] https://www.kernel.org/doc/Documentation/remoteproc.txt
[10] https://www.kernel.org/doc/Documentation/rpmsg.txt
[11] https://github.com/OpenAMP
[12] https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_2/ug1186-zynq-openamp-gsg.pdf
[13] https://github.com/NXPmicro/rpmsg-lite
[14] https://www.mentor.com/embedded-software/multicore
[15] https://www.khronos.org/opencl/
Multicore – our training & coaching
Do you want to bring yourself up to date with the latest technology?
Then find out more here MircoConsult offers training courses/seminars/workshops and individual coaching on the topic of multicore/microcontrollers.
Training & coaching on the other topics in our portfolio can be found here. here.
Multicore – Expertise
Valuable expertise on the topic of multicore/microcontrollers is available. here Available for you to download free of charge.
You can find expertise on other topics in our portfolio here. here.
