While the Cortex-M23 and M33 have gradually gained market acceptance as successors to the Cortex-M0+ and M4, the Cortex-M55, the first member of the Armv8.1-M architecture extension released in 2019, went almost unnoticed. With the Cortex®-M85, Arm is now closing this gap at the top end.
Unlike the M55, which is specifically tailored to the machine learning (ML) market segment, the M85 is now a true successor to the previously most powerful M7. In addition to the M7's capabilities, the M85 also covers the ML capabilities of the M55, making it not only the fastest but also the most versatile new member of the family.
Most of the Armv8-M innovations, such as TrustZone, improved MPU, and Stack Limits, are already widely known from the M23 and M33. However, with the introduction of the M85, many developers will likely encounter the over 150 additional instructions and new features of the Armv8.1-M architecture for the first time.
Is it worth upgrading from the M7 to the new M85 for me?
The most important criterion for answering this question is backward compatibility. The good news first: Just as with the upgrade from the M0+ to the M23 and the M4 to the M33, this also applies to an upgrade from the M7 to the M85. Those who don't want to use the new features only need to reprogram the MPU on the M23, M33, and now also the M85, as it is not backward compatible.
Those wishing to use the new features are initially faced with a difficult choice. The Armv8-M and Armv8.1-M architectures offer chip designers many new configuration options. Users must carefully select the new chip based on the available options in order to utilize their desired features, which I will briefly describe below:
Helium M-Profile Vector Extension (MVE)
The application of machine learning at the edge involves a multitude of real-time matrix calculations. The corresponding models are first trained on servers and then simplified and normalized using toolchains such as TensorFlow. Finally, they are executed on the Cortex-M at the edge using machine learning library functions like CMSIS-NN.

To increase computation speed, the data in the model is normalized to the smallest possible fixed-point or integer formats of 8 bits, 16 bits, or 32 bits without substantial loss of quality. The processing speed of the Cortex-M for these matrix calculations therefore directly impacts the machine learning performance.
The Helium extension utilizes the Floating Point Unit (FPU) registers as 128-bit vector registers, enabling it to execute 16 8-bit, 8 16-bit, or 4 32-bit operations in parallel. The implementation of the CMSIS-NN library ensures the use of the necessary new MVE instructions of the Cortex-M85. With the help of MVE, the M85 achieves up to four times the performance of an M7 in machine learning.

Low Overhead Branch Extension
Not only in machine learning contexts does the processing speed of loops have a decisive influence on performance. Therefore, in the M85, loop constructs can be processed with almost no overhead thanks to new machine instructions and enhanced pipeline functionality. Using the new instructions WLS, DLS, and LE, the pipeline remembers the start and end of a loop. The loop counter and the end value are stored in core registers. In the example shown, these are registers LR and R0. The loop instructions highlighted in red in the example are executed only in the first loop iteration. In all subsequent loop iterations, only the inner part of the loop is executed repeatedly, highlighted in green in the example.

Compared to a flattened loop, only two additional instructions are required. If an interrupt occurs during loop execution, the LoopEnd (LE) instruction must be re-executed after interrupt handling to further synchronize the pipeline; the overhead is significantly reduced to just one additional instruction per interrupt. A really cool feature! The best part is: the compiler does the actual work of using the new instructions for us. The new WLS, DLS, and LE instructions, and their derivatives WLSTP, DLSTP, and LETP with Loop Tail Predication, are also available in M85 even if the MVE Helium extension is not implemented.
Half Precision Floating Point
As already described in the MVE section, the ML calculations are normalized to the smallest possible fixed-point or integer formats with 8-bit, 16-bit, or 32-bit resolution without substantial loss of quality. Therefore, the M85's FPU supports not only single-precision 32-bit and double-precision 64-bit operations, but also half-precision 16-bit floating-point operations.
PXN attributes
When using RTOS operating systems, tasks typically run in non-privileged user mode. Only the operating system itself and the interrupt service routines use privileged mode, thereby allowing access to components such as the Memory Protection Unit (MPU) and the Nested Vectored Interrupt Controller (NVIC). To prevent the unauthorized access to privileged mode through stack manipulation, the Armv8.1-M architecture includes a new PXN bit in each MPU region. This prevents the user code of tasks from executing in privileged mode, thus providing an additional safeguard against unauthorized access to security-relevant areas.

Pointer Authentication and Branch Target Identification (PACBTI)
The PACBTI feature, newly introduced with the M85, offers additional protection against potential access violations. The underlying concept is the signature of pointers and return addresses to detect any manipulation and, if necessary, initiate countermeasures. Using PAC commands, an individual signature is generated for each pointer and return address and stored internally. This signature cannot be read by software.

When using the pointer or return address, the signature is recalculated and compared internally in the core with the stored signature. A deviation in the signature is interpreted as a manipulation attempt, and an exception is raised to handle the situation.

outlook
Renesas' announcement that they will be presenting a first live demo of the new Cortex-M85 RA family member as a prototype at embedded world 2022 raises hopes that we will soon be able to gain our first hands-on experience with the Cortex-M85 performance flagship and perhaps acquire knowledge from the MicroConsult Cortex-M trainings to implement in everyday project work.
Training to the arm cortex architectures They take place regularly at our training center in Munich as well as live online. Register now!
Further information
MicroConsult expertise on the topic of microcontrollers
MicroConsult Training & Coaching on the topic of microcontrollers
(Featured image: Renesas Electronics)

