{"id":8059,"date":"2017-01-08T07:51:03","date_gmt":"2017-01-08T06:51:03","guid":{"rendered":"https:\/\/web-dev-weissblau.de\/microconsult\/?p=8059"},"modified":"2017-01-08T07:51:03","modified_gmt":"2017-01-08T06:51:03","slug":"dynamic-memory-allocation-justifiably-taboo","status":"publish","type":"post","link":"https:\/\/www.microconsult.de\/en\/dynamic-memory-allocation-justifiably-taboo\/","title":{"rendered":"Dynamic Memory Allocation: Justifiably Taboo?"},"content":{"rendered":"<h2>Avoiding Risks Using New Memory Management Strategies<\/h2>\n<p>Author: Steven Graves, McObject LLC<\/p>\n<h3>Beitrag &#8211; Embedded Software Engineering Kongress 2015<\/h3>\n<h2>Abstract<\/h2>\n<p>Developers of fault-tolerant embedded systems must identify and eliminate possible failure points. Dynamic memory allocation is one key concern. A sound approach contributes to predictable and robust systems, while inattention can lead to instability, slow and\/or unpredictable performance, or failure. This paper argues that dynamic allocation is acceptable only in non-critical portions of fault-tolerant embedded systems, and then only when the technique\u2019s risks can be successfully mitigated. Fault-tolerant systems should instead employ custom memory allocators that are more precisely suited to the application\u2019s specific allocation patterns. Custom memory managers presented in the paper include block, stack, bitmap and thread-local allocators. The solutions presented retain the power and flexibility of dynamic memory management while mitigating common risks such as fragmentation and memory leaks, and improving efficiency and performance.<\/p>\n<h2>Introduction<\/h2>\n<p>Users have different quality expectations for embedded systems than for business and desktop applications.\u00a0 We wouldn\u2019t tolerate our cell phone or set-top box failing regularly, or even once in a while.\u00a0 And some embedded systems, such as the avionics that enable a jet to fly safely, are expected to be even more dependable: they must be fail-safe. In short, the effect of errors, unpredictable behavior and degraded performance in embedded systems ranges from a bad user experience and eroded customer loyalty, to potential loss of life \u2013 none of which is acceptable to organizations releasing this technology out into the world.<\/p>\n<p>So what makes embedded applications, ranging from consumer electronics to mission critical aerospace and industrial control, more dependable?\u00a0 Apart from reliable hardware platforms, it is the software components of those systems: the operating system, middleware and applications.\u00a0 In embedded environments, these components should be immune from crashes. And if an application does fail, the failure shouldn\u2019t affect other applications or the operating system.\u00a0 In an embedded setting, this requirement is not always so easy to satisfy, because the applications are often implemented as separate tasks that run within the same address space. Finally, the operating system and application software should not introduce unpredictable latencies. That is, they should exhibit predictable performance.<\/p>\n<h2>Memory Management and Performance, Predictability and Resilience<\/h2>\n<p>Embedded systems\u2019 reliability imperative requires a close examination of all aspects of software development, and how approaches to concepts such as scheduling, synchronization across multiple tasks and processes, locking strategies, deadline management, and memory management affect quality. For example, unpredictable latencies can be introduced by techniques such as message passing or garbage collection. Such approaches should be avoided or their unpredictable latencies mitigated.<\/p>\n<p>This paper focuses on the fundamental programming concept of memory management. Safe, predictable and efficient program execution is often the result of sound memory management, while the wrong practices, or inattention to memory management, can result in slow and\/or unpredictable performance (which may be a result of memory fragmentation) as well as instability or failure due to memory leaks.<\/p>\n<p>The impact of badly written or improperly used memory management can vary in severity. Less severe might mean slow or unpredictable performance in specific circumstances, such as when an application is moved from a single-core to a multi-core environment. At the more severe end of the spectrum, a high level of memory fragmentation may increasingly degrade performance, resulting finally in application failure. Even more severe could be\u00a0<em>system-wide<\/em>\u00a0instability or failure due to memory leaks.<\/p>\n<p>Software designers who develop safety-critical applications for airborne systems are well aware of the risks of unsound memory management. Industry norms dictate that safety-critical applications should avoid using techniques that could introduce instability. Those readers involved in creating airborne systems should be familiar with the DO-178B standard, used by the U.S. Federal Aviation Administration to certify avionics software. Among other strictures, the DO-178B document states:<\/p>\n<p>&#8222;Software Design Standards should include\u2026constraints on design, for example, exclusion of recursion, dynamic objects, data aliases, and compacted expressions.&#8220;<\/p>\n<p>[DO-178B,\u00a0<em>Software Considerations in Airborne Systems and Equipment Certification<\/em>, RTCA, Inc. (jointly developed with European Organisation for Civil Aviation Equipment, EUROCAE)]<\/p>\n<p>&#8222;Dynamic Objects&#8220; in this quote refers to objects created in the application through dynamic memory allocation, a technique in many programming languages, including C, in which system memory is allocated to processes on an as-needed basis at run-time.<\/p>\n<h2>An Overview of Dynamic Memory Management<\/h2>\n<p>Dynamic memory allocation was popularized with the C programming language and Unix systems in the late 60\u2019s and early 70\u2019s, and is present in virtually all\u00a0 modern operating systems and programming languages.<\/p>\n<p>To implement dynamic memory allocation, the C runtime library provides malloc() and free() functions that allow applications to allocate and release memory as needed, during a program\u2019s execution. Free memory is colloquially called the &#8222;heap&#8220;.\u00a0 The function of memory allocation mechanisms used &#8222;under the covers&#8220; in dynamic allocation (we will refer to these mechanisms as &#8222;allocators&#8220; or &#8222;memory managers&#8220;) is to organize the heap in some coherent way. They must satisfy requests for dynamic memory allocation by assigning a portion of the heap to a requesting task, and to return memory to the heap for subsequent use when the task relinquishes that memory.<\/p>\n<p>An allocator keeps track of which parts of memory are in use and which are free. A design goal of any allocator is to minimize wasted memory space, balancing the amount of wasted space against the time and processing resources needed to recover it. A major objective of allocators is to limit the\u00a0<em>fragmentation<\/em>\u00a0that occurs when an application frees memory blocks in a random order.<\/p>\n<p>In dynamic allocation, the heap is commonly managed with what is known as a &#8222;list allocator&#8220;.\u00a0 It organizes the heap into a singly-linked chain of pointers, where each link in the chain points to a free block of memory.\u00a0 Each free block (in a 32-bit system) requires 8 bytes of overhead that we call &#8222;meta-data&#8220;; 4 bytes for the chain pointer, and 4 bytes to record the size of the free block (see image,\u00a0<a title=\"Dynamic Memory Allocation: Justifiably Taboo? (PDF)\" href=\"https:\/\/www.microconsult.de\/wp-content\/uploads\/2025\/11\/fachinfo_ese_echt_dynamic_memory_allocation_justifiably_taboo_mcobject_llc_graves.pdf\" target=\"_blank\" rel=\"noopener\">PDF<\/a>)<\/p>\n<p>To satisfy a request for memory allocation, the allocator walks the chain of pointers until it finds a free block that is large enough to satisfy the allocation request. If it cannot find a block of memory large enough, it returns a NULL pointer. If it finds a block large enough, and if the block is exactly the right size, it unlinks the block from the chain and returns a pointer to it back to the application. Otherwise, the block is divided into one &#8222;chunk&#8220; that is the requested allocation size, and a remaining piece. The remainder is linked back into the chain and the pointer to the allocated memory is returned to the application. When memory is subsequently freed, it is linked back into the chain. If the newly freed block happens to be adjacent to an already free block, then the free blocks are joined together to form a single larger block, minimizing fragmentation (see below).<\/p>\n<h2>Risks of Dynamic Memory Management<\/h2>\n<p>De-fragmentation strategies used by the general purpose allocation mechanisms that underlie malloc() and free() can cause non-deterministic behavior and\/or impose CPU overhead that is too high for embedded systems.\u00a0 Application developers tend to use dynamic memory allocation liberally, without thinking about such effects. A significant drawback is that standard allocators are not limited in their consumption of physical or virtual memory. Their de facto limit is the total amount of system memory \u2013 introducing a risk of\u00a0<em>running out<\/em>\u00a0of memory.<\/p>\n<p>One of the greatest risks in dynamic memory allocation is failure to diligently relinquish (free) allocated memory when it is no longer needed. This results in\u00a0<em>memory leaks<\/em>\u00a0that, no matter how much system memory is available, will cause the system to become progressively slower and eventually stop (or crash) altogether. Dynamically allocated memory needs to be carefully released when it is no longer needed or when it is out of scope.<\/p>\n<p>Memory\u00a0<em>fragmentation<\/em>\u00a0is the phenomenon that occurs when a task allocates and de-allocates memory in random order. For the sake of simplicity, assume that the &#8222;heap&#8220; is 100 byte and memory is allocated in the following order:<\/p>\n<p>A.\u00a0\u00a0\u00a0\u00a0 10 bytes<br \/>\nB.\u00a0\u00a0\u00a0\u00a0 46<br \/>\nC.\u00a0\u00a0\u00a0\u00a0 13<br \/>\nD.\u00a0\u00a0\u00a0\u00a0 15<\/p>\n<p>Leaving 16 bytes or our original 100 bytes free (see table,\u00a0<a title=\"Dynamic Memory Allocation: Justifiably Taboo? (PDF)\" href=\"https:\/\/www.microconsult.de\/wp-content\/uploads\/2025\/11\/fachinfo_ese_echt_dynamic_memory_allocation_justifiably_taboo_mcobject_llc_graves.pdf\" target=\"_blank\" rel=\"noopener\">PDF<\/a>).<\/p>\n<p>Subsequently, the allocation for 13 bytes (C) is released (see table,\u00a0<a title=\"Dynamic Memory Allocation: Justifiably Taboo? (PDF)\" href=\"https:\/\/www.microconsult.de\/wp-content\/uploads\/2025\/11\/fachinfo_ese_echt_dynamic_memory_allocation_justifiably_taboo_mcobject_llc_graves.pdf\" target=\"_blank\" rel=\"noopener\">PDF<\/a>).<\/p>\n<p>And an allocation request is made for 21 bytes. There are 29 bytes of memory free, in the aggregate, but the allocation request cannot be satisfied: due to fragmentation, there is no free hole of 21 contiguous bytes. This is the essence of fragmentation. The longer a system runs, the more fragmented the heap becomes due to the randomness of allocating and freeing memory.<\/p>\n<h2>List Allocators: A Closer Look<\/h2>\n<p>List allocators used by malloc() and free() are, by necessity, general purpose allocators that aren\u2019t optimized for any particular memory allocation pattern.\u00a0 Compared to allocators that take specific allocation needs into account, these general purpose mechanisms are more likely to introduce unpredictability, high performance cost, and excessive resource (particularly memory) consumption. To understand, let\u2019s examine commonly used list allocation algorithms: first-fit, next-fit, best-fit and quick-fit.<\/p>\n<p>The first-fit algorithm always walks the chain of pointers from the beginning of the chain, and therefore attempts to allocate memory from the &#8222;front&#8220; of the heap first, leaving large free holes in the back.\u00a0 Therefore, this algorithm minimizes fragmentation, but at the expense of unpredictable performance: the longer the system runs, the longer the chain becomes, and the allocator typically walks a greater distance before finding a free hole large enough.<br \/>\nThe next-fit algorithm attempts to smooth the performance at the expense of greater fragmentation.\u00a0 This algorithm will walk the chain from wherever it last left off, so it will allocate memory from more-or-less random locations in the heap.<\/p>\n<p>The best-fit algorithm attempts to minimize fragmentation, but again at the risk of unpredictable, and potentially very bad, performance.\u00a0 This algorithm walks the chain until it finds a free hole that is exactly the right size, or is as close as possible to the allocation request.\u00a0 It could find the perfect free hole immediately, or it might have to walk the entire chain.<\/p>\n<p>The best-fit algorithm maintains a separate list of the most commonly requested allocation sizes, as well as pointers to free holes in the heap that match those commonly requested sizes.\u00a0 It increases the overhead (i.e. there\u2019s more meta-data to be managed), but it can provide better, and more consistent, performance for common allocation request sizes.<\/p>\n<h2>Pulling it together<\/h2>\n<p>In summary, general purpose allocators<\/p>\n<ul>\n<li>\u00a0\u00a0\u00a0\u00a0 lack limits<\/li>\n<li>\u00a0\u00a0\u00a0\u00a0 have unpredictable and potentially unacceptable performance<\/li>\n<li>\u00a0\u00a0\u00a0\u00a0 suffer from fragmentation<\/li>\n<li>\u00a0\u00a0\u00a0\u00a0 can impose excesive overhead<\/li>\n<\/ul>\n<p>Avoiding general purpose allocators is a good strategy generally for embedded and real-time systems.\u00a0 The drawbacks of these allocators are not surprising. After all, they must, by definition, satisfy a variety of application scenarios and allocation patterns. It stands to reason that a tool designed to address every scenario will fail to fill some needs precisely. And as discussed above, embedded systems have higher standards than non-embedded applications in areas including performance, predictability and reliability. They need software tools and components that do their jobs precisely.<\/p>\n<h2>Mitigating the Risks with Custom Memory Managers<\/h2>\n<p>When a task makes an allocation request (calls malloc), control is passed to the C run-time library.\u00a0 Malloc is a software &#8222;black box&#8220;. The embedded system developer cannot predict what will happen. Will the allocation request succeed or fail? How many clock cycles will transpire before control is returned to the calling function?<\/p>\n<p>In contrast, the allocators that best serve embedded systems concentrate on just a few allocation patterns, and ONLY those patterns required by the system. Therefore, efficiency is gained by using allocators that are optimized for these patterns. To mitigate the fact that general purpose allocators have no limits (i.e. will allow an application to exhaust all available memory), embedded systems developers should put every task &#8222;in a box&#8220;, essentially fencing off tasks from one another.\u00a0 In other words, force the task to operate within a predefined memory arena that\u2019s large enough for the task to accomplish its purpose but small enough that when memory utilization approaches 100%, it still can\u2019t adversely affect the operating system or other tasks.<\/p>\n<p>To accomplish this, the developer should start with the assumption that responsibility for memory management must be taken away from the standard memory management routines (e.g. the C run-time library&#8217;s malloc and free function) and assigned to the application.<\/p>\n<p>Within every application, it is sometimes possible to separate non-critical application activities from critical ones. For non-critical tasks, the general purpose, standard dynamic allocator exposed by the C runtime can still be used; malloc and free can be introduced for these tasks, provided that memory leaks that might be introduced will not affect safety-critical parts of the system.<\/p>\n<p>For the safety-critical tasks, the developer can replace the standard allocator with a custom allocator that sets aside a buffer for the exclusive use of that task, and satisfies all memory allocation requests out of that buffer. If the memory reserved for this buffer is exhausted, the task is informed. When the task receives such a notification, it must free up memory within the buffer, or find more memory.<\/p>\n<p>The nature of the custom allocators that replace malloc and free will differ according to memory usage patterns that occur in the application, and will be optimized to deliver predictability, performance, and efficiency when addressing these patterns. The remainder of this paper presents four examples of custom allocator algorithms: the block allocator, stack allocator, bitmap allocator and thread-local allocator. Custom memory managers can use any one of these algorithms or a combination of different algorithms.<\/p>\n<h2>Block Allocator<\/h2>\n<p>Applications use block allocators when they need to allocate a large number of small objects of the same size. Typical examples of this pattern include small scalar values such as timestamp\/value pairs that can represent sensor data; another allocation pattern that is well-served by the block allocator is abstract syntax tree nodes used by various parsers, and pages of an in-memory database. Block allocators handle such objects very efficiently with minimal overhead, and eliminate fragmentation by design.<\/p>\n<p>The idea behind a block allocator is very simple. An allocator is given a quantity of memory, which we\u2019ll call the &#8222;large block&#8220;, divides it into equal-size pieces, and organizes them in a linked-list of free elements. To serve a request for memory, the allocator returns a pointer to one element and removes it from the list. When there are no more elements in the &#8222;free list&#8220;, a new large block is selected from the memory pool using some other allocator (for example, a list allocator). The new large block again gets divided by the block allocator into equal-size elements. The elements are put into a new linked-list, and allocation requests are handled from this newly created list.<\/p>\n<p>When an object is freed, it is placed back into its original &#8222;free list&#8220;. Since all allocated objects in a given list are of the same size, there is no need for the block allocator to &#8222;remember&#8220; each element\u2019s size, or to aggregate smaller chunks into larger ones. Therefore, there is less overhead and fewer CPU cycles used by avoiding the aggregation of adjacent free blocks. A block allocator eliminates fragmentation because blocks are all of equal size, therefore when freed they are not, and do not need to be, joined with adjacent free blocks.<\/p>\n<p>There is also less overhead (this type of allocator is more efficient) because each block is a fixed, known, size so there is no need to carry additional meta-data (specifically, the size of the free block) in the linked list.\u00a0 This frees up 4 bytes of memory per block in a 32-bit system.<\/p>\n<p>The most basic block allocators satisfy allocation requests only for objects that fit into their pre-determined element size, making such algorithms useful only when the allocation pattern is known in advance (for example, when the application always allocates 16-byte objects). In practice, many memory managers, including database memory managers, need to satisfy several allocation patterns. So, to take advantage of the simplicity of the block allocator algorithm while at the same time dealing with this variability, a block allocator can be combined with some other technique into a hybrid memory manager.<\/p>\n<p>For example, a block allocator can maintain multiple lists of different\u2013sized elements, choosing the list that is suited for a particular allocation request. Meanwhile, the blocks themselves, and objects that exceed the chunk size of any of the blocks, are allocated using another general purpose allocator (for example a page allocator, or a list allocator). In such an implementation, the number of allocations (or objects) processed by the block algorithm is typically many times higher than those made by the general purpose allocator, and this can greatly enhance performance.<\/p>\n<h2>Stack Allocator<\/h2>\n<p>Memory allocators would be simpler to design, and perform better, if they only had to allocate objects, and not free them. The stack allocator is designed to follow this simple strategy.<\/p>\n<p>Stack-based allocators fit when memory requirements can be divided into a first phase in which objects are allocated, and a second phase in which they are de-allocated. \u00a0In order to use the stack allocator, the number of allocations and the total amount of required memory should be (A) known in advance, and (B) limited.<\/p>\n<p>Stack-based allocators use a concept similar to the application\u2019s call stack.\u00a0 When a function is called, memory space for its arguments and local variables is pushed onto the stack (by advancing the stack pointer).\u00a0 When the function returns, the arguments and local variables are no longer needed and so the stack pointer is rewound to its starting point before the function call (i.e. the stack is &#8222;popped&#8220;). With the stack-based allocator, for each allocation a pointer is advanced.\u00a0 When the memory is no longer required, the pointer is simply rewound to the starting point.<\/p>\n<p>One useful example of the stack allocator model is XML parsers, in which short-lived objects can be released all at once. Another example is SQL statement execution, in which all memory is released upon the statement\u2019s commit or rollback. The process of evaluating and executing a SQL statement involves prodigious amounts of memory allocation:\u00a0 The parser needs to build a parse tree of tokens; the optimizer is invoked next, which involves more allocations to hold possible execution plan steps (the optimizer\u2019s job is to assemble a sequence of steps that is less costly than any other sequence of steps); finally the execution plan must be carried out, which can involve more allocations to hold the result set.\u00a0 In all, executing a SQL statement can involve thousands of individual allocations. When the application program releases the SQL statement handle, all of that memory can be released.\u00a0 This fits the two-phase pattern in which objects are first allocated, and then de-allocated.<\/p>\n<p>An important by-product of the stack approach is improved safety: because memory is allocated and de-allocated in two phases (not in random order), the application doesn&#8217;t have to track individual allocations and it is impossible to accidentally introduce a memory leak through improper de-allocation.<\/p>\n<p>Another benefit is much improved performance. During allocation, there is no chain of pointers to walk.\u00a0 And de-allocation performance benefits even more: though there might have been thousands of allocations during the first phase, de-allocating the memory does not require calling the equivalent of free thousands of times.\u00a0 Rather, the stack pointer is rewound in one move.<\/p>\n<p>The stack allocator imposes virtually no overhead.\u00a0 There is a single stack pointer (4 bytes in a 32-bit system) in place of a chain of pointers in a list allocator or block allocator.<\/p>\n<p>A final benefit is improved resiliency. Whereas list and blick allocators intersperse chains of pointers in the heap, the stack allocator has just the stack pointer. So the chance of overwriting that memory and compromising the integrity of the allocator&#8217;s meta-data is greatly reduced.<\/p>\n<h2>Bitmap Allocator<\/h2>\n<p>The bitmap allocation algorithm is a little more complex. It acts on the memory pool by allocating objects in a pre-defined unit called the allocation quantum. The size of the quantum can be a word or a double word. A bitmap is a vector of bit-flags, with each bit corresponding to one quantum of memory. A bit value of 0 indicates a free quantum, while 1 is an allocated quantum.<\/p>\n<p>The bitmap allocator is similar to the block allocator in that it always allocates memory in a pre-specified size, but the bitmap allocator preserves the flexibility of a list allocator\u2019s ability to satisfy random allocation request sizes by finding adjacent free bits and allocating them together. Advantages of bitmap allocators include the following:<\/p>\n<ol>\n<li>They can allocate groups of objects sequentially, thus maintaining locality of reference.\u00a0 This means objects allocated sequentially are placed close to one another in storage, on the same physical &#8222;page&#8220;.<\/li>\n<li>Fragmentation is reduced. When objects are allocated by a quantum of comparable size, small unused holes in the memory pool are avoided.<\/li>\n<li>A performance advantage of the algorithm lies in the fact that the bitmaps themselves are not interleaved with main storage, which improves the locality of searching<\/li>\n<li>Keeping the bitmap separate from the heap also improves safety, in that memory overwrites (e.g. allocating 20 bytes and writing 24 bytes to the pointer that was returned) will not compromise the integrity of the allocator\u2019s meta-data.<\/li>\n<\/ol>\n<p>Bitmap allocators are commonly used in garbage collectors, database systems and file system disk block managers, where enforcing locality of reference and reducing fragmentation are imperative.<\/p>\n<h2>Thread-local Alligator<\/h2>\n<p>In today\u2019s world of multi-core processors and multi-threaded applications, developers need to constantly think about how to harness the power of multiple CPU cores. Increasing application performance depends on the proper use of multiple application threads, which in turn hinges on the right approach to memory management.<\/p>\n<p>In particular, multi-threaded applications running on multi-core systems can slow down markedly when performing many memory allocations. Often an application will run fine with a single CPU, but placing it on a system with two or more processors or processor cores yields a slowdown in performance, not the expected doubling of performance. This performance impact is very easy to miss at the application\u2019s design stage as it is hidden deep inside the C runtime library.<\/p>\n<p>Why does the standard list allocator perform so poorly in a multi-core environment? It stems from the fact that this allocator\u2019s chain of pointers is a shared resource that must be protected. Chaos would ensue if a thread in the middle of breaking the chain to insert a new link, was interrupted by a context switch, and another thread tried to walk the (now broken) chain.\u00a0 So, the chain is protected by a mutex to prevent concurrent access and preserve the allocator\u2019s consistency (see\u00a0<a title=\"Dynamic Memory Allocation: Justifiably Taboo? (PDF)\" href=\"https:\/\/www.microconsult.de\/wp-content\/uploads\/2025\/11\/fachinfo_ese_echt_dynamic_memory_allocation_justifiably_taboo_mcobject_llc_graves.pdf\">PDF\u00a0<\/a>for image).<\/p>\n<p>Locking a mutex presents minor overhead when there are no, or few, conflicts. However, as the number of malloc and free calls increases, contention for this shared resource increases, creating a lock conflict. To resolve the conflict, the operating system imposes a context switch, suspending the thread that attempted to access the allocator and inserting it into the kernel\u2019s waiting queue. When the allocator is released, the thread is allowed to run and access the allocator. Even if each thread accesses only objects that it created, and so otherwise requires no synchronization, there is still only one memory allocator; the allocator doesn\u2019t &#8222;know&#8220; that no synchronization is required, and acts to protect its meta-data, which results in a lot of conflicts between threads. As a result, the same application would perform better on a single CPU because the CPU can be kept busy with other tasks (it does not try to schedule tasks that are in the waiting queue). Conversely, in a multi-core setting, all but one core can be idled, with respect to dynamic memory management, because of the serialization of access to the heap (see\u00a0<a title=\"Dynamic Memory Allocation: Justifiably Taboo? (PDF)\" href=\"https:\/\/www.microconsult.de\/wp-content\/uploads\/2025\/11\/fachinfo_ese_echt_dynamic_memory_allocation_justifiably_taboo_mcobject_llc_graves.pdf\">PDF\u00a0<\/a>for image).<\/p>\n<p>The ideal solution, from a performance standpoint, would be for each thread to allocate objects on the stack rather than in dynamic memory (in other words, with local variables declared in the function body). However, this simplified approach is rarely viable: thread stack size is limited and therefore allocating large objects or a large number of smaller objects on the stack is impossible. A more practical approach is to provide a separate memory allocator for each thread, so that each allocator manages memory independently of the others. This approach is called a thread-local allocator. The thread-local allocator is a custom allocator that avoids creating locking conflicts when objects are allocated and released within a single task. When allocating in one task and de-allocating in another, a lock is, of course, required. However, this allocator takes measures to minimize those locking conflicts.<\/p>\n<p>The implementation of the thread-local memory manager is based on two concepts: the block allocator algorithm that we have already discussed, and the concept of thread-local storage, or TLS. Thread-local storage provides a means to map global memory to a local thread\u2019s memory. In other words, data in a global variable is usually located at the same memory location when it is referred to by threads from the same process. Sometimes, it is advantageous to have different threads that refer to the same global variable while referring to different memory locations. Thread-local storage accomplishes this. Likewise, the thread-local allocator maps portions of the global heap to individual threads. Again, the thread-local memory manager is based on the block allocator algorithm discussed earlier. The allocator creates and maintains a number of chains of same-size small blocks that are made out of large pages. To allocate memory, the allocator simply unlinks a block from the appropriate chain and returns the pointer to the block to the application. When and if a new large page is necessary, the allocator can use a general-purpose memory manager (standard malloc) to allocate the page.<\/p>\n<p>As long as all objects are allocated and de-allocated locally (by the same thread), this algorithm does not require any synchronization at all because each thread has its own allocator (and therefore doesn&#8217;t need to synchronization with any other thread when allocating\/de-allocating its own memory). What happens when objects are not local? The memory manager maintains a Pending free Requests List (PRL) for each thread. When an object allocated in one thread is being de-allocated by another thread, the de-allocating thread simply links the object into its PRL list. Of course, PRL access is protected by a mutex. Each thread periodically de-allocates objects in its PRL at once. When does this occur? It could be based on a timer, or when a certain number of requests are pending, or when a certain amount of memory has accumulated in the PRL, or according to any other application-specific criteria.<\/p>\n<p>It\u2019s important to note that regardless of the criteria, the number of synchronization requests is reduced significantly using this approach. First, objects are often freed by the same thread that allocated them. Second, even when the object is de-allocated by a different thread, it does not interfere with all other threads, but only with those that need to use the same PRL. For example, assume you have eight threads, and you know based on your application\u2019s logic flow that memory allocated by thread #1 will only ever be de-allocated by threads #4 or #7. Therefore, locking the PRL in any one of those threads will only interfere with the other two threads, not with all seven of the other threads, as would be the case with the default allocator. In this way, locking conflicts are reduced even when allocation\/de-allocation is not &#8222;local&#8220;.<\/p>\n<p>Following is a diagram of the internal structures of thread local allocators (see\u00a0<a title=\"Dynamic Memory Allocation: Justifiably Taboo? (PDF)\" href=\"https:\/\/www.microconsult.de\/wp-content\/uploads\/2025\/11\/fachinfo_ese_echt_dynamic_memory_allocation_justifiably_taboo_mcobject_llc_graves.pdf\" target=\"_blank\" rel=\"noopener\">PDF<\/a>).<\/p>\n<p>The thread-local allocator can create an arbitrary number of block lists, of varying sizes. The diagram shows just one possible example (see\u00a0<a title=\"Dynamic Memory Allocation: Justifiably Taboo? (PDF)\" href=\"https:\/\/www.microconsult.de\/wp-content\/uploads\/2025\/11\/fachinfo_ese_echt_dynamic_memory_allocation_justifiably_taboo_mcobject_llc_graves.pdf\" target=\"_blank\" rel=\"noopener\">PDF<\/a>).<\/p>\n<p>To applications, the allocator exports three functions with syntax similar to the standard C runtime allocation API:\u00a0 thread_malloc(), thread_realloc() and thread_free(). For applications written in C++, the memory manager\u2019s interface also includes a simple way to redefine the default new and delete operators.<\/p>\n<p>We developed two tests to examine the impact of the thread-local memory manager. \u00a0The first test compares performance of the thread-local allocator and the standard C runtime allocator when the allocation pattern is thread-local: all de-allocations are performed by the same thread as the original allocations. This is a \u201cbest case\u201d scenario.<\/p>\n<p>The second test compares performance when objects are allocated by a &#8222;producer&#8220; thread and freed by a &#8222;consumer&#8220; thread. This is a &#8222;worst case&#8220; scenario (see\u00a0<a title=\"Dynamic Memory Allocation: Justifiably Taboo? (PDF)\" href=\"https:\/\/www.microconsult.de\/wp-content\/uploads\/2025\/11\/fachinfo_ese_echt_dynamic_memory_allocation_justifiably_taboo_mcobject_llc_graves.pdf\" target=\"_blank\" rel=\"noopener\">PDF<\/a>).<\/p>\n<p>We ran these tests on a Sunfire x4450 system with four 6-core Xeon processors and 24GB of memory.\u00a0 The first test performed 10 million allocation\/free pairs in each of 24 threads (for a total of 240 million allocation\/free pairs). Because all the allocations were so-called local and required no synchronization, all 24 cores were utilized with the thread-local allocator. Standard malloc, of course, could only utilize a single core, and this accounts for the dramatic performance difference.<\/p>\n<p>The second test also performed 10 million allocation\/free pairs but with only two threads.\u00a0 In this case, performance improved, but only by about 20%, due to three factors:\u00a0 (1) Allocation doesn\u2019t require any synchronization, (2) there was some benefit from the reduced synchronization on the PRL (but minimal, because there were only two threads), and (3) the block allocator that the thread-local allocator uses is simply a superior allocator compared to standard malloc.<\/p>\n<p>The results show that significant performance improvements are obtained by replacing the standard allocation mechanism with a thread-local allocator, especially as the number of cores increases. This is a classic case of &#8222;your mileage may wary&#8220; (YMMV): The benefit that any application will experience will be a function of (1) the number of cores and (2) the ratio of local to global allocations and (3) the logic flow that determines the number of synchronizations on the PRLs.<\/p>\n<h2>Conclusion<\/h2>\n<p>Approaches to memory management significantly affect embedded code safety, performance and predictability, as well as prospects for DO-178B airborne software certification. This is due to the fact that dynamic memory allocation is risky. It can and should be eliminated from safety-critical processes. General purpose C language memory allocators are not optimized for embedded systems. When possible they should be replaced with custom allocators that deliver safety, predictability and reduced overhead when allocating memory. A number of algorithms can be considered, including bitmaps allocators, block allocators and stack-based allocators. Finally, the performance of multi-threaded applications on multi-core systems can be improved with a custom thread-local allocator.<\/p>\n<p><a title=\"Dynamic Memory Allocatin: Justifiably Taboo? (PDF)\" href=\"https:\/\/www.microconsult.de\/wp-content\/uploads\/2025\/11\/fachinfo_ese_echt_dynamic_memory_allocation_justifiably_taboo_mcobject_llc_graves.pdf\" target=\"_blank\" rel=\"noopener\"><strong>Beitrag als PDF-Datei herunterladen<\/strong><\/a><\/p>\n<div>\n<hr \/>\n<h2>Echtzeit &#8211; MicroConsult Trainings &amp; Coachings<\/h2>\n<p><strong>Wollen Sie sich auf den aktuellen Stand der Technik bringen?<\/strong><\/p>\n<p>Dann informieren Sie sich\u00a0<a title=\"Alle Trainings und Termine\" href=\"https:\/\/www.microconsult.de\/alle-trainings-termine-komplettuebersicht\/\" target=\"_blank\" rel=\"noopener\"><strong>hier<\/strong>\u00a0<\/a>zu Schulungen\/ Seminaren\/ Trainings\/ Workshops und individuellen Coachings von MircoConsult zum Thema Embedded- und Echtzeit-Softwareentwicklung.<\/p>\n<p><strong>Training &amp; Coaching zu den weiteren Themen unseren Portfolios finden Sie <a title=\"Training &amp; Beratung - alle Themen\" href=\"https:\/\/www.microconsult.de\/training-beratung\/\" target=\"_blank\" rel=\"noopener\">hier<\/a>.<\/strong><\/p>\n<hr \/>\n<h2>Echtzeit &#8211; Fachwissen<\/h2>\n<p>Wertvolles Fachwissen zum Thema\u00a0Architektur \/Embedded- und Echtzeit-Softwareentwicklung steht\u00a0<a title=\"Embedded Software Architektur Fachwissen\" href=\"https:\/\/www.microconsult.de\/die-7-wichtigsten-tipps-fuer-ihre-embedded-software-architektur\/\" target=\"_blank\" rel=\"noopener\"><strong>hier<\/strong><\/a>\u00a0f\u00fcr Sie zum kostenfreien Download bereit.<\/p>\n<p><a title=\"Embedded Software Architektur Fachwissen\" href=\"https:\/\/www.microconsult.de\/die-7-wichtigsten-tipps-fuer-ihre-embedded-software-architektur\/\" target=\"_blank\" rel=\"noopener\"><strong>Zu den Fachinformationen<\/strong><\/a><\/p>\n<p><strong>Fachwissen zu weiteren Themen unseren Portfolios finden Sie <a title=\"Fachinformationen\" href=\"https:\/\/www.microconsult.de\/fachwissen\/\" target=\"_blank\" rel=\"noopener\">hier<\/a>.<\/strong><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Avoiding Risks Using New Memory Management Strategies Author: Steven Graves, McObject LLC Beitrag &#8211; Embedded Software Engineering Kongress 2015 Abstract Developers of fault-tolerant embedded systems must identify and eliminate possible failure points. Dynamic memory allocation is one key concern. A sound approach contributes to predictable and robust systems, while inattention can lead to instability, slow [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","inline_featured_image":false,"footnotes":""},"categories":[],"tags":[],"class_list":["post-8059","post","type-post","status-publish","format-standard","hentry"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v28.0 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Dynamic Memory Allocation: Justifiably Taboo? - MicroConsult Academy GmbH<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.microconsult.de\/en\/dynamic-memory-allocation-justifiably-taboo\/\" \/>\n<meta property=\"og:locale\" content=\"en_GB\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Dynamic Memory Allocation: Justifiably Taboo? - MicroConsult Academy GmbH\" \/>\n<meta property=\"og:description\" content=\"Avoiding Risks Using New Memory Management Strategies Author: Steven Graves, McObject LLC Beitrag &#8211; Embedded Software Engineering Kongress 2015 Abstract Developers of fault-tolerant embedded systems must identify and eliminate possible failure points. Dynamic memory allocation is one key concern. A sound approach contributes to predictable and robust systems, while inattention can lead to instability, slow [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.microconsult.de\/en\/dynamic-memory-allocation-justifiably-taboo\/\" \/>\n<meta property=\"og:site_name\" content=\"MicroConsult Academy GmbH\" \/>\n<meta property=\"article:published_time\" content=\"2017-01-08T06:51:03+00:00\" \/>\n<meta name=\"author\" content=\"weissblau media\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"weissblau media\" \/>\n\t<meta name=\"twitter:label2\" content=\"Estimated reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"29 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/dynamic-memory-allocation-justifiably-taboo\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/dynamic-memory-allocation-justifiably-taboo\\\/\"},\"author\":{\"name\":\"weissblau media\",\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/#\\\/schema\\\/person\\\/b6d4c4ae959b068fbe8d9416ed019a0a\"},\"headline\":\"Dynamic Memory Allocation: Justifiably Taboo?\",\"datePublished\":\"2017-01-08T06:51:03+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/dynamic-memory-allocation-justifiably-taboo\\\/\"},\"wordCount\":5191,\"commentCount\":0,\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/www.microconsult.de\\\/dynamic-memory-allocation-justifiably-taboo\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/dynamic-memory-allocation-justifiably-taboo\\\/\",\"url\":\"https:\\\/\\\/www.microconsult.de\\\/dynamic-memory-allocation-justifiably-taboo\\\/\",\"name\":\"Dynamic Memory Allocation: Justifiably Taboo? - MicroConsult Academy GmbH\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/#website\"},\"datePublished\":\"2017-01-08T06:51:03+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/#\\\/schema\\\/person\\\/b6d4c4ae959b068fbe8d9416ed019a0a\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/dynamic-memory-allocation-justifiably-taboo\\\/#breadcrumb\"},\"inLanguage\":\"en-GB\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.microconsult.de\\\/dynamic-memory-allocation-justifiably-taboo\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/dynamic-memory-allocation-justifiably-taboo\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.microconsult.de\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Dynamic Memory Allocation: Justifiably Taboo?\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/#website\",\"url\":\"https:\\\/\\\/www.microconsult.de\\\/\",\"name\":\"MicroConsult Academy GmbH\",\"description\":\"Professionelle Schulungen, Beratung und Projektunterst\u00fctzung\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.microconsult.de\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-GB\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.microconsult.de\\\/#\\\/schema\\\/person\\\/b6d4c4ae959b068fbe8d9416ed019a0a\",\"name\":\"weissblau media\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-GB\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/bbb409da4970da9446f6c49465d453cb8a0dae301e4d4f465b5c4e62408daa2e?s=96&d=mm&r=g\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/bbb409da4970da9446f6c49465d453cb8a0dae301e4d4f465b5c4e62408daa2e?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/bbb409da4970da9446f6c49465d453cb8a0dae301e4d4f465b5c4e62408daa2e?s=96&d=mm&r=g\",\"caption\":\"weissblau media\"},\"sameAs\":[\"https:\\\/\\\/www.microconsult.de\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Dynamic Memory Allocation: Justifiably Taboo? - MicroConsult Academy GmbH","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.microconsult.de\/en\/dynamic-memory-allocation-justifiably-taboo\/","og_locale":"en_GB","og_type":"article","og_title":"Dynamic Memory Allocation: Justifiably Taboo? - MicroConsult Academy GmbH","og_description":"Avoiding Risks Using New Memory Management Strategies Author: Steven Graves, McObject LLC Beitrag &#8211; Embedded Software Engineering Kongress 2015 Abstract Developers of fault-tolerant embedded systems must identify and eliminate possible failure points. Dynamic memory allocation is one key concern. A sound approach contributes to predictable and robust systems, while inattention can lead to instability, slow [&hellip;]","og_url":"https:\/\/www.microconsult.de\/en\/dynamic-memory-allocation-justifiably-taboo\/","og_site_name":"MicroConsult Academy GmbH","article_published_time":"2017-01-08T06:51:03+00:00","author":"weissblau media","twitter_card":"summary_large_image","twitter_misc":{"Written by":"weissblau media","Estimated reading time":"29 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.microconsult.de\/dynamic-memory-allocation-justifiably-taboo\/#article","isPartOf":{"@id":"https:\/\/www.microconsult.de\/dynamic-memory-allocation-justifiably-taboo\/"},"author":{"name":"weissblau media","@id":"https:\/\/www.microconsult.de\/#\/schema\/person\/b6d4c4ae959b068fbe8d9416ed019a0a"},"headline":"Dynamic Memory Allocation: Justifiably Taboo?","datePublished":"2017-01-08T06:51:03+00:00","mainEntityOfPage":{"@id":"https:\/\/www.microconsult.de\/dynamic-memory-allocation-justifiably-taboo\/"},"wordCount":5191,"commentCount":0,"inLanguage":"en-GB","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.microconsult.de\/dynamic-memory-allocation-justifiably-taboo\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.microconsult.de\/dynamic-memory-allocation-justifiably-taboo\/","url":"https:\/\/www.microconsult.de\/dynamic-memory-allocation-justifiably-taboo\/","name":"Dynamic Memory Allocation: Justifiably Taboo? - MicroConsult Academy GmbH","isPartOf":{"@id":"https:\/\/www.microconsult.de\/#website"},"datePublished":"2017-01-08T06:51:03+00:00","author":{"@id":"https:\/\/www.microconsult.de\/#\/schema\/person\/b6d4c4ae959b068fbe8d9416ed019a0a"},"breadcrumb":{"@id":"https:\/\/www.microconsult.de\/dynamic-memory-allocation-justifiably-taboo\/#breadcrumb"},"inLanguage":"en-GB","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.microconsult.de\/dynamic-memory-allocation-justifiably-taboo\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.microconsult.de\/dynamic-memory-allocation-justifiably-taboo\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.microconsult.de\/"},{"@type":"ListItem","position":2,"name":"Dynamic Memory Allocation: Justifiably Taboo?"}]},{"@type":"WebSite","@id":"https:\/\/www.microconsult.de\/#website","url":"https:\/\/www.microconsult.de\/","name":"MicroConsult Academy GmbH","description":"Professional training, consulting and project support","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.microconsult.de\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-GB"},{"@type":"Person","@id":"https:\/\/www.microconsult.de\/#\/schema\/person\/b6d4c4ae959b068fbe8d9416ed019a0a","name":"weissblau media","image":{"@type":"ImageObject","inLanguage":"en-GB","@id":"https:\/\/secure.gravatar.com\/avatar\/bbb409da4970da9446f6c49465d453cb8a0dae301e4d4f465b5c4e62408daa2e?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/bbb409da4970da9446f6c49465d453cb8a0dae301e4d4f465b5c4e62408daa2e?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/bbb409da4970da9446f6c49465d453cb8a0dae301e4d4f465b5c4e62408daa2e?s=96&d=mm&r=g","caption":"weissblau media"},"sameAs":["https:\/\/www.microconsult.de"]}]}},"post_mailing_queue_ids":[],"_links":{"self":[{"href":"https:\/\/www.microconsult.de\/en\/wp-json\/wp\/v2\/posts\/8059","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microconsult.de\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microconsult.de\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microconsult.de\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microconsult.de\/en\/wp-json\/wp\/v2\/comments?post=8059"}],"version-history":[{"count":6,"href":"https:\/\/www.microconsult.de\/en\/wp-json\/wp\/v2\/posts\/8059\/revisions"}],"predecessor-version":[{"id":11622,"href":"https:\/\/www.microconsult.de\/en\/wp-json\/wp\/v2\/posts\/8059\/revisions\/11622"}],"wp:attachment":[{"href":"https:\/\/www.microconsult.de\/en\/wp-json\/wp\/v2\/media?parent=8059"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microconsult.de\/en\/wp-json\/wp\/v2\/categories?post=8059"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microconsult.de\/en\/wp-json\/wp\/v2\/tags?post=8059"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}