Memory management

While some realtime kernels or executives provide support for memory protection in the development environment, few provide protected memory support for the runtime configuration, citing penalties in memory and performance as reasons. But with memory protection becoming common on many embedded processors, the benefits of memory protection far outweigh the very small penalties in performance for enabling it. The key advantage gained by adding memory protection to embedded applications, especially for mission-critical systems, is improved robustness.

With memory protection, if one of the processes executing in a multitasking environment attempts to access memory that hasn't been explicitly declared or allocated for the type of access attempted, the MMU hardware can notify the OS, which can then abort the thread (at the failing/offending instruction).

This protects process address spaces from each other, preventing coding errors in a thread in one process from damaging memory used by threads in other processes or even in the OS. This protection is useful both for development and for the installed runtime system, because it makes postmortem analysis possible.

During development, common coding errors (for example, stray pointers and indexing beyond array bounds) can result in one process/thread accidentally overwriting the data space of another process. If the overwriting touches memory that isn't referenced again until much later, you can spend hours of debugging—often using in-circuit emulators and logic analyzers—in an attempt to find the guilty party.

With an MMU enabled, the OS can abort the process the instant the memory-access violation occurs, providing immediate feedback to the programmer instead of mysteriously crashing the system some time later. The OS can then provide the location of the errant instruction in the failed process, or position a symbolic debugger directly on this instruction.

Memory management units (MMUs)

A typical MMU operates by dividing physical memory into a number of 4-KB pages. The hardware within the processor then uses a set of page tables stored in system memory that define the mapping of virtual addresses (that is, the memory addresses used within the application program) to the addresses emitted by the CPU to access physical memory. While the thread executes, the page tables managed by the OS control how the memory addresses that the thread is using are mapped onto the physical memory attached to the processor.

Diagram showing the Intel Memory Managment Unit.

For a large address space with many processes and threads, the number of page-table entries needed to describe these mappings can be significant—more than can be stored within the processor. To maintain performance, the processor caches frequently used portions of the external page tables within a TLB (translation look-aside buffer).

The servicing of misses on the TLB cache is part of the overhead imposed by enabling the MMU. Our OS uses various clever page-table arrangements to minimize this overhead.

Associated with these page tables are bits that define the attributes of each page of memory. Pages can be marked as read-only, read-write, and so on. Typically, the memory of an executing process would be described with read-only pages for code, and read-write pages for the data and stack.

When the OS performs a context switch (that is, suspends the execution of one thread and resumes another), it manipulates the MMU to use a potentially different set of page tables for the newly resumed thread. If the OS is switching between threads within a single process, no MMU manipulations are necessary.

When the new thread resumes execution, any addresses generated as the thread runs are mapped to physical memory through the assigned page tables. If the thread tries to use an address not mapped to it, or it tries to use an address in a way that violates the defined attributes (for example, writing to a read-only page), the CPU receives a fault (similar to a divide-by-zero error), typically implemented as a special type of interrupt.

By examining the instruction pointer pushed on the stack by the interrupt, the OS can determine the address of the instruction that caused the memory-access fault within the thread/process and can act accordingly.

Memory protection at run time

While memory protection is useful during development, it can also provide greater reliability for embedded systems installed in the field. Many embedded systems already employ a hardware watchdog timer to detect if the software or hardware has lost its mind, but this approach lacks the finesse of an MMU-assisted watchdog. Hardware watchdog timers are usually implemented as a retriggerable monostable timer attached to the processor reset line. If the system software doesn't strobe the hardware timer regularly, the timer expires and force a processor reset. Typically, some component of the system software checks for system integrity and strobe the timer hardware to indicate the system is sane.

Although this approach enables recovery from a lockup related to a software or hardware glitch, it results in a complete system restart and perhaps significant downtime while this restart occurs.

Software watchdog

When an intermittent software error occurs in a memory-protected system, the OS can catch the event and pass control to a user-written thread instead of the memory dump facilities. This thread can make an intelligent decision about how best to recover from the failure, instead of forcing a full reset as the hardware watchdog timer would do. The software watchdog could:

  • Abort the process that failed due to a memory access violation and simply restart that process without shutting down the rest of the system.
  • Abort the failed process and any related processes, initialize the hardware to a safe state, and then restart the related processes in a coordinated manner.
  • If the failure is very critical, perform a coordinated shutdown of the entire system and sound an audible alarm.

The important distinction here is that we retain intelligent, programmed control of the embedded system, even though various processes and threads within the control software may have failed for various reasons. A hardware watchdog timer is still of use to recover from hardware latch-ups, but for software failures we now have much better control.

While performing some variation of these recovery strategies, the system can also collect information about the nature of the software failure. For example, if the embedded system contains or has access to some mass storage (flash memory, hard drive, a network link to another computer with disk storage), the software watchdog can generate a chronologically archived sequence of dump files. These dump files could then be used for postmortem diagnostics.

Embedded control systems often employ these partial restart approaches to surviving intermittent software failures without the operators experiencing any system downtime or even being aware of these quick-recovery software failures. Since the dump files are available, the developers of the software can detect and correct software problems without having to deal with the emergencies that result when critical systems fail at inconvenient times. If we compare this to the hardware watchdog timer approach and the prolonged interruptions in service that result, it's obvious what our preference is!

Postmortem dump-file analysis is especially important for mission-critical embedded systems. Whenever a critical system fails in the field, significant effort should be made to identify the cause of the failure so that a fix can be engineered and applied to other systems before they experience similar failures.

Dump files give programmers the information they need to fix the problem—without them, programmers may have little more to go on than a customer's cryptic complaint that the system crashed.

Quality control

By dividing embedded software into a team of cooperating, memory-protected processes (containing threads), we can readily treat these processes as components to be used again in new projects. Because of the explicitly defined (and hardware-enforced) interfaces, these processes can be integrated into applications with confidence that they won't disrupt the system's overall reliability. In addition, because the exact binary image (not just the source code) of the process is being reused, we can better control changes and instabilities that might have resulted from recompilation of source code, relinking, new versions of development tools, header files, library routines, and so on. Since the binary image of the process is reused (with its behavior perhaps modified by command-line options), the confidence we have in that binary module from acquired experience in the field more easily carries over to new applications than if the binary image of the process were changed.

As much as we strive to produce error-free code for the systems we deploy, the reality of software-intensive embedded systems is that programming errors end up in released products. Rather than pretend these bugs don't exist (until the customer calls to report them), we should adopt a mission-critical mindset. Systems should be designed to be tolerant of, and able to recover from, software faults. Making use of the memory protection delivered by integrated MMUs in the embedded systems we build is a good step in that direction.

Full-protection model

Our full-protection model relocates all code in the image into a new virtual space, enabling the MMU hardware and setting up the initial page-table mappings. This allows procnto to start in a correct, MMU-enabled environment. The process manager then takes over this environment, changing the mapping tables as needed by the processes it starts.

Private virtual memory

In the full-protection model, each process is given its own private virtual memory, which spans to 2 or 3.5 gigabytes (depending on the CPU). This is accomplished by using the CPU's MMU. The performance cost for a process switch and a message pass increases due to the increased complexity of obtaining addressability between two completely private address spaces.

Diagram showing full protection virtual memory on an X86.

The memory cost per process may increase by 4 KB to 8 KB for each process's page tables. Note that this memory model supports the POSIX fork() call.

Variable page size

The virtual memory manager may use variable page sizes if the processor supports them and there's a benefit to doing so. Using a variable page size can improve performance because:

  • You can increase the page size beyond 4 KB. As a result, the system uses fewer TLB entries.
  • There are fewer TLB misses.

If you want to disable the variable page size feature, specify the -m~v option to procnto in your buildfile. The -mv option enables it.

Locking memory

The BlackBerry 10 OS supports POSIX memory locking, so that a process can avoid the latency of fetching a page of memory, by locking the memory so that the page is memory-resident (that is, it remains in physical memory). The levels of locking are as follows:

Unlocked memory can be paged in and out. Memory is allocated when it's mapped, but page table entries aren't created. The first attempt to access the memory fails, and the thread stays in the WAITPAGE state while the memory manager initializes the memory and creates the page table entries.

Failure to initialize the page results in the receipt of a SIGBUS signal.

Locked memory may not be paged in or out. Page faults can still occur on access or reference, to maintain usage and modification statistics. Pages that you think are PROT_WRITE are still actually PROT_READ. This is so that, on the first write, the kernel may be alerted that a MAP_PRIVATE page now is different from the shared backing store, and must be privatized.

To lock and unlock a portion of a thread's memory, call mlock() and munlock(); to lock and unlock all of a thread's memory, call mlockall() and munlockall(). The memory remains locked until the process unlocks it, exits, or calls an exec*() function. If the process calls fork(), a posix_spawn*() function, or a spawn*() function, the memory locks are released in the child process.

More than one process can lock the same (or overlapping) region; the memory remains locked until all the processes have unlocked it. Memory locks don't stack; if a process locks the same region more than once, unlocking it once undoes all of the process's locks on that region.

To lock all memory for all applications, specify the -ml option for procnto . Thus all pages are at least initialized (if still set only to PROT_READ).

(A BlackBerry 10 OS extension) No faulting is allowed at all; all memory must be initialized and privatized, and the permissions set, as soon as the memory is mapped. Superlocking covers the thread's whole address space.

To superlock memory, obtain I/O privileges by:

  1. Enabling the PROCMGR_AID_IO ability. For more information, see procmgr_ability().
  2. Calling ThreadCtl(), specifying the _NTO_TCTL_IO flag:

    ThreadCtl( _NTO_TCTL_IO, 0 );

To superlock all memory for all applications, specify the -mL option for procnto .

For MAP_LAZY mappings, memory isn't allocated or mapped until the memory is first referenced for any of the above types. When it's been referenced, it obeys the above rules—it's a programmer error to touch a MAP_LAZY area in a critical region (where interrupts are disabled or in an ISR) that hasn't already been referenced.

Last modified: 2015-05-07

Got questions about leaving a comment? Get answers from our Disqus FAQ.

comments powered by Disqus