Shared memory offers the highest bandwidth IPC available. When a shared-memory object is created, processes with access to the object can use pointers to directly read and write into it. This means that access to shared memory is in itself unsynchronized. If a process is updating an area of shared memory, care must be taken to prevent another process from reading or updating the same area. Even in the simple case of a read, the other process may get information that is in flux and inconsistent.
To solve these problems, shared memory is often used in conjunction with one of the synchronization primitives to make updates atomic between processes. If the granularity of updates is small, then the synchronization primitives themselves limits the inherently high bandwidth of using shared memory. Shared memory is therefore most efficient when used for updating large amounts of data as a block.
Both semaphores and mutexes are suitable synchronization primitives for use with shared memory. Semaphores were introduced with the POSIX realtime standard for interprocess synchronization. Mutexes were introduced with the POSIX threads standard for thread synchronization. Mutexes may also be used between threads in different processes. POSIX considers this an optional capability; we support it. In general, mutexes are more efficient than semaphores.
Shared memory with message passing
Shared memory and message passing can be combined to provide IPC that offers:
- Very high performance (shared memory)
- Synchronization (message passing)
- Network transparency (message passing)
Using message passing, a client sends a request to a server and blocks. The server receives the messages in priority order from clients, processes them, and replies when it can satisfy a request. At this point, the client is unblocked and continues. The very act of sending messages provides natural synchronization between the client and the server. Rather than copy all the data through the message pass, the message can contain a reference to a shared-memory region, so the server could read or write the data directly. This is best explained with a simple example.
Let's assume a graphics server accepts draw image requests from clients and renders them into a frame buffer on a graphics card. Using message passing alone, the client would send a message containing the image data to the server. This would result in a copy of the image data from the client's address space to the server's address space. The server would then render the image and issue a short reply.
If the client didn't send the image data inline with the message, but instead sent a reference to a shared-memory region that contained the image data, then the server could access the client's data directly.
Since the client is blocked on the server as a result of sending it a message, the server knows that the data in shared memory is stable and does not change until the server replies. This combination of message passing and shared memory achieves natural synchronization and very high performance.
This model of operation can also be reversed—the server can generate data and give it to a client. For example, suppose a client sends a message to a server that reads video data directly from a CD-ROM into a shared memory buffer provided by the client. The client is blocked on the server while the shared memory is being changed. When the server replies and the client continues, the shared memory is stable for the client to access. This type of design can be pipelined using more than one shared-memory region.
Simple shared memory can't be used between processes on different computers connected via a network. Message passing, on the other hand, is network transparent. A server could use shared memory for local clients and full message passing of the data for remote clients. This allows you to provide a high-performance server that is also network transparent.
In practice, the message-passing primitives are more than fast enough for the majority of IPC needs. The added complexity of a combined approach need only be considered for special applications with very high bandwidth.
Creating a shared-memory object
Multiple threads within a process share the memory of that process. To share memory between processes, you must first create a shared-memory region and then map that region into your process's address space. Shared-memory regions are created and manipulated using the following calls:
|shm_open()||Open (or create) a shared-memory region.||POSIX|
|close()||Close a shared-memory region.||POSIX|
|mmap()||Map a shared-memory region into a process's address space.||POSIX|
|munmap()||Unmap a shared-memory region from a process's address space.||POSIX|
|munmap_flags()||Unmap previously mapped addresses, exercising more control than possible with munmap()||BlackBerry 10 OS|
|mprotect()||Change protections on a shared-memory region.||POSIX|
|msync()||Synchronize memory with physical storage.||POSIX|
|shm_ctl(), shm_ctl_special()||Give special attributes to a shared-memory object.||BlackBerry 10 OS|
|shm_unlink()||Remove a shared-memory region.||POSIX|
POSIX shared memory is implemented in the BlackBerry 10 OS via the process manager (procnto). The above calls are implemented as messages to procnto (see Process manager).
The shm_open() function takes the same arguments as open() and returns a file descriptor to the object. As with a regular file, this function lets you create a new shared-memory object or open an existing shared-memory object.
You must open the file descriptor for reading; if you want to write in the memory object, you also need write access, unless you specify a private (MAP_PRIVATE) mapping.
When a new shared-memory object is created, the size of the object is set to zero. To set the size, you use ftruncate()—the very same function used to set the size of a file —or shm_ctl().
When you have a file descriptor to a shared-memory object, you use the mmap() function to map the object, or part of it, into your process's address space. The mmap() function is the cornerstone of memory management within BlackBerry 10 OS and deserves a detailed discussion of its capabilities.
You can also use mmap() to map files and typed memory objects into your process's address space.
The mmap() function is defined as follows:
void * mmap( void *where_i_want_it, size_t length, int memory_protections, int mapping_flags, int fd, off_t offset_within_shared_memory );
In simple terms this says: Map in length bytes of shared memory at offset_within_shared_memory in the shared-memory object associated with fd.
The mmap() function tries to place the memory at the address where_i_want_it in your address space. The memory is given the protections specified by memory_protections and the mapping is done according to the mapping_flags.
The three arguments fd, offset_within_shared_memory, and length define a portion of a particular shared object to be mapped in. It's common to map in an entire shared object, in which case the offset is zero and the length is the size of the shared object in bytes. On an Intel processor, the length is a multiple of the page size, which is 4096 bytes.
The return value of mmap() is the address in your process's address space where the object was mapped. The argument where_i_want_it is used as a hint by the system to where you want the object placed. If possible, the object is placed at the address requested. Most applications specify an address of zero, which gives the system free rein to place the object where it wants.
The following protection types may be specified for memory_protections:
|PROT_EXEC||Memory may be executed.|
|PROT_NOCACHE||Memory should not be cached.|
|PROT_NONE||No access allowed.|
|PROT_READ||Memory may be read.|
|PROT_WRITE||Memory may be written.|
You should use the PROT_NOCACHE manifest when you're using a shared-memory region to gain access to dual-ported memory that may be modified by hardware (for example, a video frame buffer or a memory-mapped network or communications board). Without this manifest, the processor may return stale data from a previously cached read.
The mapping_flags determine how the memory is mapped. These flags are broken down into two parts—the first part is a type and must be specified as one of the following:
|MAP_SHARED||The mapping may be shared by many processes; changes are propagated back to the underlying object.|
|MAP_PRIVATE||The mapping is private to the calling process; changes aren't propagated back to the underlying object. The mmap() function allocates system RAM and makes a copy of the object.|
The MAP_SHARED type is the one to use for setting up shared memory between processes; MAP_PRIVATE has more specialized uses.
You can OR a number of flags into the above type to further define the mapping. These are described in detail in mmap(). A few of the more interesting flags are:
- Map anonymous memory that isn't associated with any file descriptor;
you must set the fd parameter to NOFD. The mmap()
function allocates the memory, and by default, fills the allocated memory with zeros;
see Initializing allocated
You commonly use MAP_ANON with MAP_PRIVATE, but you can use it with MAP_SHARED to create a shared memory area for forked applications. You can use MAP_ANON as the basis for a page-level memory allocator.
- Map the object to the address specified by where_i_want_it. If a shared-memory region contains pointers within it, then you may need to force the region at the same address in all processes that map it. This can be avoided by using offsets within the region in place of direct pointers.
- This flag indicates that you want to deal with physical memory. The
fd parameter should be set to NOFD. When used without MAP_ANON, the offset_within_shared_memory specifies the exact physical address to map
(for example, for video frame buffers). If used with MAP_ANON, then physically contiguous memory is allocated (for example,
for a DMA buffer).
You can use MAP_NOX64K and MAP_BELOW16M to further define the MAP_ANON allocated memory and address limitations present in some forms of DMA.
You should use mmap_device_memory() instead of MAP_PHYS, unless you're allocating physically contiguous memory.
- Used with MAP_PHYS | MAP_ANON. The allocated memory area does not cross a 64-KB boundary. This is required for the old 16-bit PC DMA.
- Used with MAP_PHYS | MAP_ANON. The allocated memory area resides in physical memory below 16 MB. This is necessary when using DMA with ISA bus devices.
- Relax the POSIX requirement to zero the allocated memory; see Initializing allocated memory.
Using the mapping flags described above, a process can easily share memory between processes:
/* Map in a shared memory region */ fd = shm_open("datapoints", O_RDWR); addr = mmap(0, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
or allocate a DMA buffer for a bus-mastering PCI network card:
/* Allocate a physically contiguous buffer */ addr = mmap(0, 262144, PROT_READ|PROT_WRITE|PROT_NOCACHE, MAP_PHYS|MAP_ANON, NOFD, 0);
You can unmap all or part of a shared-memory object from your address space using munmap(). This primitive isn't restricted to unmapping shared memory—it can be used to unmap any region of memory within your process. When used in conjunction with the MAP_ANON flag to mmap(), you can easily implement a private page-level allocator/deallocator.
You can change the protections on a mapped region of memory using mprotect(). Like munmap(), mprotect() isn't restricted to shared-memory regions—it can change the protection on any region of memory within your process.
Initializing allocated memory
POSIX requires that mmap() zero any memory that it allocates. It can take a while to initialize the memory, so BlackBerry 10 OS provides a way to relax the POSIX requirement. This allows for faster starting, but can be a security problem. Avoiding initializing the memory requires the cooperation of the process doing the unmapping and the one doing the mapping:
munmap_flags() function is a non-POSIX function that's similar to
munmap() but lets you control what happens when the memory is next
int munmap_flags( void *addr, size_t len, unsigned flags );
If you specify a flags argument of 0, munmap_flags() behaves the same as munmap() does.
The following bits control the clearing of memory on allocation:
- POSIX initialization of the page to all zeroes is required the next time the underlying physical memory is allocated.
- Initialization of the underlying physical memory to zeroes on its next allocation is optional.
- If you specify the MAP_NOINIT flag to mmap(), and the physical memory being mapped was previously unmapped with UNMAP_INIT_OPTIONAL, the POSIX requirement that the memory be zeroed is relaxed.
By default, the kernel initializes the memory, but you can control this by using the -m option to procnto . The argument to this option is a string that lets you enable or disable aspects of the memory manager:
- munmap() acts as if UNMAP_INIT_REQUIRED were specified.
- munmap() acts as if UNMAP_INIT_OPTIONAL were specified.
By default when memory is freed for later reuse, the contents of that memory remain untouched; whatever the application that owned the memory left behind is left intact until the next time that memory is allocated by another process. With the munmap_flags() function you can control the clearing when unmapping.
You can also use the -m option to procnto to control the default behavior:
- Clear memory when it's freed.
- Don't clear memory when it's freed (the default). When memory is freed for later reuse, the contents of that memory remain untouched; whatever the application that owned the memory left behind is left intact until the next time that memory is allocated by another process. At that point, before the memory is handed to the next process, it's zeroed.
Last modified: 2015-05-07