Synchronization services

The BlackBerry 10 OS provides the POSIX-standard thread-level synchronization primitives, some of which are useful even between threads in different processes. The synchronization services include at least the following:

Synchronization service Supported between processes Supported across a BlackBerry 10 OS LAN
Mutexes Yes No
Condvars Yes No
Barriers No No
Sleepon locks No No
Reader/writer locks Yes No
Semaphores Yes Yes (named only)
FIFO scheduling Yes No
Send/Receive/Reply Yes Yes
Atomic operations Yes No

The above synchronization primitives are implemented directly by the kernel, except for:

  • Barriers, sleepon locks, and reader/writer locks (which are built from mutexes and condvars)
  • Atomic operations (which are either implemented directly by the processor or emulated in the kernel)

You should allocate mutexes, condvars, barriers, reader/writer locks, and semaphores, as well as objects you plan to use atomic operations on, only in normal memory mappings. On certain processors, atomic operations and calls such as pthread_mutex_lock() cause a fault if the object is allocated in uncached memory.

Mutexes: mutual exclusion locks

Mutual exclusion locks, or mutexes, are the simplest of the synchronization services. A mutex is used to ensure exclusive access to data shared between threads. A mutex is typically acquired ( pthread_mutex_lock() or pthread_mutex_timedlock()) and released ( pthread_mutex_unlock()) around the code that accesses the shared data (usually a critical section).

Only one thread may have the mutex locked at any given time. Threads attempting to lock an already locked mutex blocks until the thread that owns the mutex unlocks it. When the thread unlocks the mutex, the highest-priority thread waiting to lock the mutex unblocks and become the new owner of the mutex. In this way, threads sequence through a critical region in priority-order.

On most processors, acquisition of a mutex doesn't require entry to the kernel for a free mutex. What allows this is the use of the compare-and-swap opcode on x86 processors and the load/store conditional opcodes on most RISC processors.

Entry to the kernel is done at acquisition time only if the mutex is already held so that the thread can go on a blocked list; kernel entry is done on exit if other threads are waiting to be unblocked on that mutex. This allows acquisition and release of an uncontested critical section or resource to be very quick, incurring work by the OS only to resolve contention.

A nonblocking lock function ( pthread_mutex_trylock()) can be used to test whether the mutex is currently locked or not. For best performance, the execution time of the critical section should be small and of bounded duration. A condvar should be used if the thread may block within the critical section.

Priority inheritance and mutexes

By default, if a thread with a higher priority than the mutex owner attempts to lock a mutex, then the effective priority of the current owner is increased to that of the higher-priority blocked thread waiting for the mutex. The current owner's effective priority is again adjusted when it unlocks the mutex; its new priority is the maximum of its own priority and the priorities of those threads it still blocks, either directly or indirectly. This scheme not only ensures that the higher-priority thread is blocked waiting for the mutex for the shortest possible time, but also solves the classic priority-inversion problem.

The pthread_mutexattr_init() function sets the protocol to PTHREAD_PRIO_INHERIT to allow this behavior; you can call pthread_mutexattr_setprotocol() to override this setting. The pthread_mutex_trylock() function doesn't change the thread priorities because it doesn't block.

You can also modify the attributes of the mutex (using pthread_mutexattr_settype()) to allow a mutex to be recursively locked by the same thread. This can be useful to allow a thread to call a routine that might attempt to lock a mutex that the thread already happens to have locked.

Condition variables

A condition variable, or condvar, is used to block a thread within a critical section until some condition is satisfied. The condition can be arbitrarily complex and is independent of the condvar. However, the condvar must always be used with a mutex lock to implement a monitor. A condvar supports three operations:

Note that there's no connection between a condvar signal and a POSIX signal.

Here's a typical example of how a condvar can be used:

pthread_mutex_lock( &m );

/* ... */

while (!arbitrary_condition) {
    pthread_cond_wait( &cv, &m );
    }

/* ... */

pthread_mutex_unlock( &m );

In this code sample, the mutex is acquired before the condition is tested. This ensures that only this thread has access to the arbitrary condition being examined. While the condition is true, the code sample blocks on the wait call until some other thread performs a signal or broadcast on the condvar.

The while loop is required for two reasons. First of all, POSIX cannot guarantee that false wakeups do not occur (for example, multiprocessor systems). Second, when another thread has made a modification to the condition, we need to retest to ensure that the modification matches our criteria. The associated mutex is unlocked atomically by pthread_cond_wait() when the waiting thread is blocked to allow another thread to enter the critical section.

A thread that performs a signal unblocks the highest-priority thread queued on the condvar, while a broadcast unblocks all threads queued on the condvar. The associated mutex is locked atomically by the highest-priority unblocked thread; the thread must then unlock the mutex after proceeding through the critical section.

A version of the condvar wait operation allows a timeout to be specified ( pthread_cond_timedwait()). The waiting thread can then be unblocked when the timeout expires.

An application shouldn't use a PTHREAD_MUTEX_RECURSIVE mutex with condition variables because the implicit unlock performed for a pthread_cond_wait() or pthread_cond_timedwait() may not actually release the mutex (if it's been locked multiple times). If this happens, no other thread can satisfy the condition of the predicate.

Barriers

A barrier is a synchronization mechanism that lets you corral several cooperating threads (for example, in a matrix computation), forcing them to wait at a specific point until all have finished before any one thread can continue. Unlike the pthread_join() function, where you would wait for the threads to terminate, in the case of a barrier you are waiting for the threads to rendezvous at a certain point. When the specified number of threads arrive at the barrier, we unblock all of them so they can continue to run.

You first create a barrier with pthread_barrier_init():

#include <pthread.h>

int
pthread_barrier_init (pthread_barrier_t *barrier,
                      const pthread_barrierattr_t *attr,
                      unsigned int count);

This creates a barrier object at the passed address (a pointer to the barrier object is in barrier), with the attributes as specified by attr. The count member holds the number of threads that must call pthread_barrier_wait().

When the barrier is created, each thread calls pthread_barrier_wait() to indicate that it has completed:

#include <pthread.h>

int pthread_barrier_wait (pthread_barrier_t *barrier);

When a thread calls pthread_barrier_wait(), it blocks until the number of threads specified initially in the pthread_barrier_init() function have called pthread_barrier_wait() (and blocked also). When the correct number of threads have called pthread_barrier_wait(), all those threads unblock at the same time.

Here's an example:

/*
 *  barrier1.c
 */

#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <time.h>
#include <pthread.h>
#include <sys/neutrino.h>

/* Barrier synchronization object */
pthread_barrier_t   barrier;

void *
thread1 (void *not_used)
{
    time_t  now;

    time (&now);
    printf ("thread1 starting at %s", ctime (&now));

    /* Do the computation
       Let's just do a sleep here... */
    sleep (20);
    pthread_barrier_wait (&barrier);

    /* After this point, all three threads have completed */
    time (&now);
    printf ("barrier in thread1() done at %s", ctime (&now));
}

void *
thread2 (void *not_used)
{
    time_t  now;

    time (&now);
    printf ("thread2 starting at %s", ctime (&now));

    /* Do the computation
       Let's just do a sleep here... */
    sleep (40);
    pthread_barrier_wait (&barrier);

    /* After this point, all three threads have completed */
    time (&now);
    printf ("barrier in thread2() done at %s", ctime (&now));
}

/* Ignore any arguments to main() */
int main ()
{
    time_t  now;

    /* Create a barrier object with a count of 3 */
    pthread_barrier_init (&barrier, NULL, 3);

    /* Start up two threads, thread1 and thread2 */
    pthread_create (NULL, NULL, thread1, NULL);
    pthread_create (NULL, NULL, thread2, NULL);

    /* At this point, thread1 and thread2 are running */

    /* Now wait for completion */
    time (&now);
    printf ("main() waiting for barrier at %s", ctime (&now));
    pthread_barrier_wait (&barrier);

    /* After this point, all three threads have completed */
    time (&now);
    printf ("barrier in main() done at %s", ctime (&now));
    pthread_exit( NULL );
    return (EXIT_SUCCESS);
}

The main thread created the barrier object and initialized it with a count of the total number of threads that must be synchronized to the barrier before the threads may carry on. In the example above, we used a count of 3: one for the main() thread, one for thread1(), and one for thread2().

Then we start thread1() and thread2(). To simplify this example, we have the threads sleep to cause a delay, as if computations were occurring. To synchronize, the main thread simply blocks itself on the barrier, knowing that the barrier unblocks only after the two worker threads have joined it as well.

In this release, the following barrier functions are included:

Function Description
pthread_barrierattr_getpshared() Get the value of a barrier's process-shared attribute
pthread_barrierattr_destroy() Destroy a barrier's attributes object
pthread_barrierattr_init() Initialize a barrier's attributes object
pthread_barrierattr_setpshared() Set the value of a barrier's process-shared attribute
pthread_barrier_destroy() Destroy a barrier
pthread_barrier_init() Initialize a barrier
pthread_barrier_wait() Synchronize participating threads at the barrier

Sleepon locks

Sleepon locks are very similar to condvars, with a few subtle differences. Like condvars, sleepon locks ( pthread_sleepon_lock()) can be used to block until a condition becomes true (like a memory location changing value). But unlike condvars, which must be allocated for each condition to be checked, sleepon locks multiplex their functionality over a single mutex and dynamically allocated condvar, regardless of the number of conditions being checked. The maximum number of condvars ends up being equal to the maximum number of blocked threads. These locks are patterned after the sleepon locks commonly used within the UNIX kernel.

Reader/writer locks

More formally known as multiple readers, single writer locks, these locks are used when the access pattern for a data structure consists of many threads reading the data, and (at most) one thread writing the data. These locks are more expensive than mutexes, but can be useful for this data access pattern. This lock works by allowing all the threads that request a read-access lock ( pthread_rwlock_rdlock()) to succeed in their request. But when a thread wishing to write asks for the lock ( pthread_rwlock_wrlock()), the request is denied until all the current reading threads release their reading locks ( pthread_rwlock_unlock()).

Multiple writing threads can queue (in priority order) waiting for their chance to write the protected data structure, and all the blocked writer-threads get to run before reading threads are allowed access again. The priorities of the reading threads are not considered.

There are also calls ( pthread_rwlock_tryrdlock() and pthread_rwlock_trywrlock()) to allow a thread to test the attempt to achieve the requested lock, without blocking. These calls return with a successful lock or a status indicating that the lock couldn't be granted immediately.

Reader/writer locks aren't implemented directly within the kernel, but are instead built from the mutex and condvar services provided by the kernel.

Semaphores

Semaphores are another common form of synchronization that allows threads to post and wait on a semaphore to control when threads wake or sleep. The post ( sem_post()) operation increments the semaphore; the wait ( sem_wait()) operation decrements it.

If you wait on a semaphore that is positive, you do not block. Waiting on a nonpositive semaphore blocks until some other thread executes a post. It is valid to post one or more times before a wait. This use allows one or more threads to execute the wait without blocking.

A significant difference between semaphores and other synchronization primitives is that semaphores are async safe and can be manipulated by signal handlers. If the desired effect is to have a signal handler wake a thread, semaphores are the right choice.

Note that in general, mutexes are much faster than semaphores, which always require a kernel entry. Semaphores don't affect a thread's effective priority; if you need priority inheritance, use a mutex. For more information, see Mutexes: mutual exclusion locks.

Another useful property of semaphores is that they were defined to operate between processes. Although our mutexes work between processes, the POSIX thread standard considers this an optional capability and as such may not be portable across systems. For synchronization between threads in a single process, mutexes are more efficient than semaphores.

As a useful variation, a named semaphore service is also available. It lets you use semaphores between processes on different machines connected by a network.

Note that named semaphores are slower than the unnamed variety.

Since semaphores, like condition variables, can legally return a nonzero value because of a false wake-up, correct usage requires a loop:

while (sem_wait(&s) && (errno == EINTR)) { do_nothing(); }
do_critical_region();   /* Semaphore was decremented */

Synchronization via scheduling policy

By selecting the POSIX FIFO scheduling policy, we can guarantee that no two threads of the same priority execute the critical section concurrently on a non-SMP system. The FIFO scheduling policy dictates that all FIFO-scheduled threads in the system at the same priority run, when scheduled, until they voluntarily release the processor to another thread.

This release can also occur when the thread blocks as part of requesting the service of another process, or when a signal occurs. The critical region must therefore be carefully coded and documented so that later maintenance of the code doesn't violate this condition.

In addition, higher-priority threads in that (or any other) process could still preempt these FIFO-scheduled threads. So, all the threads that could collide within the critical section must be FIFO-scheduled at the same priority. Having enforced this condition, the threads can then casually access this shared memory without having to first make explicit synchronization calls.

This exclusive-access relationship doesn't apply in multiprocessor systems, since each CPU could run a thread simultaneously through the region that would otherwise be serially scheduled on a single-processor machine.

Synchronization via message passing

Our Send/Receive/Reply message-passing IPC services (described later) implement an implicit synchronization by their blocking nature. These IPC services can, in many instances, render other synchronization services unnecessary. They are also the only synchronization and IPC primitives (other than named semaphores, which are built on top of messaging) that can be used across the network.

Synchronization via atomic operations

In some cases, you may want to perform a short operation (such as incrementing a variable) with the guarantee that the operation performs atomically—that is, the operation won't be preempted by another thread or ISR (Interrupt Service Routine). The BlackBerry 10 OS provides atomic operations for:

  • Adding a value
  • Subtracting a value
  • Clearing bits
  • Setting bits
  • Toggling (complementing) bits

These atomic operations are available by including the C header file <atomic.h>.

Although you can use these atomic operations just about anywhere, these two cases are particularly useful:

  • Between an ISR and a thread
  • Between two threads (SMP or single-processor)

Since an ISR can preempt a thread at any given point, the only way that the thread would be able to protect itself would be to disable interrupts. Since you should avoid disabling interrupts in a realtime system, you should use the atomic operations provided with BlackBerry 10 OS.

On an SMP system, multiple threads can and do run concurrently. Again, we run into the same situation as with interrupts above—you should use the atomic operations where applicable to eliminate the need to disable and reenable interrupts.

Synchronization services implementation

The following table lists the various microkernel calls and the higher-level POSIX calls constructed from them:

Last modified: 2015-05-07



Got questions about leaving a comment? Get answers from our Disqus FAQ.

comments powered by Disqus