Heap analysis: making memory errors a thing of the past

If you develop a program that dynamically allocates memory, you're also responsible for tracking any memory that you allocate whenever a task is performed, and for releasing that memory when it's no longer required. If you fail to track the memory correctly, you may introduce memory leaks or unintentionally write to an area outside of the memory space. Conventional debugging techniques usually prove to be ineffective for locating the source of corruption or leak because memory-related errors typically manifest themselves in an unrelated part of the program. Tracking down an error in a multithreaded environment becomes even more complicated because the threads all share the same memory address space.

We'll describe how BlackBerry 10 OS manages the heap and introduce you to a special version of our memory management functions that will help you to diagnose your memory management problems.

Dynamic memory management

In a program, you'll dynamically request memory buffers or blocks of a particular size from the runtime environment using malloc(), realloc(), or calloc(), and then you'll release them back to the runtime environment when they're no longer required using free().

The memory allocator ensures that your requests are satisfied by managing a region of the program's memory area known as the heap. In this heap, it tracks all of the information — such as the size of the original block — about the blocks and heap buffers that it has allocated to your program, in order that it can make the memory available to you during subsequent allocation requests. When a block is released, it places it on a list of available blocks called a free list. It usually keeps the information about a block in the header that precedes the block itself in memory.

The runtime environment grows the size of the heap when it no longer has enough memory available to satisfy allocation requests, and it returns memory from the heap to the system when the program releases memory.

The basic heap allocation mechanism is broken up into two separate pieces, a chunk-based small block allocator and a list-based large block allocator. By configuring specific parameters, you can select the various sizes for the chunks in the chunk allocator and also the boundary between the small and the large allocator.

Arena allocations

Both the small and the large portions of the allocator allocate and deallocate memory from the system in the form of arena chunks by calling mmap() and munmap(). By default, the arena allocations are performed in 32 KB chunks. This value is specified by one of the following:

  • A global variable that's defined in the allocator, but can be redefined or modified in the application
  • The _amblksiz global variable

This value must be a multiple of 4 KB, and currently is limited to being less than 256 KB. You can also configure this parameter by setting the MALLOC_ARENA_SIZE environment variable, or by calling mallopt() with MALLOC_ARENA_SIZE as the command.

For example, if you want to change the arena size to 16 KB, do one of the following:

  • _amblksiz = 16384;
  • export MALLOC_ARENA_SIZE=16384
  • mallopt(MALLOC_ARENA_SIZE, 16384);

Environment variables are checked only at program startup, but changing them is the easiest way of configuring parameters for the allocator so that these parameters are used for allocations that occur before main().

The allocator also attempts to cache recently used arena blocks. This cache is shared between the small- and the large-block allocator. You can configure the arena cache by setting the following environment variables:

MALLOC_ARENA_CACHE_MAXBLK
The number of cached arena blocks.
MALLOC_ARENA_CACHE_MAXSZ
The total size of the cached arena blocks.

Alternatively, you can call:

mallopt(MALLOC_ARENA_CACHE_MAXSZ, size);
mallopt(MALLOC_ARENA_CACHE_MAXBLK, number);

You can tell the allocator to never release memory back to the system from its arena cache by setting the environment variable:

export MALLOC_MEMORY_HOLD=1

or by calling:

mallopt(MALLOC_MEMORY_HOLD, 1);

Once you've changed the values by using mallopt() for either MALLOC_ARENA_CACHE_MAXSZ or MALLOC_ARENA_CACHE_MAXBLK, you must call mallopt() to cause the arena cache to be adjusted immediately:

// Adjust the cache to the current parameters or
// release all cache blocks, but don't change parameters
mallopt(MALLOC_ARENA_CACHE_FREE_NOW, 0);
mallopt(MALLOC_ARENA_CACHE_FREE_NOW, 1);

Without a call with a command of MALLOC_ARENA_CACHE_FREE_NOW, the changes made to the cache parameters will take effect whenever memory is subsequently released to the cache.

So for example, if the arena cache currently has 10 blocks, for a total size of say 320 KB, and if you change the arena parameters to MALLOC_ARENA_CACHE_MAXBLK = 5 and MALLOC_ARENA_CACHE_MAXSZ = 200 KB, an immediate call to mallopt(MALLOC_ARENA_CACHE_FREE_NOW, 0) will reduce the cache so that the number of blocks is no more than 5, and the total cache size is no more than 320 KB. If you don't make the call to mallopt(), then no immediate changes are made. If the application frees some memory, causing a new arena of size 32 KB to get released to the system, this will not be cached, but will be released to the system immediately.

You can use MALLOC_ARENA_CACHE_MAXSZ and MALLOC_ARENA_CACHE_MAXBLK either together or independently. A value of zero is ignored.

You can preallocate and populate the arena cache by setting the MALLOC_MEMORY_PREALLOCATE environment variable to a value that specifies the size of the total arena cache. The cache is populated by multiple arena allocation calls in chunks whose size is specified by the value of MALLOC_ARENA_SIZE.

The preallocation option doesn't alter the MALLOC_ARENA_CACHE_MAXBLK and MALLOC_ARENA_CACHE_MAXSZ options. So if you preallocate 10 MB of memory in cache blocks, to ensure that this memory stays in the application throughout the lifetime of the application, you should also set the values of MALLOC_ARENA_CACHE_MAXBLK and MALLOC_ARENA_CACHE_MAXSZ to something appropriate.

Small block configuration

You configure the small blocks by setting various bands of different sizes. Each band defines a fixed size block, and a number that describes the size of the pool for that size. The allocator initially adjusts all band sizes to be multiples of _MALLOC_ALIGN (which is 8), and then takes the size of the pools and normalizes them so that each band pool is constructed from a pool size of 4 KB.

By default, bands in the allocator are defined as:

  • _MALLOC_ALIGN × 2 = 16
  • _MALLOC_ALIGN × 3 = 24
  • _MALLOC_ALIGN × 4 = 32
  • _MALLOC_ALIGN × 6 = 48
  • _MALLOC_ALIGN × 8 = 64
  • _MALLOC_ALIGN × 10 = 80
  • _MALLOC_ALIGN × 12 = 96
  • _MALLOC_ALIGN × 16 = 128

so the smallest small block is 16 bytes, and the largest small block is 128 bytes. Allocations larger than the largest band size are serviced by the large allocator.

After initial normalization by the allocator, the band sizes and the pool sizes are adjusted to the following:

Band size Number of items
16 167
24 125
32 100
48 71
64 55
80 45
96 38
128 28

This normalization takes into account alignment restrictions and overhead needed by the allocator to manage these blocks. The number of items is the number of blocks of the given size that are created each time a new bucket is allocated.

You can specify you own band configurations by defining the following in your application's code:

typedef struct Block Block;
typedef struct Band Band;

struct Band {
  short nbpe; /* element size */
  short nalloc; /* elements per block */
  size_t slurp;
  size_t esize;
  size_t mem;
  size_t rem;
  unsigned nalloc_stats;
  Block * alist;  /* Blocks that have data to allocate */
  Block * dlist;  /* completely allocated (depleted) Blocks */
  unsigned    blk_alloced;    /* #blocks allocated */
  unsigned    blk_freed;      /* #blocks freed */
  unsigned    alloc_counter;  /* allocs */
  unsigned    free_counter;   /* frees */
  unsigned    blk_size;     /* size of allocated blocks */
};

static Band a1 = { _MALLOC_ALIGN*2, 32, 60};
static Band a2 = { _MALLOC_ALIGN*3, 32, 60};
static Band a3 = { _MALLOC_ALIGN*4, 32, 60};
static Band a4 = { _MALLOC_ALIGN*5, 24, 60};
static Band a5 = { _MALLOC_ALIGN*6, 24, 60};
static Band a6 = { _MALLOC_ALIGN*7, 24, 60};
static Band a7 = { _MALLOC_ALIGN*8, 16, 60};
static Band a8 = { _MALLOC_ALIGN*9, 8, 60};
static Band a9 = { _MALLOC_ALIGN*10, 8, 60};
static Band a10 = { _MALLOC_ALIGN*11, 8, 60};
static Band a11 = { _MALLOC_ALIGN*12, 8, 60};
static Band a12 = { _MALLOC_ALIGN*13, 8, 60};
static Band a13 = { _MALLOC_ALIGN*32, 10, 60};
Band *__dynamic_Bands[] = { &a1, &a2, &a3, &a4, &a5, &a6,
                            &a7, &a8, &a9, &a10, &a11, &a12,
                            &a13,
                          };
unsigned __dynamic_nband=13;

The main variables are *__dynamic_Bands[] and __dynamic_Bands, which specify the band configurations and the number of bands. For example, the following line:

static Band a9 = { _MALLOC_ALIGN*10, 8, 60};

specifies a band size of 80 bytes, with each chunk having at least 8 blocks, and a preallocation value of 60. The allocator first normalizes the band size to 80, and the number of items to 45. Then during initialization of the allocator, it preallocates at least 60 blocks of this size band. (Each bucket will have 45 blocks, so 60 blocks will be constructed from two buckets).

If you specify your own bands:

  • The sizes must all be distinct.
  • The band configuration must be provided in ascending order of sizes (that is, band 0 size < band 1 size < band 2 size, and so on).

A user-specified band configuration of:

static Band a1 = { 2, 32, 60};
static Band a2 = { 15, 32, 60};
static Band a3 = { 29, 32, 60};
static Band a4 = { 55, 24, 60};
static Band a5 = { 100, 24, 60};
static Band a6 = { 130, 24, 60};
static Band a7 = { 260, 8, 60};
static Band a8 = { 600, 4, 60};
Band *__dynamic_Bands[] = {&a1, &a2, &a3, &a4,
                           &a5, &a6, &a7, &a8, 
                          };
unsigned __dynamic_nband=8;

will be normalized to:

Band size Number of items
8 251
16 167
32 100
56 62
104 35
136 27
264 13
600 5

For the above configuration, allocations larger than 600 bytes will be serviced by the large block allocator.

When used in conjunction with the MALLOC_MEMORY_PREALLOCATE option for the arena cache, the preallocations of blocks in bands are performed by initially populating the arena cache, and then allocating bands from this arena cache.

You can also configure the bands by using the MALLOC_BAND_CONFIG_STR environment variable. The string format is:

N:s1,n1,p1:s2,n2,p2:s3,n3,p3: ... :sN,nN,pN

where the components are:

s
The band size.
n
The number of items.
p
The preallocation value, which can be zero.

You must specify s, n, and p for each band. The string can't include any spaces; the only valid characters are digits, colons (:), and commas (,). Position is important. The parsing is simple and strict: sizes are assumed to be provided in ascending order, further validation is done by the allocator. If the allocator doesn't like the string, it ignores it completely.

Last modified: 2014-12-11



Got questions about leaving a comment? Get answers from our Disqus FAQ.

comments powered by Disqus