Optimizing the runtime linker
The runtime linker supports the following features that you can use to optimize the way it resolves and relocates symbols:
The term lazy in all of them can cause confusion, so let's compare them briefly before looking at them in detail:
- Lazy binding is the process by which symbol resolution is deferred until a symbol is actually used.
- RTLD_LAZY indicates to the runtime linker that an a loaded object might have unresolved symbols that it shouldn't worry about resolving. It's up to the developer to load the objects that define the symbols before calling any functions that use the symbols.
- Lazy loading modifies the lookup scope and avoids loading objects (or even looking them up) before the linker needs to search them for a symbol.
RTLD_LAZY doesn't imply anything about whether dependencies will be loaded; it says where a symbol will be looked up. It allows the looking up of symbols that are subsequently opened with the RTLD_GLOBAL flag, when looking up a symbol in an RTLD_LAZY-opened object and its resolution scope fails. The term resolution scope is intentional since we don't know what it is by just looking at RTLD_LAZY; it differs depending on whether you specify RTLD_WORLD, RTLD_LAZYLOAD, or both.
Lazy binding (also known as lazy linking or on-demand symbol resolution) is the process by which symbol resolution isn't done until a symbol is actually used. Functions can be bound on-demand, but data references can't.
All dynamically resolved functions are called via a Procedure Linkage Table (PLT) stub. A PLT stub uses relative addressing, using the Global Offset Table (GOT) to retrieve the offset. The PLT knows where the GOT is, and uses the offset to this table (determined at program linking time) to read the destination function's address and make a jump to it.
To be able to do that, the GOT must be populated with the appropriate addresses. Lazy binding is implemented by providing some stub code that gets called the first time a function call to a lazy-resolved symbol is made. This stub is responsible for setting up the necessary information for a binding function that the runtime linker provides. The stub code then jumps to it.
The binding function sets up the arguments for the resolving function, calls it, and then jumps to the address returned from resolving function. The next time that user code calls this function, the PLT stub jumps directly to the resolved address, since the resolved value is now in the GOT. (GOT is initially populated with the address of this special stub; the runtime linker does only a simple relocation for the load base.)
The semantics of lazy-bound (on-demand) and now-bound (at load time) programs are the same:
- In the bind-now case, the application fails to load if a symbol couldn't be resolved.
- In the lazy-bound case, it doesn't fail right away (since it didn't check to see if it could resolve all the symbols) but will still fail on the first call to an unresolved symbol. This doesn't change even if the application later calls dlopen() to load an object that defines that symbol, because the application can't change the resolution scope. The only exceptions to this rule are objects loaded using dlopen() with the RTLD_LAZY flag (see below).
Lazy binding is controlled by the -z option to the linker, ld . This option takes keywords as an argument; the keywords include (among others):
- When generating an executable or shared library, mark it to tell the dynamic linker to defer function-call resolution to the point when the function is called (lazy binding), rather than at load time.
- When generating an executable or shared library, mark it to tell the dynamic linker to resolve all symbols when the program is started, or when the shared library is linked to using dlopen(), instead of deferring function-call resolution to the point when the function is first called.
Lazy binding is the default. If you're using qcc (as we recommend), use the -W option to pass the -z option to ld. For example, specify -Wl,-zlazy or -Wl,-znow.
There are cases where the default lazy binding isn't desired. For example:
- While the system is under development, you might want to fully resolve all symbols right away, to catch library mismatches; your application would fail to load if a referenced function couldn't be resolved.
- You might want to fully resolve the symbols for a particular object at load time.
- You might want only a given program to be always bound right away.
There's a way to do each of these:
- To change the default lazy binding to the bind now behavior for all
processes started from a given shell, set the LD_BIND_NOW
environment variable to a non-null value. For example:
By default, pdebug sets LD_BIND_NOW to 1.
Without LD_BIND_NOW, you see a different backtrace for the first function call into the shared object as the runtime linker resolves the symbol. On subsequent calls to the same function, the backtrace would be as expected. You can prevent pdebug from setting LD_BIND_NOW by specifying the -l (el) option.
- To override the binding strategy for a given shared object, link it
with the -znow linker option:
qcc -Wl,-znow -o libfoo.so foo.o bar.o
- To override the binding for all objects of a given program, link the
program's executable with the -znow option:
qcc -Wl,-znow -o foobar -lfoo.so -lbar.so
To see if a binary was built with -znow, type:
readelf -d my_binary
The output will include the BIND_NOW dynamic tag if -znow was used when linking.
Applications with many symbols — typically C++ applications — benefit the most from lazy binding. For many C applications, the difference is negligible.
Lazy binding does introduce some overhead; it takes longer to resolve N symbols using lazy binding than with immediate resolution. There are two aspects that potentially save time or at least improve the user's perception of system performance:
- When you start an application, the runtime linker doesn't resolve all symbols, so you may expect to see the initial screen sooner, providing your initialization prior to displaying the screen doesn't end up calling most of the symbols anyway.
- When the application is running, many symbols won't be used and thus they aren't looked up.
Both of the above are typically true for C++ applications.
Lazy binding could affect realtime performance because there's a delay the first time you access each unresolved symbol, but this delay isn't likely to be significant, especially on fast machines. If this delay is a problem, use -znow
It isn't sufficient to use -znow on the shared object that has a function definition for handling something critical; the whole process must be resolved now. For example, you should probably link driver executables with -znow or run drivers with LD_BIND_NOW.
RTLD_LAZY is a flag that you can pass to dlopen() when you load a shared object. Even though the word lazy in the name suggests that it's about lazy binding as described above in Lazy binding, it has different semantics. It makes (semantically) no difference whether a program is lazy- or now- bound, but for objects that you load with dlopen(), RTLD_LAZY means there may be symbols that can't be resolved; don't try to resolve them until they're used. This flag currently applies only to function symbols, not data symbols.
What does it practically mean? To explain that, consider a system that comprises an executable X, and shared objects P (primary) and S (secondary). X uses dlopen() to load P, and P loads S. Let's assume that P has a reference to some_function(), and S has the definition of some_function().
If X opens P without RTLD_LAZY binding, the symbol some_function() doesn't get resolved — not at the load time, nor later by opening S. However, if P is loaded with RTLD_LAZY | RTLD_WORLD, the runtime linker doesn't try to resolve the symbol some_function(), and there's an opportunity for us to call dlopen("S", RTLD_GLOBAL) before calling some_function(). This way, the some_function() reference in P will be satisfied by the definition of some_function() in S.
There are several programming models made possible by RTLD_LAZY:
- X uses dlopen() to load P and calls a function in P; P determines its own requirements and loads the object with the appropriate implementation. For that, P needs to be opened with RTLD_LAZY. For example, the X server opens a video driver (P), and the video driver opens its own dependencies.
- X uses dlopen() to load P, and then determines the implementation that P needs to use (that is, P is a user interface, and S is the skin implementation).
Lazy dependency loading (or on-demand dependency loading) is a method of loading the required objects when they're actually required. The most important effect of lazy loading is that the resolution scope is different for a lazyload dependency. While in a normal dependency, the resolution scope contains immediate dependencies followed by their dependencies sorted in breadth-first order, for a lazy-loaded object, the resolution scope ends with its first-level dependencies. Therefore, all of the lazy-loaded symbols must be satisfied by definitions in its first level dependencies.
Due to this difference, you must carefully consider whether lazy-load dependencies are suitable for your application.
Each dynamic object can have multiple dependencies. Dependencies can be immediate or implicit:
- Immediate dependencies are those that directly satisfy all external references of the object.
- Implicit dependencies are those that satisfy dependencies of the object's dependencies.
The ultimate dependent object is the executable binary itself, but we will consider any object that needs to resolve its external symbols to be dependent. When referring to immediate or implicit dependencies, we always view them from the point of view of the dependent object.
Here are some other terms:
- Lazy-load dependency
- Dependencies that aren't immediately loaded are referred to as lazy-load dependencies.
- Lookup scope/resolution scope
- A list of objects where a symbol is looked for. The lookup scope is determined at the object's load time.
- Immediate and lazy symbol resolution
- All symbolic references must be resolved. Some symbol resolutions need to be performed immediately, such as symbolic references to global data. Another type of symbolic references can be resolved on first use: external function calls. The first type of symbolic references are referred to as immediate, and the second as lazy.
To use lazy loading, specify the RTLD_LAZYLOAD flag when you call dlopen().
The runtime linker creates the link map for the executable in the usual way, by creating links for each DT_NEEDED object. Lazy dependencies are represented by a special link, a placeholder that doesn't refer to actual object yet. It does, however, contain enough information for the runtime linker to look up the object and load it on demand.
The lookup scope for the dependent object and its regular dependencies is the link map, while for each lazy dependency symbol, the lookup scope gets determined on-demand, when the object is actually loaded. Its lookup scope is defined in the same way that we define the lookup scope for an object loaded with dlopen(RTLD_GROUP) (it's important that RTLD_WORLD not be specified, or else we'd be including all RTLD_GLOBAL objects in the lookup scope).
When a call to an external function is made from dependent object, by using the lazy binding mechanism we traverse its scope of resolution in the usual way. If we find the definition, we're done. If, however, we reach a link that refers to a not-yet-loaded dependency, we load the dependency and then look it up for the definition. We repeat this process until either a definition is found, or we've traversed the entire dependency list. We don't traverse any of the implicit dependencies.
The same mechanism applies to resolving immediate relocations. If a dependent object has a reference to global data, and we don't find the definition of it in the currently loaded objects, we proceed to load the lazy dependencies, the same way as described above for resolving a function symbol. The difference is that this happens at the load time of the dependent object, not on first reference.
This approach preserves the symbol-overriding mechanisms provided by LD_PRELOAD.
Another important thing to note is that lazy-loaded dependencies change their own lookup scope; therefore, when resolving a function call from a lazy-loaded dependency, the lookup scope will be different than if the dependency was a normal dependency. As a consequence, lazy loading can't be transparent as, for example, lazy binding is (lazy binding doesn't change the lookup scope, only the time of the symbol lookup).
Diagnostics and debugging
When you're developing a complex application, it may become difficult to understand how the dynamic linker lays out the internal link maps and scopes of resolution. To help determine what exactly the dynamic linker is doing, you can use the DL_DEBUG environment variable to make the linker display diagnostic messages.
Diagnostic messages are categorized, and the value of DL_DEBUG determines which categories are displayed. The special category help doesn't produce diagnostics messages, but rather displays a help message and then terminates the application.
To redirect diagnostic messages to a file, set the LD_DEBUG_OUTPUT environment variable to the full path of the output file.
For security reasons, the use of LD_DEBUG_OUTPUT with setuid binaries is disabled.
The following environment variables affect the operation of the dynamic linker:
- Display diagnostic messages.
The value can be a comma-separated list of the following:
- all — display all debug messages.
- help — display a help message, and then exit.
- reloc — display relocation processing messages.
- libs — display information about shared objects being opened.
- statistics — display runtime linker statistics.
- lazyload — print lazy-load debug messages.
- debug — print various runtime linker debug messages.
A value of 1 (one) is the same as all.
- A synonym for DL_DEBUG; if you set both variables, DL_DEBUG takes precedence.
- The name of a file in which the dynamic linker writes its output.
By default, output is written to stderr.
For security reasons, the use of LD_DEBUG_OUTPUT with setuid binaries is disabled.
- Affects lazy-load dependencies due to full symbol resolution. Typically, it forces the loading of all lazy-load dependencies (until all symbols have been resolved).
Last modified: 2014-12-11