Best practices: OpenGL ES

This section will outline best practices for your OpenGL ES application.

Best practices: Performance

The performance of your application can be of particular interest to you, because you don't want your users waiting too long. Performance varies depending on the hardware available to you. One important thing to remember is that, you're targeting an embedded software device, where memory and battery power is limited compared with that of a desktop computer.

Profile your application

Profiling you application allows you to determine where your application is bottlenecking, and eliminating bottlenecks can improve the performance of your application. The Momentics IDE for BlackBerry provides tools to help you profile your applications. For more information on profiling your application, check out Analyze allocation patterns.

Use the smallest amount of memory possible

  • Always free memory once you're done with it. For example, after linking your shaders to a program object, delete the shader, and free the vertex data.
  • If you don't need all your resources at one time, separate them into subsets. For example, if your application has levels, separate the resources you need for each level and load the resouces when you need them.
  • Make sure your data types use exactly what's needed. So, if you expect values to range from 0 to 255, use an unsigned char instead of an int.

Use simple lighting models

Lighting requires a fair amount of calculations, so using them only when necessary can be beneficial to performance. You can also calculate your lighting colors early in your application, then store in a texture to sample.

Minimize the number of state settings and draw calls

Setting OpenGL ES state values many times between drawing calls reduces the performance of your application. Try to avoid these redundant calls by holding a copy of the current state settings. Generally, you want to:
  • Set OpenGL ES states at most once between drawing calls
  • Set state that only affects the next drawing call
  • Set states only when that are set is a different value, not the same value

Every time you call OpenGL ES drawing commands, the CPU prepares them for processing on the GPU. You can reduce this CPU work by batching your draw calls. Say you want to draw a simple 2-D square, instead of using two separate triangles, use a triangle strip that uses fewer primitive components.

Best practices: Application design

When designing your application, there are some guidelines to follow:
  • Use parallelism when appropriate
  • Manage the flow of control from the application to the GPU
  • No two devices are identical, so you should choose a target device or target devices, and set a benchmark for its performance
  • You should access the default frame buffer only using OpenGL ES. Some GPUs use deferred rendering, so not all your drawing commands are executed immediately. The commands are put into a queue and executed as needed. Do not access the default frame buffer from the CPU, since this flushes the drawing commands that are in the queue, and then your application will have to wait for all the commands to finish
  • Specify the fixed frame rate that you want to target, because the smoothest animations are run at a constant rate

Flush the OpenGL ES command queue sparingly

Generally, you should avoid flushing operations. Because rendering is deferred, not all drawing commands are executed immediately. The glFlush() function renders everything in the queue and waits for everything in the queue to finish, which is an time-consuming operation. Similarly, if you query OpenGL ES states using glGet*() or glGetError(), all drawing commands are executed so the state variables are in the correct state. This means you should avoid making these calls during mid-frame; instead, make sure they start or end of frame rendering.

Always use double buffering

You should attach at least two buffers for all application windows to avoid flickering, tearing, and other artifacts. If you have a single buffered window, visual glitches occur while rendering, because the application is rendering to the same window as the compositor. Single buffered windows are supported by the windowing system, but visual glitches are likely to occur which can detriment user experience.

Double buffering also allows you to prepare your next frame while the previous frame is rendered. You application renders to the back buffer and the compositor renders the front buffer. Double buffering can also help avoid resource conflicts when your application and OpenGL ES access the same object.

Use OpenGL ES objects when possible

OpenGL ES lets you store data types persistently. If you use OpenGL ES objects to store your data, OpenGL ES can reduce the overhead of transforming the data and sending to the GPU. If that data is used multiple times, OpenGL ES can significantly improve your application's performance.

Best practices: Vertices

Generally, your application configures the graphics pipeline and submits the primitive elements that you want to draw. Regardless of which primitive elements you use or how you configure your pipeline, your application provides vertex data to OpenGL ES. A vertex consists of one of more attributes, such as the position, color, or texture coordinates. OpenGL ES 2.0 and 3.0 allow you to define custom vertex attributes, whereas the OpenGL ES 1.1 API uses attributes that are defined in the fixed-function pipeline.

Use the most efficient triangle primitive element

OpenGL ES supports three types of triangle-based primitive elements: triangle lists (separate triangles in a list), triangle strips, and triangle fans. All three types can be indexed or non-indexed. Triangle strips are equally as flexible as triangle lists, but, on the PowerVR SGX540 platform, triangle lists are the most efficient.

Use interleaved vertex data

There are several ways to store vertex data. You could interleave it, so that all the data for one vertex follows right behind the all the data for the previous vertex. Alternatively, you can keep attributes in separate arrays or all in one array. In general, interleaved vertex data gives better performance because all data required to process each vertex can be grabbed in one sequential read, improving cache efficiency. However, if you have a vertex attribute array that you want to share across several meshes, then putting this attribute its own sequential array results in better performance.

Use VBOs to store data

Use vertex buffer objects (VBOs) to store vertex and index data so that OpenGL ES can perform optimizations on the data. Also, don't create a VBO for every mesh; consider grouping meshes that are rendered together to minimize buffer rebinding. For dynamic vertex data, define one buffer object for each update.

Simply your vertex models

Because mobile devices have smaller screens than desktop computers, images on screen are often small. So, you don't need complex vertex models to render the compelling graphics you want. Here are the general guidelines you should follow:
  • Reduce the number of vertices you use for your model
  • Use multiple versions of your model at different levels of detail. If the model is at a far distance from the view point (it is smaller on the screen), the less detailed model is ideal because the additional detail won't be significantly noticeable.
  • Use textures as often as possible.

Best practices: Textures

As you probably know, a texture is an OpenGL ES object that contains one or more images that have the same format. This section outlines some best practices you should follow when you create textures.

Use an appropriate texture size

A common misconception about textures is that bigger textures look better on the screen. Using the maximum texture size that displays on part of the screen is uses memory unnecessarily. Pick your texture sizes by examining where the textures are used, generally, you want to map one texel to every pixel that covers the object from the distance closest to the view point.

When you reduce your textures, try to do so uniformly. The PowerVR SGX540 platform supports non-power-of-two textures to the extent required by the specification. Non-power-of-two textures don't support mipmapping.

Load a texture during initialization

Loading a texture can be a time-consuming operation, so load it when the application or a level starts. The operation can be time-consuming because the PowerVR SGX540 platform uses a layout that follows a plane-filling curve to improve memory locality when texturing, so loading textures is an time-consuming reformatting operation. You should avoid loading texture data during mid-frame. Also, you should set your texture parameters before you load it with image data because OpenGL ES can optimize your texture data based on the parameters you set.

Compress your textures

Texture compression conserves memory, increases performance, and allows for mipmapping. The PowerVR SGX540 platform supports PVRTC and ETC texture compression formats. For more information, see Optimizing with texture compression.

Use mipmaps

Mipmaps are small predefined variants of a texture image. Each mipmap represents a different level of detail for a texture. The GPU can use mipmaps with a minification filter to calculate the level of detail automatically that is closest to mapping one texel of a mipmap to one pixel in the render target.

Best practices: Shaders

You use shaders to determine the appropriate levels of light and dark in your image. You use the OpenGL ES Shading Language to define your shaders and use them to specify rendering effects in your app. The OpenGL ES 1.1 API uses a fixed-function pipeline, which means you can use only the pixel-shading and geometric transformations that are available. The OpenGL ES 2.0 and 3.0 APIs use a programmable pipeline, which means that you more control over what is rendered. This section outlines some best practices for optimizing your shaders.

Pick the appropriate precision

The PowerVR SGX540 platform supports multiple types of precision, and picking the right balance is important. Choosing lower precision increases performance, but it can also introduce artifacts. Generally, you should start with high precision, and gradually reduce the precision level until artifacts appear.

High precision is represented by 32-bit floating-point values. Use this precision for all vertex position calculations, including world, view, and projection matrices. You can also use it for most texture coordinate, lighting, and scalar calculations.

Medium precision is represented by 16-bit floating-point values. This precision typically offers only a minor performance improvement over high precision, but it can reduce storage space that you can use for storing varying variables that you use for texture coordinates.

Low precision is represented by 10-bit fixed point values, ranging from -65520 to 65520. The precision is useful for representing colors and reading data from low-precision textures.

Reduce the number of varying variables you use

Varying variables represent the outputs from the vertex shader. They are interpolated across a triangle and then fed into the fragment shader. Try to use as few varying variables as possible, because each one uses buffer space for parameters and processing cycles for interpolation. You can reduce the space and memory required to store a whole scene in a parameter buffer by using a lower precision. The PowerVR SGX540 platform supports up to eight varying variables between the vertex and fragment shaders.

Make uniform updates that are unique

Uniform variables represent values that are constant for all vertices or fragments, and they are executed on a per-vertex basis. Try to avoid redundant uniform updates between drawing calls, as uniform updates can be an time-consuming operation. You should also be careful about how many uniform variables you use, because uniform variables need more memory and time to run.

When you perfom uniform calculations, always ensure that uniform variables are processed first:
uniform highp mat4 modelview, projection;
attribute vec4 modelPosition;
gl_Position = (projection * modelview) * modelPosition;

Isolate vector and scalar calculations

Not all GPUs include a vector processor; some GPUs perform vector calculations on a scalar processor. Depending on the order of the calculations, equations can be evaluated using more multiplications than necessary on scalar processors. On vector processors, each multiplication is processed in parallel with one another in vector calculations. Generally, you want to isolate same-type calculations first. The following code sample keeps scalar calculations isolated as long as possible:
highp vec4 v1, v2;
highp float x, y;
v2 = v1 * (x * y);

Compile and link shaders

Generally, you want to compile and link your shaders to a program object at the start of your application. Compiling and linking can be time-consuming operations, so it's best to perform them at the start of your application.

After you compile and link your shaders to a program object, you should delete the shaders, which means you have more memory to work with. When you compile your shaders, you should always check for errors; otherwise there could be error that you don't expect. You should also check to the link status to confirm that everything went well.

For information on best practices for Cascades and OpenGL ES, see Best practices.

Last modified: 2013-12-24

comments powered by Disqus