Controlling voice conversion

The libasound library supports devices with up to 8 voices.

Configuration of the libasound library is based on the maximum number of voices supported in hardware. If the numbers of source and destination voices are different, then snd_pcm_plugin_params() instantiates a voice converter.

The default voice conversion behavior is as follows:

From To Conversion
Mono Stereo Replicate channel 1 (left) to channel 2 (right)
Stereo Mono Remove channel 2 (right)
Mono 4-channel Replicate channel 1 to all other channels
Stereo 4-channel Replicate channel 1 (front left) to channel 3 (rear left), and channel 2 (front right) to channel 4 (rear right)
Previous versions of libasound converted stereo to mono by averaging the left and right channels to generate the mono stream. Now by default, the right channel is simply dropped.

You can use the voice conversion API to configure the conversion behavior and place any source channel in any destination channel slot:

snd_pcm_plugin_get_voice_conversion()
Get the current voice conversion structure for a channel
snd_pcm_plugin_set_voice_conversion()
Set the current voice conversion structure for a channel

The actual conversion is controlled by the snd_pcm_voice_conversion_t structure, which is defined as follows:

typedef struct snd_pcm_voice_conversion
{       
   uint32_t     app_voices;
   uint32_t     hw_voices;
   uint32_t     matrix[32];
} snd_pcm_voice_conversion_t

The matrix member forms a 32-by-32-bit array that specifies how to convert the voices. The array is ranked with rows representing application voices, voice 0 first; the columns represent hardware voices, with the low voice being LSB-aligned and increasing right to left.

For example, consider a mono application stream directed to a 4-voice hardware device. A bit array of:

matrix[0] = 0x1;  //  00000001

causes the sound to be output on only the first hardware channel. A bit array of:

matrix[0] = 0x9;   // 00001001

causes the sound to appear on the first and last hardware channel.

Another example would be a stereo application stream to a 6 channel (5.1) output device. A bit array of:

matrix[0] = 0x1;  //  00000001
matrix[1] = 0x2;  //  00000010

causes the sound to appear on only the front two channels, while:

matrix[0] = 0x5;  //  00000101
matrix[1] = 0x2;  //  00000010

causes the stream signal to appear on the first four channels (likely the front and rear pairs, but not on the center or LFE channels). The bitmap used to describe the hardware (i.e. the columns) depends on the hardware, and you need to be mindful of the actual hardware you'll be running on to properly map the channels. For example:

  • If the hardware orders the channels such that the center channel is the third channel, then bit 2 represents the center.
  • If the hardware orders the channels such that the Rear Left is the third channel then bit 2 represents the Rear Left.
If the number of source voices matches the number of destination voices, the converter isn't invoked, so you won't be able to reroute the channels. If you're playing a stereo file on stereo hardware, you can't use the voice matrix to swap the channels because the voice converter isn't used in this case.

If you call snd_pcm_plugin_get_voice_conversion() or snd_pcm_plugin_set_voice_conversion() before the voice conversion plugin has been instantiated, the functions fail and return -ENOENT.