Choosing the Right Audio Codec

Choosing a proper audio codec is obviously a critical decision to make.

In-game voice chat is vital for excellent communication and teamwork among players for most multiplayer online games. Players can easily have some chitchat or party up to win the game.

The solution must be tailor-made using a third-party real-time voice SDK to fit the purpose of having voice chat while playing games. Building in-game voice chat into a game involves many considerations, including audio quality, latency, system resource consumption, etc. Choosing a proper audio codec achieves the best results.

Audio Codec purpose

To build more innovative practices of “Language Chat + Scenes,” vendors like ZEGOCLOUD have launched standardized package SDKs for the language chat room scenes. The platform can realize the core functions of the language chat room by simply coding.

For in-game voice chat, the audio traffic is primarily human voice. In some cases, music may also need to be included. Let’s look at the human perception of sound when talking about audio quality. The human ear can nominally hear sounds from 20 Hz to 20,000 Hz. Within this range are four sound frequency bands: narrowband, wideband, super-wideband, and full bar.

Therefore, the narrowband sound quality can meet games’ real-time voice communication requirements. The excellent quality requirements are relatively high because of the combination of real-time voice and live broadcasts. The superb quality of the wideband can meet the needs of games and live broadcast scenes. Consequently, the bandwidth of the game voice is determined according to the budget cost of the game operator. Indeed, the bit rate is directly related to the bandwidth, and the bit rate is ultimately the cost.

Choosing the Right Audio Codec

This tool has big influence on the real-time voice solution of the game. The audio encoder’s type, attributes, and quality determine the bit rate, algorithm delay, bandwidth, and sound quality of the compiled audio stream. The encoder’s algorithm complexity determines CPU, memory, and power consumption.

Therefore, the audio codec suitable for the real-time voice solution of the game has the following four characteristics:

1) The bit rate is relatively low, meeting the requirements of controllable cost, generally not exceeding 16kbps. A sample can be compiled with 1 bit, then an 8kHz sampling rate (narrowband) corresponds to a code rate of 8kbps. A 16kHz sampling rate (wideband) corresponds to a code rate of 16kbps.

2) The delay time should be low enough to meet the interactive needs, generally not more than 300 milliseconds.

3) The algorithm complexity should be relatively low. Likewise, the system CPU, memory, and impact on the game system should be as low as possible.

The following figure shows the corresponding changes in the algorithm delay time as the bit rate changes. According to the analysis and the figure below, the algorithm delay time is less than 60 milliseconds. The speech codecs with a bit rate of less than 16kbps include Opus (SILK), Speex (NB, WB), G.729, And G.729.1.

Therefore, the real-time voice solution for games is to match the game application scenarios and technical methods. Only by thoroughly understanding the requirements of game application scenarios are we able to figure out how:

to choose voice codecs
to deploy media server resources
to configure CDN networks
to polish a set of real-time voice solutions that meet the requirements of game application scenarios.

Many solutions can use the language chat room in the game voice scene, such as casual chess and card games. The following section analyzes how to select the appropriate voice codec for the game scene mentioned.

The Right Audio Codec For In-game Voice Chat

Audio Codec purpose

Choosing the Right Audio Codec