Previously we had talked about Acoustic Echo Cancellation (AEC), an important audio pre-processing module that effectively removes acoustic echoes and noise in real-time communications (RTC), significantly improving user experience. In this lesson, we are going to introduce Automatic Gain Control (AGC).
Volume Problems in Real-life Scenarios and Necessity of AGC
Volume problems seem to be not so “serious” as echoes and noise. Besides misoperations by the player, what other volume problems may come up in real-life scenarios? You probably want to adjust the volume in the following scenarios:
Scenario 1: The volume is too low to recognize. Sometimes we need to put our ears close to the loudspeaker, frowning, in order to capture the sound. This may be because the speaker is far away from the microphone while speaking or the input volume of the microphone is low.
Scenario 2: The volume is too high, sounding like a bomb going off. We have to stay away from the loudspeaker. This may be because the speaker is too close to the microphone while speaking, or the speaker has a powerful voice.
Scenario 3: The volume goes up and down randomly. Unstable volume can be quite a torture to the listener.
In addition to speakers adjusting the microphone distance and their volume of voice, a common solution to these problems is to adjust the microphone gain, the volume slider of the player software, or the loudspeaker gain.
In fact, these manual actions can produce the desired result immediately. So we may come to the conclusion that volume problems are no big deal because we can solve them with our hands. Why do we need Automatic Gain Control anyway?
First, we need to understand that these reactive actions do have some effect but are not convenient enough or do not fit all. In real life, the situation is complex and changes all the time: The original volume varies from one speaker to another, the acoustic attenuation varies with the microphone distance, and the microphone gain varies from one model to another.
These differences make it hard for manual adjustment to adapt to the changing environment for audio capture, especially when multiple users share the same microphone. For users not familiar with the device or system, figuring out how to properly adjust the input gain of an audio input device seems like an “impossible task”. Do you know how to adjust the microphone gain on your computer? Therefore, manual adjustment, if frequently performed as the only solution, can be a negative factor for user experience and downgrade the product design.
For that reason, an intelligent volume adjustment mechanism is necessary.
AGC, as a solution, automatically adjusts the “gain compensation” of the volume for the microphone. Simply put, AGC automatically reduces the gain if the input volume is too high and boosts the gain if it is too low to maintain the volume at a relatively constant level. Users can avoid volume fluctuations without frequently operating the device and fully engage in audio or video interactions in RTC scenarios.
So far, you have learned the volume problems you may encounter from time to time and the advantages of AGC. I believe you have figured out why AGC is reasonable and necessary. It’s time to dive deeper into the technology behind it. To learn more about AGC, we can break this concept down into the following three topics:
- What is volume?
- The essence of volume gain
- AGC strategies
Let me walk you through these topics one at a time.
What is Volume?
In my previous articles about ANS and AEC modules, I first made clear what these modules are designed for, namely noise and echo, because it’s best to know what we are up against. This article about AGC is no exception. So, first things first, what is volume?
In fact, we had come across the concept of volume in “Lesson 1: Elements of Audio”, except that loudness, the alias of volume, was used.
You probably remember how we define loudness: When we speak of a sound as “loud” or “weak”, we are describing our perception of its loudness. Loudness is determined by the amplitude of the vibration. When the distance traveled by sound is the same, the higher the amplitude, the louder the sound. Conversely, when the amplitude is fixed, as the sound travels, its loudness decreases. This is the reason why we can’t hear a sound when it’s too far away. “Volume” and “loudness” measure the same property of sound and describe our perception of “loud” or “weak” sound. Perception is a psychological term that cannot be quantified. Amplitude is a physical quantity that ranges from -32,768 to 32,767 when the audio bit depth is 16 bit and is not easy to detect and calculate. For details about bit depth and amplitude, see Capturing and Quantization of Sound in Lesson 1: Elements of Audio.
To simplify the representation of volume, other units of measurement, such as Sound Pressure Level (SPL) and Decibels Full Scale (DFS), are introduced, both expressed in decibels (dB).
dB is a logarithmic unit that represents the ratio of two values of the same physical quantity. It is calculated based on a reference quantity and may vary as the reference quantity changes.
- Sound Pressure Level (SPL) is expressed in dB SPL and uses a reference sound pressure of 20 μPa. Sound pressure refers to the change in atmospheric pressure when sound waves travel through the air. 20 μPa is the minimum perceivable pressure to a person with normal hearing at a frequency of 1 kHz, roughly the sound of a flying mosquito 3 meters away). We define the volume at 20 μPa as 0 dB SPL. The louder the volume, the larger the decibels (dB SPL). The volume of a normal conversation is about 40–60 dB SPL. Sounds above 90 dB SPL can lead to hearing damage, and sounds above 190 dB SPL can be life-threatening. The common noise level classification is based on dB SBL.
- Decibels Full Scale (DFS) is expressed in dB FS and uses the amplitude of the audio sampling point as the reference quantity. Unlike SPL, DFS uses the maximum value as the reference quantity. For instance, where the maximum amplitude of the audio sample is 32,768 at a bit depth of 16 bits, the volume is the loudest. We use 32,768 as the reference quantity, which corresponds to 0 dB FS, the maximum volume measured in dB FS. Other values are all negative, and the minimum is -96 dB FS at the bit depth of 16 bits. Digital devices and digital audio processing use DFS as the unit of volume. So it is with AGC.
From the description above, I think you’ve got a basic knowledge of volume. If interested, you can learn more about the logarithm calculation with different volume standards to get a deeper understanding of the concept of “decibel”.
Now, let’s move on to the second topic: the essence of volume gain.
The Essence of Volume Gain
Gain refers to an increase to a certain level, so volume gain simply means increasing the volume to a certain level.
As mentioned above, the amplitude is the physical quantity that decides the level of the volume. When we adjust the amplitude, we adjust the volume. Therefore, to increase the volume to a certain level, we can increase the amplitude of the audio sampling point. In other words, we can increase the volume by multiplying the amplitude of the audio sampling point by a certain factor. If the factor is less than 1, the amplitude decreases. Otherwise, the amplitude increases.
It’s important to note that an increase in amplitude is not proportional to our perception of the sound, and they are not linearly related. In other words, when the amplitude increases by 100%, our perception of the sound may not be enhanced by 100% accordingly. Such a nonlinear ratio between two physical quantities can be expressed in decibels.
At the bit depth of 16 bits, we can use the following formula to calculate the volume gain from amplitude A1 to amplitude A2: 20 × log10 (A1/A2) (dB).
- If the amplitude at the sampling point is doubled (A1/A2 = 2), the gain of the sound is 20 × log10(2) ≈ 6 dB.
- If the amplitude does not change (A1/A2 = 1), the gain of the sound is 20 × log10(1) = 0 dB.
- If the amplitude decreases by one-half (A1/A2 = 1/2), the gain of the sound is 20 × log10(1/2) ≈ -6 dB.
A positive gain means to increase the volume, while a negative gain means to decrease the volume. A gain of 0 dB means that the volume does not change. Note that both the level of volume and volume gain are 0 dB. When the gain is calculated in dB, it is associated with the volume level in dB DFS.
ZEGOCLOUD’s SDKs support API-based active adjustment of input/output volume. To make the settings more friendly to users, you need to set the volume value ranging from 0 to 200. If you set the volume value in the range [0,100], which corresponds to the gain from -40 dB to 0 dB, the volume decreases or does not change; if you set the volume value in the range (100,200], which corresponds to the gain from 0 dB to 12 dB, the volume increases. Given the above formula, the volume can be increased by four times at most.
As we have learned the essence of volume gain, let’s take a closer look at the following two common methods for volume gain adjustment: analog gain and digital gain adjustment. Details about how the two methods work are not a key point of this lesson. Let’s take a quick look at the following information:
- Analog gain adjustment: This method adjusts the waveform amplitude of continuous analog signals through API to control the input gain of the device and the input volume of the system.
- Digital gain adjustment: This method adjusts the amplitude of sampling points of discrete digital signals. It does not rely on API and does not adjust the input volume of the system.
Most Windows and Mac devices support analog gain adjustment and control the volume based on analog gain and digital gain. Windows devices come with various sound cards, some of which have very low input volume. Therefore, adjustment relying only on the digital gain may result in low precision and poor sound quality. Mobile devices (iOS and Android) and Linux devices do not have APIs to adjust the input volume and can only use digital gain adjustment.
Of course, these two methods are not enough for complex and changing scenarios. The AGC algorithm flexibly selects either or both of the two methods according to the characters of different platforms as needed. It supports more processing strategies based on the adjustment methods. This leads us to the third topic of the lesson today: AGC strategies
Before we start, we need to make clear the sound signal to be processed by an AGC module.
Obviously, volume gain control is not applicable to all sound signals. Like ANS and AEC modules, which suppress noise and far-end echoes and retain near-end audio, an AGC module also needs to recognize near-end audio in the input signals to prevent enhancing the noise or echoes. So, it’s a good idea to put AGC after AEC and ANS, where noise and echoes in the signal can be largely removed.
However, a favorable position does not mean that AGC can achieve its goal without errors. Voice Activity Detection (VAD) is often introduced to distinguish between speech and non-speech segments to prevent errors. Generally, the accuracy of the VAD algorithm decreases as the signal-to-noise ratio (SNR) goes down. To solve this problem, ZEGOCLOUD’s SDKs provide the harmonic detection feature to help recognize human voices based on harmonic characteristics. It improves the performance of AGC processing.
Voice Activity Detection
Now, we are clear about the signal to be processed by AGC. Let’s move on to the details of AGC strategies.
Strategies are developed based on clear targets and limits to ensure correct and well-organized implementation. By referring to the classical WebRTC-AGC algorithm, AGC strategies are developed based on the following targets and limits.
- Target volume: indicates the target of volume adjustment, measured in dB DFS. For instance, the target output volume can be set to -3 dB;
- Gain capacity: indicates the maximum gain of volume adjustment, measured in dB. For instance, if the maximum gain is set to 12 dB, the amplitude is increased by four times at most.
- Compressor/Limiter switch: indicates whether compressor/limiter logic is enabled to suppress the volume that exceeds the target.
The following AGC strategies are common (see the classical WebRTC-AGC algorithm):
1 Fixed digital gain
Fixed digital gain is the most fundamental strategy. At its core, this strategy allows the audio volume to be increased with a fixed digital gain without exceeding the specified gain capacity, and if the compressor/limiter logic is enabled, the adjusted volume stays below the target.
Though relatively simple, the fixed digital gain has obvious disadvantages. With a fixed gain and a lack of feedback adjustment mechanism, this strategy can slightly improve the constantly low volume signals. As to the constantly high volume signals, a lack of proper gain capacity may enhance the noise over the voice. This is because the voice volume cannot exceed the target volume, and noise generally stays below the criteria of compressor/limiter logic. As a result, the noise becomes relatively louder. This problem can be worse if the ANS module fails.
To solve this problem, we can improve the performance of the ANS module and, given that the disadvantages are because of the fixed gain, introduce some dynamic adjustment mechanisms.
2 Adaptive analog gain
Based on fixed digital gain, this strategy uses a feedback mechanism to enhance the control over the analog gain of the device.
Based on the evaluation of the adjustment effect of previous modules, this strategy analyzes whether the current analog gain is reasonable and calculates the required gain. By calling the system API, it sets the “required analog gain” for the device and adjusts the original input volume for subsequent processing.
This strategy offers an obvious advantage: It combines fixed digital gain with dynamic analog gain and uses a feedback mechanism to improve flexibility and equalization performance. However, as system APIs are required to adjust the analog gain, frequent adjustment of complex signals will result in performance impact. In addition, this strategy is not applicable to mobile devices without a system API and has many limitations.
3 Adaptive digital gain
The strategy of adaptive digital gain is developed to overcome some drawbacks of the adaptive analog gain and draw on the strengths of the feedback mechanism.
This strategy is modeled after analog gain. It makes use of the feedback mechanism and continuously improves the gain by dynamically adjusting digital gain parameters based on the processing effects of the previous modules. It adjusts the digital gain only and does not rely on a system API, offering wider applications, including mobile devices.
However, this strategy has its disadvantages: it lacks sensitivity, and its gain adjustment is not fast enough. When the volume keeps going up and down frequently, it may misuse large and small gains, making a high volume higher and a low volume lower.
From the commonly used AGC strategies and their advantages and disadvantages, we can see that no strategy is perfect. We need to make the best of the three strategies, combine them properly, and continue to improve them. Obviously, there is still a long way to go before we can come up with a good design for the AGC algorithm.
Now we have already grasped the concepts and key points of AGC. When we face volume problems again in real-life, we should not just “drag the volume slider” but also consider making use of the AGC module. If AGC fails, we can track down the reasons and figure out feasible solutions to understand better.
So far, that’s all for ANS, AEC, and AGC (3A) audio processing technologies. We are committed to getting “pleasanter-to-hear” sounds through these technologies. In current RTC scenarios, 3A processing has become an indispensable part. We cannot imagine back in the days full of echoes, noises, and volume ups and downs. Without a good user experience, content is meaningless, no matter how great it is.
In the future, we will face more technical challenges in more RTC scenarios. It’s reasonable to believe that with the evolution of technology applications, better and smarter 3A algorithms will certainly emerge in the industry and continue to ensure a smooth user experience. Let’s just wait and see.