Noise is everywhere in our lives. However, once the noise level is higher than a certain threshold, it will start to degrade our listening experience and cause us problems in understanding a speech. Therefore, automatic noise suppression is necessary for any real-time voice system to deliver a great user experience.
In real-time communications, we normally need to handle two types of noises: steady-state noises and transient noises.
· Steady-state noises are ambient sounds that are continuous (last for more than one second) and have negligible fluctuations within the period of observation. Typical examples include sounds generated from fans, air conditioners, vacuum cleaners, hairdryers, lawnmowers, etc.
· Transient noises are impulsive, short-duration sounds, such as the sounds of keyboard strokes, mouse clicks, and barking dogs.
Early in 2015, ZEGOCLOUD developed its proprietary audio preprocessing modules, including automatic noise suppression (ANS), acoustic echo cancellation (AEC), and automatic gain control (AGC), to ensure the voice quality and fidelity. The ANS algorithms were implemented based on traditional noise reduction methods (i.e., wiener filter and spectral subtraction methods). Traditional ANS algorithms are effective in filtering out steady-state but are somewhat inadequate in complex ambient environments where there are a lot of transient noises. With the development of AI technologies, effectively suppressing transient noises has become possible.
With the development of AI technologies, AI-powered noise suppression technologies have shown great advantages in suppressing transient noises.
To achieve an ideal noise suppression result in real-time communications products used by everyday users, there is often a trade-off between noise suppression performance, processing speed, and system resource usage when designing the algorithms. Algorithms that can handle more noises with a better result tend to be more computationally intensive, which means higher system resource consumption. If the end-user device is not powerful enough, the process will take longer, causing delays or reducing the final noise suppression result. So, how to balance computational workload and noise reduction capability to achieve the best possible results is the biggest challenge.
In addition, how to train the AI model with limited data sets for more scenarios and make it extensible for new scenarios are also difficult problems to solve.
With all these considerations in mind, ZEGOCLOUD has developed its own AI-powered noise suppression solution based on deep learning algorithms. Let’s dive into more details about this solution.
ZEGOCLOUD’s AI-powered ANS solution is designed to address the challenge of reducing transient noises in real-time voice communications
, and music-focused scenarios are not its area of strength. It is included as a built-in component of ZEGOCLOUD’s voice and video solutions, such as voice chat room solution, live streaming solution, online classroom solution, and video call solution.
With its focus on transient noise suppression, ZEGOCLOUD’s AI-powered ANS solution can reduce up to 80% of transient noises in voice conversations, resulting in crystal-clear speech. ZEGOCLOUD has trained the algorithm with different data sets to make it capable of suppressing a wide range of noises, such as mouse clicks, keyboard strokes, desk tapping, humming air conditioner, clinking dishes, noisy restaurants, ambient wind sounds, cough sounds, and other non-human noises.
Traditional ANS algorithms vs AI-based ANS algorithms
With the release of the AI-powered ANS solution, ZEGOCLOUD now delivers an enhanced performance in suppressing different types of noises, including steady-state and transient noises, in various scenarios.
Compared to the AI-based ANS algorithm, the traditional ANS algorithm works very well in both human voice and music scenarios, however it only fits for suppressing steady-state noises. Since it is not based on AI technologies, the
good upside is that it doesn’t need to be trained with labeled voice data and can deliver very predictable performance, while the downside is that it only fits for a limited number of scenarios. Unlike an AI-based algorithm, you cannot train a traditional algorithm to handle new scenarios that are outside of the scope of its original design.
The AI-based ANS algorithm can suppress 80% of the noises in voice scenarios, delivering very competitive performance in suppressing transient noises, while the system resource footprint is slightly higher than that of the traditional ANS algorithm.
Although the two algorithms look very different, their fundamental mechanism at the bottom layer is based on a similar set of acoustic filters. What differentiate an AI-based algorithm from a traditional one is that the former is equipped with an AI machine learning model to drive the underlying acoustic filters.
Benefits of AI-based ANS algorithms
An AI-based ANS algorithm has the following benefits:
1) It has demonstrated outstanding performance in suppressing transient noises. This makes it a very desirable feature for real-time voice solutions. Previous, many common transient noises, such as keyboard strokes, mouse clicks, and desk tapping sounds, couldn’t be recognized as noise by traditional ANS algorithms and hence couldn’t be suppressed. When there are a lot of participants, such noises can become annoying and even ruin the user experience in various scenarios like voice/video conferencing or online learning. Now with the AI-based ANS algorithms, such noises can be removed effectively, making online voice communications more effective and enjoyable.
2) It can be trained to suppress noises in new scenarios. To train the AI-based ANS algorithm for a new noise scenario, we will label the noise and voice data first, and then train the model/algorithm with the labeled data. The underlying acoustic filters have a set of parameters that can be adjusted to tune the performance of the filters. We set a target performance baseline, and then train the model to get close to that baseline. Essentially, during the training process, filter parameters are adjusted gradually to make the filters produce results that are the same or close to the target performance baseline. The more scenario-specific data the AI-based algorithm is trained on, the more effective it will be in various noise scenarios.
The AI-based ANS algorithm and the traditional ANS algorithm both have their own advantages. Can we combine them together to create synergy? The answer is yes. ZEGOCLOUD developing an AI-based feature that can determine whether the audio consists primarily of voice or music. If it is the former, then the AI-based ANS algorithm will be used to suppress noises; if it is the latter, then the traditional ANS algorithm will be used. The development of this AI-based feature is underway, and it will be released soon.
ZEGOCLOUD is dedicated to enabling real-time engagement for every online communications scenario. With ZEGOCLOUD’s traditional ANS algorithm and AI-based ANS algorithm working together, both steady-state noises and transient noises can be handled gracefully in almost all real-time communications scenarios, including voice and video calls, live streaming, online classrooms, and more. Whether for voice- or music-focused scenarios, ZEGOCLOUD helps you deliver a superior real-time audio experience to your users.