Noise is everywhere in our lives. However, Noise Suppression is necessary for any real-time voice system when the noise level exceeds a certain threshold.
In real-time communications, there are two types of noises: steady-state and transient.
- Steady-state noises are environment sounds that are continuous and have negligible fluctuations within the period of observation. Typical examples include sounds generated by fans, air conditioners, vacuum cleaners, hairdryers, lawnmowers, etc.
- Transient noises are impulsive, short-duration sounds, such as the sounds of keyboard strokes, mouse clicks, and barking dogs.
Noise Suppression technology
Automatic Noise Suppression
Early in 2015, ZEGOCLOUD developed its proprietary audio preprocessing modules, including automatic noise suppression (ANS), acoustic echo cancellation (AEC), and automatic gain control (AGC), to ensure voice quality and fidelity. The ANS algorithms were implemented based on traditional noise reduction methods (i.e., wiener filter and spectral subtraction). Traditional ANS algorithms effectively filter steady-state but are somewhat inadequate in complex ambient environments with many transient noises. With the development of AI technologies, things changed for the better.
AI-powered Noise Suppression
AI-powered Noise Suppression technologies have shown significant advantages in suppressing transient noises.
When designing the algorithms, there is often a trade-off between Noise Suppression performance, processing speed, and system resource usage. Algorithms that can handle more noise with a better result tend to be more computationally intensive, which means higher system resource consumption. If the end-user device is not powerful enough, the process will take longer, causing delays or reducing the final Noise Suppression result.
Challenges to AI-powered technology
The biggest challenge is balancing computational workload and noise reduction capability to achieve the best results. Furthermore, how to train the AI model with limited data sets for more scenarios and make it extensible for new scenarios are also tricky problems to solve.
With all these considerations in mind, ZEGOCLOUD has developed its own AI-powered Noise Suppression solution based on deep learning algorithms. Let’s dive into more details about this solution.
ZEGOCLOUD’s AI-powered ANS solution
This solution addresses transient noises in real-time voice communications and music-focused scenarios. It is part of the built-in component of ZEGOCLOUD’s voice and video solutions, such as voice chat room solutions, live streaming solutions, online classroom solutions, and video call solutions.
ZEGOCLOUD’s AI-powered ANS solution can reduce up to 80% of transient noises in voice conversations, resulting in crystal-clear speech. ZEGOCLOUD has trained the algorithm with different data sets to make it capable of suppressing a wide range of noises, such as mouse clicks, keyboard strokes, desk tapping, humming air conditioners, clinking dishes, noisy restaurants, ambient wind sounds, cough sounds, and other non-human noises.
Traditional ANS algorithms vs. AI-based ANS algorithms
Compared to the AI-based ANS algorithm, the traditional ANS algorithm works very well in both human voice and music scenarios. However, it only fits for suppressing steady-state noises. The upside is that it doesn’t need training with labeled voice data and can deliver very predictable performance. The downside is that it only fits a limited number of scenarios. Unlike an AI-based algorithm, you cannot train a traditional algorithm to handle new scenarios outside the scope of its original design.
The AI-based ANS algorithm can suppress 80% of the noises in voice scenarios, delivering very competitive performance in suppressing transient noises. At the same time, the system resource footprint is slightly higher than that of the traditional ANS algorithm.
Although the two algorithms look very different, their fundamental mechanism at the bottom layer is based on similar acoustic filters. An AI-based algorithm differs from a traditional one because the former includes an AI machine-learning model to drive the underlying acoustic filters.
Benefits of AI-based ANS algorithms
An AI-based ANS algorithm has the following benefits:
1) It has outstanding performance in suppressing transient noises. These noises fade effectively, making online voice communications more effective and enjoyable.
2) It constantly trains, suppressing noises in new scenarios. To train the AI-based ANS algorithm for a new noise scenario, one must label the noise and voice data first and then train the model/algorithm with the labeled data. The underlying acoustic filters have a set of adjustable parameters to tune the performance of the filters. During the training process, filter parameters gradually adjust to make the filters produce results that are the same or close to the target performance baseline. The more scenario-specific data the AI-based algorithm is trained on, the more effective it will be in various noise scenarios.
The AI-based ANS algorithm and the traditional ANS algorithm both have advantages. Can we combine them together to create synergy? The answer is yes. ZEGOCLOUD is developing an AI-based feature that can determine whether the audio consists primarily of voice or music. If it is the former, then the AI-based ANS algorithm suppresses noises; if it is the latter, then the traditional ANS algorithm does it. The development of this AI-based feature is underway, and it will be released soon.
ZEGOCLOUD is dedicated to enabling real-time engagement for every online communications scenario. With ZEGOCLOUD’s traditional ANS algorithm and AI-based ANS algorithm working together, steady-state and transient noises can be handled gracefully in almost all real-time communications scenarios, including voice and video calls, live-streaming, online classrooms, and more. Whether for voice- or music-focused scenarios, ZEGOCLOUD helps you deliver a superior real-time audio experience to your users.
Talk to Expert
Learn more about our solutions and get your question answered.
Take your apps to the next level with our voice, video and chat APIs
- 10,000 minutes for free
- 4,000+ corporate clients
- 3 Billion daily call minutes