We are pleased to announce the launch of the Express 3.0 SDK, which features scenario-based AI noise reduction. Through intelligent recognition in the voice call scenario and adjustment of noise reduction policies, voice call noises can be minimized to ensure high-fidelity audio quality. Examples:
- Voice call scenario: Retain human voices and remove mouse/keyboard noises.
- Live streaming scenario: Retain human voices and sounds of musical instruments and remove environmental noises generated by air conditioners, wind, and vehicles.
Remove all noises except human voices in the communication scenario
For the communication scenario, ZEGOCLOUD has developed ZegoAIDenoise, a lightweight noise reduction solution based on neural network. ZegoAIDenoise combines the traditional algorithm with deep learning. To reduce the performance overhead, frequency domains are divided into sub-bands and the deep learning network model is minimized to ensure a better noise reduction effect with a minimal network model.
In performance testing of real-time processing, the default sampling rate is 32 kHz and the frame length is 10 ms. On an iPhone 6 with a clock rate of 1.4 GHz, the CPU overhead is about 1%, which is the highest optimization efficiency in the industry. ZegoAIDenoise has made substantial progress in the noise reduction effect, generalization ability, and performance overhead.
Applicable sound types: mouse/keyboard, knock, air conditioner, tableware, noisy restaurants, environmental wind, cough, and blowing, as well as noises mixed with human voices in small rooms.
Maintain high-fidelity audio quality in the music scenario
In the music scenario, the noise reduction algorithm tends to identify music as noise and performs noise reduction, which greatly damages the music and prejudices the user experience. ZEGOCLOUD’s scenario-based AI noise reduction solution provides special optimization for noise reduction in the music scenario.
To improve the music recognition accuracy, ZEGOCLOUD collects over 10,000 audio data records for tens of music styles (such as light music, classic music, and popular music) and musical instruments (such as guitar, piano, and violin), performs data augmentation to enhance model generalization, and extracts and trains features.
To minimize the misrecognition of music, ZEGOCLOUD collects abundant noises and human voices for comparative training and recognizes music of different signal-to-noise ratios (SNRs) to ensure the optimal sound quality. As a result, ZEGOCLOUD achieves a 99% recognition rate. In addition, the solution features zero delay for audio processing without extra performance overhead.
ZEGOCLOUD’s scenario-based AI noise reduction solution utilizes the self-developed music detection algorithm to recognize music in microphone input and automatically adjusts the noise reduction level in scenarios, such as sound cards, near-field playing and singing, and music playing, to ensure high-fidelity audio quality.
Applicable sound types: accompaniment, sound of musical instruments, and other music elements.
The video shows that music/accompaniment is identified as noise during noise reduction. ZEGOCLOUD’s scenario-based AI noise reduction solution can intelligently recognize music and adjust the processing policies to retain the music elements while reducing noise.
A quick guide to ZEGOCLOUD’s scenario-based AI noise reduction solution
In the following application scenarios,
- Communication scenario: live streaming, voice chatroom, online meeting, and audio and video communication
- Music scenario: Online Karaoke and social live streaming
ZEGOCLOUD’s scenario-based AI noise reduction solution can optimize the audio quality for a better user experience.
Currently, the solution supports iOS, Android, macOS, and Windows devices.
The capability can be embedded with just two lines of code.
// Enable ANS engine.enableANS(true); // Set the ANS mode to ZegoANSMode.AI engine.setANSMode(ZegoANSMode.AI);
For information about the integration and usage processes, see our development documents:
We have also added the following features in this update:
- Split and transmit subjects in video pictures
- Add scenarios in the room dimension
- Add the API for obtaining the GPS information switch
- Add the first video frame callback after the camera is turned on
For more information, see Change Log.