Scenario-based AI Noise Reduction

2024-07-27

Scenario-based AI noise reduction refers to real-time automatic identification of different scenarios, intelligently adjusting AI noise reduction strategies, and providing the best noise reduction and sound quality effects. Currently, two common noise reduction scenarios are supported:

In call scenarios, all sounds except human voice are identified as noise and eliminated. On the basis of eliminating steady-state noise (for details, please refer to Audio 3A Processing), it effectively eliminates non-steady-state noise and achieves high-fidelity human voice. Main noises include non-human voice noises such as mouse, keyboard, tapping, air conditioning, kitchen dishes, noisy restaurants, environmental wind, coughing, blowing, etc., as well as human voice reverb in small rooms.
In music scenarios, automatically adjust noise reduction effect to restore music sound quality. Real-time music detection on mic input. In sound card, singing with accompaniment, or near-field music scenarios, automatically adjust noise reduction level to ensure high-fidelity music sound quality.

Warning

Before using the AI noise reduction feature, please contact ZEGOCLOUD Technical Support for special packaging.
Starting from version 3.0.0, ZEGO Express SDK supports intelligent recognition of music scenarios. In music scenarios, AI noise reduction can automatically reduce the noise reduction level to improve sound quality experience. To use this feature, please contact ZEGOCLOUD Technical Support for special packaging and configuration.

Feature Advantages

Can eliminate 80% of noise.
Low latency.
Low memory usage, basically the same as traditional noise reduction.
Low CPU usage rate.
Music scenario recognition accuracy rate reaches 99%.

Usage Scenarios

This feature is suitable for 1v1 or multi-person audio/video call scenarios such as voice rooms, meetings, voice chat for gaming, as well as live streaming or online KTV scenarios such as sound cards, singing with accompaniment, and near-field music.

Warning

Music scenario recognition requires turning on the music detection switch. Please contact ZEGOCLOUD Technical Support to configure and enable the music detection feature.

Removable Noise

Developers can use this feature to eliminate the following noises:

Scenario	Some Typical Noises
Meeting Room	Keyboard sound Table tapping sound
Office	Keyboard sound Surrounding colleagues' voices
Vehicle	Whistle sound Car whooshing sound Car music sound Rain and wiper sound
Internet Cafe	Keyboard sound Surrounding people's voices
Coffee Shop	Chair dragging sound Surrounding people's voices Sharp collision sound

Effect Demo

Office

Original audio contains: mouse click sound, keyboard sound, applause sound, friction sound, office noise, air conditioning sound, etc.

After AI noise reduction:

Public Place

Original audio contains: rain sound, tram sound, cooking sound, car whooshing sound, etc.

After AI noise reduction:

Music Scenario

Original audio:

Conventional AI noise reduction: eliminates noise, but causes significant music damage.

After scenario-based AI noise reduction: eliminates noise, music quality fidelity preserved.

Prerequisites

Before implementing the scenario-based AI noise reduction feature, ensure that:

A project has been created in the ZEGOCLOUD Console, and a valid AppID and AppSign have been obtained. For details, please refer to Console - Project Information.
ZEGO Express SDK has been integrated into the project, and basic audio/video stream publishing and playing functionality has been implemented. For details, please refer to Quick Start - Integration and Quick Start - Implementation.

Implementation Steps

Developers can follow the steps below to complete the AI noise reduction related settings:

Please contact ZEGOCLOUD Technical Support to configure and enable the music detection feature. If it has been enabled, please ignore this step.
For the specific process of initialization and logging in to the room, please refer to "Create Engine" and "Login Room" in the implementing video call documentation.
Call the enableANS interface to enable noise suppression. After this feature is enabled, the human voice will be clearer.

After enabling noise suppression, developers can call the setANSMode interface to set the ANS mode and enable the AI noise reduction feature. The following shows some AI noise reduction modes. For more modes, please refer to ZegoANSMode.

AI Noise Reduction Mode	Applicable Scenarios
ZEGO_ANS_MODE_AI	Lightweight mode, still has good noise reduction effect under extremely low power consumption and package size increment, suitable for indoor noise environments and relatively comfortable domestic regions.
ZEGO_ANS_MODE_AI_BALANCED	Balanced mode, comprehensively eliminates noise while maintaining lossless human voice, but power consumption slightly increases. Suitable for complex call environments, such as outdoor busy markets, transportation, etc., as well as regions with serious noise interference.
ZEGO_ANS_MODE_AI_LOW_LATENCY	Low latency mode, still maintains pure noise reduction effect and high-fidelity human voice quality under 10ms latency, suitable for scenarios sensitive to latency such as game voice, game chat, real-time singing, etc.

// Enable ANS
engine->enableANS(true);
// Set AI noise reduction mode according to requirements. Note: After setting ANS mode to ZEGO_ANS_MODE, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
engine->setANSMode(ZEGO_ANS_MODE_AI);

// Enable ANS
engine->enableANS(true);
// Set AI noise reduction mode according to requirements. Note: After setting ANS mode to ZEGO_ANS_MODE, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
engine->setANSMode(ZEGO_ANS_MODE_AI);