Scenario-Based AI Noise Reduction

2024-07-27

Scenario-based AI noise reduction refers to real-time automatic identification of different scenarios, intelligently adjusting AI noise reduction strategies to provide the best noise reduction and audio quality effects. Currently, two common noise reduction scenarios are supported:

In call scenarios, all sounds other than human voice are identified as noise and eliminated. On the basis of eliminating steady-state noise, it effectively eliminates non-steady-state noise and achieves human voice high fidelity. Main noises include mouse, keyboard, tapping, air conditioning, kitchen dishes, noisy restaurants, environmental wind, coughing, blowing, and other non-human voice noises, as well as human voice reverberation in small rooms.
In music scenarios, automatically adjust noise reduction effects to restore music audio quality. Real-time music detection on mic input. In sound card, singing accompaniment, or near-field music scenarios, automatically adjust noise reduction levels to ensure high-fidelity music audio quality.

Warning

Before using the AI noise reduction feature, please contact ZEGO technical support for special packaging.
Starting from version 3.0.0, ZEGO Express SDK supports intelligent recognition of music scenarios. In music scenarios, AI noise reduction can automatically reduce the noise reduction level to improve audio quality experience. To use this feature, please contact ZEGO technical support for special packaging and configuration.

Functional Advantages

Can eliminate 80% of noise.
Low latency.
Low memory usage, basically the same as traditional noise reduction.
Low CPU usage.
Music scenario recognition accuracy rate reaches 99%.

Usage Scenarios

This feature is suitable for voice rooms, conferences, voice chat for gaming and other 1v1 or multi-person audio/video call scenarios, as well as live streaming or online KTV scenarios for sound cards, singing accompaniment, and near-field music.

Warning

Music scenario recognition requires turning on the music detection switch.

Eliminable Noise

Developers can use this feature to eliminate the following noise:

Scenario	Some Typical Noises
Meeting Room	Keyboard sounds Table tapping sounds
Office	Keyboard sounds Surrounding colleagues' talking sounds
Vehicle	Whistle sounds Car passing whooshing sounds In-car music sounds Rain sounds and windshield wiper sounds
Internet Cafe	Keyboard sounds Surrounding people's voice sounds
Coffee Shop	Chair dragging sounds Surrounding people's talking sounds Sharp collision sounds

Effect Demonstration

Office

The original audio includes: mouse click sounds, keyboard sounds, clapping sounds, friction sounds, office noise, air conditioning sounds, etc.

After AI noise reduction:

Public Place

The original audio includes: rain sounds, tram sounds, cooking sounds, car whooshing sounds, etc.

After AI noise reduction:

Music Scenario

Original audio:

Conventional AI noise reduction: Eliminates noise, but causes significant damage to music.

After scenario-based AI noise reduction: Eliminates noise, music quality fidelity is preserved.

Prerequisites

Before implementing scenario-based AI noise reduction functionality, please ensure:

You have created a project in the ZEGOCLOUD Console and applied for a valid AppID and AppSign. For details, please refer to Console - Project Information.
You have integrated ZEGO Express SDK in the project and implemented basic audio/video publishing and playing functions. For details, please refer to Quick Start - Integration and Quick Start - Implementation Flow.

Usage Steps

Developers can complete AI noise reduction related settings according to the following steps:

Please contact ZEGO technical support to configure and enable the music detection feature. If already enabled, please ignore this step.
For the specific process of initialization and logging in to the room, please refer to "Create Engine" and "Login Room" in the implementation video call documentation.
Call the enableANS interface to enable noise suppression. After this feature is enabled, human voice can be clearer.

After enabling noise suppression, developers can set the ANS mode and enable the AI noise reduction feature by calling the setANSMode interface. The following shows some AI noise reduction modes. For more modes, please refer to ZegoANSMode.

AI Noise Reduction Mode	Applicable Scenarios
ZEGO_ANS_MODE_AI	Lightweight mode, still has good noise reduction effects under extremely low power consumption and package size increment, suitable for indoor noise environments and relatively comfortable domestic regions.
ZEGO_ANS_MODE_AI_BALANCED	Balanced mode, comprehensively eliminates noise while lossless human voice, but power consumption slightly increases. Suitable for complex call environments, such as outdoor busy markets, transportation, and other environments as well as regions with serious noise interference.
ZEGO_ANS_MODE_AI_LOW_LATENCY	Low latency mode, still maintains pure noise reduction effects and high-fidelity human voice audio quality under 10ms latency, suitable for game voice, game chat, real-time singing, and other scenarios sensitive to latency.

// Enable ANS
engine->enableANS(true);
// Set AI noise reduction mode according to needs, Note: After setting ANS mode to ZEGO_ANS_MODE, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
engine->setANSMode(ZEGO_ANS_MODE_AI);

// Enable ANS
engine->enableANS(true);
// Set AI noise reduction mode according to needs, Note: After setting ANS mode to ZEGO_ANS_MODE, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
engine->setANSMode(ZEGO_ANS_MODE_AI);