Scenario-based AI Noise Reduction

2024-07-27

Scenario-based AI Noise Reduction refers to the real-time automatic identification of different scenarios, intelligently adjusting AI noise reduction strategies to provide the best noise reduction and audio quality effects. Currently, two common noise reduction scenarios are supported:

In call scenarios, all sounds except human voice are identified as noise and eliminated. On top of eliminating steady-state noise (for details, please refer to Audio 3A Processing), it effectively eliminates non-steady-state noise and achieves high-fidelity human voice. Main noises include mouse clicks, keyboard typing, tapping, air conditioning, kitchen dishes, noisy restaurants, environmental wind, coughing, breathing, and other non-human voice noises, as well as human voice reverberation in small rooms.
In music scenarios, automatically adjust noise reduction effects to restore music audio quality. Real-time music detection on mic input. In sound card, singing accompaniment, or near-field music scenarios, automatically adjust noise reduction levels to ensure high-fidelity music audio quality.

Warning

Before using the AI noise reduction feature, please contact ZEGOCLOUD technical support for special packaging.
Starting from version 3.0.0, ZEGO Express SDK supports intelligent music scenario recognition. In music scenarios, AI noise reduction can automatically reduce the noise reduction level to improve audio quality experience. If you need to use this feature, please contact ZEGOCLOUD technical support for special packaging and configuration.

Feature Advantages

Can eliminate 80% of noise.
Low latency.
Low memory usage, basically consistent with traditional noise reduction.
Low CPU usage.
99% accuracy in music scenario recognition.

Usage Scenarios

This feature is suitable for 1v1 or multi-person audio/video call scenarios such as voice rooms, conferences, voice gaming sessions, as well as live streaming or online KTV scenarios with sound cards, singing accompaniment, or near-field music.

Warning

Music scenario recognition requires enabling the music detection switch. Please contact ZEGOCLOUD technical support to configure and enable the music detection feature.

Removable Noise

Developers can use this feature to eliminate the following noises:

Scenario	Some Typical Noises
Conference Room	Keyboard sounds Table tapping sounds
Office	Keyboard sounds Surrounding colleagues' voices
Vehicle	Whistle sounds Car whooshing sounds Car music sounds Rain and windshield wiper sounds
Internet Cafe	Keyboard sounds Surrounding people's voices
Coffee Shop	Chair dragging sounds Surrounding people's voices Sharp collision sounds

Effect Demonstration

Office

Original audio contains: mouse click sounds, keyboard sounds, clapping sounds, friction sounds, office noise, air conditioning sounds, etc.

After AI noise reduction:

Public Place

Original audio contains: rain sounds, tram sounds, cooking sounds, car whooshing sounds, etc.

After AI noise reduction:

Music Scenario

Original audio:

Conventional AI noise reduction: Eliminates noise, but significantly damages music.

After scenario-based AI noise reduction: Eliminates noise, music quality fidelity preserved.

Prerequisites

Before implementing AI noise reduction functionality, ensure:

You have created a project in the ZEGOCLOUD Console and applied for a valid AppID and AppSign. For details, please refer to ZEGOCLOUD Console - Project Information.
You have integrated ZEGO Express SDK into your project and implemented basic audio/video publishing and playing functionality. For details, please refer to Quick Start - Integration and Quick Start - Implementation.

Usage Steps

Developers can complete AI noise reduction related settings according to the following steps:

Please contact ZEGOCLOUD technical support to configure and enable the music detection feature. If already enabled, please ignore this step.
For the specific process of initialization and room login, please refer to "Create Engine" and "Login Room" in the implementing video call documentation.
Call the enableANS interface to enable noise suppression. After enabling this feature, human voice becomes clearer.

After enabling noise suppression, developers can call the setANSMode interface to set the ANS mode and enable AI noise reduction functionality. The following shows some AI noise reduction modes. For more modes, please refer to ZegoANSMode.

AI Noise Reduction Mode	Applicable Scenarios
ZegoANSMode.AI	Lightweight mode, maintaining excellent noise reduction effects with extremely low power consumption and package size increase. Suitable for indoor noise environments and relatively comfortable domestic regions.
ZegoANSMode.AIBalanced	Balanced mode, comprehensively eliminating noise while preserving human voice without loss, but with slightly increased power consumption. Suitable for complex call environments such as outdoor busy markets, transportation, and regions with severe noise interference.
ZegoANSMode.AILowLatency	Low latency mode, maintaining pure noise reduction effects and high-fidelity human voice quality at 10ms latency. Suitable for latency-sensitive scenarios such as gaming voice chat, gaming sessions, and real-time chorus.

// Enable ANS
ZegoExpressEngine.instance.enableANS(true);
// Set AI noise reduction mode according to requirements, Note: After setting ANS mode to ZegoANSMode.AI, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
ZegoExpressEngine.instance.setANSMode(ZegoANSMode.AI);

// Enable ANS
ZegoExpressEngine.instance.enableANS(true);
// Set AI noise reduction mode according to requirements, Note: After setting ANS mode to ZegoANSMode.AI, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
ZegoExpressEngine.instance.setANSMode(ZegoANSMode.AI);