Scenario-Based AI Noise Suppression

2024-07-27

Scenario-Based AI Noise Suppression refers to automatically identifying different scenarios in real-time and intelligently adjusting AI noise suppression strategies to provide the best noise suppression and sound quality effects. Currently supports two common noise suppression scenarios:

In call scenarios, all sounds other than human voice are identified as noise and eliminated. On the basis of eliminating steady-state noise (for details, please refer to Audio 3A Processing), it effectively eliminates non-steady-state noise and achieves high-fidelity human voice. Main noises include mouse, keyboard, tapping, air conditioner, kitchen dishes, noisy restaurants, environmental wind, coughing, blowing and other non-human voice noises, as well as human voice reverberation in small rooms.
In music scenarios, automatically adjust noise suppression effects to restore music sound quality. Real-time music detection on mic input. In sound card, singing accompaniment or near-field music scenarios, automatically adjust noise suppression levels to ensure high-fidelity music sound quality.

Warning

Before using the AI noise suppression feature, please contact ZEGO Technical Support for special packaging.
Starting from version 3.0.0, ZEGO Express SDK supports intelligent recognition of music scenarios. In music scenarios, AI noise suppression can automatically reduce noise suppression levels and improve sound quality experience. If you need to use this feature, please contact ZEGO Technical Support for special packaging and configuration.

Feature Advantages

Can eliminate 80% of noise.
Low latency.
Low memory usage, basically the same as traditional noise suppression.
Low CPU usage.
Music scenario recognition accuracy reaches 99%.

Usage Scenarios

This feature is suitable for 1v1 or multi-person audio and video call scenarios such as voice rooms, meetings, voice gaming, as well as live streaming or online KTV scenarios such as sound cards, singing accompaniment, and near-field music.

Warning

Music scenario recognition requires turning on the music detection switch. Please contact ZEGO Technical Support to configure and enable the music detection feature.

Removable Noises

Developers can use this feature to eliminate the following noises:

Scenario	Some Typical Noises
Meeting Room	Keyboard sound Table tapping sound
Office	Keyboard sound Surrounding colleagues speaking sound
Vehicle	Car whistle sound Car passing whistling sound Car music sound Rain sound and wiper sound
Internet Cafe	Keyboard sound Surrounding people voice sound
Coffee Shop	Chair dragging sound Surrounding people speaking sound Sharp collision sound

Effect Showcase

Office

Original audio includes: mouse clicking sound, keyboard sound, clapping sound, friction sound, office noise, air conditioner sound, etc.

After AI noise suppression:

Public Place

Original audio includes: rain sound, tram sound, cooking sound, car whistling sound, etc.

After AI noise suppression:

Music Scenario

Original audio:

Conventional AI noise suppression: Eliminates noise, but greatly damages music.

After scenario-based AI noise suppression: Eliminates noise, music quality fidelity.

Prerequisites

Before implementing the AI noise suppression feature, please ensure:

You have created a project in the ZEGOCLOUD Console and applied for a valid AppID and AppSign. For details, please refer to Console - Project Information.
You have integrated ZEGO Express SDK in your project and implemented basic audio and video streaming functionality. For details, please refer to Quick Start - Integration and Quick Start - Implementation.

Usage Steps

Developers can complete AI noise suppression related settings according to the following steps:

Please contact ZEGO Technical Support to configure and enable the music detection feature. If it is already enabled, please ignore this step.
For the specific process of initialization and logging in to Room, please refer to "Create Engine" and "Login to Room" in the "Implementing Video Call" document.
Call the enableANS interface to enable noise suppression. After enabling this feature, the human voice can be clearer.

After enabling noise suppression, developers can call the setANSMode interface to set the ANS mode and enable the AI noise suppression feature. The following shows some AI noise suppression modes. For more modes, please refer to ZegoANSMode.

AI Noise Suppression Mode	Applicable Scenarios
ZegoANSMode.AI	Lightweight mode, still has good noise suppression effects under extremely low power consumption and package size increment, suitable for environments such as indoor noise and relatively comfortable domestic regions.
ZegoANSMode.AI_BALANCED	Balanced mode, comprehensively eliminates noise while lossless human voice, but power consumption increases slightly. Suitable for complex call environments, such as outdoor busy markets, transportation, etc., as well as regions with serious noise interference.
ZegoANSMode.AI_LOW_LATENCY	Low latency mode, still maintains pure noise suppression effects and high-fidelity human voice sound quality under 10ms latency, suitable for scenarios sensitive to latency such as game voice, game gaming, real-time singing, etc.

// Enable ANS
engine.enableANS(true);
// Set AI noise suppression mode according to needs, note: after setting ANS mode to ZegoANSMode, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
engine.setANSMode(ZegoANSMode.AI);

// Enable ANS
engine.enableANS(true);
// Set AI noise suppression mode according to needs, note: after setting ANS mode to ZegoANSMode, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
engine.setANSMode(ZegoANSMode.AI);