Scenario-based AI Noise Reduction

2024-07-27

Feature Introduction

Scenario-based AI noise reduction refers to real-time automatic identification of different scenarios and intelligent adjustment of AI noise reduction strategies to provide the best noise reduction and sound quality effects. Currently supports two common noise reduction scenarios:

In call scenarios, all sounds except human voice are identified as noise and eliminated. On the basis of eliminating steady-state noise (for details please refer to Audio 3A Processing), it effectively eliminates non-steady-state noise and achieves high-fidelity human voice. Main noises include mouse, keyboard, tapping, air conditioning, kitchen dishes, noisy restaurants, environmental wind, coughing, blowing, and other non-human voice noises, as well as human voice reverb in small rooms.
In music scenarios, automatically adjust noise reduction effects to restore music sound quality. Real-time music detection on mic input. In sound card, singing accompaniment, or near-field music scenarios, automatically adjust the noise reduction level to ensure high-fidelity music sound quality.

Warning

Before using the AI noise reduction feature, please contact ZEGOCLOUD Technical Support for special packaging.
Starting from version 3.0.0, ZEGO Express SDK supports intelligent recognition of music scenarios. In music scenarios, AI noise reduction can automatically reduce the noise reduction level to improve sound quality experience. If you need to use this feature, please contact ZEGOCLOUD Technical Support for special packaging and configuration.

Feature Advantages

Can eliminate 80% of noise.
Low latency.
Low memory usage, basically the same as traditional noise reduction.
Low CPU usage.
Music scenario recognition accuracy reaches 99%.

Applicable Scenarios

This feature is suitable for 1v1 or multi-person audio and video call scenarios such as voice chat rooms, meetings, and voice gaming, as well as live streaming or online KTV scenarios such as sound cards, singing accompaniment, and near-field music.

Warning

Music scenario recognition requires turning on the music detection switch.

Removable Noise

Developers can use this feature to eliminate the following noises:

Scenario	Some Typical Noises
Meeting Room	Keyboard sound Table tapping sound
Office	Keyboard sound Surrounding colleagues' speaking sound
Transportation	Car whistle sound Sound of cars passing by Car music sound Rain sound and windshield wiper sound
Internet Cafe	Keyboard sound Surrounding people's voice sound
Coffee Shop	Chair dragging sound Surrounding people's speaking sound Sharp collision sound

Effect Demonstration

Office

Original audio contains: mouse click sound, keyboard sound, applause sound, friction sound, office noise, air conditioning sound, etc.

After AI noise reduction:

Public Place

Original audio contains: rain sound, tram sound, cooking sound, car whistling sound, etc.

After AI noise reduction:

Music Scenario

Original audio:

Conventional AI noise reduction: Eliminates noise, but greatly damages music.

After scenario-based AI noise reduction: Eliminates noise, music quality fidelity is preserved.

Prerequisites

Before implementing the AI noise reduction feature, please ensure:

You have created a project in the ZEGOCLOUD Console and applied for a valid AppID and AppSign. For details, please refer to Console - Project Information.
You have integrated ZEGO Express SDK in the project and implemented basic audio and video publishing and playing functions. For details, please refer to Quick Start - Integration and Quick Start - Implementation Flow.

Usage Steps

Developers can complete the related settings of AI noise reduction according to the following steps:

Please contact ZEGOCLOUD Technical Support to configure and enable the music detection feature. If already enabled, please ignore this step.
For the specific process of initialization and logging in to the room, please refer to "Create Engine" and "Login Room" in the implementing video call documentation.
Call the enableANS interface to enable noise suppression. After enabling this feature, human voice can be clearer.

After enabling noise suppression, developers can set the ANS mode by calling the setANSMode interface to enable the AI noise reduction feature. The following shows some AI noise reduction modes. For more modes, please refer to ZegoANSMode.

AI Noise Reduction Mode	Applicable Scenarios
ZegoANSModeAI	Lightweight mode. Still has good noise reduction effects under extremely low power consumption and package size increment. Suitable for environments such as indoor noise and relatively comfortable domestic regions.
ZegoANSModeAIBalanced	Balanced mode. Comprehensively eliminates noise while preserving human voice without loss, but power consumption increases slightly. Suitable for complex call environments, such as outdoor noisy markets, transportation, and other environments as well as regions with serious noise interference.
ZegoANSModeAILowLatency	Low latency mode. Still maintains pure noise reduction effects and high-fidelity human voice sound quality under 10ms latency. Suitable for scenarios sensitive to latency such as game voice, game co-op, real-time chorus, etc.

// Enable ANS
[[ZegoExpressEngine sharedEngine] enableANS:YES];
// Set AI noise reduction mode according to needs. Note: After setting ANS mode to ZegoANSMode, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
[[ZegoExpressEngine sharedEngine] setANSMode:ZegoANSModeAI];

// Enable ANS
[[ZegoExpressEngine sharedEngine] enableANS:YES];
// Set AI noise reduction mode according to needs. Note: After setting ANS mode to ZegoANSMode, ZEGO Express SDK will forcibly disable transient noise suppression [enableTransientANS]
[[ZegoExpressEngine sharedEngine] setANSMode:ZegoANSModeAI];