How to select different voice detection interfaces according to scenarios?

2021-12-17

Products / Plugins:Video Call / Audio Call / Live streaming

Platform / Framework:iOS / Android / macOS / Windows

The following introduces several usage scenarios for different voice detection interfaces:

Scenario 1: Save Traffic

When you want to not send data when there is no voice activity to save traffic, you can use the following method: Set the configuration item enable_vad under the ZegoEngineConfig class to enable voice activity detection and enable_dtx to enable discrete audio packet transmission. When the two interfaces are used together, when the microphone is turned off or muted, the function of not sending data packets detected as silent in the publish stream is achieved.

Scenario 2: Real-time Judgment

When you want to determine whether the audio contains voice information while getting audio volume changes, you can use the following method: Call the enableVAD interface under ZegoSoundLevelConfig to set whether the sound level callback includes VAD detection results. In the onCapturedSoundLevelInfoUpdate callback, you can determine whether the corresponding stream contains normal voice through the vad parameter, and decide whether to display volume changes accordingly.

Scenario 3: Data Statistics

When you need to determine whether the microphone has continuous voice input and count the detection results within a certain period, you can use the following method: Through the steady-state voice detection function, you can determine whether someone is speaking into the microphone within a certain time period, which is used to detect whether the audio data after collection or audio preprocessing is noise or normal sound. This function is achieved by counting the results of all voice judgments within a fixed window time. Only when the human voice reaches a certain ratio is it determined to be human voice. First call the onAudioVADStateUpdate interface to set the callback for detecting the steady-state voice status of audio data, then call the startAudioVADStableStateMonitor interface (including microphone collection and external audio collection) to start steady-state detection of voice. Through the state parameter in the onAudioVADStateUpdate callback (this callback notification period is 3 seconds), determine whether the detected audio data is noise or normal voice.