Voice Call

Introduction

Product Features

Differences Between Video Call SDK and Voice Call SDK

Client SDK

Upgrade Guide

Common Error Codes

Quick Start

Communication

Room Features

Audio

Live Streaming

Other Features

Product features

Communication capabilities

Basic features

Basic features	Feature description	Business scenarios
Voice call	Users join the same room and conduct audio calls.	1v1 voice call Multi-person voice conference
Voice live	In the same room, including hosts and audience, hosts can conduct audio live streaming, and audiences in the room can watch the live stream.	Emotional FM Voice chat room
User permission control	Use Token for user permission control, such as: specifying users to enter/exit rooms; specifying users to speak/mute; specifying users.	Video conference
Pre-call detection	Before conducting audio and video calls or live streaming, perform device detection on cameras, microphones, monitors, etc., to ensure the normal operation of calls or live streaming.	Normal call function detection
Call quality monitoring	Detect the quality of audio and video, such as resolution, frame rate, bitrate, sampling rate, etc., to ensure stable quality.	Bank account opening, remote authentication, etc., which have high requirements and limitations on audio and video quality
Network speed testing	Before users publish/play streams, detect uplink and downlink network speeds to determine what bitrate of audio and video streams is suitable for publishing/playing under the current network environment.	Call scenarios, education scenarios, live streaming scenarios

Advanced features

Advanced features	Feature description	Business scenarios
Live co-hosting	In a room, multiple hosts can appear and conduct same-screen co-hosting live streaming.	Multi-host co-hosting from different regions Multi-person KTV singing together Showroom live streaming
Multi-source capture	Provides flexible and easy-to-use audio and video capture sources and channel management capabilities, reducing developer development and maintenance costs.	Video conferences, online education
Publish multiple streams simultaneously	A user can publish multiple audio and video streams, such as sending the camera's video stream while sharing the screen.	See the speaker's image while playing PPT in a video conference
Supplemental Enhancement Information (SEI)	Text information is packaged with audio and video content and transmitted through the streaming media channel to achieve precise synchronization between text data and audio and video content.	Precise layout of video screens Remote lyric synchronization Live quiz
Traffic control	ZEGO's industry-leading technology. The SDK dynamically adjusts the bitrate, frame rate, and resolution of video publishing streams, as well as audio bitrate, based on its own and the peer's current network environment status, automatically adapting to the current network environment and network fluctuations, thereby ensuring smooth video publishing.	All scenarios that require high-quality real-time audio and video services
Cloud proxy	By setting the SDK's cloud proxy interface, all traffic corresponding to the SDK is forwarded through the cloud proxy server to achieve communication with RTC and L3 (Ultra-low latency live streaming).	Hospitals, government, company internal and other restricted network environments such as intranets
Geofencing	Restricts the transmission of audio and video and signaling data to a certain region to meet regional data privacy and security-related regulations, that is, restricts access to audio and video services in a specific region.	Call scenarios
Audio and video stream encryption	Encrypt the stream when publishing, and must have a decryption key consistent with the encryption key when playing.	Scenarios that need to encrypt stream information to protect communication security
Game voice	Imitates the real world, where people have different auditory experiences based on factors such as the direction and distance of sound. For example, the farther the distance, the smaller the sound. At the same time, people who can receive the sound source can be grouped and restricted. For example, in a room, discuss in groups, and different groups cannot hear each other's voices.	Metaverse Same room, group communication or battle
Mass-scale audio and video	ZEGO's industry-leading technology. Automatically plays remote audio and video within the listening range based on the user's location in the cloud and provides spatial audio effects (by default, plays the 12 closest streams). A single scenario supports 10,000 users to enable microphones and cameras at the same time.	Virtual offices, virtual exhibitions, open virtual worlds and other virtual scenarios
Real-time synchronization of multiple users' status	ZEGO's industry-leading technology. Provides an orderly, high-frequency, low-latency, large-scale status synchronization service, helping developers quickly implement real-time information synchronization capabilities such as player positions, actions, and images in virtual gameplay. At the same time, it supports 10,000 users online simultaneously in a single scenario.	Metaverse scenarios such as virtual offices, virtual exhibitions, virtual social networking, virtual KTV, and general scenarios that require ultra-high frequency, low latency, and large-scale synchronization of information or control commands

Room capabilities

Basic features

Basic features	Feature description	Business scenarios
Room connection status description	Determine the user's connection status in the room and the conversion process of each connection status.	-
Real-time messaging and signaling	Real-time messaging mainly provides the function of sending and receiving pure text messages. It can send broadcast messages and barrage messages to other users in the same room, or send custom messages to specified users, and can implement interactive functions such as likes, gifts, and quizzes according to needs.	Showroom live streaming Voice chat room

Advanced features

Advanced features	Feature description	Business scenarios
Login to multiple rooms	A user can enter multiple rooms at the same time to conduct audio and video calls or watch live streams.	Teacher multi-class online teaching

Audio capabilities

Basic features

Basic features	Feature description	Business scenarios
Audio spectrum and sound level changes	Audio spectrum: the energy value of digital audio signals at each frequency point. Sound level changes: the volume of a certain stream.	Determine which user on the microphone is speaking, whether the microphone and speaker are available Audio spectrum animation display, etc.
Headphone monitor and channel settings	Headphone collection monitoring: After inserting headphones (ordinary headphones or Bluetooth headphones) into the device, you can hear the sound collected by the device's microphone from the local headphone side. Dual channels: two sound channels. When hearing sound, you can determine the specific position of the sound source based on the phase difference of the sound between the left and right ears.	Showroom live streaming Emotional FM Music teaching and other relatively professional scenarios
Audio 3A processing	During real-time audio and video calls or live streaming, 3A processing can be performed on audio to improve the quality of calls or live streaming and user experience. AEC (Acoustic Echo Cancellation): Filter collected audio data to reduce echoes in the audio. AGC (Automatic Gain Control): After enabling this function, the SDK can automatically adjust the microphone volume, adapt to near and far sound pickup, and keep the volume stable. ANS (Automatic Noise Suppression): Identify background noise in the sound and eliminate it. After enabling this function, the human voice can be clearer.	All scenarios that require high-quality real-time audio and video services
Voice changer/reverb/stereo	To increase fun and interactivity, users can use voice changers to be funny, use reverb to enhance the atmosphere, and use stereo to make the sound more three-dimensional. ZEGO Express SDK provides a variety of preset voice changer, reverb, reverb echo, and stereo effects. Developers can flexibly set the sound they want.	Live streaming Voice chat room Karaoke room Anonymous social networking Game entertainment Role playing

Advanced features

Advanced features	Feature description	Business scenarios
Audio mixing	The SDK obtains a piece of audio data from the App, and integrates the audio data provided by the App with the audio data collected by the SDK into a single audio data, thereby realizing the ability to play custom sounds and music files during calls or live streaming, and allowing others in the room to hear them.	Social voice chat Live streaming
Scenario-based AI noise reduction	Real-time automatic recognition of different scenarios, intelligently adjusts AI noise reduction strategies to provide the best noise reduction and sound quality effects. In call scenarios, all sounds except human voice are identified as noise and eliminated. In music scenarios, automatically adjust noise reduction effects to restore music sound quality.	Voice rooms, conferences, voice gaming and other 1v1 or multi-person audio and video call scenarios, as well as live streaming or online KTV scenarios such as sound cards, singing along, near-field music
Custom audio capture	Developers can obtain audio information by themselves and then hand it over to the SDK for transmission.	Online or local audio file transmission Transmission of audio files from customized capture systems
Custom audio rendering	Audio is rendered and played by the developer themselves.	Developers have their own special rendering requirements
Custom audio processing	Developers can perform special audio processing by themselves.	When there are special sound processing requirements that the SDK cannot meet, such as special voice changers
Get original audio data	The function of obtaining original audio recording. The obtained original audio data format is PCM.	Audio data retention or special processing
AI voice changer	The "Conan voice-changer bow tie" in real-time calls, perfectly reproduces the target character's timbre and rhythm, while retaining the user's speaking speed, emotion, and tone. Switch timbres at will with ultra-low latency.	Social voice chat Live streaming Game voice

Live streaming capabilities

Basic features

Basic features	Feature description	Business scenarios
Stream mixing	Mix multiple streams from multiple people into one stream, so that you only need to play one stream to see the screens of all members in the room and hear the voices of all members in the room.	Multi-person call host co-hosting
Use CDN for live streaming	Unify the access capabilities of multiple CDNs. This function supports publishing to CDN, connecting RTC products and CDN live streaming products, making it convenient for users to watch live content directly from web pages or third-party players.	Basic live streaming with high concurrency, scenarios without strong requirements for live streaming latency
CDN publishing authentication	To prevent attackers from stealing the developer's publishing URL address for use elsewhere, or forging the developer's server to generate the publishing URL address, resulting in traffic loss, you can configure CDN publishing authentication by yourself through the ZEGOCLOUD Console. After enabling authentication, you need to splice relevant authentication parameters in the publishing URL address, otherwise you cannot publish.	-
Playing stream by URL	When the publishing end uses third-party publishing tools (such as OBS software, network camera IP Camera, etc.) to push the stream to the CDN, or uses the ZEGO SDK to relay the audio and video screen to a third-party CDN, you can use the method of directly passing in the URL address to play the stream.	Third-party live screen acquisition

Advanced features

Advanced features	Feature description	Business scenarios
Ultra-low latency live streaming	Focuses on providing stable and reliable live streaming services. Compared with standard video live streaming products, it has lower audio and video latency, stronger synchronization, better weak network resistance, and can bring users a millisecond-level live streaming experience.	Online education Showroom live streaming E-commerce live streaming Watch together Online auction
Direct-to-CDN	The process of pushing audio and video streams directly from the local client to the CDN. Users can watch directly through the playing stream URL from a web page or a third-party player.	Developers who have audio and video distribution service cooperation with third-party CDNs can use

Other capabilities

Basic features

Basic features	Feature description	Business scenarios
Media player	Provides the ability to play audio and video media files and supports publishing the audio and video data of the played media files.	Play test audio Play background music Play video files
Audio effect player	Provides an audio effect player, manages audio effects uniformly, and achieves effects such as enhancing realism or setting the atmosphere by playing short sound effects.	Showroom live streaming Game entertainment
Audio and video recording	During video calls, live streaming, and online teaching, users often need to record and save videos for subsequent on-demand viewing by other users. ZEGO provides multiple recording solutions to meet recording needs in different scenarios.	Conference recording Live streaming recording Call recording Online classroom recording

On this page

Product features

Communication capabilities

Advanced features

Room capabilities

Advanced features

Audio capabilities

Advanced features

Live streaming capabilities

Advanced features

Other capabilities

Back to top