Quick Start Voice Call
This document explains how to quickly integrate the client SDK (ZEGO Express SDK) and achieve voice interaction with an AI Agent.
Prerequisites
- Contact ZEGOCLOUD technical support to get the version of ZEGO Express SDK that supports AI echo cancellation and integrate it.
- Contact ZEGOCLOUD technical support to get the version of ZEGO Express SDK that supports AI noise reduction and integrate it.
Sample Codes
Listed below are the sample codes of core functionalities. You can refer to these sample codes to implement your own business logic.
Android Client Sample Code
Android client sample code. It includes basic capabilities such as logging into and out of RTC rooms, and publishing and playing streams.
iOS Client Sample Code
iOS client sample code. It includes basic capabilities such as logging into and out of RTC rooms, and publishing and playing streams.
Web Client Sample Code
Web client sample code. It includes basic capabilities such as logging into and out of RTC rooms, and publishing and playing streams.
The following video demonstrates how to run the server and client (Web) sample code and interact with an AI agent by voice.
Overall Business Process
- Server side: Follow the Server Quick Start guide to run the server sample code and deploy your server
- Integrate ZEGOCLOUD AI Agent APIs to manage AI agents.
- Client side: Run the sample code
- Create and manage AI agents through your server.
- Integrate ZEGO Express SDK for real-time communication.
After completing these two steps, you can add an AI agent to a room for real-time interaction with real users.
sequenceDiagram
participant Client
participant Your Server
participant ZEGOCLOUD AI Agent Server
Your Server->>Your Server: Register an AI agent
Your Server->>ZEGOCLOUD AI Agent Server: Register an AI agent
ZEGOCLOUD AI Agent Server-->>Your Server:
Client->>Your Server: Notify server to start call
Your Server->>ZEGOCLOUD AI Agent Server: Create an AI agent instance
ZEGOCLOUD AI Agent Server->>ZEGOCLOUD AI Agent Server: The AI agent logs into the RTC room, publishes a stream, and plays the user stream
ZEGOCLOUD AI Agent Server-->>Your Server:
Your Server-->>Client:
Client->Your Server: Request Token
Your Server-->>Client: Token
Client->>Client: Initialize ZEGO Express SDK, login to room and start publishing stream
Client->>Client: User plays the AI agent stream
Client->>Your Server: Notify server to stop call
Your Server->>ZEGOCLOUD AI Agent Server: Delete the AI agent instance
ZEGOCLOUD AI Agent Server-->>Your Server:
Your Server-->>Client:
Client->>Client: User stops publishing the stream and exits the room
Core Capability Implementation
Integrate ZEGO Express SDK
Please refer to Integrate the SDK > 2.2 > Method 2 to manually integrate the SDK. After integrating the SDK, follow these steps to initialize ZegoExpressEngine.
Add Permission Declaration
Navigate to the "app/src/main" directory, open the "AndroidManifest.xml" file, and add permissions.
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" />
1
Request Recording Permission at Runtime
private final ActivityResultLauncher<String> requestPermissionLauncher = registerForActivityResult(
new ActivityResultContracts.RequestPermission(), new ActivityResultCallback<Boolean>() {
@Override
public void onActivityResult(Boolean isGranted) {
if (isGranted) {
// Permission granted
}
}
});
// Initiate request
requestPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO);
1
Create and Initialize ZegoExpressEngine
ZegoEngineProfile zegoEngineProfile = new ZegoEngineProfile();
zegoEngineProfile.appID = ; // Obtain from ZEGOCLOUD Console
zegoEngineProfile.scenario = ZegoScenario.HIGH_QUALITY_CHATROOM; // Setting this scenario can avoid requesting camera permissions, and the integrator should set specific values according to their own business scenarios
zegoEngineProfile.application = getApplication();
ZegoExpressEngine.createEngine(zegoEngineProfile, null);
1
Please refer to Import the SDK > 2.2 > Method 3 to manually integrate the SDK. After integrating the SDK, follow these steps to initialize ZegoExpressEngine.
Declare Required Permissions in Info.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
...
<key>UIBackgroundModes</key>
<array>
<string>audio</string>
</array>
<key>NSMicrophoneUsageDescription</key>
<string>Need microphone access for voice chat</string>
</dict>
</plist>
1
Request Recording Permission at Runtime
- (void)requestAudioPermission:(void(^)(BOOL granted))completion {
/// Need to add a description of microphone usage in the project's Info.plist file
AVAudioSession *audioSession = [AVAudioSession sharedInstance];
[audioSession requestRecordPermission:^(BOOL granted) {
dispatch_async(dispatch_get_main_queue(), ^{
completion(granted);
});
}];
}
1
Create and Initialize ZegoExpressEngine
-(void)initZegoExpressEngine{
ZegoEngineProfile* profile = [[ZegoEngineProfile alloc]init];
profile.appID = kZegoPassAppId;
profile.scenario = ZegoScenarioHighQualityChatroom; // Setting this scenario can avoid requesting camera permissions, and the integrator should set specific values according to their own business scenarios
[ZegoExpressEngine createEngineWithProfile:profile eventHandler:self];
}
1
Please refer to Import the SDK > Method 2 to manually integrate the SDK. After integrating the SDK, follow these steps to initialize ZegoExpressEngine.
- Load the AI noise reduction module
- Instantiate ZegoExpressEngine
- Check system requirements (WebRTC support and microphone permissions)
import { ZegoExpressEngine } from "zego-express-engine-webrtc";
import { VoiceChanger } from "zego-express-engine-webrtc/voice-changer";
const appID = 1234567 // Obtain from ZEGOCLOUD Console
const server = 'xxx' // Obtain from ZEGOCLOUD Console
// Load AI noise reduction module
ZegoExpressEngine.use(VoiceChanger);
// Instantiate ZegoExpressEngine with appId and server configurations
const zg = new ZegoExpressEngine(appID, server);
// Check system requirements
const checkSystemRequirements = async () => {
// Detect WebRTC support
const rtc_sup = await zg.checkSystemRequirements("webRTC");
if (!rtc_sup.result) {
// Browser does not support WebRTC
}
// Detect microphone permission status
const mic_sup = await zg.checkSystemRequirements("microphone");
if (!mic_sup.result) {
// Microphone permission is not enabled
}
}
checkSystemRequirements()
1
Notify Your Server to Start Call
You can notify your server to start the call immediately after the real user enters the room on the client side. Asynchronous calls can help reduce call connection time. After receiving the start call notification, your server creates an AI agent instance using the same roomID and associated userID and streamID as the client, so that the AI agent can interact with real users in the same room through mutual stream publishing and playing.
User logs in a RTC room and starts publishing a stream
After a real user logs into the room, they start publishing streams.
In this scenario, AI echo cancellation should be enabled for better effects.
In this scenario, AI noise reduction should be enabled to achieve better results.
The token used for login needs to be obtained from your server; please refer to the complete sample code.
Please ensure that the roomID, userID, and streamID are unique under one ZEGOCLOUD APPID.
- roomID: Generated by the user according to their own rules, it will be used to log into the Express SDK room. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
- userID: Length should not exceed 32 bytes. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
- streamID: Length should not exceed 256 bytes. Only numbers, English characters, and '-', '_' are supported.
Client login to room and publish a stream
private void loginRoom(String userId, String userName, String userName, String token,
IZegoRoomLoginCallback callback) {
ZegoEngineConfig config = new ZegoEngineConfig();
HashMap<String, String> advanceConfig = new HashMap<String, String>();
advanceConfig.put("set_audio_volume_ducking_mode", "1");
advanceConfig.put("enable_rnd_volume_adaptive", "true");
config.advancedConfig = advanceConfig;
ZegoExpressEngine.setEngineConfig(config);
ZegoExpressEngine.getEngine().setRoomScenario(ZegoScenario.HIGH_QUALITY_CHATROOM);
ZegoExpressEngine.getEngine().setAudioDeviceMode(ZegoAudioDeviceMode.GENERAL);
ZegoExpressEngine.getEngine().enableAEC(true);
// Please note: To enable AI echo cancellation, please contact ZEGOCLOUD technical support to obtain the corresponding version of ZEGOExpress SDK
ZegoExpressEngine.getEngine().setAECMode(ZegoAECMode.AI_AGGRESSIVE2);
ZegoExpressEngine.getEngine().enableAGC(true);
ZegoExpressEngine.getEngine().enableANS(true);
ZegoExpressEngine.getEngine().setANSMode(ZegoANSMode.MEDIUM);
ZegoRoomConfig roomConfig = new ZegoRoomConfig();
roomConfig.isUserStatusNotify = true;
roomConfig.token = token; // Token authentication is required, obtain it from your server, and refer to ZEGOCLOUD documentation for generation method
String roomId ; // Custom room ID for login, please refer to the format description
String userSteamID // Custom stream ID for publishing, please refer to the format description
ZegoExpressEngine.getEngine()
.loginRoom(roomId, new ZegoUser(userId, userName), roomConfig, (errorCode, extendedData) -> {
Timber.d(
"loginRoom() called with: errorCode = [" + errorCode + "], extendedData = [" + extendedData + "]");
if (errorCode == 0) {
// Start publishing stream after successful login
ZegoExpressEngine.getEngine().startPublishingStream(userSteamID);
// Set microphone mute status, false means unmuted, true means muted
ZegoExpressEngine.getEngine().muteMicrophone(false);
}
if (callback != null) {
callback.onRoomLoginResult(errorCode, extendedData);
}
});
}
1
Client request to login to room and publish a stream
// Record the agent
self.streamToPlay = [self getAgentStreamID];
ZegoEngineConfig* engineConfig = [[ZegoEngineConfig alloc] init];
engineConfig.advancedConfig = @{
@"set_audio_volume_ducking_mode":@1,/** This configuration is used for volume ducking **/
@"enable_rnd_volume_adaptive":@"true",/** This configuration is used for adaptive playback volume **/
};
[ZegoExpressEngine setEngineConfig:engineConfig];
// This setting only affects AEC (echo cancellation). Here we set it to ModeGeneral, which uses our proprietary echo cancellation algorithm, giving us more control.
// If other options are selected, it might use the system's echo cancellation, which may work better on iPhones but could be less effective on some Android devices.
[[ZegoExpressEngine sharedEngine] setAudioDeviceMode:ZegoAudioDeviceModeGeneral];
// Note: Enabling AI echo cancellation requires contacting ZEGOCLOUD technical support to obtain the corresponding ZegoExpressionEngine.xcframework, as versions with these capabilities have not yet been released.
[[ZegoExpressEngine sharedEngine] enableAGC:TRUE];
[[ZegoExpressEngine sharedEngine] enableAEC:TRUE];
[[ZegoExpressEngine sharedEngine] setAECMode:ZegoAECModeAIAggressive2];
[[ZegoExpressEngine sharedEngine] enableANS:TRUE];
[[ZegoExpressEngine sharedEngine] setANSMode:ZegoANSModeMedium];
// Login to room
[self loginRoom:^(int errorCode, NSDictionary *extendedData) {
if (errorCode!=0) {
NSString* errorMsg =[NSString stringWithFormat:@"Failed to enter voice room:%d", errorCode];
completion(NO, errorMsg);
return;
}
// Start publishing stream after entering room
[self startPushlishStream];
}];
1
Client request to login to room and publish a stream
const userId = "" // User ID for logging into the Express SDK room
const roomId = "" // RTC Room ID
const userStreamId = "" // User stream push ID
async function enterRoom() {
try {
// Generate RTC Token [Reference Documentation] (https://www.zegocloud.com/docs/video-call/token?platform=web&language=javascript)
const token = await Api.getToken();
// Login to room
await zg.loginRoom(roomId, token, {
userID: userId,
userName: "",
});
// Create local audio stream
const localStream = await zg.createZegoStream({
camera: {
video: false,
audio: true,
},
});
if (localStream) {
// Push local stream
await zg.startPublishingStream(userStreamId, localStream);
// Enable AI noise reduction (requires specially packaged ZEGO Express SDK)
const enableResult = await zg.enableAiDenoise(localStream, true);
if (enableResult.errorCode === 0) {
return zg.setAiDenoiseMode(localStream, 1);
}
}
} catch (error) {
console.error("Failed to enter room:", error);
throw error;
}
}
enterRoom()
1
Play the AI Agent Stream
By default, there is only one real user and one AI agent in the same room, so any new stream added is assumed to be the AI agent stream.
Client request to play the AI agent stream
// Set up the event handler
void setEventHandler() {
ZegoExpressEngine.getEngine().setEventHandler(new IZegoEventHandler() {
@Override
// When other users in the room start/stop publishing streams, you can receive notifications about the corresponding user's audio/video stream changes here
public void onRoomStreamUpdate(String roomID, ZegoUpdateType updateType, ArrayList<ZegoStream> streamList, JSONObject extendedData) {
super.onRoomStreamUpdate(roomID, updateType, streamList, extendedData);
// When updateType is ZegoUpdateType.ADD, it means there is a new audio/video stream, at this time we can call the startPlayingStream interface to pull this audio/video stream
if (updateType == ZegoUpdateType.ADD) {
ZegoStream stream = streamList.get(0);
// By default, new streams are from the AI agent, so play directly
ZegoExpressEngine.getEngine().startPlayingStream(stream.streamID);
}
}
});
}
1
Client request to play the AI agent stream
// Listen for room stream information update status, and play the AI agent stream
- (void)onRoomStreamUpdate:(ZegoUpdateType)updateType
streamList:(NSArray<ZegoStream *> *)streamList
extendedData:(nullable NSDictionary *)extendedData
roomID:(NSString *)roomID{
if (updateType == ZegoUpdateTypeAdd) {
for (int i=0; i<streamList.count; i++) {
ZegoStream* item = [streamList objectAtIndex:i];
[self startPlayStream:item.streamID];
break;
}
} else if(updateType == ZegoUpdateTypeDelete) {
for (int i=0; i<streamList.count; i++) {
ZegoStream* item = [streamList objectAtIndex:i];
[[ZegoExpressEngine sharedEngine] stopPlayingStream:item.streamID];
}
}
}
1
Client request to play the AI agent stream
// Listen to remote stream update events
function setupEvent() {
zg.on("roomStreamUpdate",
async (roomID, updateType, streamList) => {
if (updateType === "ADD" && streamList.length > 0) {
try {
for (const stream of streamList) {
// Play the AI agent stream
const mediaStream = await zg.startPlayingStream(stream.streamID);
if (!mediaStream) return;
const remoteView = await zg.createRemoteStreamView(mediaStream);
if (remoteView) {
// A container with the id 'remoteSteamView' is required on the page to receive the AI agent stream [Reference Documentation](https://www.zegocloud.com/article/api?doc=Express_Video_SDK_API~javascript_web~class~ZegoStreamView)
remoteView.play("remoteSteamView", {
enableAutoplayDialog: false,
});
}
}
} catch (error) {
console.error("Failed to pull stream:", error);
}
}
}
);
}
1
Congratulations🎉! After completing this step, you can ask the AI agent any question by voice, and the AI agent will answer your questions by voice!
Delete the agent instance and the user exits the room
The client calls the logout interface to exit the room and stops publishing and playing streams. At the same time, it notifies your server to end the call. After receiving the end call notification, your server will delete the AI agent instance, and the AI agent instance will automatically exit the room and stop publishing and playing streams. This completes a full interaction.
// Notify your server to end the call
private void stop() {
RequestBody body = RequestBody.create("", MediaType.parse("application/json; charset=utf-8"));
Request request = new Request.Builder().url(YOUR_SERVER_URL + "/api/stop").post(body).build();
new OkHttpClient.Builder().build().newCall(request).enqueue(new Callback() {
@Override
public void onFailure(@NonNull Call call, @NonNull IOException e) {
}
@Override
public void onResponse(@NonNull Call call, @NonNull Response response) throws IOException {
if (response.isSuccessful()) {
// Exit room
ZegoExpressEngine.getEngine().logoutRoom();
}
}
});
}
1
/**
* Notify your server to end the call
*
* @param completion Completion callback, returns operation result
* @discussion This method sends a request to the server to end the call, used to release the AI agent instance
*/
- (void)doStopCallWithCompletion:(void (^)(NSInteger code, NSString *message, NSDictionary *data))completion {
// Build request URL
NSString *url = [NSString stringWithFormat:@"%@/api/stop", self.currentBaseURL];
NSURL *requestURL = [NSURL URLWithString:url];
// Create request
NSMutableURLRequest *request = [[NSMutableURLRequest alloc] initWithURL:requestURL];
request.HTTPMethod = @"POST";
// Set request headers
[request setValue:@"application/json" forHTTPHeaderField:@"Content-Type"];
// Create request parameters
NSMutableDictionary *params = [NSMutableDictionary dictionary];
NSData *jsonData = [NSJSONSerialization dataWithJSONObject:params options:0 error:nil];
request.HTTPBody = jsonData;
// Create session
NSURLSession *session = [NSURLSession sharedSession];
// Send request
NSURLSessionDataTask *task = [session dataTaskWithRequest:request
completionHandler:^(NSData * _Nullable data,
NSURLResponse * _Nullable response,
NSError * _Nullable error) {
dispatch_async(dispatch_get_main_queue(), ^{
if (error) {
if (completion) {
completion(-1, @"Network request failed", nil);
}
return;
}
NSHTTPURLResponse *httpUrlResponse = (NSHTTPURLResponse *)response;
if (httpUrlResponse.statusCode != 200) {
if (completion) {
completion(httpUrlResponse.statusCode,
[NSString stringWithFormat:@"Server error: %ld", (long)httpUrlResponse.statusCode],
nil);
}
return;
}
NSError *jsonError;
NSDictionary *dict = [NSJSONSerialization JSONObjectWithData:data options:0 error:&jsonError];
if (jsonError) {
if (completion) {
completion(-2, @"Failed to parse response data", nil);
}
return;
}
// Parse response data
NSInteger code = [dict[@"code"] integerValue];
NSString *message = dict[@"message"];
NSDictionary *responseData = dict[@"data"];
if (completion) {
completion(code, message, responseData);
}
// Exit room
[[ZegoExpressEngine sharedEngine] logoutRoom];
});
}];
[task resume];
}
1
// Exit room
async function stopCall() {
try {
const response = await fetch(`${YOUR_SERVER_URL}/api/stop`, { // YOUR_SERVER_URL is the address of your Your Server
method: 'POST',
headers: {
'Content-Type': 'application/json',
}
});
const data = await response.json();
console.log('End call result:', data);
return data;
} catch (error) {
console.error('Failed to end call:', error);
throw error;
}
}
stopCall();
zg.destroyLocalStream(localStream);
zg.logoutRoom();
1
This is the complete core process for you to achieve real-time voice interaction with an AI agent.