logo
On this page

Quick Start Voice Call

This document explains how to quickly integrate the client SDK (ZEGO Express SDK) and achieve voice interaction with an AI Agent.

Prerequisites

  • Contact ZEGOCLOUD technical support to get the version of ZEGO Express SDK that supports AI echo cancellation and integrate it.

Sample Codes

The following is the example code for the business backend that integrates the real-time interactive AI Agent API. You can refer to the example code to implement your own business logic.

Below are the client sample codes, you can refer to these sample codes to implement your own business logic.

The following video demonstrates how to run the server and client (iOS) sample code and interact with an AI agent by voice.

Overall Business Process

  1. Server side: Follow the Server Quick Start guide to run the server sample code and deploy your server
    • Integrate ZEGOCLOUD AI Agent APIs to manage AI agents.
  2. Client side: Run the sample code
    • Create and manage AI agents through your server.
    • Integrate ZEGO Express SDK for real-time communication.

After completing these two steps, you can add an AI agent to a room for real-time interaction with real users.

Core Capability Implementation

Integrate ZEGO Express SDK

Please refer to Import the SDK > 2.2 > Method 3 to manually integrate the SDK. After integrating the SDK, follow these steps to initialize ZegoExpressEngine.

1

Declare Required Permissions in Info.plist

Info.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    ...
    <key>UIBackgroundModes</key>
    <array>
        <string>audio</string>
    </array>
    <key>NSMicrophoneUsageDescription</key>
    <string>Need microphone access for voice chat</string>
</dict>
</plist>
2

Request Recording Permission at Runtime

- (void)requestAudioPermission:(void(^)(BOOL granted))completion {
    /// Need to add a description of microphone usage in the project's Info.plist file
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    [audioSession requestRecordPermission:^(BOOL granted) {
        dispatch_async(dispatch_get_main_queue(), ^{
            completion(granted);
        });
    }];
}
3

Create and Initialize ZegoExpressEngine

-(void)initZegoExpressEngine{
    ZegoEngineProfile* profile = [[ZegoEngineProfile alloc]init];
    profile.appID = kZegoPassAppId;
// !mark
    profile.scenario = ZegoScenarioHighQualityChatroom; // Setting this scenario can avoid requesting camera permissions, and the integrator should set specific values according to their own business scenarios

    [ZegoExpressEngine createEngineWithProfile:profile eventHandler:self];
}

Notify Your Server to Start Call

You can notify your server to start the call immediately after the real user enters the room on the client side. Asynchronous calls can help reduce call connection time. After receiving the start call notification, your server creates an AI agent instance using the same roomID and associated userID and streamID as the client, so that the AI agent can interact with real users in the same room through mutual stream publishing and playing.

Note
In the following examples, roomID, userID, streamID and other parameters are not passed when notifying your server to start the call because fixed values have been agreed between the client and your server in this example. In actual use, please pass the real parameters according to your business requirements.
// Notify your server to start call
/**
 * Start a call with the AI agent
 *
 * @param completion Completion callback, returns operation result
 * @discussion This method sends a request to the server to start the call, used to initialize the AI agent instance
 */
- (void)doStartCallWithCompletion:(void (^)(NSInteger code, NSString *message, NSDictionary *data))completion {
    // Build request URL
    NSString *url = [NSString stringWithFormat:@"%@/api/start", self.currentBaseURL];
    NSURL *requestURL = [NSURL URLWithString:url];

    // Create request
    NSMutableURLRequest *request = [[NSMutableURLRequest alloc] initWithURL:requestURL];
    request.HTTPMethod = @"POST";

    // Set request headers
    [request setValue:@"application/json" forHTTPHeaderField:@"Content-Type"];

    // Create request parameters
    NSMutableDictionary *params = [NSMutableDictionary dictionary];
    NSData *jsonData = [NSJSONSerialization dataWithJSONObject:params options:0 error:nil];
    request.HTTPBody = jsonData;

    // Create session
    NSURLSession *session = [NSURLSession sharedSession];

    // Send request
    NSURLSessionDataTask *task = [session dataTaskWithRequest:request
                                           completionHandler:^(NSData * _Nullable data,
                                                            NSURLResponse * _Nullable response,
                                                            NSError * _Nullable error) {
        dispatch_async(dispatch_get_main_queue(), ^{
            if (error) {
                if (completion) {
                    completion(-1, @"Network request failed", nil);
                }
                return;
            }

            NSHTTPURLResponse *httpUrlResponse = (NSHTTPURLResponse *)response;
            if (httpUrlResponse.statusCode != 200) {
                if (completion) {
                    completion(httpUrlResponse.statusCode,
                             [NSString stringWithFormat:@"Server error: %ld", (long)httpUrlResponse.statusCode],
                             nil);
                }
                return;
            }

            NSError *jsonError;
            NSDictionary *dict = [NSJSONSerialization JSONObjectWithData:data options:0 error:&jsonError];
            if (jsonError) {
                if (completion) {
                    completion(-2, @"Failed to parse response data", nil);
                }
                return;
            }

            // Parse response data
            NSInteger code = [dict[@"code"] integerValue];
            NSString *message = dict[@"message"];
            NSDictionary *responseData = dict[@"data"];

            if (completion) {
                completion(code, message, responseData);
            }
        });
    }];

    [task resume];
}

User logs in a RTC room and starts publishing a stream

After a real user logs into the room, they start publishing streams.

Note

In this scenario, AI echo cancellation should be enabled for better effects.

The token used for login needs to be obtained from your server; please refer to the complete sample code.

Note

Please ensure that the roomID, userID, and streamID are unique under one ZEGOCLOUD APPID.

  • roomID: Generated by the user according to their own rules, it will be used to log into the Express SDK room. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
  • userID: Length should not exceed 32 bytes. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
  • streamID: Length should not exceed 256 bytes. Only numbers, English characters, and '-', '_' are supported.
Client request to login to room and publish a stream
// Record the agent
self.streamToPlay = [self getAgentStreamID];

ZegoEngineConfig* engineConfig = [[ZegoEngineConfig alloc] init];
engineConfig.advancedConfig = @{
// !mark(1:2)
    @"set_audio_volume_ducking_mode":@1,/** This configuration is used for volume ducking **/
    @"enable_rnd_volume_adaptive":@"true",/** This configuration is used for adaptive playback volume **/
};
// !mark
[ZegoExpressEngine setEngineConfig:engineConfig];

// This setting only affects AEC (echo cancellation). Here we set it to ModeGeneral, which uses our proprietary echo cancellation algorithm, giving us more control.
// If other options are selected, it might use the system's echo cancellation, which may work better on iPhones but could be less effective on some Android devices.
[[ZegoExpressEngine sharedEngine] setAudioDeviceMode:ZegoAudioDeviceModeGeneral];

// Note: Enabling AI echo cancellation requires contacting ZEGOCLOUD technical support to obtain the corresponding ZegoExpressionEngine.xcframework, as versions with these capabilities have not yet been released.
// !mark(1:5)
[[ZegoExpressEngine sharedEngine] enableAGC:TRUE];
[[ZegoExpressEngine sharedEngine] enableAEC:TRUE];
[[ZegoExpressEngine sharedEngine] setAECMode:ZegoAECModeAIAggressive2];
[[ZegoExpressEngine sharedEngine] enableANS:TRUE];
[[ZegoExpressEngine sharedEngine] setANSMode:ZegoANSModeMedium];

// Login to room
// !mark
[self loginRoom:^(int errorCode, NSDictionary *extendedData) {
    if (errorCode!=0) {
        NSString* errorMsg =[NSString stringWithFormat:@"Failed to enter voice room:%d", errorCode];
        completion(NO, errorMsg);
        return;
    }

    // Start publishing stream after entering room
    [self startPushlishStream];
}];

Play the AI Agent Stream

By default, there is only one real user and one AI agent in the same room, so any new stream added is assumed to be the AI agent stream.

Client request to play the AI agent stream
// Listen for room stream information update status, and play the AI agent stream
- (void)onRoomStreamUpdate:(ZegoUpdateType)updateType
                streamList:(NSArray<ZegoStream *> *)streamList
              extendedData:(nullable NSDictionary *)extendedData
                    roomID:(NSString *)roomID{
    if (updateType == ZegoUpdateTypeAdd) {
        for (int i=0; i<streamList.count; i++) {
            ZegoStream* item = [streamList objectAtIndex:i];

// !mark
            [self startPlayStream:item.streamID];
        }
    } else if(updateType == ZegoUpdateTypeDelete) {
        for (int i=0; i<streamList.count; i++) {
            ZegoStream* item = [streamList objectAtIndex:i];
            [[ZegoExpressEngine sharedEngine] stopPlayingStream:item.streamID];
        }
    }
}

Congratulations🎉! After completing this step, you can ask the AI agent any question by voice, and the AI agent will answer your questions by voice!

Delete the agent instance and the user exits the room

The client calls the logout interface to exit the room and stops publishing and playing streams. At the same time, it notifies your server to end the call. After receiving the end call notification, your server will delete the AI agent instance, and the AI agent instance will automatically exit the room and stop publishing and playing streams. This completes a full interaction.

/**
 * Notify your server to end the call
 *
 * @param completion Completion callback, returns operation result
 * @discussion This method sends a request to the server to end the call, used to release the AI agent instance
 */
- (void)doStopCallWithCompletion:(void (^)(NSInteger code, NSString *message, NSDictionary *data))completion {
    // Build request URL
// !mark
    NSString *url = [NSString stringWithFormat:@"%@/api/stop", self.currentBaseURL];
    NSURL *requestURL = [NSURL URLWithString:url];

    // Create request
    NSMutableURLRequest *request = [[NSMutableURLRequest alloc] initWithURL:requestURL];
    request.HTTPMethod = @"POST";

    // Set request headers
    [request setValue:@"application/json" forHTTPHeaderField:@"Content-Type"];

    // Create request parameters
    NSMutableDictionary *params = [NSMutableDictionary dictionary];
    NSData *jsonData = [NSJSONSerialization dataWithJSONObject:params options:0 error:nil];
    request.HTTPBody = jsonData;

    // Create session
    NSURLSession *session = [NSURLSession sharedSession];

    // Send request
    NSURLSessionDataTask *task = [session dataTaskWithRequest:request
                                           completionHandler:^(NSData * _Nullable data,
                                                            NSURLResponse * _Nullable response,
                                                            NSError * _Nullable error) {
        dispatch_async(dispatch_get_main_queue(), ^{
            if (error) {
                if (completion) {
                    completion(-1, @"Network request failed", nil);
                }
                return;
            }

            NSHTTPURLResponse *httpUrlResponse = (NSHTTPURLResponse *)response;
            if (httpUrlResponse.statusCode != 200) {
                if (completion) {
                    completion(httpUrlResponse.statusCode,
                             [NSString stringWithFormat:@"Server error: %ld", (long)httpUrlResponse.statusCode],
                             nil);
                }
                return;
            }

            NSError *jsonError;
            NSDictionary *dict = [NSJSONSerialization JSONObjectWithData:data options:0 error:&jsonError];
            if (jsonError) {
                if (completion) {
                    completion(-2, @"Failed to parse response data", nil);
                }
                return;
            }

            // Parse response data
            NSInteger code = [dict[@"code"] integerValue];
            NSString *message = dict[@"message"];
            NSDictionary *responseData = dict[@"data"];

            if (completion) {
                completion(code, message, responseData);
            }

            // Exit room
// !mark
            [[ZegoExpressEngine sharedEngine] logoutRoom];
        });
    }];

    [task resume];
}

This is the complete core process for you to achieve real-time voice interaction with an AI agent.

Best Practices for ZEGO Express SDK Configuration

To achieve the best audio call experience, it is recommended to configure the ZEGO Express SDK according to the following best practices. These configurations can significantly improve the quality of AI agent voice interactions.

Before join room Settings:

  • Enable traditional audio 3A processing (Acoustic Echo Cancellation AEC, Automatic Gain Control AGC, and Noise Suppression ANS)
  • Set the room usage scenario to High Quality Chatroom, as the SDK will adopt different optimization strategies for different scenarios
  • Set the audio device mode to default mode
  • Enable AI echo cancellation to improve echo cancellation effect (this feature requires contacting ZEGO technical support to obtain the corresponding version of ZEGOExpress SDK)
  • Configure volume ducking to avoid sound conflicts
  • Enable adaptive playback volume to enhance user experience
  • Enable AI noise reduction and set appropriate noise suppression level
ZegoEngineProfile* profile = [[ZegoEngineProfile alloc]init];
profile.appID = kZegoAppId;
profile.scenario = ZegoScenarioHighQualityChatroom; // High Quality Chatroom scenario, setting this scenario can avoid requesting camera permissions, integrators should set specific values according to their business scenarios
ZegoEngineConfig* engineConfig = [[ZegoEngineConfig alloc] init];
engineConfig.advancedConfig = @{
    @"set_audio_volume_ducking_mode":@1,/** Configure volume ducking to avoid sound conflicts **/
    @"enable_rnd_volume_adaptive":@"true",/** Enable adaptive playback volume **/
};
[ZegoExpressEngine setEngineConfig:engineConfig];
[ZegoExpressEngine createEngineWithProfile:profile eventHandler:self];
// Enable traditional audio 3A processing
[[ZegoExpressEngine sharedEngine] enableAGC:TRUE];
[[ZegoExpressEngine sharedEngine] enableAEC:TRUE];
[[ZegoExpressEngine sharedEngine] enableANS:TRUE];
// Enable AI echo cancellation, please note: enabling AI echo cancellation requires contacting ZEGO technical support to obtain the corresponding version of ZEGOExpress SDK
[[ZegoExpressEngine sharedEngine] setAECMode:ZegoAECModeAIAggressive2];
// Enable AI noise reduction with moderate noise suppression
[[ZegoExpressEngine sharedEngine] setANSMode:ZegoANSModeMedium];

Listen for Exception Callback

Note
Due to the large number of parameters for LLM and TTS, it is easy to cause various abnormal problems such as the AI agent not answering or not speaking during the test process due to parameter configuration errors. We strongly recommend that you listen for exception callbacks during the test process and quickly troubleshoot problems based on the callback information.

Previous

Release Notes

Next

Quick Start with Digital Human