logo
On this page

Quick Start Digital Human Video Call

This document explains how to quickly integrate the client SDK (ZEGO Express SDK and Digital Human SDK) and achieve video interaction with an AI Agent.

Digital Human Introduction

With just a photo or image of a real person or anime character from the waist up, you can obtain a 1080P digital human with accurate lip-sync and realistic appearance. When used with the AI Agent product, you can quickly achieve video interaction chat with AI digital humans within 2 seconds overall, suitable for various scenarios such as digital human 1V1 interactive video, digital human customer service, and digital human live streaming.

  • More natural driving effects: Supports subtle body movements, natural facial expressions without distortion, providing more realistic and immersive interaction compared to voice calls;
  • Multi-language accurate lip-sync: Natural and accurate lip movements, especially optimized for Chinese and English;
  • Ultra-low interaction latency: Digital human driving latency < 500ms, combined with AI Agent interaction latency < 2s;
  • Higher clarity: True 1080P effect, 20%+ improvement in clarity compared to traditional image-based digital humans

Prerequisites

  • Create a project in the ZEGOCLOUD Console, and get its valid AppID and AppSign. For more details, please refer to Admin Console doc How to view project info.
  • You have contacted ZEGOCLOUD Technical Support to enable Digital Human PaaS service and related interface permissions.
  • You have contacted ZEGOCLOUD Technical Support to create a digital human.
  • You have contacted ZEGOCLOUD Technical Support to obtain the ZEGO Express SDK that supports AI echo cancellation and integrated it into your project.

Sample Code

Below are client sample codes. You can refer to the sample code to implement your own business logic.

The following video demonstrates how to run the server and client (Web) sample code and interact with the digital human agent via video.

Overall Business Process

  1. Server, deploy the business backend sample code according to the Business Backend Quick Start Guide.
    • Integrate the AI Agent API management of the AI Agent.
  1. Client, run the sample code.
    • Create and manage agents through the business backend.
    • Integrate ZEGO Express SDK and Digital Human SDK to complete real-time communication.

After completing the above two steps, you can achieve real-time interaction with the AI Agent in the room and with the real user.

Core Capabilities Implementation

Integrate ZEGO Express SDK

Please refer to Import the SDK > 2.2 > Method 3 to manually integrate the SDK. After integrating the SDK, follow the steps below to initialize ZegoExpressEngine.

1
Declare Necessary Permissions in Info.plist File
Info.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    ...
    <key>UIBackgroundModes</key>
    <array>
        <string>audio</string>
    </array>
    <key>NSMicrophoneUsageDescription</key>
    <string>需要访问麦克风以进行语音聊天</string>
</dict>
</plist>
1
Copied!
2
Request Microphone Permission at Runtime
Untitled
- (void)requestAudioPermission:(void(^)(BOOL granted))completion {
    /// Add a description of microphone permission usage in the Info.plist file of the project
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    [audioSession requestRecordPermission:^(BOOL granted) {
        dispatch_async(dispatch_get_main_queue(), ^{
            completion(granted);
        });
    }];
}
1
Copied!
3
Create and Initialize ZegoExpressEngine
Untitled
-(void)initZegoExpressEngine{
    ZegoEngineProfile* profile = [[ZegoEngineProfile alloc]init];
    profile.appID = kZegoPassAppId;
    profile.scenario = ZegoScenarioHighQualityChatroom; // Set this scenario to avoid applying for camera permission, and the business side should set the specific value according to its own business scenario
    
    [ZegoExpressEngine createEngineWithProfile:profile eventHandler:self];
}
1
Copied!

Integrate Digital Human SDK

1
Download the Latest Version of SDK

Please download the latest version of the SDK.

2
Unzip the SDK

Unzip the SDK package to the project directory, for example, the "libs" folder.

20250626-153505.jpeg
3

Select the “TARGETS > General > Frameworks,Libraries,and Embedded Content” menu, add “ZegoDigitalMobile.xcframework”, and set “Embed” to “Embed & Sign”.

20250626-153759.jpeg

Notify Business Backend to Start Call

You can notify the business backend to start the call immediately after the real user enters the room. The asynchronous call can reduce the call connection time. After the business backend receives the start call notification, it creates a digital human agent instance using the same roomID and associated userID and streamID as the client, so that the digital human agent can interact with the real user in the same room through mutual push and pull streams.

When requesting the business backend, you need to include the digital human parameters, which include digital_human_id and config_id.

  • digital_human_id is the digital human ID, please contact ZEGO technical support to obtain it.
  • config_id is the configuration ID of the digital human, different platforms use different digital human configurations, and the digital human service will optimize the performance and effect on different platforms according to the config_id. For Android/iOS, please fill in mobile, and for Web, please fill in web.

Initialize the Digital Human SDK Instance

First, add a digital human preview view in the View, and the digital human will be rendered to this view.

1
Declare the Digital Human Instance and View
Untitled
#import <ZegoDigitalMobile/ZegoDigitalMobile.h>

// Digital Human SDK Instance, you can create multiple instances to display different digital humans
@property (nonatomic, strong) id<IZegoDigitalMobile> digitalMobile;
// Digital Human Preview View, the digital human will be rendered to this view
@property (nonatomic, strong) ZegoPreviewView *previewView;
1
Copied!
2
Create and Add previewView
Untitled
- (void)setupPreviewView {
    self.previewView = [[ZegoPreviewView alloc] init];
    self.previewView.backgroundColor = [UIColor whiteColor];

    [self.view addSubview:self.previewView];
    [self.previewView mas_makeConstraints:^(MASConstraintMaker *make) {
        make.edges.equalTo(self.view);
    }];
}
1
Copied!
3
Initialize the Digital Human SDK Instance and Bind the Rendering View
Untitled
// Create the digital human SDK instance
self.digitalMobile = [ZegoDigitalMobileFactory create];
// Digital human configuration, get it from the DigitalHumanConfig returned by the interface for creating the digital human agent instance from the business backend
NSString *digitalHumanEncodeConfig = @"";
// Initialize the digital human SDK instance, pass in the digital human configuration
[self.digitalMobile start:digitalHumanEncodeConfig delegate:self];
// Bind the preview view created above, the digital human will be rendered to this view
[self.digitalMobile attach:self.previewView];
1
Copied!

Synchronize Express Data to the Digital Human SDK

The digital human SDK relies on the video frames and SEI data of ZEGO Express SDK when rendering the image, so you need to enable the custom video rendering capability of ZEGO Express SDK and synchronize the video frames and SEI data of ZEGO Express SDK to the digital human SDK.

Note
  • Enabling the custom video rendering capability of ZEGO Express SDK requires setting it before calling the ZEGO Express SDK startPublishingStreamstartPlayingStream interface, otherwise it will be invalid.
Express
- (BOOL)enableCustomVideoRender {
    // Custom rendering
    ZegoCustomVideoRenderConfig *renderConfig =
    [[ZegoCustomVideoRenderConfig alloc] init];
    // Select RawData type video frame data
    renderConfig.bufferType = ZegoVideoBufferTypeRawData;
    // Select RGB color system data format
    renderConfig.frameFormatSeries = ZegoVideoFrameFormatSeriesRGB;
    // Specify that the engine also renders during custom video rendering
    renderConfig.enableEngineRender = NO;
    
    ZegoExpressEngine *engine = [ZegoExpressEngine sharedEngine];
    if (!engine) {
        return NO;
    }
    
    [engine enableCustomVideoRender:YES config:renderConfig];
    [engine setCustomVideoRenderHandler:self];
    
    return YES;
}

#pragma mark - ZegoEventHandler

- (void)onRemoteVideoFrameRawData:(unsigned char **)data
                       dataLength:(unsigned int *)dataLength
                            param:(ZegoVideoFrameParam *)param
                         streamID:(NSString *)streamID {
    // Convert parameter format
    ZDMVideoFrameParam *digitalParam = [[ZDMVideoFrameParam alloc] init];
    digitalParam.format = (ZDMVideoFrameFormat)param.format;
    digitalParam.width = param.size.width;
    digitalParam.height = param.size.height;
    digitalParam.rotation = param.rotation;
    
    for (int i = 0; i < 4; i++) {
        [digitalParam setStride: param.strides[i] atIndex:i];
    }
    
    // Traverse all digital human APIs for data callbacks
    for (id<IZegoDigitalMobile> digitalMobile in self.digitalMobileArray) {
        [digitalMobile onRemoteVideoFrameRawData:data dataLength:dataLength param:digitalParam streamID:streamID];
    }
}

- (void)onPlayerSyncRecvSEI:(NSData *)data streamID:(NSString *)streamID{
    // Traverse all digital human APIs for SEI data callbacks
    for (id<IZegoDigitalMobile> digitalMobile in self.digitalMobileArray) {
        [digitalMobile onPlayerSyncRecvSEI:streamID data:data];
    }
}
1
Copied!

User logs in a RTC room and starts publishing a stream

After a real user logs into the room, they start publishing streams.

Note

In this scenario, AI echo cancellation should be enabled for better effects.

The token used for login needs to be obtained from your server; please refer to the complete sample code.

Note

Please ensure that the roomID, userID, and streamID are unique under one ZEGOCLOUD APPID.

  • roomID: Generated by the user according to their own rules, it will be used to log into the Express SDK room. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
  • userID: Length should not exceed 32 bytes. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
  • streamID: Length should not exceed 256 bytes. Only numbers, English characters, and '-', '_' are supported.
Client request to login to room and publish a stream
// Record the agent
self.streamToPlay = [self getAgentStreamID];

ZegoEngineConfig* engineConfig = [[ZegoEngineConfig alloc] init];
engineConfig.advancedConfig = @{
    @"set_audio_volume_ducking_mode":@1,/** This configuration is used for volume ducking **/
    @"enable_rnd_volume_adaptive":@"true",/** This configuration is used for adaptive playback volume **/
};
[ZegoExpressEngine setEngineConfig:engineConfig];

// This setting only affects AEC (echo cancellation). Here we set it to ModeGeneral, which uses our proprietary echo cancellation algorithm, giving us more control.
// If other options are selected, it might use the system's echo cancellation, which may work better on iPhones but could be less effective on some Android devices.
[[ZegoExpressEngine sharedEngine] setAudioDeviceMode:ZegoAudioDeviceModeGeneral];

// Note: Enabling AI echo cancellation requires contacting ZEGOCLOUD technical support to obtain the corresponding ZegoExpressionEngine.xcframework, as versions with these capabilities have not yet been released.
[[ZegoExpressEngine sharedEngine] enableAGC:TRUE];
[[ZegoExpressEngine sharedEngine] enableAEC:TRUE];
[[ZegoExpressEngine sharedEngine] setAECMode:ZegoAECModeAIAggressive2];
[[ZegoExpressEngine sharedEngine] enableANS:TRUE];
[[ZegoExpressEngine sharedEngine] setANSMode:ZegoANSModeMedium];

// Login to room
[self loginRoom:^(int errorCode, NSDictionary *extendedData) {
    if (errorCode!=0) {
        NSString* errorMsg =[NSString stringWithFormat:@"Failed to enter voice room:%d", errorCode];
        completion(NO, errorMsg);
        return;
    }
    
    // Start publishing stream after entering room
    [self startPushlishStream];
}];
1
Copied!

Play the AI Agent Stream

By default, there is only one real user and one AI agent in the same room, so any new stream added is assumed to be the AI agent stream.

Client
// Listen for room stream information update status, pull the AI agent stream playback
- (void)onRoomStreamUpdate:(ZegoUpdateType)updateType
                streamList:(NSArray<ZegoStream *> *)streamList
              extendedData:(nullable NSDictionary *)extendedData
                    roomID:(NSString *)roomID{    
    if (updateType == ZegoUpdateTypeAdd) {
        for (int i=0; i<streamList.count; i++) {
            ZegoStream* item = [streamList objectAtIndex:i];
            
            [self startPlayStream:item.streamID];
        }
    } else if(updateType == ZegoUpdateTypeDelete) {
        for (int i=0; i<streamList.count; i++) {
            ZegoStream* item = [streamList objectAtIndex:i];
            [[ZegoExpressEngine sharedEngine] stopPlayingStream:item.streamID];
        }
    }
}
1
Copied!

Congratulations🎉! After completing this step, you can now ask the AI agent any questions, and the AI agent will answer your questions!

Exit the Room and End the Call

The client calls the logout interface to exit the room and stop the push and pull streams. At the same time, it notifies the business backend that the call has ended. After the business backend receives the end call notification, it will delete the AI agent instance, and the AI agent instance will automatically exit the room and stop the push and pull streams. Finally, call the digital human SDK exit interface, so that a complete interactive session is completed.

Untitled
/**
 * Notify the business backend to end the call
 * 
 * @param completion Completion callback, return the operation result
 * @discussion This method will send a request to end the call, which is used to release the AI agent instance
 */
- (void)doStopCallWithCompletion:(void (^)(NSInteger code, NSString *message, NSDictionary *data))completion {
    // Build the request URL
    NSString *url = [NSString stringWithFormat:@"%@/api/stop", self.currentBaseURL];
    NSURL *requestURL = [NSURL URLWithString:url];
    
    // Create the request
    NSMutableURLRequest *request = [[NSMutableURLRequest alloc] initWithURL:requestURL];
    request.HTTPMethod = @"POST";
    
    // Set the request header
    [request setValue:@"application/json" forHTTPHeaderField:@"Content-Type"];
    
    // Create the request parameters
    NSMutableDictionary *params = [NSMutableDictionary dictionary];
    NSData *jsonData = [NSJSONSerialization dataWithJSONObject:params options:0 error:nil];
    request.HTTPBody = jsonData;
    
    // Create the session
    NSURLSession *session = [NSURLSession sharedSession];
    
    // Send the request
    NSURLSessionDataTask *task = [session dataTaskWithRequest:request
                                           completionHandler:^(NSData * _Nullable data,
                                                            NSURLResponse * _Nullable response,
                                                            NSError * _Nullable error) {
        dispatch_async(dispatch_get_main_queue(), ^{
            if (error) {
                if (completion) {
                    completion(-1, @"Network request failed", nil);
                }
                return;
            }
            
            NSHTTPURLResponse *httpUrlResponse = (NSHTTPURLResponse *)response;
            if (httpUrlResponse.statusCode != 200) {
                if (completion) {
                    completion(httpUrlResponse.statusCode, 
                             [NSString stringWithFormat:@"Server error: %ld", (long)httpUrlResponse.statusCode],
                             nil);
                }
                return;
            }
            
            NSError *jsonError;
            NSDictionary *dict = [NSJSONSerialization JSONObjectWithData:data options:0 error:&jsonError];
            if (jsonError) {
                if (completion) {
                    completion(-2, @"Failed to parse the response data", nil);
                }
                return;
            }
            
            // Parse the response data
            NSInteger code = [dict[@"code"] integerValue];
            NSString *message = dict[@"message"];
            NSDictionary *responseData = dict[@"data"];
            
            if (completion) {
                completion(code, message, responseData);
            }

            // Exit the room
            [[ZegoExpressEngine sharedEngine] logoutRoom];
        });
    }];
    
    [task resume];
}
1
Copied!

This is the complete core process for you to implement real-time interaction with the digital human agent.

Best Practices for ZEGO Express SDK Configuration

To achieve the best audio call experience, it is recommended to configure the ZEGO Express SDK according to the following best practices. These configurations can significantly improve the quality of AI agent voice interactions.

Before join room Settings:

  • Enable traditional audio 3A processing (Acoustic Echo Cancellation AEC, Automatic Gain Control AGC, and Noise Suppression ANS)
  • Set the room usage scenario to High Quality Chatroom, as the SDK will adopt different optimization strategies for different scenarios
  • Set the audio device mode to default mode
  • Enable AI echo cancellation to improve echo cancellation effect (this feature requires contacting ZEGO technical support to obtain the corresponding version of ZEGOExpress SDK)
  • Configure volume ducking to avoid sound conflicts
  • Enable adaptive playback volume to enhance user experience
  • Enable AI noise reduction and set appropriate noise suppression level
Untitled
ZegoEngineProfile* profile = [[ZegoEngineProfile alloc]init];
profile.appID = kZegoAppId;
profile.scenario = ZegoScenarioHighQualityChatroom; // High Quality Chatroom scenario, setting this scenario can avoid requesting camera permissions, integrators should set specific values according to their business scenarios
ZegoEngineConfig* engineConfig = [[ZegoEngineConfig alloc] init];
engineConfig.advancedConfig = @{
    @"set_audio_volume_ducking_mode":@1,/** Configure volume ducking to avoid sound conflicts **/
    @"enable_rnd_volume_adaptive":@"true",/** Enable adaptive playback volume **/
};
[ZegoExpressEngine setEngineConfig:engineConfig];
[ZegoExpressEngine createEngineWithProfile:profile eventHandler:self];
// Enable traditional audio 3A processing
[[ZegoExpressEngine sharedEngine] enableAGC:TRUE];
[[ZegoExpressEngine sharedEngine] enableAEC:TRUE];
[[ZegoExpressEngine sharedEngine] enableANS:TRUE];
// Enable AI echo cancellation, please note: enabling AI echo cancellation requires contacting ZEGO technical support to obtain the corresponding version of ZEGOExpress SDK
[[ZegoExpressEngine sharedEngine] setAECMode:ZegoAECModeAIAggressive2];
// Enable AI noise reduction with moderate noise suppression
[[ZegoExpressEngine sharedEngine] setANSMode:ZegoANSModeMedium];
1
Copied!

Listen for Exception Callback

Note
Due to the large number of parameters for LLM and TTS, it is easy to cause various abnormal problems such as the AI agent not answering or not speaking during the test process due to parameter configuration errors. We strongly recommend that you listen for exception callbacks during the test process and quickly troubleshoot problems based on the callback information.

Previous

Quick Start Voice Call

Next

Display Subtitles