logo
On this page

Quick Start Digital Human Video Call

This document explains how to quickly integrate the client SDK (ZEGO Express SDK and Digital Human SDK) and achieve video interaction with an AI Agent.

Digital Human Introduction

With just a photo or image of a real person or anime character from the waist up, you can obtain a 1080P digital human with accurate lip-sync and realistic appearance. When used with the AI Agent product, you can quickly achieve video interaction chat with AI digital humans within 2 seconds overall, suitable for various scenarios such as digital human 1V1 interactive video, digital human customer service, and digital human live streaming.

  • More natural driving effects: Supports subtle body movements, natural facial expressions without distortion, providing more realistic and immersive interaction compared to voice calls;
  • Multi-language accurate lip-sync: Natural and accurate lip movements, especially optimized for Chinese and English;
  • Ultra-low interaction latency: Digital human driving latency < 500ms, combined with AI Agent interaction latency < 2s;
  • Higher clarity: True 1080P effect, 20%+ improvement in clarity compared to traditional image-based digital humans

Prerequisites

  • Create a project in the ZEGOCLOUD Console, and get its valid AppID and AppSign. For more details, please refer to Admin Console doc How to view project info.
  • You have contacted ZEGOCLOUD Technical Support to enable Digital Human PaaS service and related interface permissions.
  • You have contacted ZEGOCLOUD Technical Support to create a digital human.
  • You have contacted ZEGOCLOUD Technical Support to obtain the ZEGO Express SDK that supports AI echo cancellation and integrated it into your project.

Sample Code

Below are client sample codes. You can refer to the sample code to implement your own business logic.

The following video demonstrates how to run the server and client (Web) sample code and interact with the digital human agent via video.

Overall Business Process

  1. Server, deploy the business backend sample code according to the Business Backend Quick Start Guide.
    • Integrate the AI Agent API management of the AI Agent.
  1. Client, run the sample code.
    • Create and manage agents through the business backend.
    • Integrate ZEGO Express SDK and Digital Human SDK to complete real-time communication.

After completing the above two steps, you can achieve real-time interaction with the AI Agent in the room and with the real user.

Core Capabilities Implementation

Integrate ZEGO Express SDK

Please refer to Integrate SDK > 2.2 > Method 2 to manually integrate the SDK. After integrating the SDK, follow the steps below to initialize ZegoExpressEngine.

1
Add Permission Declaration

Enter the "app/src/main" directory, open the "AndroidManifest.xml" file, and add permissions.

AndroidManifest.xml
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.RECORD_AUDIO" /> 
1
Copied!
2
Request Microphone Permission at Runtime
Untitled
private final ActivityResultLauncher<String> requestPermissionLauncher = registerForActivityResult(
    new ActivityResultContracts.RequestPermission(), new ActivityResultCallback<Boolean>() {
        @Override
        public void onActivityResult(Boolean isGranted) {
            if (isGranted) {
                // Grant permission
            }
        }
    });
// Initiate request
requestPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO);
1
Copied!
3
Create and Initialize ZegoExpressEngine
Untitled
ZegoEngineProfile zegoEngineProfile = new ZegoEngineProfile();
zegoEngineProfile.appID = ; // Get appId from ZEGO Console
zegoEngineProfile.scenario = ZegoScenario.HIGH_QUALITY_CHATROOM;
zegoEngineProfile.application = getApplication();
ZegoExpressEngine.createEngine(zegoEngineProfile, null);
1
Copied!

Integrate Digital Human SDK

The Digital Human SDK is already published in the maven repository, and you can refer to the following code to integrate the SDK into your project.

1
Add `maven` Configuration

Select the corresponding implementation steps based on your Android Gradle plugin version.

2
Modify your app-level `build.gradle` file
Untitled
dependencies {
    ...
    // Digital Human SDK dependency
    implementation 'im.zego:digitalmobile:+'   

    // Third-party library dependencies used by Digital Human SDK
    implementation 'com.squareup.okhttp3:okhttp:4.9.3'
    implementation "com.google.code.gson:gson:2.9.1"
    implementation 'com.liulishuo.okdownload:okdownload:1.0.7'
    implementation 'com.liulishuo.okdownload:sqlite:1.0.7'
    implementation 'com.liulishuo.okdownload:okhttp:1.0.7'
}
1
Copied!

Notify Business Backend to Start Call

You can notify the business backend to start the call immediately after the real user enters the room. The asynchronous call can reduce the call connection time. After the business backend receives the start call notification, it creates a digital human agent instance using the same roomID and associated userID and streamID as the client, so that the digital human agent can interact with the real user in the same room through mutual push and pull streams.

When requesting the business backend, you need to include the digital human parameters, which include digital_human_id and config_id.

  • digital_human_id is the digital human ID, please contact ZEGO technical support to obtain it.
  • config_id is the configuration ID of the digital human, different platforms use different digital human configurations, and the digital human service will optimize the performance and effect on different platforms according to the config_id. For Android/iOS, please fill in mobile, and for Web, please fill in web.

Initialize the Digital Human SDK Instance

First, add a digital human preview view in the android layout file, and the digital human image will be rendered to this view.

Untitled
<im.zego.digitalmobile.ZegoPreviewView
    android:id="@+id/preview_view"
    android:layout_width="match_parent"
    android:layout_height="match_parent" />
1
Copied!
Untitled
String digitalHumanConfig = xxx; // Digital human configuration, get it from the DigitalHumanConfig returned by the interface for creating the digital human agent instance from the business backend
IZegoDigitalMobile digitalMobile = ZegoDigitalMobileFactory.create(this);   // Create the digital human SDK instance, you can create multiple instances to display different digital humans
digitalMobile.start(digitalHumanConfig, null);   // Initialize the digital human SDK instance, pass in the digital human configuration
digitalMobile.attach(previewView);   // Bind the preview view created above, the digital human will be rendered to this view
1
Copied!

Synchronize Express Data to the Digital Human SDK

The digital human SDK relies on the video frames and SEI data of ZEGO Express SDK when rendering the image, so you need to enable the custom video rendering capability of ZEGO Express SDK and synchronize the video frames and SEI data of ZEGO Express SDK to the digital human SDK.

Note
  • Enabling the custom video rendering capability of ZEGO Express SDK requires setting it before calling the ZEGO Express SDK startPublishingStreamstartPlayingStream interface, otherwise it will be invalid.
Express
// Enable Express custom rendering
ZegoCustomVideoRenderConfig renderConfig = new ZegoCustomVideoRenderConfig();
renderConfig.bufferType = ZegoVideoBufferType.RAW_DATA;
renderConfig.frameFormatSeries = ZegoVideoFrameFormatSeries.RGB;
renderConfig.enableEngineRender = false;
ZegoExpressEngine.getEngine().enableCustomVideoRender(true, renderConfig);
// Listen for video frame callbacks
ZegoExpressEngine.getEngine().setCustomVideoRenderHandler(new IZegoCustomVideoRenderHandler() {
    @Override
    public void onRemoteVideoFrameRawData(ByteBuffer[] data, int[] dataLength, ZegoVideoFrameParam param,
                                            String streamID) {
        IZegoDigitalMobile.ZegoVideoFrameParam digitalParam = new IZegoDigitalMobile.ZegoVideoFrameParam();
        digitalParam.format = IZegoDigitalMobile.ZegoVideoFrameFormat.getZegoVideoFrameFormat(param.format.value());
        digitalParam.height = param.height;
        digitalParam.width = param.width;
        digitalParam.rotation = param.rotation;
        for (int i = 0; i < 4; i++) {
            digitalParam.strides[i] = param.strides[i];
        }
        // Pass the Express video frame data to the digital human SDK
        digitalMobile.onRemoteVideoFrameRawData(data, dataLength, digitalParam, streamID);
    }
});

// Listen for Express SEI data
ZegoExpressEngine.getEngine().setEventHandler(new IZegoEventHandler() {
    @Override
    public void onPlayerSyncRecvSEI(String streamID, byte[] data) {
        // Pass the Express SEI data to the digital human SDK
        digitalMobile.onPlayerSyncRecvSEI(streamID, data);
    }
});
1
Copied!

User logs in a RTC room and starts publishing a stream

After a real user logs into the room, they start publishing streams.

Note

In this scenario, AI echo cancellation should be enabled for better effects.

The token used for login needs to be obtained from your server; please refer to the complete sample code.

Note

Please ensure that the roomID, userID, and streamID are unique under one ZEGOCLOUD APPID.

  • roomID: Generated by the user according to their own rules, it will be used to log into the Express SDK room. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
  • userID: Length should not exceed 32 bytes. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
  • streamID: Length should not exceed 256 bytes. Only numbers, English characters, and '-', '_' are supported.
Client login to room and publish a stream
private void loginRoom(String userId, String userName, String userName, String token,
    IZegoRoomLoginCallback callback) {
    ZegoEngineConfig config = new ZegoEngineConfig();
    HashMap<String, String> advanceConfig = new HashMap<String, String>();
    advanceConfig.put("set_audio_volume_ducking_mode", "1");
    advanceConfig.put("enable_rnd_volume_adaptive", "true");
    config.advancedConfig = advanceConfig;
    ZegoExpressEngine.setEngineConfig(config);
    ZegoExpressEngine.getEngine().setRoomScenario(ZegoScenario.HIGH_QUALITY_CHATROOM);
    ZegoExpressEngine.getEngine().setAudioDeviceMode(ZegoAudioDeviceMode.GENERAL);

    ZegoExpressEngine.getEngine().enableAEC(true);
    // Please note: To enable AI echo cancellation, please contact ZEGOCLOUD technical support to obtain the corresponding version of ZEGOExpress SDK
    ZegoExpressEngine.getEngine().setAECMode(ZegoAECMode.AI_AGGRESSIVE2);
    ZegoExpressEngine.getEngine().enableAGC(true);
    ZegoExpressEngine.getEngine().enableANS(true);
    ZegoExpressEngine.getEngine().setANSMode(ZegoANSMode.MEDIUM);

    ZegoRoomConfig roomConfig = new ZegoRoomConfig();
    roomConfig.isUserStatusNotify = true;
    roomConfig.token = token;  // Token authentication is required, obtain it from your server, and refer to ZEGOCLOUD documentation for generation method

    String roomId ;   // Custom room ID for login, please refer to the format description
    String userSteamID // Custom stream ID for publishing, please refer to the format description
    ZegoExpressEngine.getEngine()
        .loginRoom(roomId, new ZegoUser(userId, userName), roomConfig, (errorCode, extendedData) -> {
            Timber.d(
                "loginRoom() called with: errorCode = [" + errorCode + "], extendedData = [" + extendedData + "]");
            if (errorCode == 0) {
                // Start publishing stream after successful login
                ZegoExpressEngine.getEngine().startPublishingStream(userSteamID);
                // Set microphone mute status, false means unmuted, true means muted
                ZegoExpressEngine.getEngine().muteMicrophone(false);
            }
            if (callback != null) {
                callback.onRoomLoginResult(errorCode, extendedData);
            }

        });
}
1
Copied!

Play the AI Agent Stream

By default, there is only one real user and one AI agent in the same room, so any new stream added is assumed to be the AI agent stream.

Client
// Set up the event handler
ZegoExpressEngine.getEngine().setEventHandler(new IZegoEventHandler() {
    @Override
    // When other users in the room start/stop publishing streams, you can receive notifications about the corresponding user's audio/video stream changes here
    public void onRoomStreamUpdate(String roomID, ZegoUpdateType updateType, ArrayList<ZegoStream> streamList, JSONObject extendedData) {
        super.onRoomStreamUpdate(roomID, updateType, streamList, extendedData);
        // When updateType is ZegoUpdateType.ADD, it means there is a new audio/video stream, at this time we can call the startPlayingStream interface to pull this audio/video stream
        if (updateType == ZegoUpdateType.ADD) {
            ZegoStream stream = streamList.get(0);
            // By default, new streams are from the AI agent, so play directly
            ZegoExpressEngine.getEngine().setPlayStreamBufferIntervalRange(stream.streamID, 100, 2000);  // 设置 buffer 优化体验
            ZegoExpressEngine.getEngine().startPlayingStream(stream.streamID);
        }
    }
});
1
Copied!

Congratulations🎉! After completing this step, you can now ask the AI agent any questions, and the AI agent will answer your questions!

Exit the Room and End the Call

The client calls the logout interface to exit the room and stop the push and pull streams. At the same time, it notifies the business backend that the call has ended. After the business backend receives the end call notification, it will delete the AI agent instance, and the AI agent instance will automatically exit the room and stop the push and pull streams. Finally, call the digital human SDK exit interface, so that a complete interactive session is completed.

Untitled
// Notify the business backend to end the call
private void stop() {
    RequestBody body = RequestBody.create("", MediaType.parse("application/json; charset=utf-8"));
    Request request = new Request.Builder().url(YOUR_SERVER_URL + "/api/stop").post(body).build();

    new OkHttpClient.Builder().build().newCall(request).enqueue(new Callback() {
        @Override
        public void onFailure(@NonNull Call call, @NonNull IOException e) {

        }

        @Override
        public void onResponse(@NonNull Call call, @NonNull Response response) throws IOException {
            if (response.isSuccessful()) {
                // Exit the room
                ZegoExpressEngine.getEngine().logoutRoom();
                // Exit the digital human SDK
                digitalMobile.stop();
            }
        }
    });
}

1
Copied!

This is the complete core process for you to implement real-time interaction with the digital human agent.

Best Practices for ZEGO Express SDK Configuration

To achieve the best audio call experience, it is recommended to configure the ZEGO Express SDK according to the following best practices. These configurations can significantly improve the quality of AI agent voice interactions.

Before join room Settings:

  • Enable traditional audio 3A processing (Acoustic Echo Cancellation AEC, Automatic Gain Control AGC, and Noise Suppression ANS)
  • Set the room usage scenario to High Quality Chatroom, as the SDK will adopt different optimization strategies for different scenarios
  • Set the audio device mode to default mode
  • Enable AI echo cancellation to improve echo cancellation effect (this feature requires contacting ZEGO technical support to obtain the corresponding version of ZEGOExpress SDK)
  • Configure volume ducking to avoid sound conflicts
  • Enable adaptive playback volume to enhance user experience
  • Enable AI noise reduction and set appropriate noise suppression level
Untitled
ZegoEngineConfig config = new ZegoEngineConfig();
HashMap<String, String> advanceConfig = new HashMap<String, String>();
// Configure volume ducking to avoid sound conflicts
advanceConfig.put("set_audio_volume_ducking_mode", "1");
// Enable adaptive playback volume
advanceConfig.put("enable_rnd_volume_adaptive", "true");
config.advancedConfig = advanceConfig;
ZegoExpressEngine.setEngineConfig(config);
// Set room usage scenario to High Quality Chatroom
ZegoExpressEngine.getEngine().setRoomScenario(ZegoScenario.HIGH_QUALITY_CHATROOM);
// Set audio device mode to default mode
ZegoExpressEngine.getEngine().setAudioDeviceMode(ZegoAudioDeviceMode.GENERAL);
// Enable traditional audio 3A processing
ZegoExpressEngine.getEngine().enableAEC(true);
ZegoExpressEngine.getEngine().enableAGC(true);
ZegoExpressEngine.getEngine().enableANS(true);
// Enable AI echo cancellation, please note: enabling AI echo cancellation requires contacting ZEGO technical support to obtain the corresponding version of ZEGOExpress SDK
ZegoExpressEngine.getEngine().setAECMode(ZegoAECMode.AI_AGGRESSIVE2);
// Enable AI noise reduction with moderate noise suppression
ZegoExpressEngine.getEngine().setANSMode(ZegoANSMode.MEDIUM);
1
Copied!

Listen for Exception Callback

Note
Due to the large number of parameters for LLM and TTS, it is easy to cause various abnormal problems such as the AI agent not answering or not speaking during the test process due to parameter configuration errors. We strongly recommend that you listen for exception callbacks during the test process and quickly troubleshoot problems based on the callback information.

Previous

Quick Start Voice Call

Next

Display Subtitles