logo
On this page

Quick Start Digital Human Video Call

This document explains how to quickly integrate the client SDK (ZEGO Express SDK and Digital Human SDK) and achieve video interaction with an AI Agent.

Digital Human Introduction

With just a photo or image of a real person or anime character from the waist up, you can obtain a 1080P digital human with accurate lip-sync and realistic appearance. When used with the AI Agent product, you can quickly achieve video interaction chat with AI digital humans within 2 seconds overall, suitable for various scenarios such as digital human 1V1 interactive video, digital human customer service, and digital human live streaming.

  • More natural driving effects: Supports subtle body movements, natural facial expressions without distortion, providing more realistic and immersive interaction compared to voice calls;
  • Multi-language accurate lip-sync: Natural and accurate lip movements, especially optimized for Chinese and English;
  • Ultra-low interaction latency: Digital human driving latency < 500ms, combined with AI Agent interaction latency < 2s;
  • Higher clarity: True 1080P effect, 20%+ improvement in clarity compared to traditional image-based digital humans

Prerequisites

  • Create a project in the ZEGOCLOUD Console, and get its valid AppID and AppSign. For more details, please refer to Admin Console doc How to view project info.
  • You have contacted ZEGOCLOUD Technical Support to enable Digital Human PaaS service and related interface permissions.
  • You have contacted ZEGOCLOUD Technical Support to create a digital human.

Sample Code

Below are client sample codes. You can refer to the sample code to implement your own business logic.

The following video demonstrates how to run the server and client (Web) sample code and interact with the digital human agent via video.

Overall Business Process

  1. Server, deploy the business backend sample code according to the Business Backend Quick Start Guide.
    • Integrate the AI Agent API management of the AI Agent.
  1. Client, run the sample code.
    • Create and manage agents through the business backend.
    • Integrate ZEGO Express SDK to complete real-time communication.

After completing the above two steps, you can achieve real-time interaction with the AI Agent in the room and with the real user.

Core Capabilities Implementation

Integrate ZEGO Express SDK

Please refer to Integrate SDK > Method 2 to use npm to integrate SDK v3.9.123 or above. After integrating the SDK, initialize ZegoExpressEngine as follows.

  1. Instantiate ZegoExpressEngine
  2. Check system requirements (WebRTC support and microphone permissions)
Untitled
import { ZegoExpressEngine } from "zego-express-engine-webrtc";

const appID = 1234567 // Get appId from ZEGO Console
const server = 'xxx' // Get server from ZEGO Console

// Instantiate ZegoExpressEngine with appId and server configurations
const zg = new ZegoExpressEngine(appID, server);
// Check system requirements
const checkSystemRequirements = async () => {
    // Detect whether WebRTC is supported
    const rtc_sup = await zg.checkSystemRequirements("webRTC");
    if (!rtc_sup.result) {
      // Browser does not support WebRTC
  }
    // Detect whether microphone permission is enabled
    const mic_sup = await zg.checkSystemRequirements("microphone");
    if (!mic_sup.result) {
      // Microphone permission is not enabled
  }
}
checkSystemRequirements()
1
Copied!

Notify Business Backend to Start Call

You can notify the business backend to start the call immediately after the real user enters the room. The asynchronous call can reduce the call connection time. After the business backend receives the start call notification, it creates a digital human agent instance using the same roomID and associated userID and streamID as the client, so that the digital human agent can interact with the real user in the same room through mutual push and pull streams.

When requesting the business backend, you need to include the digital human parameters, which include digital_human_id and config_id.

  • digital_human_id is the digital human ID, please contact ZEGO technical support to obtain it.
  • config_id is the configuration ID of the digital human, different platforms use different digital human configurations, and the digital human service will optimize the performance and effect on different platforms according to the config_id. For Android/iOS, please fill in mobile, and for Web, please fill in web.

User logs in a RTC room and starts publishing a stream

After a real user logs into the room, they start publishing streams.

The token used for login needs to be obtained from your server; please refer to the complete sample code.

Note

Please ensure that the roomID, userID, and streamID are unique under one ZEGOCLOUD APPID.

  • roomID: Generated by the user according to their own rules, it will be used to log into the Express SDK room. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
  • userID: Length should not exceed 32 bytes. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
  • streamID: Length should not exceed 256 bytes. Only numbers, English characters, and '-', '_' are supported.
Client login to room and publish a stream
const userId = "" // User ID for logging into the Express SDK room
const roomId = "" // RTC Room ID
const userStreamId = "" // User stream push ID
async function enterRoom() {
  try {
    // Generate RTC Token [Reference Documentation] (https://www.zegocloud.com/docs/video-call/token?platform=web&language=javascript)
    const token = await Api.getToken();
    // Login to room
    await zg.loginRoom(roomId, token, {
      userID: userId,
      userName: "",
    });

    // Create local audio stream
    const localStream = await zg.createZegoStream({
      camera: {
        video: false,
        audio: true,
      },
    });
    if (localStream) {
      // Push local stream
      await zg.startPublishingStream(userStreamId, localStream);
    }
  } catch (error) {
    console.error("Failed to enter room:", error);
    throw error;
  }
}
enterRoom()
1
Copied!

Play the AI Agent Stream

By default, there is only one real user and one AI agent in the same room, so any new stream added is assumed to be the AI agent stream.

Client
// Listen for remote stream update events
function setupEvent() {
  zg.on("roomStreamUpdate",
    async (roomID, updateType, streamList) => {
      if (updateType === "ADD" && streamList.length > 0) {
        try {
          for (const stream of streamList) {
            // Pull the AI agent stream
            const mediaStream = await zg.startPlayingStream(stream.streamID);
            if (!mediaStream) return;
            const remoteView = await zg.createRemoteStreamView(mediaStream);
            if (remoteView) {
             // Here you need to have a container with an id of remoteSteamView to receive the AI agent stream [Reference Documentation](https://docs.zegocloud.com/article/api?doc=Express_Video_SDK_API~javascript_web~class~ZegoStreamView)
              remoteView.play("remoteSteamView", {
                enableAutoplayDialog: false,
              });
            }
          }
        } catch (error) {
          console.error("Pull stream failed:", error);
        }
      }
    }
  );
}
1
Copied!

Congratulations🎉! After completing this step, you can now ask the AI agent any questions, and the AI agent will answer your questions!

Exit the Room and End the Call

The client calls the logout interface to exit the room and stop the push and pull streams. At the same time, it notifies the business backend that the call has ended. After the business backend receives the end call notification, it will delete the AI agent instance, and the AI agent instance will automatically exit the room and stop the push and pull streams. Finally, call the digital human SDK exit interface, so that a complete interactive session is completed.

Untitled
// Exit the room
async function stopCall() {
  try {
    const response = await fetch(`${YOUR_SERVER_URL}/api/stop`, { // YOUR_SERVER_URL is your business backend address
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      }
    });

    const data = await response.json();
    console.log('End call result:', data);
    return data;
  } catch (error) {
    console.error('End call failed:', error);
    throw error;
  }
}
stopCall();
zg.destroyLocalStream(localStream);
zg.logoutRoom();
1
Copied!

This is the complete core process for you to implement real-time interaction with the digital human agent.

Best Practices for ZEGO Express SDK Configuration

To achieve the best audio call experience, it is recommended to configure the ZEGO Express SDK according to the following best practices. These configurations can significantly improve the quality of AI agent voice interactions.

  • Enable traditional audio 3A processing (Acoustic Echo Cancellation AEC, Automatic Gain Control AGC, and Noise Suppression ANS)
  • Set the room usage scenario to High Quality Chatroom, as the SDK will adopt different optimization strategies for different scenarios
  • When pushing streams, configure the push parameters to automatically switch to available videoCodec
Untitled
// Import necessary modules
import { ZegoExpressEngine } from "zego-express-engine-webrtc";
import { VoiceChanger } from "zego-express-engine-webrtc/voice-changer";

// Load audio processing module, must be called before new ZegoExpressEngine
ZegoExpressEngine.use(VoiceChanger);

// Instantiate ZegoExpressEngine, set room usage scenario to High Quality Chatroom
const zg = new ZegoExpressEngine(appid, server, { scenario: 7 })

// Traditional audio 3A processing is enabled by default in SDK

// Create local media stream
const localStream = await zg.createZegoStream();

// Push local media stream, need to set automatic switching to available videoCodec
await zg.startPublishingStream(userStreamId, localStream, {
  enableAutoSwitchVideoCodec: true,
});

// Check system requirements
async function checkSystemRequirements() {
  // Check WebRTC support
  const rtcSupport = await zg.checkSystemRequirements("webRTC");
  if (!rtcSupport.result) {
    console.error("Browser does not support WebRTC");
    return false;
  }
  
  // Check microphone permission
  const micSupport = await zg.checkSystemRequirements("microphone");
  if (!micSupport.result) {
    console.error("Microphone permission not granted");
    return false;
  }
  
  return true;
}
1
Copied!

Additional Optimization Recommendations

  • Browser Compatibility: Recommended to use the latest versions of modern browsers such as Chrome, Firefox, Safari
  • Network Environment: Ensure stable network connection, recommend using wired network or Wi-Fi with good signal
  • Audio Equipment: Use high-quality microphones and speakers
  • Page Optimization: Avoid running too many JavaScript tasks on the same page, which may affect audio processing performance
  • HTTPS Environment: Use HTTPS protocol in production environment to ensure microphone permission access

Listen for Exception Callback

Note
Due to the large number of parameters for LLM and TTS, it is easy to cause various abnormal problems such as the AI agent not answering or not speaking during the test process due to parameter configuration errors. We strongly recommend that you listen for exception callbacks during the test process and quickly troubleshoot problems based on the callback information.

Previous

Quick Start Voice Call

Next

Display Subtitles