On this page

Overview

2026-05-12

Product Introduction

ZEGOCLOUD Digital Human API is built on ZEGOCLOUD's self-developed AI engine, supporting quick server-side integration.
Developers can customize and generate natural and vivid Digital Human avatars, and achieve real-time stream output through real-time avatar-driven inference, enabling real-time interaction, broadcasting, live streaming, and more. It can be flexibly applied to diverse business scenarios such as AI companions, intelligent customer service, live commerce, and AI teaching.

Product Advantages

Lifelike Digital Human Avatars

Based on ZEGOCLOUD's self-developed Digital Human avatar generation engine, developers can implement two types of avatar customization services: Image Digital Human and Real-person Digital Human.

  • Image Digital Human (Recommended): With just a single image, AI training can bring the image to "life". Supports various avatar types including real people, cartoons, and virtual characters. The generated avatar has clear speech, natural expressions, and certain natural movements, making it lively and vivid.
  • Real-person Digital Human: Capture a real-person video and generate a Digital Human with expressions, movements, and facial expressions that rival real people through AI training. Supports custom backgrounds, up to 2K ultra-HD quality, and custom action drive. The Digital Human avatar is realistic with natural effects.

Multiple Drive Modes, High-Quality Real-Time Stream Output

  • Drive the Digital Human for content generation through text, audio, or real-time audio/video streams, outputting real-time audio/video streams with minimum latency < 200ms, meeting the needs of ultra-low latency real-time interactive scenarios such as live streaming and interactive conversations.
  • Drive directional actions through keywords to meet custom behavior needs, making the Digital Human more lively and natural.
  • Photo Digital Human drive supports body movements, making photos not only "speak" but also "move", creating a lifelike avatar.
  • Ultimate cost-effectiveness: ultra-low pricing, supports high concurrency, meeting scenarios with high usage volumes such as interactive conversations and interactive teaching, helping businesses achieve extreme cost reduction.
Note

Using the real-time audio/video stream output feature requires integration with the Video Call capability.

Flexible Integration, Efficient Setup

  • Efficient integration: Standardized APIs, with a minimum of two APIs to complete Digital Human capability building
  • Diverse combinations: Free combination of atomic capabilities to meet flexible business customization and adapt to different scenario needs
  • Multi-platform support: Compatible with Web, App, Mini Programs, and other platforms
  • Multiple deployment options: Supports public cloud, private deployment, and other deployment forms

Product Features

Feature ModuleFeature NameFeature Description
Customize Digital Human AvatarImage Digital HumanGenerate a Digital Human avatar quickly with just a single image. For image specifications, refer to the Image Digital Human Material Specification.
Real-person Digital Human AvatarRecord a real-person video to customize a real-person Digital Human avatar. For recording guidelines, refer to the Video Digital Human Shooting Guide.
Custom Action LibraryWhen customizing an avatar, you can generate actions with special meanings, such as "finger heart", "wave hello", "show numbers", etc., which can drive the Digital Human to perform custom actions in specific scenarios.
Digital Human ManagementQuery Digital Human Avatar InformationQuery available public/customized Digital Human avatars and timbre lists.
Real-Time Audio/Video Stream OutputCreate/Stop Digital Human Audio/Video StreamSupports creating/stopping Digital Human audio/video stream tasks based on business scenario needs.
Custom Maximum Stream DurationSet the maximum duration for a Digital Human video stream task. The task automatically ends when the duration reaches this value, with a maximum of 24H.
Multi-Modal Digital Human DriveSupports driving the Digital Human through text, audio files, RTC streaming, and WebSocket audio streaming.
Custom Digital Human ActionsSupports driving the Digital Human to perform directional actions (requires generating directional actions during avatar customization first).
Interrupt Digital Human DriveSupports interrupting a currently driving Digital Human to start a new drive task.
Get Audio/Video Stream Task Drive StatusGet the drive task status of the Digital Human video stream, including historical drive records. Status includes:
  • 1: Queued.
  • 2: Driving.
  • 3: Drive failed.
  • 4: Drive ended.
  • 5: Drive interrupted (was interrupted).
Get Audio/Video Stream Task StatusGet the task status of the audio/video stream. Supports querying completed video stream task statuses. Status includes:
  • 1: Video stream task initializing.
  • 2: Video stream task initialization failed.
  • 3: Streaming.
  • 4: Stopping stream.
  • 5: Stream stopped.
Query All Running Digital Human Audio/Video Tasks
  • Get the list of all running Digital Human audio/video tasks and their streaming status.
  • Status includes:
    • 1: Video stream task initializing.
    • 3: Streaming.

Next

Release Notes

On this page

Back to top