On this page

Overview

2026-03-20

Product Introduction

ZEGOCLOUD Digital Human API, based on ZEGOCLOUD's self-developed AI engine, provides quick integration through server APIs. Developers can quickly customize and generate vivid Digital Human avatars, output video files or real-time audio/video streams based on customized avatars, flexibly applicable to scenarios such as AI companions, Digital Human customer service, live commerce, and AI teaching.

Product Advantages

Lifelike Digital Human Avatars

Based on ZEGOCLOUD's self-developed Digital Human avatar generation engine, developers can implement two types of avatar customization services: real-person Digital Human and image Digital Human.

  • Real-person Digital Human: Capture a real-person video and generate a Digital Human with expressions, movements, and facial expressions that rival real people through AI training. Supports custom backgrounds, up to 2K ultra-HD quality, and custom action drive. The Digital Human avatar is realistic with natural effects.
  • Image Digital Human: With just a single image, AI training can bring the image to "life". Supports various avatar types including real people, cartoons, and virtual characters. The generated avatar has clear speech, natural expressions, and certain natural movements, making it lively and vivid.

Multi-Modal Drive, High-Quality Content Generation

ZEGOCLOUD's self-developed Digital Human content generation engine supports asynchronous short video file generation and real-time audio/video stream output.

  • Asynchronous short video file generation: Call the API to customize the background, avatar, text, and other configurations for Digital Human short video production. Supports outputting different video formats and up to 2K resolution ultra-HD short video content.
  • Real-time audio/video stream output:
    • Drive the Digital Human for content generation through text, audio, or real-time audio/video streams, outputting real-time audio/video streams with minimum latency < 200ms, meeting the needs of ultra-low latency real-time interactive scenarios such as live streaming and interactive conversations.
    • Drive directional actions through keywords to meet custom behavior needs, making the Digital Human more lively and natural.
    • Photo Digital Human drive supports body movements, making photos not only "speak" but also "move", creating a lifelike avatar.
    • Ultimate cost-effectiveness: ultra-low pricing, supports high concurrency, meeting scenarios with high usage volumes such as interactive conversations and interactive teaching, helping businesses achieve extreme cost reduction.
Note

Using the real-time audio/video stream output feature requires integration with the Video Call capability.

Flexible Integration, Efficient Setup

  • Efficient integration: Standardized APIs, with a minimum of two APIs to complete Digital Human capability building
  • Diverse combinations: Free combination of atomic capabilities to meet flexible business customization and adapt to different scenario needs
  • Multi-platform support: Compatible with Web, App, Mini Programs, and other platforms
  • Multiple deployment options: Supports public cloud, private deployment, and other deployment forms

Product Features

Feature ModuleFeature NameFeature Description
Customize Digital Human AvatarReal-person Digital Human AvatarRecord a real-person video to customize a real-person Digital Human avatar. For recording guidelines, refer to the Video Digital Human Shooting Guide.
Image Digital HumanUpload a single image through the image-to-Digital Human API to quickly generate a Digital Human avatar.
Note
This feature is in beta. Please contact ZEGOCLOUD sales.
Custom Action LibraryWhen customizing an avatar, you can generate actions with special meanings, such as "finger heart", "wave hello", "show numbers", etc., which can drive the Digital Human to perform custom actions in specific scenarios.
Digital Human Asset QueryQuery available public/customized Digital Human avatars and timbre lists.
Asynchronous Short Video File GenerationShort Video File Generation
  • Supported output formats: MP4.
  • Supported video quality: 1080P, 2K.
Note
Feature upgrade in progress. Please contact ZEGOCLOUD sales.
Short Video Configuration
  • Supports replacing video background (requires green screen background for the avatar); supports adding decorative images.
  • Supports custom layout, custom short video text, short video speech rate, and other settings.
Real-Time Audio/Video Stream OutputCreate/Stop Digital Human Audio/Video StreamSupports creating/stopping Digital Human audio/video stream tasks based on business scenario needs.
Custom Maximum Stream DurationSet the maximum duration for a Digital Human video stream task. The task automatically ends when the duration reaches this value, with a maximum of 24H.
Multi-Modal Digital Human DriveSupports driving the Digital Human through text, audio files, RTC streaming, and WebSocket audio streaming.
Custom Digital Human ActionsSupports driving the Digital Human to perform directional actions (requires generating directional actions during avatar customization first).
Interrupt Digital Human DriveSupports interrupting a currently driving Digital Human to start a new drive task.
Get Audio/Video Stream Task Drive StatusGet the drive task status of the Digital Human video stream, including historical drive records. Status includes:
  • 1: Queued.
  • 2: Driving.
  • 3: Drive failed.
  • 4: Drive ended.
  • 5: Drive interrupted (was interrupted).
Get Audio/Video Stream Task StatusGet the task status of the audio/video stream. Supports querying completed video stream task statuses. Status includes:
  • 1: Video stream task initializing.
  • 2: Video stream task initialization failed.
  • 3: Streaming.
  • 4: Stopping stream.
  • 5: Stream stopped.
Query All Running Digital Human Audio/Video Tasks
  • Get the list of all running Digital Human audio/video tasks and their streaming status.
  • Status includes:
    • 1: Video stream task initializing.
    • 3: Streaming.

Next

Release Notes

On this page

Back to top