Overview
Product Introduction
ZEGOCLOUD Digital Human API is built on ZEGOCLOUD's self-developed AI engine, supporting quick server-side integration.
Developers can customize and generate natural and vivid Digital Human avatars, and achieve real-time stream output through real-time avatar-driven inference, enabling real-time interaction, broadcasting, live streaming, and more. It can be flexibly applied to diverse business scenarios such as AI companions, intelligent customer service, live commerce, and AI teaching.
Product Advantages
Lifelike Digital Human Avatars
Based on ZEGOCLOUD's self-developed Digital Human avatar generation engine, developers can implement two types of avatar customization services: Image Digital Human and Real-person Digital Human.
- Image Digital Human (Recommended): With just a single image, AI training can bring the image to "life". Supports various avatar types including real people, cartoons, and virtual characters. The generated avatar has clear speech, natural expressions, and certain natural movements, making it lively and vivid.
- Real-person Digital Human: Capture a real-person video and generate a Digital Human with expressions, movements, and facial expressions that rival real people through AI training. Supports custom backgrounds, up to 2K ultra-HD quality, and custom action drive. The Digital Human avatar is realistic with natural effects.
Multiple Drive Modes, High-Quality Real-Time Stream Output
- Drive the Digital Human for content generation through text, audio, or real-time audio/video streams, outputting real-time audio/video streams with minimum latency < 200ms, meeting the needs of ultra-low latency real-time interactive scenarios such as live streaming and interactive conversations.
- Drive directional actions through keywords to meet custom behavior needs, making the Digital Human more lively and natural.
- Photo Digital Human drive supports body movements, making photos not only "speak" but also "move", creating a lifelike avatar.
- Ultimate cost-effectiveness: ultra-low pricing, supports high concurrency, meeting scenarios with high usage volumes such as interactive conversations and interactive teaching, helping businesses achieve extreme cost reduction.
Using the real-time audio/video stream output feature requires integration with the Video Call capability.
Flexible Integration, Efficient Setup
- Efficient integration: Standardized APIs, with a minimum of two APIs to complete Digital Human capability building
- Diverse combinations: Free combination of atomic capabilities to meet flexible business customization and adapt to different scenario needs
- Multi-platform support: Compatible with Web, App, Mini Programs, and other platforms
- Multiple deployment options: Supports public cloud, private deployment, and other deployment forms
Product Features
| Feature Module | Feature Name | Feature Description |
|---|---|---|
| Customize Digital Human Avatar | Image Digital Human | Generate a Digital Human avatar quickly with just a single image. For image specifications, refer to the Image Digital Human Material Specification. |
| Real-person Digital Human Avatar | Record a real-person video to customize a real-person Digital Human avatar. For recording guidelines, refer to the Video Digital Human Shooting Guide. | |
| Custom Action Library | When customizing an avatar, you can generate actions with special meanings, such as "finger heart", "wave hello", "show numbers", etc., which can drive the Digital Human to perform custom actions in specific scenarios. | |
| Digital Human Management | Query Digital Human Avatar Information | Query available public/customized Digital Human avatars and timbre lists. |
| Real-Time Audio/Video Stream Output | Create/Stop Digital Human Audio/Video Stream | Supports creating/stopping Digital Human audio/video stream tasks based on business scenario needs. |
| Custom Maximum Stream Duration | Set the maximum duration for a Digital Human video stream task. The task automatically ends when the duration reaches this value, with a maximum of 24H. | |
| Multi-Modal Digital Human Drive | Supports driving the Digital Human through text, audio files, RTC streaming, and WebSocket audio streaming. | |
| Custom Digital Human Actions | Supports driving the Digital Human to perform directional actions (requires generating directional actions during avatar customization first). | |
| Interrupt Digital Human Drive | Supports interrupting a currently driving Digital Human to start a new drive task. | |
| Get Audio/Video Stream Task Drive Status | Get the drive task status of the Digital Human video stream, including historical drive records. Status includes:
| |
| Get Audio/Video Stream Task Status | Get the task status of the audio/video stream. Supports querying completed video stream task statuses. Status includes:
| |
| Query All Running Digital Human Audio/Video Tasks |
|
