Overview
Product Introduction
ZEGOCLOUD Digital Human API, based on ZEGOCLOUD's self-developed AI engine, provides quick integration through server APIs. Developers can quickly customize and generate vivid Digital Human avatars, output video files or real-time audio/video streams based on customized avatars, flexibly applicable to scenarios such as AI companions, Digital Human customer service, live commerce, and AI teaching.
Product Advantages
Lifelike Digital Human Avatars
Based on ZEGOCLOUD's self-developed Digital Human avatar generation engine, developers can implement two types of avatar customization services: real-person Digital Human and image Digital Human.
- Real-person Digital Human: Capture a real-person video and generate a Digital Human with expressions, movements, and facial expressions that rival real people through AI training. Supports custom backgrounds, up to 2K ultra-HD quality, and custom action drive. The Digital Human avatar is realistic with natural effects.
- Image Digital Human: With just a single image, AI training can bring the image to "life". Supports various avatar types including real people, cartoons, and virtual characters. The generated avatar has clear speech, natural expressions, and certain natural movements, making it lively and vivid.
Multi-Modal Drive, High-Quality Content Generation
ZEGOCLOUD's self-developed Digital Human content generation engine supports asynchronous short video file generation and real-time audio/video stream output.
- Asynchronous short video file generation: Call the API to customize the background, avatar, text, and other configurations for Digital Human short video production. Supports outputting different video formats and up to 2K resolution ultra-HD short video content.
- Real-time audio/video stream output:
- Drive the Digital Human for content generation through text, audio, or real-time audio/video streams, outputting real-time audio/video streams with minimum latency < 200ms, meeting the needs of ultra-low latency real-time interactive scenarios such as live streaming and interactive conversations.
- Drive directional actions through keywords to meet custom behavior needs, making the Digital Human more lively and natural.
- Photo Digital Human drive supports body movements, making photos not only "speak" but also "move", creating a lifelike avatar.
- Ultimate cost-effectiveness: ultra-low pricing, supports high concurrency, meeting scenarios with high usage volumes such as interactive conversations and interactive teaching, helping businesses achieve extreme cost reduction.
Using the real-time audio/video stream output feature requires integration with the Video Call capability.
Flexible Integration, Efficient Setup
- Efficient integration: Standardized APIs, with a minimum of two APIs to complete Digital Human capability building
- Diverse combinations: Free combination of atomic capabilities to meet flexible business customization and adapt to different scenario needs
- Multi-platform support: Compatible with Web, App, Mini Programs, and other platforms
- Multiple deployment options: Supports public cloud, private deployment, and other deployment forms
Product Features
| Feature Module | Feature Name | Feature Description |
|---|---|---|
| Customize Digital Human Avatar | Real-person Digital Human Avatar | Record a real-person video to customize a real-person Digital Human avatar. For recording guidelines, refer to the Video Digital Human Shooting Guide. |
| Image Digital Human | Upload a single image through the image-to-Digital Human API to quickly generate a Digital Human avatar. Note This feature is in beta. Please contact ZEGOCLOUD sales. | |
| Custom Action Library | When customizing an avatar, you can generate actions with special meanings, such as "finger heart", "wave hello", "show numbers", etc., which can drive the Digital Human to perform custom actions in specific scenarios. | |
| Digital Human Asset Query | Query available public/customized Digital Human avatars and timbre lists. | |
| Asynchronous Short Video File Generation | Short Video File Generation |
Note Feature upgrade in progress. Please contact ZEGOCLOUD sales. |
| Short Video Configuration |
| |
| Real-Time Audio/Video Stream Output | Create/Stop Digital Human Audio/Video Stream | Supports creating/stopping Digital Human audio/video stream tasks based on business scenario needs. |
| Custom Maximum Stream Duration | Set the maximum duration for a Digital Human video stream task. The task automatically ends when the duration reaches this value, with a maximum of 24H. | |
| Multi-Modal Digital Human Drive | Supports driving the Digital Human through text, audio files, RTC streaming, and WebSocket audio streaming. | |
| Custom Digital Human Actions | Supports driving the Digital Human to perform directional actions (requires generating directional actions during avatar customization first). | |
| Interrupt Digital Human Drive | Supports interrupting a currently driving Digital Human to start a new drive task. | |
| Get Audio/Video Stream Task Drive Status | Get the drive task status of the Digital Human video stream, including historical drive records. Status includes:
| |
| Get Audio/Video Stream Task Status | Get the task status of the audio/video stream. Supports querying completed video stream task statuses. Status includes:
| |
| Query All Running Digital Human Audio/Video Tasks |
|
