Overview
The Digital Human PaaS has been fully upgraded and rebranded as the Digital Human AI, offering developers a cost-effective and high-quality digital human service:
- A newly developed in-house inference solution delivers ultra-low prices, high concurrency support, and exceptional cost performance.
- The image-based digital human solution has been enhanced with more natural movements, making avatars more vivid and expressive.
- Significant improvements in inference performance: inference latency is under 500 ms. When paired with an AI agent, AI conversations can be achieved quickly with interactive latency under 1.5 seconds.
- Optimized server-side APIs provide more flexible and modular output capabilities to support highly customizable business integration.
Product Overview
ZEGOCLOUD Digital Human AI, based on ZEGOCLOUD's in-house AI engine, allows developers to quickly customize and generate lifelike digital human avatars through server-side APIs. These avatars can be used to create engaging video content or real-time audio-video streams, making them suitable for AI companionship, digital human customer service, live e-commerce, AI teaching, and other scenarios.
Advantages
Digital Human Avatars that Match Real People
Based on ZEGOCLOUD's in-house digital human avatar generation engine, developers can customize two types of digital human avatars:
- Real-life digital human: Capture a real person's video, and after AI training, the digital human's facial expressions, movements, and gestures will closely resemble those of the real person. Customizable background, support for 2K ultra-high-definition quality, and customizable action driving are available. The digital human avatar is realistic and natural in appearance.
- Image-based digital human: Upload a single image, and after AI training, the image will come to life. The image supports various real-life, cartoon, and virtual human avatars. The generated avatars are clear in speech, natural in expression, and have a certain degree of natural movements, making them lively and expressive.
Multi-modal Driving for High-quality Content Generation
ZEGOCLOUD's in-house digital human content generation engine supports asynchronous video file generation and real-time audio-video stream output.
- Asynchronous video file generation: Call the API interface to customize the background, avatar, text, and other configurations needed for digital human video production. Support for outputting videos in different formats and 2K ultra-high-definition quality.
- Real-time audio-video stream output:
- Drive digital humans for content generation through text, audio, and real-time audio-video streams, capable of outputting real-time audio-video streams with minimum latency < 500ms, meeting the needs of ultra-low latency real-time interaction scenarios such as live streaming and interactive conversations.
- Enable keyword-driven actions for digital humans, satisfying custom behavior requirements and making digital humans more vivid and natural.
- Photo-based digital human driving supports body movements, allowing photos to not only "speak" but also "move", creating "lifelike" appearances.
- Ultimate cost-effectiveness: ultra-low pricing with high concurrency support, suitable for high-volume interactive conversations, interactive teaching, and other scenarios, helping businesses achieve maximum cost reduction.
To use the real-time audio-video stream output feature, you need to integrate with Express Video capabilities.
Flexible Integration, Efficient Implementation
- Efficient Integration: Standard APIs require only two interfaces minimum to complete digital human capability building
- Diverse Combinations: Atomic capabilities can be freely combined to meet flexible business customization and adapt to different scenario requirements
- Full Platform Support: Compatible with Web, App, and other platforms
- Multiple Deployment Options: Supports public cloud, private deployment, and other deployment forms
Features
Module | Feature | Description |
---|---|---|
Custom Digital Human | Real Person-based Digital Human | Record a video of a real person to customize a real person digital human avatar. For recording guidelines, refer to Image and Audio Collection Guide. |
Image-based Digital Human | Upload a single image to the Image-to-Digital Human AI to quickly generate a digital human. Note Feature is in beta test. Please contact ZEGOCLOUD sales. | |
Digital Human Asset Query | Query available public/custom digital human avatars and timbre lists. | |
Asynchronous Short Video File Generation | Short Video File Generation |
Note Feature is being upgraded. Please contact ZEGOCLOUD sales. |
Short Video Configuration |
| |
Real-time Audio-Video Stream Output | Create/Stop Digital Human Audio-Video Stream | Support for creating/stopping digital human audio-video stream tasks based on business scenario requirements. |
Custom Maximum Stream Duration | Set the maximum duration for digital human video stream tasks. The task automatically ends when the duration reaches this value, with a maximum of 24 hours. | |
Custom Digital Human |
| |
Custom Video Stream Parameters | Set room ID and stream ID. | |
Custom Video Parameters | Set encoding method, resolution, and bitrate for audio-video streams. | |
Multi-modal Digital Human Driving | Support for driving digital humans through text, audio files, RTC streaming, and WebSocket audio streaming. | |
Custom Digital Human Actions | Support for calling action names to drive digital humans to perform specified actions. | |
Interrupt Digital Human Driving Behavior | Support for interrupting digital humans that are currently being driven to enable new driving tasks. | |
Get Audio-Video Stream Task Driving Status | Get the driving task status of digital human video streams, including historical driving records. Status includes:
| |
Get Audio-Video Stream Task Status | Get the task status of audio-video streams, supporting queries for completed video stream task status. Status includes:
| |
Query All Running Digital Human Audio-Video Tasks |
|