Overview

Note

The Digital Human PaaS has been fully upgraded and rebranded as the Digital Human AI, offering developers a cost-effective and high-quality digital human service:

A newly developed in-house inference solution delivers ultra-low prices, high concurrency support, and exceptional cost performance.
The image-based digital human solution has been enhanced with more natural movements, making avatars more vivid and expressive.
Significant improvements in inference performance: inference latency is under 200 ms. When paired with an AI agent, AI conversations can be achieved quickly with interactive latency under 1.5 seconds.
Optimized server-side APIs provide more flexible and modular output capabilities to support highly customizable business integration.

Product Overview

ZEGOCLOUD Digital Human AI, based on ZEGOCLOUD's in-house AI engine, allows developers to quickly customize and generate lifelike digital human avatars through server-side APIs. These avatars can be used to create engaging video content or real-time audio-video streams, making them suitable for AI companionship, digital human customer service, live e-commerce, AI teaching, and other scenarios.

Advantages

Digital Human Avatars that Match Real People

Based on ZEGOCLOUD's in-house digital human avatar generation engine, developers can customize two types of digital human avatars:

Real-life digital human: Capture a real person's video, and after AI training, the digital human's facial expressions, movements, and gestures will closely resemble those of the real person. Customizable background, support for 2K ultra-high-definition quality, and customizable action driving are available. The digital human avatar is realistic and natural in appearance.
Image-based digital human: Upload a single image, and after AI training, the image will come to life. The image supports various real-life, cartoon, and virtual human avatars. The generated avatars are clear in speech, natural in expression, and have a certain degree of natural movements, making them lively and expressive.

ZEGOCLOUD's in-house digital human content generation engine supports asynchronous video file generation and real-time audio-video stream output.

Asynchronous video file generation: Call the API interface to customize the background, avatar, text, and other configurations needed for digital human video production. Support for outputting videos in different formats and 2K ultra-high-definition quality.
Real-time audio-video stream output:
- Drive digital humans for content generation through text, audio, and real-time audio-video streams, capable of outputting real-time audio-video streams with minimum latency < 200ms, meeting the needs of ultra-low latency real-time interaction scenarios such as live streaming and interactive conversations.
- Enable keyword-driven actions for digital humans, satisfying custom behavior requirements and making digital humans more vivid and natural.
- Photo-based digital human driving supports body movements, allowing photos to not only "speak" but also "move", creating "lifelike" appearances.
- Ultimate cost-effectiveness: ultra-low pricing with high concurrency support, suitable for high-volume interactive conversations, interactive teaching, and other scenarios, helping businesses achieve maximum cost reduction.

Note

To use the real-time audio-video stream output feature, you need to integrate with Express Video capabilities.

Flexible Integration, Efficient Implementation

Efficient Integration: Standard APIs require only two interfaces minimum to complete digital human capability building
Diverse Combinations: Atomic capabilities can be freely combined to meet flexible business customization and adapt to different scenario requirements
Full Platform Support: Compatible with Web, App, and other platforms
Multiple Deployment Options: Supports public cloud, private deployment, and other deployment forms

Features

Module	Feature	Description
Custom Digital Human	Real Person-based Digital Human	Record a video of a real person to customize a real person digital human avatar. For recording guidelines, refer to Image and Audio Collection Guide.
	Image-based Digital Human	Upload a single image to the Image-to-Digital Human AI to quickly generate a digital human. Note Feature is in beta test. Please contact ZEGOCLOUD sales.
	Digital Human Asset Query	Query available public/custom digital human avatars and timbre lists.
Asynchronous Short Video File Generation	Short Video File Generation	Supported output formats: MP4, WebM (supports Alpha transparency channel). Supported video resolutions: 1080P, 2K. Note Feature is being upgraded. Please contact ZEGOCLOUD sales.
Asynchronous Short Video File Generation	Short Video Configuration	Support for replacing video backgrounds (requires green screen background); support for adding decorative images. Support for custom layouts, custom short video text, video speed settings, etc.
Real-time Audio-Video Stream Output	Create/Stop Digital Human Audio-Video Stream	Support for creating/stopping digital human audio-video stream tasks based on business scenario requirements.
	Custom Maximum Stream Duration	Set the maximum duration for digital human video stream tasks. The task automatically ends when the duration reaches this value, with a maximum of 24 hours.
	Custom Digital Human	Set digital human background images/background colors. Set video content layout. Digital human layout: Set coordinates for each layer. Set layer width and height for each layer. Set layer order.
	Custom Video Stream Parameters	Set room ID and stream ID.
	Custom Video Parameters	Set encoding method, resolution, and bitrate for audio-video streams.
	Multi-modal Digital Human Driving	Support for driving digital humans through text, audio files, RTC streaming, and WebSocket audio streaming.
	Custom Digital Human Actions	Support for calling action names to drive digital humans to perform specified actions.
	Interrupt Digital Human Driving Behavior	Support for interrupting digital humans that are currently being driven to enable new driving tasks.
	Get Audio-Video Stream Task Driving Status	Get the driving task status of digital human video streams, including historical driving records. Status includes: 1: Queuing. 2: Driving. 3: Driving failed. 4: Driving ended. 5: Driving interrupted (interrupted).
	Get Audio-Video Stream Task Status	Get the task status of audio-video streams, supporting queries for completed video stream task status. Status includes: 1: Video stream task initializing. 2: Video stream task initialization failed. 3: Streaming. 4: Stopping stream. 5: Stream stopped.
	Query All Running Digital Human Audio-Video Tasks	Get a list of all running digital human audio-video tasks and their streaming status. Status includes: 1: Video stream task initializing. 3: Streaming.