logo
On this page

Overview


Note

The Digital Human PaaS has been fully upgraded and rebranded as the Digital Human AI, offering developers a cost-effective and high-quality digital human service:

  • A newly developed in-house inference solution delivers ultra-low prices, high concurrency support, and exceptional cost performance.
  • The image-based digital human solution has been enhanced with more natural movements, making avatars more vivid and expressive.
  • Significant improvements in inference performance: inference latency is under 500 ms. When paired with an AI agent, AI conversations can be achieved quickly with interactive latency under 1.5 seconds.
  • Optimized server-side APIs provide more flexible and modular output capabilities to support highly customizable business integration.

Product Overview

ZEGOCLOUD Digital Human AI, based on ZEGOCLOUD's in-house AI engine, allows developers to quickly customize and generate lifelike digital human avatars through server-side APIs. These avatars can be used to create engaging video content or real-time audio-video streams, making them suitable for AI companionship, digital human customer service, live e-commerce, AI teaching, and other scenarios.

Advantages

Digital Human Avatars that Match Real People

Based on ZEGOCLOUD's in-house digital human avatar generation engine, developers can customize two types of digital human avatars:

  • Real-life digital human: Capture a real person's video, and after AI training, the digital human's facial expressions, movements, and gestures will closely resemble those of the real person. Customizable background, support for 2K ultra-high-definition quality, and customizable action driving are available. The digital human avatar is realistic and natural in appearance.
  • Image-based digital human: Upload a single image, and after AI training, the image will come to life. The image supports various real-life, cartoon, and virtual human avatars. The generated avatars are clear in speech, natural in expression, and have a certain degree of natural movements, making them lively and expressive.

Multi-modal Driving for High-quality Content Generation

ZEGOCLOUD's in-house digital human content generation engine supports asynchronous video file generation and real-time audio-video stream output.

  • Asynchronous video file generation: Call the API interface to customize the background, avatar, text, and other configurations needed for digital human video production. Support for outputting videos in different formats and 2K ultra-high-definition quality.
  • Real-time audio-video stream output:
    • Drive digital humans for content generation through text, audio, and real-time audio-video streams, capable of outputting real-time audio-video streams with minimum latency < 500ms, meeting the needs of ultra-low latency real-time interaction scenarios such as live streaming and interactive conversations.
    • Enable keyword-driven actions for digital humans, satisfying custom behavior requirements and making digital humans more vivid and natural.
    • Photo-based digital human driving supports body movements, allowing photos to not only "speak" but also "move", creating "lifelike" appearances.
    • Ultimate cost-effectiveness: ultra-low pricing with high concurrency support, suitable for high-volume interactive conversations, interactive teaching, and other scenarios, helping businesses achieve maximum cost reduction.
Note

To use the real-time audio-video stream output feature, you need to integrate with Express Video capabilities.

Flexible Integration, Efficient Implementation

  • Efficient Integration: Standard APIs require only two interfaces minimum to complete digital human capability building
  • Diverse Combinations: Atomic capabilities can be freely combined to meet flexible business customization and adapt to different scenario requirements
  • Full Platform Support: Compatible with Web, App, and other platforms
  • Multiple Deployment Options: Supports public cloud, private deployment, and other deployment forms

Features

ModuleFeatureDescription
Custom Digital HumanReal Person-based Digital HumanRecord a video of a real person to customize a real person digital human avatar. For recording guidelines, refer to Image and Audio Collection Guide.
Image-based Digital HumanUpload a single image to the Image-to-Digital Human AI to quickly generate a digital human.
Note
Feature is in beta test. Please contact ZEGOCLOUD sales.
Digital Human Asset QueryQuery available public/custom digital human avatars and timbre lists.
Asynchronous Short Video File GenerationShort Video File Generation
  • Supported output formats: MP4, WebM (supports Alpha transparency channel).
  • Supported video resolutions: 1080P, 2K.
Note
Feature is being upgraded. Please contact ZEGOCLOUD sales.
Short Video Configuration
  • Support for replacing video backgrounds (requires green screen background); support for adding decorative images.
  • Support for custom layouts, custom short video text, video speed settings, etc.
Real-time Audio-Video Stream OutputCreate/Stop Digital Human Audio-Video StreamSupport for creating/stopping digital human audio-video stream tasks based on business scenario requirements.
Custom Maximum Stream DurationSet the maximum duration for digital human video stream tasks. The task automatically ends when the duration reaches this value, with a maximum of 24 hours.
Custom Digital Human
  • Set digital human background images/background colors.
  • Set video content layout.
  • Digital human layout:
    • Set coordinates for each layer.
    • Set layer width and height for each layer.
    • Set layer order.
Custom Video Stream ParametersSet room ID and stream ID.
Custom Video ParametersSet encoding method, resolution, and bitrate for audio-video streams.
Multi-modal Digital Human DrivingSupport for driving digital humans through text, audio files, RTC streaming, and WebSocket audio streaming.
Custom Digital Human ActionsSupport for calling action names to drive digital humans to perform specified actions.
Interrupt Digital Human Driving BehaviorSupport for interrupting digital humans that are currently being driven to enable new driving tasks.
Get Audio-Video Stream Task Driving StatusGet the driving task status of digital human video streams, including historical driving records. Status includes:
  • 1: Queuing.
  • 2: Driving.
  • 3: Driving failed.
  • 4: Driving ended.
  • 5: Driving interrupted (interrupted).
Get Audio-Video Stream Task StatusGet the task status of audio-video streams, supporting queries for completed video stream task status. Status includes:
  • 1: Video stream task initializing.
  • 2: Video stream task initialization failed.
  • 3: Streaming.
  • 4: Stopping stream.
  • 5: Stream stopped.
Query All Running Digital Human Audio-Video Tasks
  • Get a list of all running digital human audio-video tasks and their streaming status.
  • Status includes:
    • 1: Video stream task initializing.
    • 3: Streaming.

Next

Release Notes