Release Notes
V2
2025-07-31
New Features
Feature | Description | Documentation |
---|---|---|
WindowSize、LoadMessageCount maximum value adjusted to 200 | The MessageHistory.WindowSize and MessageHistory.ZIM.LoadMessageCount fields of the create agent instance/create digital human agent instance interface are adjusted to 200. | Create Agent Instance Create Digital Human Agent Instance |
TTS adds TerminatorText field | The TTS field of the register agent/create/update agent instance interface adds a TerminatorText field. This field can be used to set the termination text of TTS. If the content in the input TTS text matches the TerminatorText string, the content from the TerminatorText string (including) will not be synthesized for this round of TTS. |
Improvements & Optimizations
- Optimized the sentence break logic of unidirectional streaming TTS.
2025-06-26
New Features
Feature | Description | Documentation |
---|---|---|
Digital Human Video Call | Support creating a digital human image in the Digital Human PaaS Service, and create a digital human agent instance to achieve real-time video interaction with the digital human.
| Implement Digital Human Video Call |
Multi-agent multi-voice output | Support multi-voice output when interacting with multiple AI agents, by actively calling TTS | Send Agent Instance TTS |
Improvements & Optimizations
- Updated the default model of MiniMax TTS (Text-to-Speech) to speech-02-turbo, and optimized its latency to approximately 300ms.
2025-06-19
New Features
Feature | Description | Documentation |
---|---|---|
Support retrieving average latency information when instance is destroyed | Latency information includes:
| Get Agent Service Status & Latency Data |
Support Alibaba CosyVoice TTS bidirectional streaming | By configuring the Vendor as Alibaba CosyVoice when creating an agent and setting up supported voice tones, you can achieve AI real-time voice calls based on CosyVoice. | - |
Support callbacks for agent instance creation success and destruction | Can be used in conjunction with agent instance status query, server exception callback, and agent interruption callback to manage the entire lifecycle process of the agent | Get Agent Service Status & Latency Data |
Improvements & Optimizations
- During the integration testing period, no separate account application and authentication are required to use services from some ZEGO-supported LLMs (Doubao, MiniMax, Tongyi Qianwen, Stepfun, etc.) and TTS vendors (MiniMax, BytePlus, Alibaba CosyVoice). For details, please refer to Quick Start.
- Updated support for MiniMax TTS WebSocket unidirectional streaming, further optimizing latency and voice tone effects.
- Reduced end-to-end latency by 100-200ms, can be reduced to under 1 second with technical support enablement.
2025-05-30
New Features
Feature | Description | Documentation |
---|---|---|
1 user vs multiple AI roles | Note Feature is in beta testing, please contact ZEGOCLOUD Business for details. | - |
Request body contains agent instance and user information when calling LLM | When creating an agent instance, if the AddAgentInfo field is set to true , the AI Agent backend will add the agent_info field to the request body parameters sent to the custom LLM, which includes room_id , user_id , and agent_instance_id information. This allows for personalized responses based on different users or agent instances, such as calling different function calling or memory based on user IDs. | Configuring LLM |
Callback for each round of user speech audio segment | When creating an agent instance, if the UserAudioData field of CallbackConfig is set to 1, the AI Agent backend will callback the audio data of the user's speech in the previous 1-1.5 seconds of each round of conversation (if less than 1 second, no callback will be sent). Business side can implement voiceprint recognition and other capabilities based on this audio information. | Receiving Callback |
Improvements & Optimizations
- Optimized the user experience problem caused by subtitle and LLM callback too early when ASR multi-sentence concatenation is enabled. For details, please refer to Speech Recognition Segmentation.
2025-05-16
New Features
Feature | Description | Documentation |
---|---|---|
Multi-user vs 1 Agent | Supports multiple users simultaneously interacting with one AI agent through voice. Features include voice interruption, manual interruption, proactive agent speech, and the agent's ability to distinguish and respond to different users. Note Contact ZEGOCLOUD Technical Support for details. | - |
Speech Recognition Segmentation | Supports voice detection threshold settings and pause duration settings to balance latency and speech recognition segmentation. | Speech Recognition Segmentation |
More TTS Service Providers | Added support for Alibaba Cloud and MiniMax, with bidirectional streaming API support for BytePlus. | Agent Parameter Description - TTS |
Interrupt Agent | Supports disabling voice interruption while enabling manual interruption, enabling scenarios like manual interruption and Push-to-talk intercom voice interaction. | Interrupt Agent |
Context Management | Supports agent instance-level context management capabilities, including context querying and resetting. | AI Short-term Memory (Agent Context) Management |
LLM Content Filtering | Supports filtering LLM output content, enabling emoji filtering and specific word replacement. Note Contact ZEGOCLOUD Technical Support for details. | - |
Callback Events | Enables developers to receive agent interruption events, user speech behavior, and agent speech behavior through server-side callbacks. |
Improvements & Optimizations
- Comprehensive optimization of integration examples, providing business service control pages and supporting client sample code. For details, refer to Quick Start.
- Further improved speech recognition and interruption accuracy, especially for external music sounds.
- Further optimized voice end-to-end latency, reducing 200ms+ delay.
- Added token authentication support for real-time audio and video (RTC), enhancing interaction security without affecting agent interaction.
2025-04-25
Version Update
- Enhanced onboarding experience, enabling voice calls with AI agents through less than 10 lines of code.
- Upgraded full-process audio handling capabilities, significantly improving the accuracy of speech interruption and recognition, especially in noisy environments, while playing BGM, or during cross talk (AI and user speaking simultaneously), covering various environments such as home, office, and public spaces for AI interaction.
- Supports for features including: custom third-party large language models (LLMs), natural speech interruptions within 500ms, real-time subtitles, AI agent status queries, proactive LLM invocation, and proactive TTS invocation.
- Upgraded architecture: ZEGOCLOUD AI agent supports multi-user vs multi-AI agent for more flexible interaction formats.
V1
2025-03-21
New Features
- Added a
Query Agent Status
server-side interface. - When creating a session, added a
Pass-through Third-party Parameters
field to the text-to-speech configuration object. - For Minimax text-to-speech services, the
Pass-through Third-party Parameters
now includes aModel
field. - The ASR configuration object has added
Hotwords
andExtended Parameters
fields. - Added a
Remove History
field to the request parameters of the server-side interface used for actively invoking text-to-speech services.
2025-02-10
New Features
- Added server-side callback for abnormal events.
- Added a
Sentence Pause Duration
field to the text-to-speech configuration object.
2025-01-16
New Features
- Added
Response Format Types
andResponse Message Name
fields to the large language model configuration object when creating a session. - Added a
User ID
(required) field to the request parameters of session and conversation-related server-side interfaces, as well as those used for actively invoking large language models and text-to-speech services. - Added
API Type
andResource ID
fields to the extended parameters of the text-to-speech configuration object.
2025-01-08
New Features
- Added a
Session ID
field to the server-side interface for obtaining session lists, supporting querying session details by session ID. - Added a
Conversation History Mode
field to the server-side interface for creating sessions, supporting whether to save session history messages.
Improvements & Optimizations
- Adjusted room event message protocol.
Deprecated & Removed
- Removed the
Account Source
field from large language model and text-to-speech configuration objects.
2024-12-31
Version Update
- Comprehensive service reliability & stability.
- Lower end-to-end latency and interruption delay.
- Updated audio processing capabilities, supporting noisy environments and meeting over 80% of scenarios.
- Agent template library.
- Supports active invocation of large language models.
- Supports active invocation of text-to-speech services.
- Supports custom RAG and other capabilities.
- Added an
Ignore Bracketed Text
field to the large language model configuration object, supporting filtering out emojis from large language model texts.
Beta
2024-12-16
New Features
- Added a server-side interface for proactively calling the text-to-speech service.
- Added a server-side interface for proactively calling the large language model service.
- Added a server-side callback interface for obtaining results from the large language model service.
- The session creation server-side interface added an
Enable Large Language Model Server Message
configuration. - The large language model configuration object added an
Ignore Bracketed Text
field, supporting filtering of emoticons in the large language model's text.
Improvements and Optimizations
- Unified the
Timestamp
field for customizing per-round conversation prompts with the large language model to Int type.
2024-12-05
New Features
- Added a
Conversation Configuration
field to server-side interfaces for creating, updating, and querying sessions. - Added a protocol for a custom pre-processing server-side interface for large language model prompts.
- The text-to-speech configuration object added
Ignore Bracketed Text
andIgnore Custom Bracketed Text
fields, supporting ignoring certain input content for text-to-speech services, such as content within Chinese and English brackets.
2024-11-26
New Features
- Added an
Extended Parameters
field applicable to text-to-speech services, supporting replicated voices from BytePlus and Minimax. - Added error codes such as
410003101
.
Bug Fixes
- Fixed an issue where the AI agent could not interrupt properly under certain scenarios.
2024-10-01
Version Release
- Supports basic scenarios such as AI real-time voice calls and IM text chats.
- Supports switching between large language models (LLMs), text-to-speech (TTS) service providers, and voice tones.