Release Notes
V2
2025-05-30
Server v2.2.0
New Features
Feature | Description | Documentation |
---|---|---|
1 user vs multiple AI roles | Note Feature is in beta testing, please contact ZEGOCLOUD Business for details. | - |
Request body contains agent instance and user information when calling LLM | When creating an agent instance, if the AddAgentInfo field is set to true , the AI Agent backend will add the agent_info field to the request body parameters sent to the custom LLM, which includes room_id , user_id , and agent_instance_id information. This allows for personalized responses based on different users or agent instances, such as calling different function calling or memory based on user IDs. | Configuring LLM |
Callback for each round of user speech audio segment | When creating an agent instance, if the UserAudioData field of CallbackConfig is set to 1, the AI Agent backend will callback the audio data of the user's speech in the previous 1-1.5 seconds of each round of conversation (if less than 1 second, no callback will be sent). Business side can implement voiceprint recognition and other capabilities based on this audio information. | Receiving Callback |
Improvements & Optimizations
- Optimized the user experience problem caused by subtitle and LLM callback too early when ASR multi-sentence concatenation is enabled. For details, please refer to Speech Recognition Segmentation.
2025-05-16
Server v2.1.0
New Features
Feature | Description | Documentation |
---|---|---|
Multi-user vs 1 Agent | Supports multiple users simultaneously interacting with one AI agent through voice. Features include voice interruption, manual interruption, proactive agent speech, and the agent's ability to distinguish and respond to different users. Note Contact ZEGOCLOUD Technical Support for details. | - |
Speech Recognition Segmentation | Supports voice detection threshold settings and pause duration settings to balance latency and speech recognition segmentation. | Speech Recognition Segmentation |
More TTS Service Providers | Added support for Alibaba Cloud and MiniMax, with bidirectional streaming API support for BytePlus. | Agent Parameter Description - TTS |
Interrupt Agent | Supports disabling voice interruption while enabling manual interruption, enabling scenarios like manual interruption and Push-to-talk intercom voice interaction. | Interrupt Agent |
Context Management | Supports agent instance-level context management capabilities, including context querying and resetting. | AI Short-term Memory (Agent Context) Management |
LLM Content Filtering | Supports filtering LLM output content, enabling emoji filtering and specific word replacement. Note Contact ZEGOCLOUD Technical Support for details. | - |
Callback Events | Enables developers to receive agent interruption events, user speech behavior, and agent speech behavior through server-side callbacks. |
Improvements & Optimizations
- Comprehensive optimization of integration examples, providing business service control pages and supporting client sample code. For details, refer to Quick Start.
- Further improved speech recognition and interruption accuracy, especially for external music sounds.
- Further optimized voice end-to-end latency, reducing 200ms+ delay.
- Added token authentication support for real-time audio and video (RTC), enhancing interaction security without affecting agent interaction.
2025-04-25
Server v2.0.0
Version Update
- Enhanced onboarding experience, enabling voice calls with AI agents through less than 10 lines of code.
- Upgraded full-process audio handling capabilities, significantly improving the accuracy of speech interruption and recognition, especially in noisy environments, while playing BGM, or during cross talk (AI and user speaking simultaneously), covering various environments such as home, office, and public spaces for AI interaction.
- Supports for features including: custom third-party large language models (LLMs), natural speech interruptions within 500ms, real-time subtitles, AI agent status queries, proactive LLM invocation, and proactive TTS invocation.
- Upgraded architecture: ZEGOCLOUD AI agent supports multi-user vs multi-AI agent for more flexible interaction formats.
V1
2025-03-21
Server v1.4.0
New Features
- Added a
Query Agent Status
server-side interface. - When creating a session, added a
Pass-through Third-party Parameters
field to the text-to-speech configuration object. - For Minimax text-to-speech services, the
Pass-through Third-party Parameters
now includes aModel
field. - The ASR configuration object has added
Hotwords
andExtended Parameters
fields. - Added a
Remove History
field to the request parameters of the server-side interface used for actively invoking text-to-speech services.
2025-02-10
Server v1.3.0
New Features
- Added server-side callback for abnormal events.
- Added a
Sentence Pause Duration
field to the text-to-speech configuration object.
2025-01-16
Server v1.2.0
New Features
- Added
Response Format Types
andResponse Message Name
fields to the large language model configuration object when creating a session. - Added a
User ID
(required) field to the request parameters of session and conversation-related server-side interfaces, as well as those used for actively invoking large language models and text-to-speech services. - Added
API Type
andResource ID
fields to the extended parameters of the text-to-speech configuration object.
2025-01-08
Server v1.1.0
New Features
- Added a
Session ID
field to the server-side interface for obtaining session lists, supporting querying session details by session ID. - Added a
Conversation History Mode
field to the server-side interface for creating sessions, supporting whether to save session history messages.
Improvements & Optimizations
- Adjusted room event message protocol.
Deprecated & Removed
- Removed the
Account Source
field from large language model and text-to-speech configuration objects.
2024-12-31
Server v1.0.0
Version Update
- Comprehensive service reliability & stability.
- Lower end-to-end latency and interruption delay.
- Updated audio processing capabilities, supporting noisy environments and meeting over 80% of scenarios.
- Agent template library.
- Supports active invocation of large language models.
- Supports active invocation of text-to-speech services.
- Supports custom RAG and other capabilities.
- Added an
Ignore Bracketed Text
field to the large language model configuration object, supporting filtering out emojis from large language model texts.
Beta
2024-12-16
Server v0.5.0
New Features
- Added a server-side interface for proactively calling the text-to-speech service.
- Added a server-side interface for proactively calling the large language model service.
- Added a server-side callback interface for obtaining results from the large language model service.
- The session creation server-side interface added an
Enable Large Language Model Server Message
configuration. - The large language model configuration object added an
Ignore Bracketed Text
field, supporting filtering of emoticons in the large language model's text.
Improvements and Optimizations
- Unified the
Timestamp
field for customizing per-round conversation prompts with the large language model to Int type.
2024-12-05
Server v0.3.0
New Features
- Added a
Conversation Configuration
field to server-side interfaces for creating, updating, and querying sessions. - Added a protocol for a custom pre-processing server-side interface for large language model prompts.
- The text-to-speech configuration object added
Ignore Bracketed Text
andIgnore Custom Bracketed Text
fields, supporting ignoring certain input content for text-to-speech services, such as content within Chinese and English brackets.
2024-11-26
Server v0.2.0
New Features
- Added an
Extended Parameters
field applicable to text-to-speech services, supporting replicated voices from BytePlus and Minimax. - Added error codes such as
410003101
.
Bug Fixes
- Fixed an issue where the AI agent could not interrupt properly under certain scenarios.
2024-10-01
Server v0.1.0
Version Release
- Supports basic scenarios such as AI real-time voice calls and IM text chats.
- Supports switching between large language models (LLMs), text-to-speech (TTS) service providers, and voice tones.