logo
On this page

Release Notes


V2

2025-05-30

Server v2.2.0

New Features

FeatureDescriptionDocumentation
1 user vs multiple AI roles
Note
Feature is in beta testing, please contact ZEGOCLOUD Business for details.
-
Request body contains agent instance and user information when calling LLMWhen creating an agent instance, if the AddAgentInfo field is set to true, the AI Agent backend will add the agent_info field to the request body parameters sent to the custom LLM, which includes room_id, user_id, and agent_instance_id information. This allows for personalized responses based on different users or agent instances, such as calling different function calling or memory based on user IDs.Configuring LLM
Callback for each round of user speech audio segmentWhen creating an agent instance, if the UserAudioData field of CallbackConfig is set to 1, the AI Agent backend will callback the audio data of the user's speech in the previous 1-1.5 seconds of each round of conversation (if less than 1 second, no callback will be sent). Business side can implement voiceprint recognition and other capabilities based on this audio information.Receiving Callback

Improvements & Optimizations

  • Optimized the user experience problem caused by subtitle and LLM callback too early when ASR multi-sentence concatenation is enabled. For details, please refer to Speech Recognition Segmentation.

2025-05-16

Server v2.1.0

New Features

FeatureDescriptionDocumentation
Multi-user vs 1 AgentSupports multiple users simultaneously interacting with one AI agent through voice. Features include voice interruption, manual interruption, proactive agent speech, and the agent's ability to distinguish and respond to different users.
Note
Contact ZEGOCLOUD Technical Support for details.
-
Speech Recognition SegmentationSupports voice detection threshold settings and pause duration settings to balance latency and speech recognition segmentation.Speech Recognition Segmentation
More TTS Service ProvidersAdded support for Alibaba Cloud and MiniMax, with bidirectional streaming API support for BytePlus.Agent Parameter Description - TTS
Interrupt AgentSupports disabling voice interruption while enabling manual interruption, enabling scenarios like manual interruption and Push-to-talk intercom voice interaction.Interrupt Agent
Context ManagementSupports agent instance-level context management capabilities, including context querying and resetting.AI Short-term Memory (Agent Context) Management
LLM Content FilteringSupports filtering LLM output content, enabling emoji filtering and specific word replacement.
Note
Contact ZEGOCLOUD Technical Support for details.
-
Callback EventsEnables developers to receive agent interruption events, user speech behavior, and agent speech behavior through server-side callbacks.

Improvements & Optimizations

  • Comprehensive optimization of integration examples, providing business service control pages and supporting client sample code. For details, refer to Quick Start.
  • Further improved speech recognition and interruption accuracy, especially for external music sounds.
  • Further optimized voice end-to-end latency, reducing 200ms+ delay.
  • Added token authentication support for real-time audio and video (RTC), enhancing interaction security without affecting agent interaction.

2025-04-25

Server v2.0.0

Version Update

  • Enhanced onboarding experience, enabling voice calls with AI agents through less than 10 lines of code.
  • Upgraded full-process audio handling capabilities, significantly improving the accuracy of speech interruption and recognition, especially in noisy environments, while playing BGM, or during cross talk (AI and user speaking simultaneously), covering various environments such as home, office, and public spaces for AI interaction.
  • Supports for features including: custom third-party large language models (LLMs), natural speech interruptions within 500ms, real-time subtitles, AI agent status queries, proactive LLM invocation, and proactive TTS invocation.
  • Upgraded architecture: ZEGOCLOUD AI agent supports multi-user vs multi-AI agent for more flexible interaction formats.

V1

2025-03-21

Server v1.4.0

New Features

  • Added a Query Agent Status server-side interface.
  • When creating a session, added a Pass-through Third-party Parameters field to the text-to-speech configuration object.
  • For Minimax text-to-speech services, the Pass-through Third-party Parameters now includes a Model field.
  • The ASR configuration object has added Hotwords and Extended Parameters fields.
  • Added a Remove History field to the request parameters of the server-side interface used for actively invoking text-to-speech services.

2025-02-10

Server v1.3.0

New Features

  • Added server-side callback for abnormal events.
  • Added a Sentence Pause Duration field to the text-to-speech configuration object.

2025-01-16

Server v1.2.0

New Features

  • Added Response Format Types and Response Message Name fields to the large language model configuration object when creating a session.
  • Added a User ID (required) field to the request parameters of session and conversation-related server-side interfaces, as well as those used for actively invoking large language models and text-to-speech services.
  • Added API Type and Resource ID fields to the extended parameters of the text-to-speech configuration object.

2025-01-08

Server v1.1.0

New Features

  • Added a Session ID field to the server-side interface for obtaining session lists, supporting querying session details by session ID.
  • Added a Conversation History Mode field to the server-side interface for creating sessions, supporting whether to save session history messages.

Improvements & Optimizations

  • Adjusted room event message protocol.

Deprecated & Removed

  • Removed the Account Source field from large language model and text-to-speech configuration objects.

2024-12-31

Server v1.0.0

Version Update

  • Comprehensive service reliability & stability.
  • Lower end-to-end latency and interruption delay.
  • Updated audio processing capabilities, supporting noisy environments and meeting over 80% of scenarios.
  • Agent template library.
  • Supports active invocation of large language models.
  • Supports active invocation of text-to-speech services.
  • Supports custom RAG and other capabilities.
  • Added an Ignore Bracketed Text field to the large language model configuration object, supporting filtering out emojis from large language model texts.

Beta

2024-12-16

Server v0.5.0

New Features

  • Added a server-side interface for proactively calling the text-to-speech service.
  • Added a server-side interface for proactively calling the large language model service.
  • Added a server-side callback interface for obtaining results from the large language model service.
  • The session creation server-side interface added an Enable Large Language Model Server Message configuration.
  • The large language model configuration object added an Ignore Bracketed Text field, supporting filtering of emoticons in the large language model's text.

Improvements and Optimizations

  • Unified the Timestamp field for customizing per-round conversation prompts with the large language model to Int type.

2024-12-05

Server v0.3.0

New Features

  • Added a Conversation Configuration field to server-side interfaces for creating, updating, and querying sessions.
  • Added a protocol for a custom pre-processing server-side interface for large language model prompts.
  • The text-to-speech configuration object added Ignore Bracketed Text and Ignore Custom Bracketed Text fields, supporting ignoring certain input content for text-to-speech services, such as content within Chinese and English brackets.

2024-11-26

Server v0.2.0

New Features

  • Added an Extended Parameters field applicable to text-to-speech services, supporting replicated voices from BytePlus and Minimax.
  • Added error codes such as 410003101.

Bug Fixes

  • Fixed an issue where the AI agent could not interrupt properly under certain scenarios.

2024-10-01

Server v0.1.0

Version Release

  • Supports basic scenarios such as AI real-time voice calls and IM text chats.
  • Supports switching between large language models (LLMs), text-to-speech (TTS) service providers, and voice tones.

Previous

Overview

Next

Quick Start Voice Call