Round Mechanism and Callback Tracking
Overview
Round is the unique identifier for an AI Agent interaction chain. Every time a user speaks or an API is called, the server generates a Round value. All subsequent related callbacks (ASR recognition, LLM response, TTS playback, status changes, interruption events, etc.) carry this Round value, enabling the business side to accurately track "what happened in this interaction".
Applicable scenarios: AI chat companions, voice chat rooms, digital humans, intelligent customer service, and other scenarios that require tracking the complete conversation chain. Whether the user speaks proactively or the business side triggers AI responses, Round can link all callbacks together to easily handle complex situations like interruptions and queuing.
Core Concept: What is Round
Definition of Round
Round is an ascending sequence number generated by the server for each complete interaction. It never repeats.
- One interaction = Complete flow from "trigger" to "end"
- Trigger sources: User speech, calling SendAgentInstanceLLM, calling SendAgentInstanceTTS
- End marker: AI completes the current round response (TTS playback finished or LLM response completed)
Purpose of Round
Round spans the entire interaction lifecycle. All callback events carry the Round field:
Trigger → Round N starts
├─ ASRResult (Round: N) // ASR recognition result starts
├─ LLMResult (Round: N) // LLM response content
├─ AgentInstanceStatus (Round: N) // Status change (Listening→Thinking→Speaking→Idle)
├─ AgentInstanceMetaInfo (Round: N) // Metadata (voice, emotion, etc.)
└─ End → Round N completedWhen Round is Generated
| Trigger Source | Round Generation Timing | Description |
|---|---|---|
| User speaks | User voice detected | Round is determined when ASR starts recognition |
| SendAgentInstanceLLM | API request succeeds | Server assigns Round immediately upon receiving the request |
| SendAgentInstanceTTS | API request succeeds | Server assigns Round immediately upon receiving the request |
Diagrams
Data Flow Overview
┌─────────────────────────────────────────────────────────────────┐
│ Trigger Sources │
├───────────────────────┬─────────────────────────────────────────┤
│ User speaks proactively │ Business side calls API │
│ (RTC room voice) │ SendAgentInstanceLLM/TTS │
└───────────┬───────────┴────────────────┬────────────────────────┘
│ │
▼ ▼
┌────────────────────────────────────────┐
│ ZEGO AI Agent Server │
│ Generate Round (ascending sequence) │
└────────────────────────────────────────┘
│
┌───────────┼───────────┬───────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ ASR │ │ LLM │ │ TTS │ │ Status │
│ Result │ │ Result │ │ Status │ │ Change │
│ Round: N │ │ Round: N │ │ Round: N │ │ Round: N │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
│ │ │ │
└───────────┴───────────┴───────────┘
│
▼
┌───────────────────────┐
│ Business Server Callbacks │
│ Correlate all events through Round │
└───────────────────────┘Comparison of Two Data Flows
[Generation Timing 1: User speaks proactively]
User: "How's the weather today?" (Round 5)
↓
Server: ASR recognition → LLM thinking → TTS playback
↓
Callback sequence:
- ASRResult (Round: 5, text: "How's the weather today?")
- AgentInstanceStatus (Round: 5, status: "Thinking")
- LLMResult (Round: 5, text: "The weather is sunny...")
- AgentInstanceStatus (Round: 5, status: "Speaking")
- AgentInstanceStatus (Round: 5, status: "Idle")
[Generation Timing 2: Business side calls API]
Business: SendAgentInstanceTTS("Welcome") (Round 6)
↓
Server: Direct TTS playback
↓
Callback sequence:
- AgentInstanceStatus (Round: 6, status: "Speaking")
- AgentInstanceMetaInfo (Round: 6, voice: "zh_female")
- AgentInstanceStatus (Round: 6, status: "Idle")Getting Full-Chain Round
Client SDK Callback
Important: Clients can obtain Round information through ZEGO Express SDK experimental API callbacks, used to implement status switch UI (e.g., "Listening", "Thinking", "Speaking").
Cmd Types and Data Fields
| Cmd | Type | Data Fields | Description |
|---|---|---|---|
| 1 | User speech status | SpeakStatus, UserId | SpeakStatus: 1=speech starts, 2=speech ends |
| 3 | ASR text | Text, MessageId, UserId, StartFlag, EndFlag | Incrementally delivers ASR recognition results |
| 4 | LLM response | Text, MessageId, EndFlag | Incrementally delivers LLM responses |
| 6 | Agent status | Status, OldStatus, Reason | Status: 0=Idle, 1=Listening, 2=Thinking, 3=Speaking |
| 102 | Metadata info | Object | Metadata info (voice, emotion, action, etc.) |
Callback Field Description (Key Fields)
Common fields for all callback events:
{
"Timestamp": 1765510379, // Second-level timestamp
"TimestampMs": 1765510379113, // Millisecond-level timestamp
"SeqId": 278800715, // Packet sequence number, guarantees ordering, not consecutive
"Round": 510359002, // 【Core】Conversation round, generated in ascending order, not consecutive
"Cmd": 1, // Command type
"Data": { ... } // Specific content, see table below
}Detailed Documentation
For complete API documentation, refer to: AI Agent Instance SDK Callbacks
Server Callback
Callback Type List
| Callback Event | Trigger Timing | Purpose |
|---|---|---|
| ASRResult | User speech recognition completed | Get ASR recognition text |
| LLMResult | LLM generates response | Get AI response content |
| AgentInstanceStatus | Status change | Track AI current status (Listening/Thinking/Speaking/Idle) |
| AgentInstanceMetaInfo | Broadcast starts | Get metadata (voice, emotion, action, etc.) |
| Interrupted | Interrupted | Handle interruption logic |
| UserSpeakAction | User starts/stops speaking | Detect user speech |
| AgentSpeakAction | AI starts/stops speaking | Detect AI speech |
| UserAudioData | User starts speaking | Get audio data for the corresponding round |
| Exception | Error occurred | Error handling |
Callback Field Description (Key Fields)
Common fields for all callback events:
{
"Event": "ASRResult", // Event type
"RoomId": "room_123", // Room ID
"AgentUserId": "agent_user_001", // Agent user ID
"Sequence": 1234567890, // Event sequence number (globally incrementing)
"Timestamp": 1746619200000, // Timestamp (milliseconds)
"Data": {
"Round": 5, // 【Core】Round number
...
} // Event specific data
}Detailed Documentation
For complete documentation, refer to: Server Callbacks
Getting Full-Chain Round Examples
FAQ
Q1: When is Round generated?
A: Generated each time a new interaction is triggered, including:
- User starts speaking
- Calling SendAgentInstanceLLM
- Calling SendAgentInstanceTTS
In interruption scenarios, new interactions trigger new Rounds, old Rounds terminate.
Q2: Will Round repeat?
A: No. Round is generated in ascending order and never repeats.
Q3: How to debug Round correlation issues?
A: It is recommended that the business side logs the Round value of each callback and views them grouped by Round:
[Round 5] ASRResult: "User question content"
[Round 5] LLMResult: "AI response content"
[Round 5] Status: Speaking → IdleIf Round is not continuous or events are missing, check:
- Whether callback configuration is enabled (CallbackConfig)
- Whether the network is normal (whether callbacks are lost)
Q4: What is the difference between Round and Sequence?
A:
- Round: Identifies "which interaction", all events of the same interaction have the same Round
- Sequence: Identifies "global order of events", all events strictly increment
Business side uses Round to correlate events of the same interaction, and uses Sequence to detect missing events.
Q5: Will the interrupted Round still receive LLMResult?
A: No. When interrupted, if LLM is still generating, the server stops generation and will not callback LLMResult.
Business side should clean up the pending state of the interrupted Round after receiving the Interrupted event.
Appendix: Callback Configuration Example
When creating an Agent instance, enable relevant callbacks:
{
"Action": "CreateAgentInstance",
"AppId": 1234567890,
"AgentId": "agent_001",
"UserId": "user_001",
"RTC": {
"RoomId": "room_123",
"AgentStreamId": "stream_001",
"AgentUserId": "agent_user_001",
"UserStreamId": "user_stream_001"
},
"CallbackConfig": {
"ASRResult": 1, // Enable ASR callback
"LLMResult": 1, // Enable LLM callback
"Interrupted": 1, // Enable interruption callback
"UserSpeakAction": 1, // Enable user speech callback
"AgentSpeakAction": 1, // Enable AI speech callback
"AgentInstanceStatus": 1 // Enable status callback
}
}