Round Mechanism and Callback Tracking

2026-05-12

Overview

Round is the unique identifier for an AI Agent interaction chain. Every time a user speaks or an API is called, the server generates a Round value. All subsequent related callbacks (ASR recognition, LLM response, TTS playback, status changes, interruption events, etc.) carry this Round value, enabling the business side to accurately track "what happened in this interaction".

Applicable scenarios: AI chat companions, voice chat rooms, digital humans, intelligent customer service, and other scenarios that require tracking the complete conversation chain. Whether the user speaks proactively or the business side triggers AI responses, Round can link all callbacks together to easily handle complex situations like interruptions and queuing.

Core Concept: What is Round

Definition of Round

Round is an ascending sequence number generated by the server for each complete interaction. It never repeats.

One interaction = Complete flow from "trigger" to "end"
Trigger sources: User speech, calling SendAgentInstanceLLM, calling SendAgentInstanceTTS
End marker: AI completes the current round response (TTS playback finished or LLM response completed)

Purpose of Round

Round spans the entire interaction lifecycle. All callback events carry the Round field:

Trigger → Round N starts
  ├─ ASRResult (Round: N)        // ASR recognition result starts
  ├─ LLMResult (Round: N)        // LLM response content
  ├─ AgentInstanceStatus (Round: N)  // Status change (Listening→Thinking→Speaking→Idle)
  ├─ AgentInstanceMetaInfo (Round: N) // Metadata (voice, emotion, etc.)
  └─ End → Round N completed

Trigger → Round N starts
  ├─ ASRResult (Round: N)        // ASR recognition result starts
  ├─ LLMResult (Round: N)        // LLM response content
  ├─ AgentInstanceStatus (Round: N)  // Status change (Listening→Thinking→Speaking→Idle)
  ├─ AgentInstanceMetaInfo (Round: N) // Metadata (voice, emotion, etc.)
  └─ End → Round N completed

When Round is Generated

Trigger Source	Round Generation Timing	Description
User speaks	User voice detected	Round is determined when ASR starts recognition
SendAgentInstanceLLM	API request succeeds	Server assigns Round immediately upon receiving the request
SendAgentInstanceTTS	API request succeeds	Server assigns Round immediately upon receiving the request

Diagrams

Data Flow Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Trigger Sources                          │
├───────────────────────┬─────────────────────────────────────────┤
│   User speaks proactively  │   Business side calls API             │
│   (RTC room voice)          │   SendAgentInstanceLLM/TTS              │
└───────────┬───────────┴────────────────┬────────────────────────┘
            │                            │
            ▼                            ▼
    ┌────────────────────────────────────────┐
    │    ZEGO AI Agent Server                       │
    │    Generate Round (ascending sequence)       │
    └────────────────────────────────────────┘
                    │
        ┌───────────┼───────────┬───────────┐
        │           │           │           │
        ▼           ▼           ▼           ▼
  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
  │   ASR    │ │   LLM    │ │   TTS    │ │  Status  │
  │  Result  │ │  Result  │ │  Status  │ │  Change  │
  │ Round: N │ │ Round: N │ │ Round: N │ │ Round: N │
  └──────────┘ └──────────┘ └──────────┘ └──────────┘
        │           │           │           │
        └───────────┴───────────┴───────────┘
                    │
                    ▼
        ┌───────────────────────┐
        │   Business Server Callbacks  │
        │   Correlate all events through Round  │
        └───────────────────────┘

┌─────────────────────────────────────────────────────────────────┐
│                        Trigger Sources                          │
├───────────────────────┬─────────────────────────────────────────┤
│   User speaks proactively  │   Business side calls API             │
│   (RTC room voice)          │   SendAgentInstanceLLM/TTS              │
└───────────┬───────────┴────────────────┬────────────────────────┘
            │                            │
            ▼                            ▼
    ┌────────────────────────────────────────┐
    │    ZEGO AI Agent Server                       │
    │    Generate Round (ascending sequence)       │
    └────────────────────────────────────────┘
                    │
        ┌───────────┼───────────┬───────────┐
        │           │           │           │
        ▼           ▼           ▼           ▼
  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
  │   ASR    │ │   LLM    │ │   TTS    │ │  Status  │
  │  Result  │ │  Result  │ │  Status  │ │  Change  │
  │ Round: N │ │ Round: N │ │ Round: N │ │ Round: N │
  └──────────┘ └──────────┘ └──────────┘ └──────────┘
        │           │           │           │
        └───────────┴───────────┴───────────┘
                    │
                    ▼
        ┌───────────────────────┐
        │   Business Server Callbacks  │
        │   Correlate all events through Round  │
        └───────────────────────┘

Comparison of Two Data Flows

[Generation Timing 1: User speaks proactively]
User: "How's the weather today?" (Round 5)
  ↓
Server: ASR recognition → LLM thinking → TTS playback
  ↓
Callback sequence:
  - ASRResult (Round: 5, text: "How's the weather today?")
  - AgentInstanceStatus (Round: 5, status: "Thinking")
  - LLMResult (Round: 5, text: "The weather is sunny...")
  - AgentInstanceStatus (Round: 5, status: "Speaking")
  - AgentInstanceStatus (Round: 5, status: "Idle")

[Generation Timing 2: Business side calls API]
Business: SendAgentInstanceTTS("Welcome") (Round 6)
  ↓
Server: Direct TTS playback
  ↓
Callback sequence:
  - AgentInstanceStatus (Round: 6, status: "Speaking")
  - AgentInstanceMetaInfo (Round: 6, voice: "zh_female")
  - AgentInstanceStatus (Round: 6, status: "Idle")

[Generation Timing 1: User speaks proactively]
User: "How's the weather today?" (Round 5)
  ↓
Server: ASR recognition → LLM thinking → TTS playback
  ↓
Callback sequence:
  - ASRResult (Round: 5, text: "How's the weather today?")
  - AgentInstanceStatus (Round: 5, status: "Thinking")
  - LLMResult (Round: 5, text: "The weather is sunny...")
  - AgentInstanceStatus (Round: 5, status: "Speaking")
  - AgentInstanceStatus (Round: 5, status: "Idle")

[Generation Timing 2: Business side calls API]
Business: SendAgentInstanceTTS("Welcome") (Round 6)
  ↓
Server: Direct TTS playback
  ↓
Callback sequence:
  - AgentInstanceStatus (Round: 6, status: "Speaking")
  - AgentInstanceMetaInfo (Round: 6, voice: "zh_female")
  - AgentInstanceStatus (Round: 6, status: "Idle")

Getting Full-Chain Round

Client SDK Callback

Important: Clients can obtain Round information through ZEGO Express SDK experimental API callbacks, used to implement status switch UI (e.g., "Listening", "Thinking", "Speaking").

Cmd Types and Data Fields

Cmd	Type	Data Fields	Description
1	User speech status	SpeakStatus, UserId	SpeakStatus: 1=speech starts, 2=speech ends
3	ASR text	Text, MessageId, UserId, StartFlag, EndFlag	Incrementally delivers ASR recognition results
4	LLM response	Text, MessageId, EndFlag	Incrementally delivers LLM responses
6	Agent status	Status, OldStatus, Reason	Status: 0=Idle, 1=Listening, 2=Thinking, 3=Speaking
102	Metadata info	Object	Metadata info (voice, emotion, action, etc.)

Callback Field Description (Key Fields)

Common fields for all callback events:

{
  "Timestamp": 1765510379,           // Second-level timestamp
  "TimestampMs": 1765510379113,      // Millisecond-level timestamp
  "SeqId": 278800715,                // Packet sequence number, guarantees ordering, not consecutive
  "Round": 510359002,                // 【Core】Conversation round, generated in ascending order, not consecutive
  "Cmd": 1,                          // Command type
  "Data": { ... }                    // Specific content, see table below
}

{
  "Timestamp": 1765510379,           // Second-level timestamp
  "TimestampMs": 1765510379113,      // Millisecond-level timestamp
  "SeqId": 278800715,                // Packet sequence number, guarantees ordering, not consecutive
  "Round": 510359002,                // 【Core】Conversation round, generated in ascending order, not consecutive
  "Cmd": 1,                          // Command type
  "Data": { ... }                    // Specific content, see table below
}

Detailed Documentation

For complete API documentation, refer to: AI Agent Instance SDK Callbacks

Server Callback

Callback Type List

Callback Event	Trigger Timing	Purpose
ASRResult	User speech recognition completed	Get ASR recognition text
LLMResult	LLM generates response	Get AI response content
AgentInstanceStatus	Status change	Track AI current status (Listening/Thinking/Speaking/Idle)
AgentInstanceMetaInfo	Broadcast starts	Get metadata (voice, emotion, action, etc.)
Interrupted	Interrupted	Handle interruption logic
UserSpeakAction	User starts/stops speaking	Detect user speech
AgentSpeakAction	AI starts/stops speaking	Detect AI speech
UserAudioData	User starts speaking	Get audio data for the corresponding round
Exception	Error occurred	Error handling

Callback Field Description (Key Fields)

Common fields for all callback events:

{
  "Event": "ASRResult",           // Event type
  "RoomId": "room_123",               // Room ID
  "AgentUserId": "agent_user_001",    // Agent user ID
  "Sequence": 1234567890,             // Event sequence number (globally incrementing)
  "Timestamp": 1746619200000,         // Timestamp (milliseconds)
  "Data": {
    "Round": 5,                         // 【Core】Round number
    ...
  }                     // Event specific data
}

{
  "Event": "ASRResult",           // Event type
  "RoomId": "room_123",               // Room ID
  "AgentUserId": "agent_user_001",    // Agent user ID
  "Sequence": 1234567890,             // Event sequence number (globally incrementing)
  "Timestamp": 1746619200000,         // Timestamp (milliseconds)
  "Data": {
    "Round": 5,                         // 【Core】Round number
    ...
  }                     // Event specific data
}

Detailed Documentation

For complete documentation, refer to: Server Callbacks

Getting Full-Chain Round Examples

FAQ

Q1: When is Round generated?

A: Generated each time a new interaction is triggered, including:

User starts speaking
Calling SendAgentInstanceLLM
Calling SendAgentInstanceTTS

In interruption scenarios, new interactions trigger new Rounds, old Rounds terminate.

Q2: Will Round repeat?

A: No. Round is generated in ascending order and never repeats.

Q3: How to debug Round correlation issues?

A: It is recommended that the business side logs the Round value of each callback and views them grouped by Round:

[Round 5] ASRResult: "User question content"
[Round 5] LLMResult: "AI response content"
[Round 5] Status: Speaking → Idle

[Round 5] ASRResult: "User question content"
[Round 5] LLMResult: "AI response content"
[Round 5] Status: Speaking → Idle

If Round is not continuous or events are missing, check:

Whether callback configuration is enabled (CallbackConfig)
Whether the network is normal (whether callbacks are lost)

Q4: What is the difference between Round and Sequence?

Round: Identifies "which interaction", all events of the same interaction have the same Round
Sequence: Identifies "global order of events", all events strictly increment

Business side uses Round to correlate events of the same interaction, and uses Sequence to detect missing events.

Q5: Will the interrupted Round still receive LLMResult?

A: No. When interrupted, if LLM is still generating, the server stops generation and will not callback LLMResult.

Business side should clean up the pending state of the interrupted Round after receiving the Interrupted event.

Appendix: Callback Configuration Example

When creating an Agent instance, enable relevant callbacks:

{
  "Action": "CreateAgentInstance",
  "AppId": 1234567890,
  "AgentId": "agent_001",
  "UserId": "user_001",
  "RTC": {
    "RoomId": "room_123",
    "AgentStreamId": "stream_001",
    "AgentUserId": "agent_user_001",
    "UserStreamId": "user_stream_001"
  },
  "CallbackConfig": {
    "ASRResult": 1,              // Enable ASR callback
    "LLMResult": 1,              // Enable LLM callback
    "Interrupted": 1,            // Enable interruption callback
    "UserSpeakAction": 1,        // Enable user speech callback
    "AgentSpeakAction": 1,       // Enable AI speech callback
    "AgentInstanceStatus": 1     // Enable status callback
  }
}

{
  "Action": "CreateAgentInstance",
  "AppId": 1234567890,
  "AgentId": "agent_001",
  "UserId": "user_001",
  "RTC": {
    "RoomId": "room_123",
    "AgentStreamId": "stream_001",
    "AgentUserId": "agent_user_001",
    "UserStreamId": "user_stream_001"
  },
  "CallbackConfig": {
    "ASRResult": 1,              // Enable ASR callback
    "LLMResult": 1,              // Enable LLM callback
    "Interrupted": 1,            // Enable interruption callback
    "UserSpeakAction": 1,        // Enable user speech callback
    "AgentSpeakAction": 1,       // Enable AI speech callback
    "AgentInstanceStatus": 1     // Enable status callback
  }
}