On this page

Round Mechanism and Callback Tracking

2026-05-12

Overview

Round is the unique identifier for an AI Agent interaction chain. Every time a user speaks or an API is called, the server generates a Round value. All subsequent related callbacks (ASR recognition, LLM response, TTS playback, status changes, interruption events, etc.) carry this Round value, enabling the business side to accurately track "what happened in this interaction".

Applicable scenarios: AI chat companions, voice chat rooms, digital humans, intelligent customer service, and other scenarios that require tracking the complete conversation chain. Whether the user speaks proactively or the business side triggers AI responses, Round can link all callbacks together to easily handle complex situations like interruptions and queuing.


Core Concept: What is Round

Definition of Round

Round is an ascending sequence number generated by the server for each complete interaction. It never repeats.

  • One interaction = Complete flow from "trigger" to "end"
  • Trigger sources: User speech, calling SendAgentInstanceLLM, calling SendAgentInstanceTTS
  • End marker: AI completes the current round response (TTS playback finished or LLM response completed)

Purpose of Round

Round spans the entire interaction lifecycle. All callback events carry the Round field:

Trigger → Round N starts
  ├─ ASRResult (Round: N)        // ASR recognition result starts
  ├─ LLMResult (Round: N)        // LLM response content
  ├─ AgentInstanceStatus (Round: N)  // Status change (Listening→Thinking→Speaking→Idle)
  ├─ AgentInstanceMetaInfo (Round: N) // Metadata (voice, emotion, etc.)
  └─ End → Round N completed

When Round is Generated

Trigger SourceRound Generation TimingDescription
User speaksUser voice detectedRound is determined when ASR starts recognition
SendAgentInstanceLLMAPI request succeedsServer assigns Round immediately upon receiving the request
SendAgentInstanceTTSAPI request succeedsServer assigns Round immediately upon receiving the request

Diagrams

Data Flow Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Trigger Sources                          │
├───────────────────────┬─────────────────────────────────────────┤
│   User speaks proactively  │   Business side calls API             │
│   (RTC room voice)          │   SendAgentInstanceLLM/TTS              │
└───────────┬───────────┴────────────────┬────────────────────────┘
            │                            │
            ▼                            ▼
    ┌────────────────────────────────────────┐
    │    ZEGO AI Agent Server                       │
    │    Generate Round (ascending sequence)       │
    └────────────────────────────────────────┘
                    │
        ┌───────────┼───────────┬───────────┐
        │           │           │           │
        ▼           ▼           ▼           ▼
  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
  │   ASR    │ │   LLM    │ │   TTS    │ │  Status  │
  │  Result  │ │  Result  │ │  Status  │ │  Change  │
  │ Round: N │ │ Round: N │ │ Round: N │ │ Round: N │
  └──────────┘ └──────────┘ └──────────┘ └──────────┘
        │           │           │           │
        └───────────┴───────────┴───────────┘
                    │
                    ▼
        ┌───────────────────────┐
        │   Business Server Callbacks  │
        │   Correlate all events through Round  │
        └───────────────────────┘

Comparison of Two Data Flows

[Generation Timing 1: User speaks proactively]
User: "How's the weather today?" (Round 5)
  ↓
Server: ASR recognition → LLM thinking → TTS playback
  ↓
Callback sequence:
  - ASRResult (Round: 5, text: "How's the weather today?")
  - AgentInstanceStatus (Round: 5, status: "Thinking")
  - LLMResult (Round: 5, text: "The weather is sunny...")
  - AgentInstanceStatus (Round: 5, status: "Speaking")
  - AgentInstanceStatus (Round: 5, status: "Idle")

[Generation Timing 2: Business side calls API]
Business: SendAgentInstanceTTS("Welcome") (Round 6)
  ↓
Server: Direct TTS playback
  ↓
Callback sequence:
  - AgentInstanceStatus (Round: 6, status: "Speaking")
  - AgentInstanceMetaInfo (Round: 6, voice: "zh_female")
  - AgentInstanceStatus (Round: 6, status: "Idle")

Getting Full-Chain Round

Client SDK Callback

Important: Clients can obtain Round information through ZEGO Express SDK experimental API callbacks, used to implement status switch UI (e.g., "Listening", "Thinking", "Speaking").

Cmd Types and Data Fields

CmdTypeData FieldsDescription
1User speech statusSpeakStatus, UserIdSpeakStatus: 1=speech starts, 2=speech ends
3ASR textText, MessageId, UserId, StartFlag, EndFlagIncrementally delivers ASR recognition results
4LLM responseText, MessageId, EndFlagIncrementally delivers LLM responses
6Agent statusStatus, OldStatus, ReasonStatus: 0=Idle, 1=Listening, 2=Thinking, 3=Speaking
102Metadata infoObjectMetadata info (voice, emotion, action, etc.)

Callback Field Description (Key Fields)

Common fields for all callback events:

{
  "Timestamp": 1765510379,           // Second-level timestamp
  "TimestampMs": 1765510379113,      // Millisecond-level timestamp
  "SeqId": 278800715,                // Packet sequence number, guarantees ordering, not consecutive
  "Round": 510359002,                // 【Core】Conversation round, generated in ascending order, not consecutive
  "Cmd": 1,                          // Command type
  "Data": { ... }                    // Specific content, see table below
}

Detailed Documentation

For complete API documentation, refer to: AI Agent Instance SDK Callbacks


Server Callback

Callback Type List

Callback EventTrigger TimingPurpose
ASRResultUser speech recognition completedGet ASR recognition text
LLMResultLLM generates responseGet AI response content
AgentInstanceStatusStatus changeTrack AI current status (Listening/Thinking/Speaking/Idle)
AgentInstanceMetaInfoBroadcast startsGet metadata (voice, emotion, action, etc.)
InterruptedInterruptedHandle interruption logic
UserSpeakActionUser starts/stops speakingDetect user speech
AgentSpeakActionAI starts/stops speakingDetect AI speech
UserAudioDataUser starts speakingGet audio data for the corresponding round
ExceptionError occurredError handling

Callback Field Description (Key Fields)

Common fields for all callback events:

{
  "Event": "ASRResult",           // Event type
  "RoomId": "room_123",               // Room ID
  "AgentUserId": "agent_user_001",    // Agent user ID
  "Sequence": 1234567890,             // Event sequence number (globally incrementing)
  "Timestamp": 1746619200000,         // Timestamp (milliseconds)
  "Data": {
    "Round": 5,                         // 【Core】Round number
    ...
  }                     // Event specific data
}

Detailed Documentation

For complete documentation, refer to: Server Callbacks


Getting Full-Chain Round Examples


FAQ

Q1: When is Round generated?

A: Generated each time a new interaction is triggered, including:

  • User starts speaking
  • Calling SendAgentInstanceLLM
  • Calling SendAgentInstanceTTS

In interruption scenarios, new interactions trigger new Rounds, old Rounds terminate.

Q2: Will Round repeat?

A: No. Round is generated in ascending order and never repeats.

Q3: How to debug Round correlation issues?

A: It is recommended that the business side logs the Round value of each callback and views them grouped by Round:

[Round 5] ASRResult: "User question content"
[Round 5] LLMResult: "AI response content"
[Round 5] Status: Speaking → Idle

If Round is not continuous or events are missing, check:

  1. Whether callback configuration is enabled (CallbackConfig)
  2. Whether the network is normal (whether callbacks are lost)

Q4: What is the difference between Round and Sequence?

A:

  • Round: Identifies "which interaction", all events of the same interaction have the same Round
  • Sequence: Identifies "global order of events", all events strictly increment

Business side uses Round to correlate events of the same interaction, and uses Sequence to detect missing events.

Q5: Will the interrupted Round still receive LLMResult?

A: No. When interrupted, if LLM is still generating, the server stops generation and will not callback LLMResult.

Business side should clean up the pending state of the interrupted Round after receiving the Interrupted event.


Appendix: Callback Configuration Example

When creating an Agent instance, enable relevant callbacks:

{
  "Action": "CreateAgentInstance",
  "AppId": 1234567890,
  "AgentId": "agent_001",
  "UserId": "user_001",
  "RTC": {
    "RoomId": "room_123",
    "AgentStreamId": "stream_001",
    "AgentUserId": "agent_user_001",
    "UserStreamId": "user_stream_001"
  },
  "CallbackConfig": {
    "ASRResult": 1,              // Enable ASR callback
    "LLMResult": 1,              // Enable LLM callback
    "Interrupted": 1,            // Enable interruption callback
    "UserSpeakAction": 1,        // Enable user speech callback
    "AgentSpeakAction": 1,       // Enable AI speech callback
    "AgentInstanceStatus": 1     // Enable status callback
  }
}

Previous

Getting MetaInfo During AI Broadcast

Next

Role-Playing System Prompt