logo
On this page

Get AI Agent Status and Latency Data


During real-time voice calls with AI Agents, you might need to obtain the AI agent instance's status or real-time change messages to handle subsequent operations promptly on the business side or ensure business stability. You can obtain this information through active API calls or by listening to corresponding server callbacks.

The information includes the following types:

  • Server exception events: including AI Agent service errors, Real-Time Communication (RTC) related errors, Large Language Model (LLM) related errors, Text-to-Speech (TTS) related errors (such as TTS concurrency limit exceeded), etc.
  • AI agent instance status:
    • Status that can be queried via server API: idle, listening, thinking, speaking, etc.
    • Status that can be monitored via server callbacks: agent instance creation success, interruption, and deletion success events.
  • AI agent average latency data:
    • Large Language Model (LLM) related latency.
    • Text-to-Speech (TTS) related latency.
    • AI Agent server total latency.
    • Client & server latency. Can be obtained through ZEGO Express SDK.

Listen for Server Exception Events

Note
Please contact ZEGOCLOUD Technical Support to configure the address for receiving AI Agent backend callbacks.

When there are exception events on the server, the AI Agent backend will send an exception event notification (Event is Exception) to the configured address above. Here's a callback content sample:

{
    "AppId": 123456789,
// !mark
    "Event": "Exception",
    "Nonce": "abcdd22113",
    "Timestamp":1741221508000,
    "Signature": "XXXXXXX",
    "Sequence": 1921825797275873300,
    "RoomId": "test_room",
    "AgentUserId": "test_agent",
    "AgentInstanceId": "1912124734317838336",
    "Data": {
        "Code": 2203,
        "Message": "The API key in the request is missing or invalid"
    }
}

For more detailed information, please refer to the Receiving Callback and Exception Codes documentation.

Get Agent Instance Status

Via Server API

Call the Query Agent Instance Status API (QueryAgentInstanceStatus), pass in the corresponding AgentInstanceId, and the server will return the current status of the AI agent instance (such as idle, listening, thinking, speaking, etc.).

Note
The AgentInstanceId field is included in the successful response when you create an agent instance (CreateAgentInstance).
Note
Please contact ZEGOCLOUD Technical Support to configure the address for receiving AI Agent backend callbacks.

Get Agent Latency Data

Note
Please contact ZEGOCLOUD Technical Support to configure the address for receiving AI Agent backend callbacks.

When an agent instance is successfully deleted, the AgentInstanceDeleted event will be triggered, which includes average latency data for conversations with the agent instance.

AgentInstanceDeleted callback data example
{
    "AppId": 1234567,
    "AgentInstanceId": "1912124734317838336",
    "AgentUserId": "agent_user_1",
    "RoomId": "room_1",
    "Sequence": 1234567890,
    "Data": {
        "Code": 0,
        "DeletedTimestamp": 1745502345138,
        "LatencyData": {
            "LLMTTFT": 613,
            "LLMTPS": 11.493,
            "TTSAudioFirstFrameTime": 783,
            "TotalCost": 1693
        }
    },
    "Event": "AgentInstanceDeleted",
    "Nonce": "7450395512627324902",
    "Signature": "fd9c1ce54e85bd92f48b0a805e82a52b0c0c6445",
    "Timestamp": 1745502313000
}

The latency data (average values) are defined as follows:

AI Agent耗时分析.png
ParameterTypeDescription
LLMTTFTIntLLM first token average latency (milliseconds). The time from requesting the Large Language Model to the Large Language Model returning the first non-empty token.
LLMTPSFloat64LLM average output speed (tokens/second). The average number of tokens output per second by the Large Language Model.
TTSAudioFirstFrameTimeIntTTS audio first frame average latency (milliseconds). From the first non-empty LLM token to the first TTS non-silent frame return (including request establishment time)
TotalCostIntAI Agent server average total latency (milliseconds):
  • User speaking: The time from when the AI Agent server pulls the stream and determines the user has finished speaking, to when TTS returns the first non-silent frame and starts pushing the stream. All server-generated latency, including at least Automatic Speech Recognition (ASR) latency, Large Language Model (LLM) related latency, Text-to-Speech (TTS) related latency, etc.
  • Custom LLM/TTS calls: The time from API call to start of stream pushing.

Previous

Display User And Agent Status

Next

Configure LLM