This article introduces how to display subtitles during a voice call between a user and an AI agent. As follows:
User's speech: Stream the user's spoken content as it is being recognized by ASR in real time.
AI agent's speech: Stream the AI agent's output content as it is being generated by LLM in real time.
Prerequisites
You should have already integrated the ZEGO Express SDK and the ZEGOCLOUD AI Agent, and implemented a basic voice-call feature following the Quick Start doc.
Quick Implementation
During voice conversations between users and AI agents, the ZEGOCLOUD AI Agent server sends ASR recognition text and LLM response text via custom messages in the RTC room to the client. By listening for these custom messages, the client can parse the status events and render the UI.
The processing flowchart for RTC room custom messages is as follows:
Listening to Custom Room Messages
By listening to the onRecvExperimentalAPI callback, the client can obtain custom room messages with method as liveroom.room.on_recive_room_channel_message. Below is an example of the listener callback code:
// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
// !mark
ZegoExpressEngine.getEngine().setEventHandler(new IZegoEventHandler() {
@Override
// !mark
public void onRecvExperimentalAPI(String content) {
super.onRecvExperimentalAPI(content);
try {
// Step 1: Parse the content into a JSONObject
JSONObject json = new JSONObject(content);
// Step 2: Check the value of the method field
if (json.has("method") && json.getString("method")
.equals("liveroom.room.on_recive_room_channel_message")) {
// Step 3: Get and parse params
JSONObject paramsObject = json.getJSONObject("params");
// !mark
String msgContent = paramsObject.getString("msg_content");
// JSON string example: "{\"Timestamp\":1745224717,\"SeqId\":1467995418,\"Round\":2132219714,\"Cmd\":3,\"Data\":{\"MessageId\":\"2135894567\",\"Text\":\"你\",\"EndFlag\":false}}"
// Parse the JSON string into an AudioChatMessage object
AudioChatMessage chatMessage = gson.fromJson(msgContent, AudioChatMessage.class);
if (chatMessage.cmd == 3) {
updateASRChatMessage(chatMessage);
} else if (chatMessage.cmd == 4) {
addOrUpdateLLMChatMessage(chatMessage);
}
}
} catch (JSONException e) {
e.printStackTrace();
}
}
});
/**
* Voice chat UI, structure of chat messages within the room sent by the backend server
*/
public static class AudioChatMessage {
@SerializedName("Timestamp")
public long timestamp;
@SerializedName("SeqId")
public int seqId;
@SerializedName("Round")
public int round;
@SerializedName("Cmd")
public int cmd;
@SerializedName("Data")
public Data data;
public static class Data {
@SerializedName("SpeakStatus")
public int speakStatus;
@SerializedName("Text")
public String text;
@SerializedName("MessageId")
public String messageId;
@SerializedName("EndFlag")
public boolean endFlag;
}
}
// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
// !mark
ZegoExpressEngine.getEngine().setEventHandler(new IZegoEventHandler() {
@Override
// !mark
public void onRecvExperimentalAPI(String content) {
super.onRecvExperimentalAPI(content);
try {
// Step 1: Parse the content into a JSONObject
JSONObject json = new JSONObject(content);
// Step 2: Check the value of the method field
if (json.has("method") && json.getString("method")
.equals("liveroom.room.on_recive_room_channel_message")) {
// Step 3: Get and parse params
JSONObject paramsObject = json.getJSONObject("params");
// !mark
String msgContent = paramsObject.getString("msg_content");
// JSON string example: "{\"Timestamp\":1745224717,\"SeqId\":1467995418,\"Round\":2132219714,\"Cmd\":3,\"Data\":{\"MessageId\":\"2135894567\",\"Text\":\"你\",\"EndFlag\":false}}"
// Parse the JSON string into an AudioChatMessage object
AudioChatMessage chatMessage = gson.fromJson(msgContent, AudioChatMessage.class);
if (chatMessage.cmd == 3) {
updateASRChatMessage(chatMessage);
} else if (chatMessage.cmd == 4) {
addOrUpdateLLMChatMessage(chatMessage);
}
}
} catch (JSONException e) {
e.printStackTrace();
}
}
});
/**
* Voice chat UI, structure of chat messages within the room sent by the backend server
*/
public static class AudioChatMessage {
@SerializedName("Timestamp")
public long timestamp;
@SerializedName("SeqId")
public int seqId;
@SerializedName("Round")
public int round;
@SerializedName("Cmd")
public int cmd;
@SerializedName("Data")
public Data data;
public static class Data {
@SerializedName("SpeakStatus")
public int speakStatus;
@SerializedName("Text")
public String text;
@SerializedName("MessageId")
public String messageId;
@SerializedName("EndFlag")
public boolean endFlag;
}
}
By implementing the ZegoEventHandler protocol and listening to the onRecvExperimentalAPI callback, the client can obtain room custom messages with method as liveroom.room.on_recive_room_channel_message. Below is an example of the callback listener code:
import 'dart:convert';
import 'package:flutter/cupertino.dart';
import 'package:zego_express_engine/zego_express_engine.dart';
class YourPage extends StatefulWidget {
const YourPage({super.key});
@override
State<YourPage> createState() => _YourPageState();
}
class _YourPageState extends State<YourPage> {
@override
void initState() {
super.initState();
ZegoExpressEngine.onRecvExperimentalAPI = onRecvExperimentalAPI;
}
@override
void dispose() {
super.dispose();
ZegoExpressEngine.onRecvExperimentalAPI = null;
}
@override
Widget build(BuildContext context) {
return YourMessageView();
}
/// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
void onRecvExperimentalAPI(String content) {
try {
/// Check if it is a room message
final contentMap = jsonDecode(content);
if (contentMap['method'] !=
'liveroom.room.on_recive_room_channel_message') {
return;
}
final params = contentMap['params'];
if (params == null) {
return;
}
final msgContent = params['msg_content'];
if (msgContent == null) {
return;
}
handleMessageContent(msgContent);
} catch (e) {}
}
/// Handle message content
void handleMessageContent(String msgContent) {
final Map<String, dynamic> json = jsonDecode(msgContent);
/// 解析基本信息
final int timestamp = json['Timestamp'] ?? 0;
final int seqId = json['SeqId'] ?? 0;
final int round = json['Round'] ?? 0;
final int cmdType = json['Cmd'] ?? 0;
final Map<String, dynamic> data =
json['Data'] != null ? Map<String, dynamic>.from(json['Data']) : {};
// Handle messages based on command type
switch (cmdType) {
case 3:
/// ASR text
handleRecvAsrMessage(data, seqId, round, timestamp);
break;
case 4:
/// LLM text
handleRecvLLMMessage(data, seqId, round, timestamp);
break;
}
}
}
import 'dart:convert';
import 'package:flutter/cupertino.dart';
import 'package:zego_express_engine/zego_express_engine.dart';
class YourPage extends StatefulWidget {
const YourPage({super.key});
@override
State<YourPage> createState() => _YourPageState();
}
class _YourPageState extends State<YourPage> {
@override
void initState() {
super.initState();
ZegoExpressEngine.onRecvExperimentalAPI = onRecvExperimentalAPI;
}
@override
void dispose() {
super.dispose();
ZegoExpressEngine.onRecvExperimentalAPI = null;
}
@override
Widget build(BuildContext context) {
return YourMessageView();
}
/// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
void onRecvExperimentalAPI(String content) {
try {
/// Check if it is a room message
final contentMap = jsonDecode(content);
if (contentMap['method'] !=
'liveroom.room.on_recive_room_channel_message') {
return;
}
final params = contentMap['params'];
if (params == null) {
return;
}
final msgContent = params['msg_content'];
if (msgContent == null) {
return;
}
handleMessageContent(msgContent);
} catch (e) {}
}
/// Handle message content
void handleMessageContent(String msgContent) {
final Map<String, dynamic> json = jsonDecode(msgContent);
/// 解析基本信息
final int timestamp = json['Timestamp'] ?? 0;
final int seqId = json['SeqId'] ?? 0;
final int round = json['Round'] ?? 0;
final int cmdType = json['Cmd'] ?? 0;
final Map<String, dynamic> data =
json['Data'] != null ? Map<String, dynamic>.from(json['Data']) : {};
// Handle messages based on command type
switch (cmdType) {
case 3:
/// ASR text
handleRecvAsrMessage(data, seqId, round, timestamp);
break;
case 4:
/// LLM text
handleRecvLLMMessage(data, seqId, round, timestamp);
break;
}
}
}
By listening to the recvExperimentalAPI callback, the client can obtain room custom messages with method as onRecvRoomChannelMessage. Below is an example of the callback listener code:
// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
// !mark
zg.on("recvExperimentalAPI", (result) => {
const { method, content } = result;
// !mark
if (method === "onRecvRoomChannelMessage") {
try {
// Parse the message
const recvMsg = JSON.parse(content.msgContent);
const { Cmd, SeqId, Data, Round } = recvMsg;
} catch (error) {
console.error("Failed to parse the message:", error);
}
}
});
// Enable the experimental API for onRecvRoomChannelMessage
// !mark
zg.callExperimentalAPI({ method: "onRecvRoomChannelMessage", params: {} });
// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
// !mark
zg.on("recvExperimentalAPI", (result) => {
const { method, content } = result;
// !mark
if (method === "onRecvRoomChannelMessage") {
try {
// Parse the message
const recvMsg = JSON.parse(content.msgContent);
const { Cmd, SeqId, Data, Round } = recvMsg;
} catch (error) {
console.error("Failed to parse the message:", error);
}
}
});
// Enable the experimental API for onRecvRoomChannelMessage
// !mark
zg.callExperimentalAPI({ method: "onRecvRoomChannelMessage", params: {} });
Room Custom Message Protocol
The fields of the room custom message are described as follows:
Field
Type
Description
Timestamp
Number
Timestamp, at the second level
SeqId
Number
Packet sequence number, may be out of order. Please sort the messages according to the sequence number. In extreme cases, the Id may not be continuous.
Round
Number
Dialogue round, increases each time the user starts speaking
Cmd
Number
3: ASR text.
4: LLM text.
Data
Object
Specific content, different Cmds correspond to different Data
Data varies depending on the Cmd as follows:
Processing Logic
Determine the message type based on the Cmd field, and obtain the message content from the Data field.
If you are working on a Vue project, you can download the subtitle component to your project and use it directly.
Vue Project Subtitle Component Usage Example
// Example code for using the subtitle component
// Import the chatHook in your page
import { useChat } from "useChat";
import { onMounted, onBeforeUnmount } from 'vue';
// Call the useChat method, pass in the Express SDK instance. The messages will be rendered in your subtitle component.
const { messages, setupEventListeners, clearMessages } = useChat(zg);
onMounted(() => {
// Register event listeners when the page loads
setupEventListeners()
})
onBeforeUnmount(() => {
// Clear messages when the page is destroyed
clearMessages()
})```
// Example code for using the subtitle component
// Import the chatHook in your page
import { useChat } from "useChat";
import { onMounted, onBeforeUnmount } from 'vue';
// Call the useChat method, pass in the Express SDK instance. The messages will be rendered in your subtitle component.
const { messages, setupEventListeners, clearMessages } = useChat(zg);
onMounted(() => {
// Register event listeners when the page loads
setupEventListeners()
})
onBeforeUnmount(() => {
// Clear messages when the page is destroyed
clearMessages()
})```
Precautions
Message Sorting Processing: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
Streaming Text Processing:
Each ASR text sent is the full text. Messages with the same MessageId should completely replace the previous content.
Each LLM text sent is incremental. Messages with the same MessageId need to be cumulatively displayed after sorting.
Memory Management: Please clear the cache of completed messages in time, especially when users engage in long conversations.