This article introduces how to display subtitles during a voice call between a user and an AI agent. As follows:
User's speech: Stream the user's spoken content as it is being recognized by ASR in real time.
AI agent's speech: Stream the AI agent's output content as it is being generated by LLM in real time.
Prerequisites
You should have already integrated the ZEGO Express SDK and the ZEGOCLOUD AI Agent, and implemented a basic voice-call feature following the Quick Start doc.
Quick Implementation
During voice conversations between users and AI agents, the ZEGOCLOUD AI Agent server sends ASR recognition text and LLM response text via custom messages in the RTC room to the client. By listening for these custom messages, the client can parse the status events and render the UI.
The processing flowchart for RTC room custom messages is as follows:
Listening to Custom Room Messages
By listening to the onRecvExperimentalAPI callback, the client can obtain custom room messages with method as liveroom.room.on_recive_room_channel_message. Below is an example of the listener callback code:
Untitled
// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.ZegoExpressEngine.getEngine().setEventHandler(newIZegoEventHandler(){@OverridepublicvoidonRecvExperimentalAPI(String content){super.onRecvExperimentalAPI(content);try{// Step 1: Parse the content into a JSONObjectJSONObject json =newJSONObject(content);// Step 2: Check the value of the method fieldif(json.has("method")&& json.getString("method").equals("liveroom.room.on_recive_room_channel_message")){// Step 3: Get and parse paramsJSONObject paramsObject = json.getJSONObject("params");String msgContent = paramsObject.getString("msg_content");// JSON string example: "{\"Timestamp\":1745224717,\"SeqId\":1467995418,\"Round\":2132219714,\"Cmd\":3,\"Data\":{\"MessageId\":\"2135894567\",\"Text\":\"你\",\"EndFlag\":false}}"// Parse the JSON string into an AudioChatMessage objectAudioChatMessage chatMessage = gson.fromJson(msgContent,AudioChatMessage.class);if(chatMessage.cmd ==3){updateASRChatMessage(chatMessage);}elseif(chatMessage.cmd ==4){addOrUpdateLLMChatMessage(chatMessage);}}}catch(JSONException e){
e.printStackTrace();}}});/**
* Voice chat UI, structure of chat messages within the room sent by the backend server
*/publicstaticclassAudioChatMessage{@SerializedName("Timestamp")publiclong timestamp;@SerializedName("SeqId")publicint seqId;@SerializedName("Round")publicint round;@SerializedName("Cmd")publicint cmd;@SerializedName("Data")publicData data;publicstaticclassData{@SerializedName("SpeakStatus")publicint speakStatus;@SerializedName("Text")publicString text;@SerializedName("MessageId")publicString messageId;@SerializedName("EndFlag")publicboolean endFlag;}}
By implementing the ZegoEventHandler protocol and listening to the onRecvExperimentalAPI callback, the client can obtain room custom messages with method as liveroom.room.on_recive_room_channel_message. Below is an example of the callback listener code:
YourService.h/m
YourViewController.h/m
// Implement ZegoEventHandler protocol@interfaceYourService()<ZegoEventHandler>@property(nonatomic, strong) YourViewController *youViewController;@end@implementation YourService
// Handle messages received from express onRecvExperimentalAPI-(void)onRecvExperimentalAPI:(NSString *)content {// Forward to view for message content parsing[self.youViewController handleExpressExperimentalAPIContent:content];}@end// YourService implementation
1234567891011121314
Copied!
// Implement ZegoEventHandler protocol in the header file@interfaceYourViewController()@end@implementation YourViewController
// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.// Parse custom signaling messages-(void)handleExpressExperimentalAPIContent:(NSString *)content {// Parse JSON content
NSError *error;
NSData *jsonData =[content dataUsingEncoding:NSUTF8StringEncoding];
NSDictionary *contentDict =[NSJSONSerialization JSONObjectWithData:jsonData
options:NSJSONReadingMutableContainers
error:&error];if(error ||!contentDict){NSLog(@"JSON parsing failed: %@", error);return;}// Check if it's a room message
NSString *method = contentDict[@"method"];if(![method isEqualToString:@"liveroom.room.on_recive_room_channel_message"]){return;}// Get message parameters
NSDictionary *params = contentDict[@"params"];if(!params){return;}
NSString *msgContent = params[@"msg_content"];
NSString *sendIdName = params[@"send_idname"];
NSString *sendNickname = params[@"send_nickname"];
NSString *roomId = params[@"roomid"];if(!msgContent ||!sendIdName ||!roomId){NSLog(@"parseExperimentalAPIContent Parameters incomplete: msgContent=%@, sendIdName=%@, roomId=%@",
msgContent, sendIdName, roomId);return;}// JSON string example: "{\"Timestamp\":1745224717,\"SeqId\":1467995418,\"Round\":2132219714,\"Cmd\":3,\"Data\":{\"MessageId\":\"2135894567\",\"Text\":\"你\",\"EndFlag\":false}}"// Parse message content[self handleMessageContent:msgContent userID:sendIdName userName:sendNickname ?:@""];}// Handle message content-(void)handleMessageContent:(NSString *)command userID:(NSString *)userID userName:(NSString *)userName{
NSDictionary* msgDict =[self dictFromJson:command];if(!msgDict){return;}// Parse basic informationint cmd =[msgDict[@"Cmd"] intValue];
int64_t seqId =[msgDict[@"SeqId"] longLongValue];
int64_t round =[msgDict[@"Round"] longLongValue];
int64_t timestamp =[msgDict[@"Timestamp"] longLongValue];
NSDictionary *data = msgDict[@"Data"];// Handle messages based on command typeswitch(cmd){case3:// ASR text[self handleAsrText:data seqId:seqId round:round timestamp:timestamp];break;case4:// LLM text[self handleLlmText:data seqId:seqId round:round timestamp:timestamp];break;}}@end// YourViewController implementation
import'dart:convert';import'package:flutter/cupertino.dart';import'package:zego_express_engine/zego_express_engine.dart';classYourPageextendsStatefulWidget{constYourPage({super.key});@overrideState<YourPage>createState()=>_YourPageState();}class _YourPageState extendsState<YourPage>{@overridevoidinitState(){super.initState();ZegoExpressEngine.onRecvExperimentalAPI = onRecvExperimentalAPI;}@overridevoiddispose(){super.dispose();ZegoExpressEngine.onRecvExperimentalAPI =null;}@overrideWidgetbuild(BuildContext context){returnYourMessageView();}/// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.voidonRecvExperimentalAPI(String content){try{/// Check if it is a room messagefinal contentMap =jsonDecode(content);if(contentMap['method']!='liveroom.room.on_recive_room_channel_message'){return;}final params = contentMap['params'];if(params ==null){return;}final msgContent = params['msg_content'];if(msgContent ==null){return;}handleMessageContent(msgContent);}catch(e){}}/// Handle message contentvoidhandleMessageContent(String msgContent){finalMap<String,dynamic> json =jsonDecode(msgContent);/// 解析基本信息final int timestamp = json['Timestamp']??0;final int seqId = json['SeqId']??0;final int round = json['Round']??0;final int cmdType = json['Cmd']??0;finalMap<String,dynamic> data =
json['Data']!=null?Map<String,dynamic>.from(json['Data']):{};// Handle messages based on command typeswitch(cmdType){case3:/// ASR texthandleRecvAsrMessage(data, seqId, round, timestamp);break;case4:/// LLM texthandleRecvLLMMessage(data, seqId, round, timestamp);break;}}}
By listening to the recvExperimentalAPI callback, the client can obtain room custom messages with method as onRecvRoomChannelMessage. Below is an example of the callback listener code:
Untitled
// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
zg.on("recvExperimentalAPI",(result)=>{const{ method, content }= result;if(method ==="onRecvRoomChannelMessage"){try{// Parse the messageconst recvMsg =JSON.parse(content.msgContent);const{ Cmd, SeqId, Data, Round }= recvMsg;}catch(error){
console.error("Failed to parse the message:", error);}}});// Enable the experimental API for onRecvRoomChannelMessage
zg.callExperimentalAPI({method:"onRecvRoomChannelMessage",params:{}});
123456789101112131415
Copied!
Room Custom Message Protocol
The fields of the room custom message are described as follows:
Field
Type
Description
Timestamp
Number
Timestamp, at the second level
SeqId
Number
Packet sequence number, may be out of order. Please sort the messages according to the sequence number. In extreme cases, the Id may not be continuous.
Round
Number
Dialogue round, increases each time the user starts speaking
Cmd
Number
3: ASR text.
4: LLM text.
Data
Object
Specific content, different Cmds correspond to different Data
Data varies depending on the Cmd as follows:
Cmd is 3
Cmd is 4
Field
Type
Description
Text
String
ASR text of user speech. Each issuance is the full text, supporting text correction.
MessageId
String
Message ID. It is unique for each round of ASR text message.
EndFlag
Bool
End flag, true indicates that the ASR text of this round has been processed.
Processing Logic
Determine the message type based on the Cmd field, and obtain the message content from the Data field.
Cmd is 3, ASR Text
Cmd is 4, LLM Text
Untitled
// Handle user messagefunctionhandleUserMessage(data, seqId, round){if(data.EndFlag){// User has finished speaking}const content = data.Text;if(content){// Use the ASR text corresponding to the latest seqId as the latest speech recognition result and update the UI}}
12345678910
Copied!
Untitled
-(void)handleAsrText:(NSDictionary *)data seqId:(int64_t)seqId round:(int64_t)round timestamp:(int64_t)timestamp {
NSString *content = data[@"Text"];
NSString *messageId = data[@"MessageId"];
BOOL endFlag =[data[@"EndFlag"] boolValue];if(content && content.length >0){// Process ASR message and update UI}}
123456789
Copied!
Untitled
voidhandleRecvAsrMessage(Map<String,dynamic> data,
int seqId,
int round,
int timestamp,){String text ='';String messageId ='';
bool endFlag =false;if(data.containsKey('Text')){
text = data['Text']??'';}if(data.containsKey('MessageId')){
messageId = data['MessageId']??'';}if(data.containsKey('EndFlag')){var val = data['EndFlag'];if(val is bool){
endFlag = val;}elseif(val is int){
endFlag = val !=0;}elseif(val isString){
endFlag = val =='true'|| val =='1';}}/// Process ASR message and update UI}
12345678910111213141516171819202122232425262728
Copied!
The corresponding message processing flow is shown in the figure below:
If you are working on a Vue project, you can download the subtitle component to your project and use it directly.
Vue Project Subtitle Component Usage Example
Precautions
Message Sorting Processing: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
Streaming Text Processing:
Each ASR text sent is the full text. Messages with the same MessageId should completely replace the previous content.
Each LLM text sent is incremental. Messages with the same MessageId need to be cumulatively displayed after sorting.
Memory Management: Please clear the cache of completed messages in time, especially when users engage in long conversations.