logo
On this page

Display Subtitles


This article introduces how to display subtitles during a voice call between a user and an AI agent. As follows:

  • User's speech: Stream the user's spoken content as it is being recognized by ASR in real time.
  • AI agent's speech: Stream the AI agent's output content as it is being generated by LLM in real time.
image.png

Prerequisites

You should have already integrated the ZEGO Express SDK and the ZEGOCLOUD AI Agent, and implemented a basic voice-call feature following the Quick Start doc.

Quick Implementation

During voice conversations between users and AI agents, the ZEGOCLOUD AI Agent server sends ASR recognition text and LLM response text via custom messages in the RTC room to the client. By listening for these custom messages, the client can parse the status events and render the UI.

The processing flowchart for RTC room custom messages is as follows:

flowchart TD Start([Start]) --> Init[Implement onRecvExperimentalAPI callback and initialize subtitle UI component] Init --> ParseMessage[Parse RTC room custom messages] ParseMessage --> |Cmd=3| ProcessASR[Process ASR text] ParseMessage --> |Cmd=4| ProcessLLM[Process LLM text] ProcessASR --> UpdateSubtitles1[Update user subtitles] ProcessLLM --> UpdateSubtitles2[Update AI agent subtitles] UpdateSubtitles1 --> HandleEndFlags[Clear message cache after message ends] UpdateSubtitles2 --> HandleEndFlags[Clear message cache after message ends] HandleEndFlags --> End([End])

Listening to Custom Room Messages

By implementing the ZegoEventHandler protocol and listening to the onRecvExperimentalAPI callback, the client can obtain room custom messages with method as liveroom.room.on_recive_room_channel_message. Below is an example of the callback listener code:

YourService.h/m
YourViewController.h/m
// Implement ZegoEventHandler protocol
@interface YourService () <ZegoEventHandler>
@property (nonatomic, strong) YourViewController *youViewController;
@end

@implementation YourService

// Handle messages received from express onRecvExperimentalAPI
- (void)onRecvExperimentalAPI:(NSString *)content {
    // Forward to view for message content parsing
    [self.youViewController handleExpressExperimentalAPIContent:content];
}

@end // YourService implementation
1
Copied!
// Implement ZegoEventHandler protocol in the header file
@interface YourViewController () <ZegoEventHandler>

@end

@implementation YourViewController

// Parse custom signaling messages
- (void)handleExpressExperimentalAPIContent:(NSString *)content {
    // Parse JSON content
    NSError *error;
    NSData *jsonData = [content dataUsingEncoding:NSUTF8StringEncoding];
    NSDictionary *contentDict = [NSJSONSerialization JSONObjectWithData:jsonData 
                                                        options:NSJSONReadingMutableContainers 
                                                          error:&error];
    if (error || !contentDict) {
        NSLog(@"JSON parsing failed: %@", error);
        return;
    }
    // Check if it's a room message
    NSString *method = contentDict[@"method"];
    if (![method isEqualToString:@"liveroom.room.on_recive_room_channel_message"]) {
        return;
    }
    // Get message parameters
    NSDictionary *params = contentDict[@"params"];
    if (!params) {
        return;
    }
    NSString *msgContent = params[@"msg_content"];
    NSString *sendIdName = params[@"send_idname"];
    NSString *sendNickname = params[@"send_nickname"];
    NSString *roomId = params[@"roomid"];
    if (!msgContent || !sendIdName || !roomId) {
         NSLog(@"parseExperimentalAPIContent Parameters incomplete: msgContent=%@, sendIdName=%@, roomId=%@",
                msgContent, sendIdName, roomId);
        return;
    }
    
    // JSON string example: "{\"Timestamp\":1745224717,\"SeqId\":1467995418,\"Round\":2132219714,\"Cmd\":3,\"Data\":{\"MessageId\":\"2135894567\",\"Text\":\"你\",\"EndFlag\":false}}"
    // Parse message content
    [self handleMessageContent:msgContent userID:sendIdName userName:sendNickname ?: @""];
}

// Handle message content
- (void)handleMessageContent:(NSString *)command userID:(NSString *)userID userName:(NSString *)userName{
    NSDictionary* msgDict = [self dictFromJson:command];
    if (!msgDict) {
        return;
    }
  
    // Parse basic information
    int cmd = [msgDict[@"Cmd"] intValue];
    int64_t seqId = [msgDict[@"SeqId"] longLongValue];
    int64_t round = [msgDict[@"Round"] longLongValue];
    int64_t timestamp = [msgDict[@"Timestamp"] longLongValue];
    NSDictionary *data = msgDict[@"Data"];
  
    // Handle messages based on command type
    switch (cmd) {
        case 3: // ASR text
            [self handleAsrText:data seqId:seqId round:round timestamp:timestamp];
            break;
        case 4: // LLM text
            [self handleLlmText:data seqId:seqId round:round timestamp:timestamp];
            break;
    }
}

@end // YourViewController implementation
1
Copied!

Room Custom Message Protocol

The fields of the room custom message are described as follows:

FieldTypeDescription
TimestampNumberTimestamp, at the second level
SeqIdNumberPacket sequence number, may be out of order. Please sort the messages according to the sequence number. In extreme cases, the Id may not be continuous.
RoundNumberDialogue round, increases each time the user starts speaking
CmdNumber
  • 3: ASR text.
  • 4: LLM text.
DataObjectSpecific content, different Cmds correspond to different Data

Data varies depending on the Cmd as follows:

Processing Logic

Determine the message type based on the Cmd field, and obtain the message content from the Data field.

Use the subtitle component

You can also download the subtitle component source code to your project for use.

Precautions

  • Message Sorting Processing: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
  • Streaming Text Processing:
  • Each ASR text sent is the full text. Messages with the same MessageId should completely replace the previous content.
  • Each LLM text sent is incremental. Messages with the same MessageId need to be cumulatively displayed after sorting.
  • Memory Management: Please clear the cache of completed messages in time, especially when users engage in long conversations.

Previous

Quick Start Voice Call