Display Subtitles

This article introduces how to display subtitles during a voice call between a user and an AI agent. As follows:

User's speech: Stream the user's spoken content as it is being recognized by ASR in real time.
AI agent's speech: Stream the AI agent's output content as it is being generated by LLM in real time.

Prerequisites

You should have already integrated the ZEGO Express SDK and the ZEGOCLOUD AI Agent, and implemented a basic voice-call feature following the Quick Start doc.

Use the subtitle component

You can also download the subtitle component source code to your project for use.

Subtitle Component Usage Example

private AudioChatMessageParser audioChatMessageParser = new AudioChatMessageParser();

ZegoExpressEngine.getEngine().setEventHandler(new IZegoEventHandler() {
    @Override
    public void onRecvExperimentalAPI(String content) {
        super.onRecvExperimentalAPI(content);
        try {
            // Step 1: Parse content into a JSONObject
            JSONObject json = new JSONObject(content);

            // Step 2: Check the value of the method field
            if (json.has("method") && json.getString("method")
                .equals("liveroom.room.on_recive_room_channel_message")) {
                // Step 3: Get params and parse them
                JSONObject paramsObject = json.getJSONObject("params");
                String msgContent = paramsObject.getString("msg_content");

                // AudioChatTextMessage will parse the JSON string
                audioChatMessageParser.parseAudioChatMessage(msgContent);
            }
        } catch (JSONException e) {
            e.printStackTrace();
        }
    }
});

audioChatMessageParser.setAudioChatMessageListListener(new AudioChatMessageListListener() {
    @Override
    public void onMessageListUpdated(List<AudioChatMessage> messagesList) {
        // Update the UI list
        binding.messageList.onMessageListUpdated(messagesList);
    }
});

private AudioChatMessageParser audioChatMessageParser = new AudioChatMessageParser();

ZegoExpressEngine.getEngine().setEventHandler(new IZegoEventHandler() {
    @Override
    public void onRecvExperimentalAPI(String content) {
        super.onRecvExperimentalAPI(content);
        try {
            // Step 1: Parse content into a JSONObject
            JSONObject json = new JSONObject(content);

            // Step 2: Check the value of the method field
            if (json.has("method") && json.getString("method")
                .equals("liveroom.room.on_recive_room_channel_message")) {
                // Step 3: Get params and parse them
                JSONObject paramsObject = json.getJSONObject("params");
                String msgContent = paramsObject.getString("msg_content");

                // AudioChatTextMessage will parse the JSON string
                audioChatMessageParser.parseAudioChatMessage(msgContent);
            }
        } catch (JSONException e) {
            e.printStackTrace();
        }
    }
});

audioChatMessageParser.setAudioChatMessageListListener(new AudioChatMessageListListener() {
    @Override
    public void onMessageListUpdated(List<AudioChatMessage> messagesList) {
        // Update the UI list
        binding.messageList.onMessageListUpdated(messagesList);
    }
});

Quick Implementation

If you don't want to use the subtitle component, you can also implement the subtitle component functionality yourself. Details are as follows:

During voice conversations between users and AI agents, the ZEGOCLOUD AI Agent server sends ASR recognition text and LLM response text via custom messages in the RTC room to the client. By listening for these custom messages, the client can parse the status events and render the UI.

The processing flowchart for RTC room custom messages is as follows:

Listening to Custom Room Messages

By listening to the onRecvExperimentalAPI callback, the client can obtain custom room messages with method as liveroom.room.on_recive_room_channel_message. Below is an example of the listener callback code(click to view the complete example code):

// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
// !mark
ZegoExpressEngine.getEngine().setEventHandler(new IZegoEventHandler() {
    @Override
// !mark
    public void onRecvExperimentalAPI(String content) {
        super.onRecvExperimentalAPI(content);
        try {
            // Step 1: Parse the content into a JSONObject
            JSONObject json = new JSONObject(content);

            // Step 2: Check the value of the method field
            if (json.has("method") && json.getString("method")
                .equals("liveroom.room.on_recive_room_channel_message")) {
                // Step 3: Get and parse params
                JSONObject paramsObject = json.getJSONObject("params");
// !mark
                String msgContent = paramsObject.getString("msg_content");

                // JSON string example: "{\"Timestamp\":1745224717,\"SeqId\":1467995418,\"Round\":2132219714,\"Cmd\":3,\"Data\":{\"MessageId\":\"2135894567\",\"Text\":\"你\",\"EndFlag\":false}}"
                // Parse the JSON string into an AudioChatMessage object
                AudioChatMessage chatMessage = gson.fromJson(msgContent, AudioChatMessage.class);
                if (chatMessage.cmd == 3) {
                    updateASRChatMessage(chatMessage);
                } else if (chatMessage.cmd == 4) {
                    addOrUpdateLLMChatMessage(chatMessage);
                }
            }
        } catch (JSONException e) {
            e.printStackTrace();
        }
    }
});

/**
 * Voice chat UI, structure of chat messages within the room sent by the backend server
 */
public static class AudioChatMessage {
    @SerializedName("Timestamp")
    public long timestamp;
    @SerializedName("SeqId")
    public int seqId;
    @SerializedName("Round")
    public int round;
    @SerializedName("Cmd")
    public int cmd;
    @SerializedName("Data")
    public Data data;
    public static class Data {
        @SerializedName("SpeakStatus")
        public int speakStatus;
        @SerializedName("Text")
        public String text;
        @SerializedName("MessageId")
        public String messageId;
        @SerializedName("EndFlag")
        public boolean endFlag;
    }
}

// WARNING!!!: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
// !mark
ZegoExpressEngine.getEngine().setEventHandler(new IZegoEventHandler() {
    @Override
// !mark
    public void onRecvExperimentalAPI(String content) {
        super.onRecvExperimentalAPI(content);
        try {
            // Step 1: Parse the content into a JSONObject
            JSONObject json = new JSONObject(content);

            // Step 2: Check the value of the method field
            if (json.has("method") && json.getString("method")
                .equals("liveroom.room.on_recive_room_channel_message")) {
                // Step 3: Get and parse params
                JSONObject paramsObject = json.getJSONObject("params");
// !mark
                String msgContent = paramsObject.getString("msg_content");

                // JSON string example: "{\"Timestamp\":1745224717,\"SeqId\":1467995418,\"Round\":2132219714,\"Cmd\":3,\"Data\":{\"MessageId\":\"2135894567\",\"Text\":\"你\",\"EndFlag\":false}}"
                // Parse the JSON string into an AudioChatMessage object
                AudioChatMessage chatMessage = gson.fromJson(msgContent, AudioChatMessage.class);
                if (chatMessage.cmd == 3) {
                    updateASRChatMessage(chatMessage);
                } else if (chatMessage.cmd == 4) {
                    addOrUpdateLLMChatMessage(chatMessage);
                }
            }
        } catch (JSONException e) {
            e.printStackTrace();
        }
    }
});

/**
 * Voice chat UI, structure of chat messages within the room sent by the backend server
 */
public static class AudioChatMessage {
    @SerializedName("Timestamp")
    public long timestamp;
    @SerializedName("SeqId")
    public int seqId;
    @SerializedName("Round")
    public int round;
    @SerializedName("Cmd")
    public int cmd;
    @SerializedName("Data")
    public Data data;
    public static class Data {
        @SerializedName("SpeakStatus")
        public int speakStatus;
        @SerializedName("Text")
        public String text;
        @SerializedName("MessageId")
        public String messageId;
        @SerializedName("EndFlag")
        public boolean endFlag;
    }
}

Room Custom Message Protocol

The fields of the room custom message are described as follows:

Field	Type	Description
Timestamp	Number	Timestamp, at the second level
SeqId	Number	Packet sequence number, may be out of order. Please sort the messages according to the sequence number. In extreme cases, the Id may not be continuous.
Round	Number	Dialogue round, increases each time the user starts speaking
Cmd	Number	3: ASR text. 4: LLM text.
Data	Object	Specific content, different Cmds correspond to different Data

Data varies depending on the Cmd as follows:

Processing Logic

Determine the message type based on the Cmd field, and obtain the message content from the Data field.

Precautions

Message Sorting Processing: The data received through custom room messages may be out of order, and sorting needs to be performed based on the SeqId field.
Streaming Text Processing:

Each ASR text sent is the full text. Messages with the same MessageId should completely replace the previous content.

Each LLM text sent is incremental. Messages with the same MessageId need to be cumulatively displayed after sorting.

Memory Management: Please clear the cache of completed messages in time, especially when users engage in long conversations.