logo
On this page

Proactive Invocation of LLM and TTS

Large Language Models (LLMs) do not output text and voice proactively. Therefore, developers need to trigger the AI agent to speak based on certain rules, thereby making the real-time interactions more engaging. For example, if the user has not spoken for 5 seconds, the AI agent can speak a sentence through Text-to-Speech (TTS).

Ways for AI Agents to speak proactively:

  • Trigger LLM: You can simulate a user to initiate a message, thereby enabling the AI agent to output text and voice based on context.
  • Trigger TTS: You can make the AI agent speak a segment of text content, usually in a fixed pattern, such as "Hello, welcome to use ZEGOCLOUD AI Agent service."

Trigger LLM

Call the SendAgentInstanceLLM API to trigger the LLM to output text and voice.

When calling SendAgentInstanceLLM, the AI Agent server will concatenate a context, which consists of three parts:

  • Placed at the front is the SystemPrompt, the temporary system prompt for this conversation.
  • In the middle are the previous conversation records, the number of which is determined by WindowSize.
  • At the end is the Text set in this interface.

The text information passed to this method will not be recorded in the conversation message history, nor will it be delivered through RTC room messages. However, the responses generated by the LLM will be recorded in the conversation message history and will be delivered through RTC room messages.

The interface parameters are as follows:

ParameterTypeRequiredDescription
AgentInstanceIdStringYesThe unique identifier of the agent instance, obtained through the response parameter of the Create An Agent Instance interface.
TextStringYesThe text content sent to the LLM service.
SystemPromptStringNoThe temporary system prompt for this conversation. If not provided, it will use the SystemPrompt in the LLM parameters from Register An Agent or Create An Agent Instance.

Example request:

Untitled
{
    "AgentInstanceId": "1907755175297171456",
    "Text": "How's the weather today?"
}
1
Copied!

Trigger TTS

Call the SendAgentInstanceTTS API to make the agent speak a segment of text content.

The text message passed to this interface will be recorded in the conversation message history based on the AddHistory parameter as context input for the LLM, and this message will also be delivered through RTC room messages.

The interface parameters are as follows:

ParameterTypeRequiredDescription
AgentInstanceIdStringYesThe unique identifier of the agent instance, obtained through the response parameter of the Create An Agent Instance interface.
TextStringYesThe text content for TTS, with a maximum length of no more than 300 characters.
AddHistoryBooleanNoWhether to record the text message in the conversation message history as context input for the LLM. The default value is true.

Example request:

Untitled
{
    "AgentInstanceId": "1907780504753553408",
    "Text": "Hello, welcome to use ZEGOCLOUD AI Agent service."
}
1
Copied!

Previous

Configure LLM

Next

Configure ASR Hot Word