Configuring TTS

Function Introduction

To match different personas and scenarios, you may need to:

Select different text-to-speech (TTS) vendors, such as Volcano Engine, MiniMax, Aliyun, etc.
Configure different voices.
Customize the audio of TTS, such as volume, speed, tone, etc.
Special rules can filter the content for TTS. For example, in "(happily) The weather is really nice today", the content inside the parentheses will be filtered out.

Enable AI Agent service
Enable corresponding TTS vendor service:
- Method 1: Experience directly with the zego_test account.
- Method 2: Purchase TTS service through ZEGO. Please contact ZEGOCLOUD sales to obtain an account and authentication information.
- Method 3: Purchase TTS service on your own and obtain key information, etc.

Currently, TTS related parameters can be set through 4 interfaces:

Interface	Description
Register Agent	Set vendor, voice, speed, etc. parameters.
Create Agent Instance Create Digital Human Agent Instance	Set vendor, voice, speed, etc. parameters. Note If not set, the TTS parameters carried by the registered Agent ( RegisterAgent ) will be used by default.
Update Agent Instance	Set voice, speed, etc. parameters. Note Does not support modifying the `FilterText` parameter.

Parameter Name	Type	Required	Description
Vendor	String	Yes	Text-to-speech (TTS) service provider. Optional values: Aliyun: Aliyun TTS (note: this is normal speech synthesis, not CosyVoice). CosyVoice: Aliyun CosyVoice TTS ByteDance: Volcano Engine unidirectional streaming TTS. ByteDanceV3: Volcano Engine V3 version unidirectional streaming TTS. ByteDanceFlowing: Volcano bidirectional streaming engine TTS. MiniMax: MiniMax TTS Note This parameter cannot be updated when updating the agent instance.
Params	Object	Yes	TTS configuration parameters, in JSON object format. Contains app parameters (for authentication) and other parameters (for adjusting TTS effects). Please refer to the Params parameter description below.
FilterText	Array of Object	No	Filter the text in the specified punctuation symbols, then perform speech synthesis. For example, to filter the content in [] , set it to `[{"BeginCharacters": "[", "EndCharacters": "]"}]` Note This parameter cannot be updated when updating the agent instance.
TerminatorText	String	No	Can be used to set the termination text of TTS. If the content matching the TerminatorText string appears in the text input to TTS, the content from the TerminatorText string (including) will no longer be synthesized. Note Only one character can be set for bidirectional streaming. Maximum length: 4 characters.

Parameter Name	Type	Required	Description
app	object	Yes	Used for TTS service authentication. The structure of the app parameter required varies depending on the value of Vendor. See the app parameter instructions for each vendor below.
Other Params	-	No	Besides the `app` parameter, you can also provide additional TTS configuration parameters to further customize the speech synthesis effect. These parameters are directly passed through to the associated TTS service provider. Refer to the official documentation for each vendor (by the value of Vendor) for detailed parameter information: Aliyun: Speech Synthesis - API Reference CosyVoice: "Payload request parameters" in CosyVoice WebSocket API ByteDance: Speech Synthesis API - Parameter List - Request Parameters ByteDanceV3: Unidirectional streaming WebSocket V3 - Parameter List - Request Parameters ByteDanceFlowing: "Payload request parameters" in Bidirectional Streaming API - WebSocket Binary Protocol MiniMax: Speech Model - T2A v2 - WebSocket - API Request Parameters

The definitions of the app parameter and other TTS parameters vary by vendor. Please refer to the parameter instructions for each vendor below.

FilterText is an Object array. Each Object contains two string type parameters: BeginCharacters and EndCharacters.

Parameter Name	Type	Required	Description
BeginCharacters	string	Yes	The starting punctuation of the filtered text. For example, if you want to filter the content in (), set it to (.
EndCharacters	string	Yes	The ending punctuation of the filtered text. For example, if you want to filter the content in (), set it to ).