logo
On this page

Configuring TTS

Function Introduction

To match different personas and scenarios, you may need to:

  • Select different text-to-speech (TTS) vendors, such as Volcano Engine, MiniMax, Aliyun, etc.
  • Configure different voices.
  • Customize the audio of TTS, such as volume, speed, tone, etc.
  • Special rules can filter the content for TTS. For example, in "(happily) The weather is really nice today", the content inside the parentheses will be filtered out.

Prerequisites

  1. Enable AI Agent service
  2. Enable corresponding TTS vendor service:
    • Method 1: Experience directly with the zego_test account.
    • Method 2: Purchase TTS service through ZEGO. Please contact ZEGOCLOUD sales to obtain an account and authentication information.
    • Method 3: Purchase TTS service on your own and obtain key information, etc.

Usage Method

Currently, TTS related parameters can be set through 4 interfaces:

InterfaceDescription
Register AgentSet vendor, voice, speed, etc. parameters.
Create Agent Instance
Create Digital Human Agent Instance
Set vendor, voice, speed, etc. parameters.
Note
If not set, the TTS parameters carried by the registered Agent ( RegisterAgent ) will be used by default.
Update Agent InstanceSet voice, speed, etc. parameters.
Note
Does not support modifying the FilterText parameter.

TTS Parameters Description

Parameter NameTypeRequiredDescription
VendorStringYesText-to-speech (TTS) service provider. Optional values:
  • Aliyun: Aliyun TTS (note: this is normal speech synthesis, not CosyVoice).
  • CosyVoice: Aliyun CosyVoice TTS
  • ByteDance: Volcano Engine unidirectional streaming TTS.
  • ByteDanceFlowing: Volcano bidirectional streaming engine TTS.
  • MiniMax: MiniMax TTS

Note
This parameter cannot be updated when updating the agent instance.
ParamsObjectYesTTS configuration parameters, in JSON object format. Contains app parameters (for authentication) and other parameters (for adjusting TTS effects). Please refer to the Params parameter description below.
FilterTextArray of ObjectNoFilter the text in the specified punctuation symbols, then perform speech synthesis. For example, to filter the content in [] , set it to [{"BeginCharacters": "[", "EndCharacters": "]"}]
Note
This parameter cannot be updated when updating the agent instance.
TerminatorTextStringNoCan be used to set the termination text of TTS. If the content matching the TerminatorText string appears in the text input to TTS, the content from the TerminatorText string (including) will no longer be synthesized.
Note
Only one character can be set for bidirectional streaming. Maximum length: 4 characters.

Params Parameters Description

Parameter NameTypeRequiredDescription
appobjectYesUsed for TTS service authentication. The structure of the app parameter required varies depending on the value of Vendor. See the app parameter instructions for each vendor below.
Other Params-NoIn addition to the app parameter, you can also pass other TTS configuration parameters to adjust the speech synthesis effect. These parameters will be directly forwarded to the corresponding TTS service provider.

You can refer to the official documentation of each service provider according to the value of Vendor to obtain the required information:

The definitions of the app parameter and other TTS parameters vary by vendor. Please refer to the parameter instructions for each vendor below.


FilterText Parameters Description

FilterText is an Object array. Each Object contains two string type parameters: BeginCharacters and EndCharacters.

Parameter NameTypeRequiredDescription
BeginCharactersstringYesThe starting punctuation of the filtered text. For example, if you want to filter the content in (), set it to (.
EndCharactersstringYesThe ending punctuation of the filtered text. For example, if you want to filter the content in (), set it to ).

Previous

Configuring ASR

Next

Proactive Invocation of LLM and TTS

On this page

Back to top