logo
On this page

Configuring ASR


Function Introduction

To improve the recognition accuracy of speech recognition (or speech-to-text) in different scenarios, the following methods can be used:

  • Select the appropriate vendor/recognition model: Support Tencent ASR, Aliyun Paraformer, Aliyun Gummy, Microsoft, etc.
  • Select the appropriate language: The default Tencent and Aliyun Paraformer models are for Chinese recognition, and Microsoft is for English recognition.
  • Set recognition hot words: In certain scenarios, there are usually some specialized words, such as character names, user IDs, function names, etc., which can be set as temporary hot words when creating an agent instance to improve the accuracy of speech recognition.

Prerequisites

Currently, Tencent is the default vendor that is supported and opened. If you need Aliyun, Microsoft, etc., please contact ZEGOCLOUD business to open.

Usage Method

Currently, ASR related parameters can be set through 4 interfaces:

接口说明
Register AgentSet vendor, hot words, language, etc. parameters.
Create Agent Instance
Create Digital Human Agent Instance
Set vendor, hot words, language, etc. parameters.
Note
If not set, the ASR parameters carried by the registered Agent ( RegisterAgent ) will be used by default.
Update Agent Instance
Note
Supports modifying hot words and languages. Other parameters please contact technical support for confirmation.

ASR Parameters

ParametersTypeRequiredDescription
VendorStringNoASR vendor, default is Tencent:
  • Tencent: Tencent
  • AliyunParaformer: Aliyun Paraformer
  • AliyunGummy: Aliyun Gummy
  • Microsoft: Microsoft ASR
HotWordStringNoThis parameter has been deprecated.
Please set it through the Params extended parameters, please refer to the hot word setting instructions for each vendor below.
ParamsObjectNoVendor parameters, please refer to the parameter setting instructions for each vendor below.
VADSilenceSegmentationnumberNoSet the number of milliseconds after which two sentences are no longer considered as one. Range [200, 2000], default is 500. Please refer to Speech Segmentation Control for details.
PauseIntervalnumberNoSet the number of milliseconds within which two sentences are considered as one, i.e., ASR multi-sentence concatenation. Range [200, 2000]. Only when this value is greater than VADSilenceSegmentation, ASR multi-sentence concatenation will be enabled. Please refer to Speech Segmentation Control for details.

The Params parameters for each vendor are as follows:

Previous

Configuring LLM

Next

Configuring TTS

On this page

Back to top