Configuring ASR

Function Introduction

To improve the recognition accuracy of speech recognition (or speech-to-text) in different scenarios, the following methods can be used:

Select the appropriate vendor/recognition model: Support Tencent ASR, Aliyun Paraformer, Aliyun Gummy, Microsoft, etc.
Select the appropriate language: The default Tencent and Aliyun Paraformer models are for Chinese recognition, and Microsoft is for English recognition.
Set recognition hot words: In certain scenarios, there are usually some specialized words, such as character names, user IDs, function names, etc., which can be set as temporary hot words when creating an agent instance to improve the accuracy of speech recognition.

Prerequisites

Currently, Tencent is the default vendor that is supported and opened. If you need Aliyun, Microsoft, etc., please contact ZEGOCLOUD business to open.

Usage Method

Currently, ASR related parameters can be set through 4 interfaces:

Interface	Description
Register Agent	Set vendor, hot words, language, etc. parameters.
Create Agent Instance Create Digital Human Agent Instance	Set vendor, hot words, language, etc. parameters. Note If not set, the ASR parameters carried by the registered Agent ( RegisterAgent ) will be used by default.
Update Agent Instance	Note Supports modifying hot words and languages. Other parameters please contact technical support for confirmation.

ASR Parameters

Parameters	Type	Required	Description
Vendor	String	No	ASR vendor, default is Tencent： Tencent: Tencent AliyunParaformer: Aliyun Paraformer AliyunGummy: Aliyun Gummy Microsoft: Microsoft ASR
~~HotWord~~	String	No	This parameter has been deprecated. Please set it through the Params extended parameters, please refer to the hot word setting instructions for each vendor below.
Params	Object	No	Vendor parameters, please refer to the parameter setting instructions for each vendor below.
VADSilenceSegmentation	number	No	Set the number of milliseconds after which two sentences are no longer considered as one. Range [200, 2000], default is 500. Please refer to Speech Segmentation Control for details.
PauseInterval	number	No	Set the number of milliseconds within which two sentences are considered as one, i.e., ASR multi-sentence concatenation. Range [200, 2000]. Only when this value is greater than VADSilenceSegmentation, ASR multi-sentence concatenation will be enabled. Please refer to Speech Segmentation Control for details.

The Params parameters for each vendor are as follows: