Configuring ASR
Function Introduction
To improve the recognition accuracy of speech recognition (or speech-to-text) in different scenarios, the following methods can be used:
- Select the appropriate vendor/recognition model: Support Tencent ASR, Aliyun Paraformer, Aliyun Gummy, Microsoft, etc.
- Select the appropriate language: The default Tencent and Aliyun Paraformer models are for Chinese recognition, and Microsoft is for English recognition.
- Set recognition hot words: In certain scenarios, there are usually some specialized words, such as character names, user IDs, function names, etc., which can be set as temporary hot words when creating an agent instance to improve the accuracy of speech recognition.
Prerequisites
Currently, Tencent is the default vendor that is supported and opened. If you need Aliyun, Microsoft, etc., please contact ZEGOCLOUD business to open.
Usage Method
Currently, ASR related parameters can be set through 4 interfaces:
接口 | 说明 |
---|---|
Register Agent | Set vendor, hot words, language, etc. parameters. |
Create Agent Instance Create Digital Human Agent Instance | Set vendor, hot words, language, etc. parameters. Note If not set, the ASR parameters carried by the registered Agent ( RegisterAgent ) will be used by default. |
Update Agent Instance | Note Supports modifying hot words and languages. Other parameters please contact technical support for confirmation. |
ASR Parameters
Parameters | Type | Required | Description |
---|---|---|---|
Vendor | String | No | ASR vendor, default is Tencent:
|
String | No | This parameter has been deprecated. Please set it through the Params extended parameters, please refer to the hot word setting instructions for each vendor below. | |
Params | Object | No | Vendor parameters, please refer to the parameter setting instructions for each vendor below. |
VADSilenceSegmentation | number | No | Set the number of milliseconds after which two sentences are no longer considered as one. Range [200, 2000], default is 500. Please refer to Speech Segmentation Control for details. |
PauseInterval | number | No | Set the number of milliseconds within which two sentences are considered as one, i.e., ASR multi-sentence concatenation. Range [200, 2000]. Only when this value is greater than VADSilenceSegmentation, ASR multi-sentence concatenation will be enabled. Please refer to Speech Segmentation Control for details. |
The Params parameters for each vendor are as follows: