Speech Segmentation Control
Since LLM (Large Language Model) does not support streaming input, it is necessary to determine whether the user has finished speaking based on real-time ASR (Automatic Speech Recognition) results, and then request LLM to start a new round of Q&A.
To determine whether the user has finished speaking, check these parameters:
VADSilenceSegmentation
PauseInterval
Parameter Description
The two parameters that affect the determination of user's speech completion are in the ASR parameters of registering/updating agents and creating/updating agent instances. The detailed descriptions are as follows:
Parameter Name | Type | Required | Description |
---|---|---|---|
VADSilenceSegmentation | Number | No | Sets the duration (in milliseconds) of silence after which two utterances are no longer considered as one. Range: [200, 2000], Default: 500. |
PauseInterval | Number | No | Sets the duration (in milliseconds) within which two utterances are considered as one, enabling ASR multi-sentence concatenation. Range: [200, 2000]. ASR multi-sentence concatenation is only enabled when this value is greater than VADSilenceSegmentation. |
Scenario Examples

Configuration | Q&A Results |
---|---|
VADSilenceSegmentation = 500ms, PauseInterval not set | User is determined to have spoken twice, resulting in 2 turns of Q&A round 1: - user: The weather is nice today. I want to go out - assistant: Response 1 (interrupted by round 2) Context: Empty round 2: - user: What about you? - assistant: Response 2 Context: First Q&A round |
VADSilenceSegmentation = 500ms, PauseInterval = 1000ms | User is determined to have spoken once, resulting in 1 round of Q&A - user: The weather is nice today. I want to go out. What about you? - assistant: Response 1 Context: Empty |
Best Practice Configurations
Note
If you're unsure which configuration works better, we recommend using Scenario 2 configuration.
Scenario | VADSilenceSegmentation | PauseInterval |
---|---|---|
Scenario 1: Users speak in short, frequent bursts. E.g., companionship scenarios | 500ms | Not set |
Scenario 2: Users have mixed-length content and are sensitive to latency. E.g., customer service scenarios | 500ms | 1000~1500ms |
Scenario 3: Users typically speak for longer durations and are less sensitive to latency | 1000ms | Not set |