CreateDigitalHumanAgentInstance
https://aigc-aiagent-api.zegotech.cn/
With this interface, you can create a digital human agent instance and join the agent instance into a voice (RTC) conversation. 📌 Note: If the RTC room is not occupied by a real user after 120 seconds, the agent instance will be automatically destroyed, and the Event will be AgentInstanceDeleted callback, and the Data.Code will be 1202.
Request
Query Parameters
Possible values: [CreateDigitalHumanAgentInstance
]
Interface prototype parameters
https://aigc-aiagent-api.zegotech.cn?Action=CreateDigitalHumanAgentInstance
The unique Application ID assigned to your project by ZEGOCLOUD. Get it from the ZEGOCLOUD Admin Console.
Random string.
Unix timestamp, in seconds. The maximum allowed error is 10 minutes.
Signature, used to verify the legitimacy of the request. Refer to Signing the requests for how to generate an API request signature.
Possible values: [2.0
]
Signature version number, default value is 2.0.
- application/json
Body
required
- MiniMax:https://api.minimax.chat/v1/text/chatcompletion_v2
- Volcano Engine (Doubao): https://ark.cn-beijing.volces.com/api/v3/chat/completions
- Aliyun Bailei (Tongyi Qianwen): https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
- Stepfun: https://api.stepfun.com/v1/chat/completions
- MiniMax:
- MiniMax-Text-01
- Volcano Engine (Doubao):
- doubao-1-5-pro-32k-250115
- doubao-1-5-lite-32k-250115
- Aliyun Bailei (Tongyi Qianwen):
- qwen-plus
- Stepfun:
- step-2-16k
- room_id: RTC room ID
- user_id: User ID
- agent_instance_id: AI agent instance ID
- Aliyun: Aliyun
- ByteDance: ByteDance (Volcano Voice - Large Model Speech Synthesis API)
- ByteDanceFlowing: ByteDance (Volcano Voice - Streaming Speech Synthesis API (WebSocket))
- MiniMax: MiniMax
- CosyVoice: Aliyun CosyVoice
- Aliyun
- ByteDance
- ByteDanceFlowing
- MiniMax
- CosyVoice
- Array[
- ]
- 0: Synchronize from the In-app Chat (ZIM)
- 1: Synchronize through the Messages parameter
- Array[
- user: User
- assistant: AI agent
- ]
- 0: Interrupt immediately. If the user speaks while the AI is speaking, the AI will be immediately interrupted and stop speaking.
- 1: Do not interrupt. If the user speaks while the AI is speaking, the AI will not be affected until the content is finished.
The unique identifier of the registered AI agent.
Possible values: <= 32 characters
The real user ID used to log in to the RTC room. Only numbers, English characters, '-', and '_' are supported.
RTC objectrequired
RTC related information
📌 Important Note
All attribute character restrictions: only numbers, English characters, '_', '-', and '.' are supported.
Possible values: <= 128 characters
RTC room ID.
Possible values: <= 128 characters
The stream ID used by the AI agent instance for streaming.
📌 Important Note
Ensure that multiple AI agent instances (even if they are not in the same RTC room) use different stream IDs, otherwise the streaming of the later created AI agent instance will fail.
Possible values: <= 32 characters
The user ID of the AI agent instance.
📌 Important Note
Ensure that multiple AI agent instances (even if they are not in the same RTC room) use different user IDs, otherwise the earlier created AI agent instance will be kicked out of the RTC room.
Possible values: <= 128 characters
The stream ID used by the real user for streaming.
LLM object
The endpoint that receives the request (can be your own service or any LLM service provider's service) and must be compatible with OpenAI Chat Completions API.
For example: https://api.openai.com/v1/chat/completions
📌 Important Note
If ApiKey is set to "zego_test", you must use one of the following Url addresses:
The parameter used for authentication by the LLM service provider. It is empty by default, but must be provided in production environments.
📌 Important Note
During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.
The LLM model. Different LLM service providers support different models, please refer to their official documentation to select the appropriate model.
📌 Important Note
If ApiKey is set to "zego_test", you must use one of the following models:
The system prompt of the AI agent. It is the predefined information that is added at the beginning when calling the LLM, used to control the output of the LLM. It can be role settings, prompts, and answer examples.
Possible values: >= 0
and <= 2
Default value: 0.7
The higher the value, the more random the output; the lower the value, the more concentrated and determined the output.
Possible values: >= 0
and <= 1
Default value: 0.9
The sampling method. The smaller the value, the stronger the determinism; the larger the value, the stronger the randomness.
Other parameters supported by the LLM service provider, such as the maximum token limit. Different LLM providers support different parameters, please refer to their official documentation and fill in as needed.
Default value: false
If this value is true, the AI Agent server will include the AI agent information in the request parameters when requesting the LLM service. You can use this parameter to execute additional business logic in your custom LLM service.
The structure of agent_info is as follows:
TTS object
Possible values: [Aliyun
, ByteDance
, ByteDanceFlowing
, MiniMax
, CosyVoice
]
The TTS service provider. Options:
Params objectrequired
TTS configuration parameters, in JSON object format. Contains app parameters (for authentication) and other parameters (for adjusting TTS effects).
In addition to the app parameter, you can also pass in other TTS configuration parameters to adjust the speech synthesis effect. These parameters will be directly passed to the corresponding TTS service provider.
You can refer to the official documentation of the service provider for the required information according to the value of Vendor.
- Aliyun
: Intelligent Speech Interaction - Overview of speech synthesis - 2. Start the synthesis task
- ByteDance
: Large Model Speech Synthesis API - Parameter List - Request Parameters
- ByteDanceFlowing
: "Payload request parameters" in Streaming Text-to-Speech API (WebSocket) - WebSocket Binary Protocol
- MiniMax
: Voice Model - T2A v2 - WebSocket API - Interface Parameters
- CosyVoice
: "Payload request parameters" in CosyVoice WebSocket API for Speech Synthesis
app object required
Read the Alibaba Cloud docs Intelligent Speech Interaction - Create a project to create a project and get the AppKey from the Intelligent Speech Interaction console, and pass it here.
📌 Important Note
During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.
Read the Alibaba Cloud docs Intelligent Speech Interaction - Activate Intelligent Speech Interaction - Procedure to obtain the AccessKey ID and pass it here.
📌 Important Note
During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.
Read the Alibaba Cloud docs Intelligent Speech Interaction - Activate Intelligent Speech Interaction - Procedure to obtain the AccessKey Secret and pass it here.
📌 Important Note
During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.
Read the BytePlus docs Voice Technology - Quick Start - Console Usage FAQ under "Where can I get the following parameters: appid, cluster, token, authorization_type, secret_key?".
📌 Important Note
During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.
Read the BytePlus docs Voice Technology - Quick Start - Console Usage FAQ under "Where can I get the following parameters: appid, cluster, token, authorization_type, secret_key?".
📌 Important Note
During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.
Possible values: [volcano_tts
, volcano_mega
, volcano_icl
]
Default value: volcano_tts
BytePlus cluster configuration
📌 Important Note
This parameter must match the audio.voice_type parameter.
Read the BytePlus Voice docs Voice Console Guide - Step 2: Trial Use to get the App ID, and pass it here.
📌 Important Note
During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.
Read the BytePlus Voice docs Voice Console Guide - Step 2: Trial Use to get the App ID, and pass it here.
📌 Important Note
During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.
Possible values: [volc.service_type.10029
, volc.megatts.default
, volc.megatts.concurr
]
Default value: volc.service_type.10029
BytePlus resource ID
📌 Important Note
This parameter must match the req_params.speaker parameter.
Read the MiniMax docs Quick Start to get the API Key, and pass it here.
📌 Important Note
During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.
Read the CosyVoice docs Get API Key to get the API Key, and pass it here.
📌 Important Note
During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.
The voice of the speaker, default is xiaoyun. Except for the app parameter, the other parameters are only for demonstration of the transparent transmission parameter level. You can add or delete them according to your own needs.
Possible values: >= 0
and <= 100
The volume, range [0, 100]. Except for the app parameter, the other parameters are only for demonstration of the transparent transmission parameter level. You can add or delete them according to your own needs.
Possible values: >= -500
and <= 500
The speech rate, range [-500, 500]. Except for the app parameter, the other parameters are only for demonstration of the transparent transmission parameter level. You can add or delete them according to your own needs.
FilterText object[]
The start punctuation mark of the filtered text. For example, if you want to filter the content in (), set it to (.
The end punctuation mark of the filtered text. For example, if you want to filter the content in (), set it to ).
ASR object
The hot word list is used to improve the recognition accuracy. Format: Hotword1|Weight1,Hotword2|Weight2,Hotword3|Weight3
A single hot word cannot exceed 30 characters, cannot contain spaces, and the weight range is [-1, 11]. Up to 128 hot words are supported.
📌 Important Note
When the weight is 11, it means that the word is a super hot word. It is recommended to set only the important and must-take-effect hot words to 11, and too many hot words with a weight of 11 will affect the recognition effect.
Extended parameters, please contact ZEGOCLOUD technical support for details.
Possible values: >= 200
and <= 2000
Default value: 500
Set the time after which the user's speech is no longer considered as a sentence. The unit is ms, range [200, 2000], default is 500.
Possible values: >= 200
and <= 2000
Set the time within which two sentences are considered as one sentence, i.e., ASR multi-sentence concatenation. The unit is ms, range [200, 2000]. Only when this value is greater than VADSilenceSegmentation, ASR multi-sentence concatenation will be enabled.
MessageHistory object
Possible values: [0
, 1
]
Default value: 0
Message synchronization mode:
Messages object[]
<= 100
Possible values: [user
, assistant
]
The role of the message sender:
Message content
Possible values: >= 0
and <= 100
Default value: 20
The number of recent history messages used when calling the LLM service. It affects the LLM context understanding ability, and it is recommended to set it to 10-30.
ZIM object
📌 Important Note
- Only effective when MessageHistory.SyncMode is 0.
- Please ensure that your project has enabled the ZIM service.
- Please ensure that you have called the ZIM robot registration interface, and set the returned UserInfo.UserId as the RobotId.
- It is recommended to register the robot in advance to improve the user information settings and enhance the efficiency of creating AI agent instances.
ZIM robot ID. That is, the UserInfo.UserId returned by calling the ZIM register robot interface. It is used to load the chat context between the user and the ZIM robot, and synchronize the messages generated during the conversation to ZIM. If this parameter is empty, the real-time interactive AI Agent backend will randomly generate one.
Possible values: >= 0
and <= 100
The number of messages to be fetched from the ZIM service as context when creating an AI agent instance. The default is the value of WindowSize (the upper limit).
CallbackConfig object
📌 Important Note
Before configuring the following parameters, you need to set the callback address according to Receiving Callback, and understand the specific field descriptions.
Possible values: [0
, 1
]
Default value: 0
Whether to enable server-side callback for ASR results.
Possible values: [0
, 1
]
Default value: 0
Whether to enable server-side callback for LLM results. If enabled, the ZEGOCLOUD server will return the LLM output result for each sentence.
Possible values: [0
, 1
]
Default value: 0
Whether to enable server-side callback for the AI agent being interrupted.
Possible values: [0
, 1
]
Default value: 0
Whether to enable server-side callback for user speech.
Possible values: [0
, 1
]
Default value: 0
Whether to enable server-side callback for the AI agent speaking.
Possible values: [0
, 1
]
Default value: 0
Whether to enable server-side callback for user speech audio data.
AdvancedConfig object
Possible values: [0
, 1
]
Default value: 0
The mode of interrupting the AI agent when the user speaks:
DigitalHuman objectrequired
Digital human ID
Possible values: [mobile
, web
]
Digital human configuration ID
Possible values: [H264
, VP8
]
Default value: H264
Digital human video encoding format
Responses
- 200
- application/json
- Schema
- Example (from schema)
Schema
Return code. 0 indicates success, other values indicate failure. For more information on error codes and response handling recommendations, please refer to Return Codes.
Explanation of the request result
Request ID
Data object
The unique identifier of the AI agent instance.
Digital human configuration, used by the digital human mobile SDK.
{
"Code": 0,
"Message": "Success",
"RequestId": "8825223157230377926",
"Data": {
"AgentInstanceId": "1912122918452641792",
"DigitalHumanConfig": "eyJEaWdpdGFsSHVtYW5JZCI6ImU1ODNkMzVmLTk3OTMtNDJiNC1hYjFiLTE4OWEzNWI4OGQxYyIsIlN0cmVhbXMiOlt7IlJvb21JZCI6ImlyXzU1NTd5bDVoIiwiU3RyZWFtSWQiOiJpcl81NTU3eWw1aF8xNzEwMl9hZ2VudCIsIkVuY29kZUNvZGUiOiJIMjY0IiwiQ29uZmlnSWQiOiJ3ZWIifV19"
}
}
- curl
- python
- go
- nodejs
- ruby
- csharp
- php
- java
- powershell
- CURL
Click the "Send" button above and see the response here!