logo
On this page

CreateDigitalHumanAgentInstance

POST

https://aigc-aiagent-api.zegotech.cn/

With this interface, you can create a digital human agent instance and join the agent instance into a voice (RTC) conversation. 📌 Note: If the RTC room is not occupied by a real user after 120 seconds, the agent instance will be automatically destroyed, and the Event will be AgentInstanceDeleted callback, and the Data.Code will be 1202.

Request

Query Parameters

    Action stringrequired

    Possible values: [CreateDigitalHumanAgentInstance]

    Interface prototype parameters

    https://aigc-aiagent-api.zegotech.cn?Action=CreateDigitalHumanAgentInstance

    AppId uint32required

    The unique Application ID assigned to your project by ZEGOCLOUD. Get it from the ZEGOCLOUD Admin Console.

    SignatureNonce stringrequired

    Random string.

    Timestamp int64required

    Unix timestamp, in seconds. The maximum allowed error is 10 minutes.

    Signature stringrequired

    Signature, used to verify the legitimacy of the request. Refer to Signing the requests for how to generate an API request signature.

    SignatureVersion stringrequired

    Possible values: [2.0]

    Signature version number, default value is 2.0.

Body

required
    AgentId stringrequired

    The unique identifier of the registered AI agent.

    UserId stringrequired

    Possible values: <= 32 characters

    The real user ID used to log in to the RTC room. Only numbers, English characters, '-', and '_' are supported.

    RTC objectrequired

    RTC related information


    📌 Important Note

    All attribute character restrictions: only numbers, English characters, '_', '-', and '.' are supported.

    RoomId stringrequired

    Possible values: <= 128 characters

    RTC room ID.

    AgentStreamId stringrequired

    Possible values: <= 128 characters

    The stream ID used by the AI agent instance for streaming.

    📌 Important Note

    Ensure that multiple AI agent instances (even if they are not in the same RTC room) use different stream IDs, otherwise the streaming of the later created AI agent instance will fail.

    AgentUserId stringrequired

    Possible values: <= 32 characters

    The user ID of the AI agent instance.

    📌 Important Note

    Ensure that multiple AI agent instances (even if they are not in the same RTC room) use different user IDs, otherwise the earlier created AI agent instance will be kicked out of the RTC room.

    UserStreamId stringrequired

    Possible values: <= 128 characters

    The stream ID used by the real user for streaming.

    LLM object
    Url stringrequired

    The endpoint that receives the request (can be your own service or any LLM service provider's service) and must be compatible with OpenAI Chat Completions API.

    For example: https://api.openai.com/v1/chat/completions

    📌 Important Note

    If ApiKey is set to "zego_test", you must use one of the following Url addresses:

    • MiniMax:https://api.minimax.chat/v1/text/chatcompletion_v2
    • Volcano Engine (Doubao): https://ark.cn-beijing.volces.com/api/v3/chat/completions
    • Aliyun Bailei (Tongyi Qianwen): https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
    • Stepfun: https://api.stepfun.com/v1/chat/completions
    ApiKey string

    The parameter used for authentication by the LLM service provider. It is empty by default, but must be provided in production environments.

    📌 Important Note

    During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

    Model stringrequired

    The LLM model. Different LLM service providers support different models, please refer to their official documentation to select the appropriate model.

    📌 Important Note

    If ApiKey is set to "zego_test", you must use one of the following models:

    • MiniMax:
      • MiniMax-Text-01
    • Volcano Engine (Doubao):
      • doubao-1-5-pro-32k-250115
      • doubao-1-5-lite-32k-250115
    • Aliyun Bailei (Tongyi Qianwen):
      • qwen-plus
    • Stepfun:
      • step-2-16k
    SystemPrompt string

    The system prompt of the AI agent. It is the predefined information that is added at the beginning when calling the LLM, used to control the output of the LLM. It can be role settings, prompts, and answer examples.

    Temperature number

    Possible values: >= 0 and <= 2

    Default value: 0.7

    The higher the value, the more random the output; the lower the value, the more concentrated and determined the output.

    TopP number

    Possible values: >= 0 and <= 1

    Default value: 0.9

    The sampling method. The smaller the value, the stronger the determinism; the larger the value, the stronger the randomness.

    Params object

    Other parameters supported by the LLM service provider, such as the maximum token limit. Different LLM providers support different parameters, please refer to their official documentation and fill in as needed.

    AddAgentInfo boolean

    Default value: false

    If this value is true, the AI Agent server will include the AI agent information in the request parameters when requesting the LLM service. You can use this parameter to execute additional business logic in your custom LLM service.

    The structure of agent_info is as follows:

    • room_id: RTC room ID
    • user_id: User ID
    • agent_instance_id: AI agent instance ID
    TTS object
    Vendor stringrequired

    Possible values: [Aliyun, ByteDance, ByteDanceFlowing, MiniMax, CosyVoice]

    The TTS service provider. Options:

    • Aliyun: Aliyun
    • ByteDance: ByteDance (Volcano Voice - Large Model Speech Synthesis API)
    • ByteDanceFlowing: ByteDance (Volcano Voice - Streaming Speech Synthesis API (WebSocket))
    • MiniMax: MiniMax
    • CosyVoice: Aliyun CosyVoice
    Params objectrequired

    TTS configuration parameters, in JSON object format. Contains app parameters (for authentication) and other parameters (for adjusting TTS effects).


    In addition to the app parameter, you can also pass in other TTS configuration parameters to adjust the speech synthesis effect. These parameters will be directly passed to the corresponding TTS service provider.

    You can refer to the official documentation of the service provider for the required information according to the value of Vendor.

    - Aliyun: Intelligent Speech Interaction - Overview of speech synthesis - 2. Start the synthesis task

    - ByteDance: Large Model Speech Synthesis API - Parameter List - Request Parameters

    - ByteDanceFlowing: "Payload request parameters" in Streaming Text-to-Speech API (WebSocket) - WebSocket Binary Protocol

    - MiniMax: Voice Model - T2A v2 - WebSocket API - Interface Parameters

    - CosyVoice: "Payload request parameters" in CosyVoice WebSocket API for Speech Synthesis

    app object required
    The TTS service authentication parameter, which is different for different Vendor values. Please refer to the requirements of each vendor for the structure of the app parameter.
    oneOf
    app_key stringrequired

    Read the Alibaba Cloud docs Intelligent Speech Interaction - Create a project to create a project and get the AppKey from the Intelligent Speech Interaction console, and pass it here.

    📌 Important Note

    During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

    ak_id stringrequired

    Read the Alibaba Cloud docs Intelligent Speech Interaction - Activate Intelligent Speech Interaction - Procedure to obtain the AccessKey ID and pass it here.

    📌 Important Note

    During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

    ak_key stringrequired

    Read the Alibaba Cloud docs Intelligent Speech Interaction - Activate Intelligent Speech Interaction - Procedure to obtain the AccessKey Secret and pass it here.

    📌 Important Note

    During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

    voice string

    The voice of the speaker, default is xiaoyun. Except for the app parameter, the other parameters are only for demonstration of the transparent transmission parameter level. You can add or delete them according to your own needs.

    volume number

    Possible values: >= 0 and <= 100

    The volume, range [0, 100]. Except for the app parameter, the other parameters are only for demonstration of the transparent transmission parameter level. You can add or delete them according to your own needs.

    speech_rate number

    Possible values: >= -500 and <= 500

    The speech rate, range [-500, 500]. Except for the app parameter, the other parameters are only for demonstration of the transparent transmission parameter level. You can add or delete them according to your own needs.

    FilterText object[]
    Filter the text within the specified punctuation marks from the content returned by the LLM, and then perform speech synthesis.Note:- The content that should be placed within the specified punctuation marks must be defined in LLM > SystemPrompt.- This parameter cannot be updated when updating the AI agent instance.
  • Array[
  • BeginCharacters stringrequired

    The start punctuation mark of the filtered text. For example, if you want to filter the content in (), set it to (.

    EndCharacters stringrequired

    The end punctuation mark of the filtered text. For example, if you want to filter the content in (), set it to ).

  • ]
  • ASR object
    HotWord string

    The hot word list is used to improve the recognition accuracy. Format: Hotword1|Weight1,Hotword2|Weight2,Hotword3|Weight3

    A single hot word cannot exceed 30 characters, cannot contain spaces, and the weight range is [-1, 11]. Up to 128 hot words are supported.

    📌 Important Note

    When the weight is 11, it means that the word is a super hot word. It is recommended to set only the important and must-take-effect hot words to 11, and too many hot words with a weight of 11 will affect the recognition effect.

    Params object

    Extended parameters, please contact ZEGOCLOUD technical support for details.

    VADSilenceSegmentation number

    Possible values: >= 200 and <= 2000

    Default value: 500

    Set the time after which the user's speech is no longer considered as a sentence. The unit is ms, range [200, 2000], default is 500.

    PauseInterval number

    Possible values: >= 200 and <= 2000

    Set the time within which two sentences are considered as one sentence, i.e., ASR multi-sentence concatenation. The unit is ms, range [200, 2000]. Only when this value is greater than VADSilenceSegmentation, ASR multi-sentence concatenation will be enabled.

    MessageHistory object
    Configuration of the history messages used by the AI agent instance
    SyncMode integer

    Possible values: [0, 1]

    Default value: 0

    Message synchronization mode:

    • 0: Synchronize from the In-app Chat (ZIM)
    • 1: Synchronize through the Messages parameter
    Messages object[]
    Possible values: <= 100
    Message list
  • Array[
  • Role stringrequired

    Possible values: [user, assistant]

    The role of the message sender:

    • user: User
    • assistant: AI agent
    Content stringrequired

    Message content

  • ]
  • WindowSize integer

    Possible values: >= 0 and <= 100

    Default value: 20

    The number of recent history messages used when calling the LLM service. It affects the LLM context understanding ability, and it is recommended to set it to 10-30.

    ZIM object
    ZIM-related information.

    📌 Important Note

    - Only effective when MessageHistory.SyncMode is 0.

    - Please ensure that your project has enabled the ZIM service.

    - Please ensure that you have called the ZIM robot registration interface, and set the returned UserInfo.UserId as the RobotId.

    - It is recommended to register the robot in advance to improve the user information settings and enhance the efficiency of creating AI agent instances.

    RobotId string

    ZIM robot ID. That is, the UserInfo.UserId returned by calling the ZIM register robot interface. It is used to load the chat context between the user and the ZIM robot, and synchronize the messages generated during the conversation to ZIM. If this parameter is empty, the real-time interactive AI Agent backend will randomly generate one.

    LoadMessageCount integer

    Possible values: >= 0 and <= 100

    The number of messages to be fetched from the ZIM service as context when creating an AI agent instance. The default is the value of WindowSize (the upper limit).

    CallbackConfig object
    Server-side callback configuration

    📌 Important Note

    Before configuring the following parameters, you need to set the callback address according to Receiving Callback, and understand the specific field descriptions.

    ASRResult integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for ASR results.

    LLMResult integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for LLM results. If enabled, the ZEGOCLOUD server will return the LLM output result for each sentence.

    Interrupted integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for the AI agent being interrupted.

    UserSpeakAction integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for user speech.

    AgentSpeakAction integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for the AI agent speaking.

    UserAudioData integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for user speech audio data.

    AdvancedConfig object
    InterruptMode integer

    Possible values: [0, 1]

    Default value: 0

    The mode of interrupting the AI agent when the user speaks:

    • 0: Interrupt immediately. If the user speaks while the AI is speaking, the AI will be immediately interrupted and stop speaking.
    • 1: Do not interrupt. If the user speaks while the AI is speaking, the AI will not be affected until the content is finished.
    DigitalHuman objectrequired
    DigitalHumanId string

    Digital human ID

    ConfigId string

    Possible values: [mobile, web]

    Digital human configuration ID

    EncodeCode string

    Possible values: [H264, VP8]

    Default value: H264

    Digital human video encoding format

Responses

Success
Schema
    Code integer

    Return code. 0 indicates success, other values indicate failure. For more information on error codes and response handling recommendations, please refer to Return Codes.

    Message string

    Explanation of the request result

    RequestId string

    Request ID

    Data object
    AgentInstanceId string

    The unique identifier of the AI agent instance.

    DigitalHumanConfig string

    Digital human configuration, used by the digital human mobile SDK.


1
Copied!
Request
Collapse all
Base URL
https://aigc-aiagent-api.zegotech.cn
Unified access address (no regional distinction)
Parameters
queryrequired
queryrequired
queryrequired
queryrequired
queryrequired
queryrequired
Bodyrequired
{
"AgentId": "Jacky",
"UserId": "user_1",
"RTC": {
"RoomId": "room_1",
"AgentStreamId": "agent_stream_1",
"AgentUserId": "agent_user_1",
"UserStreamId": "user_stream_1"
},
"LLM": {
"Url": "https://ark.cn-beijing.volces.com/api/v3/chat/completions",
"ApiKey": "zego_test",
"Model": "doubao-1-5-lite-32k-250115",
"SystemPrompt": "You are a friendly assistant",
"Temperature": 0.7,
"TopP": 0.9,
"Params": {
"max_tokens": 16384
},
"AddAgentInfo": false
},
"TTS": {
"Vendor": "ByteDance",
"Params": {
"app": {
"appid": "zego_test",
"token": "zego_test",
"cluster": "volcano_tts"
},
"audio": {
"voice_type": "zh_female_qingxinnvsheng_mars_bigtts",
"loudness_ratio": 1,
"speed_ratio": 1
}
},
"FilterText": [
{
"BeginCharacters": "(",
"EndCharacters": ")"
}
]
},
"ASR": {
"HotWord": "ZEGO|10,AI|10,Agent|10",
"Params": {},
"VADSilenceSegmentation": 500,
"PauseInterval": 800
},
"MessageHistory": {
"SyncMode": 0,
"Messages": [
{
"Role": "user",
"Content": "Hello, I want to know about the product information"
}
],
"WindowSize": 20,
"ZIM": {
"RobotId": "@RBT#robot_123",
"LoadMessageCount": 20
}
},
"CallbackConfig": {
"ASRResult": 0,
"LLMResult": 0,
"Interrupted": 0,
"UserSpeakAction": 0,
"AgentSpeakAction": 0,
"UserAudioData": 0
},
"AdvancedConfig": {
"InterruptMode": 0
},
"DigitalHuman": {
"DigitalHumanId": "xiaozhi",
"ConfigId": "mobile",
"EncodeCode": "H264"
}
}
RESPONSEClear

Click the "Send" button above and see the response here!

Previous

Create Agent Instance

Next

Update Agent Instance

On this page

Back to top