logo
On this page

CreateDigitalHumanAgentInstance

POST

https://aigc-aiagent-api.zegotech.cn/

With this interface, you can create a digital human agent instance and join the agent instance into a voice (RTC) conversation.

Note
  1. If the RTC room is not occupied by a real user after 120 seconds, the agent instance will be automatically destroyed, and the Event will be AgentInstanceDeleted callback, and the Data.Code will be 1202.
  2. By default, each account can have at most 10 digital human agent instances. If the limit is exceeded, the creation of a digital human agent instance will fail. If you need to adjust this limit, please contact ZEGOCLOUD Technical Support.

Request

Query Parameters

    Action stringrequired

    Possible values: [CreateDigitalHumanAgentInstance]

    Interface prototype parameters

    https://aigc-aiagent-api.zegotech.cn?Action=CreateDigitalHumanAgentInstance

    AppId uint32required

    💡Public parameter. Application ID, assigned by ZEGOCLOUD. Get it from the ZEGOCLOUD Admin Console.

    SignatureNonce stringrequired

    💡Public parameter. A 16-character hexadecimal random string (hex encoding of 8-byte random number). Refer to Signature sample code for how to generate.

    Timestamp int64required

    💡Public parameter. Current Unix timestamp, in seconds. Refer to Signature sample code for how to generate, with a maximum error of 10 minutes.

    Signature stringrequired

    💡Public parameter. Signature, used to verify the legitimacy of the request. Refer to Signing the requests for how to generate an API request signature.

    SignatureVersion stringrequired

    Possible values: [2.0]

    Default value: 2.0

    💡Public parameter. Signature version number.

Body

required
    AgentId stringrequired

    The unique identifier of the registered AI agent.

    UserId stringrequired

    Possible values: <= 32 characters

    The real user ID used to interact with this AI Agent instance. Only numbers, English characters, '-', and '_' are supported.

    RTC objectrequired

    RTC related information


    📌 Important Note

    All attribute character restrictions: only numbers, English characters, '_', '-', and '.' are supported.

    RoomId stringrequired

    Possible values: <= 128 characters

    RTC room ID.

    AgentStreamId stringrequired

    Possible values: <= 128 characters

    The stream ID used by the AI agent instance for streaming.

    📌 Important Note

    Ensure that multiple AI agent instances (even if they are not in the same RTC room) use different stream IDs, otherwise the streaming of the later created AI agent instance will fail.

    AgentUserId stringrequired

    Possible values: <= 32 characters

    The user ID of the AI agent instance.

    📌 Important Note

    Ensure that multiple AI agent instances (even if they are not in the same RTC room) use different user IDs, otherwise the earlier created AI agent instance will be kicked out of the RTC room.

    UserStreamId stringrequired

    Possible values: <= 128 characters

    The stream ID used by the real user for streaming.

    LLM object
    Url stringrequired

    The endpoint that receives the request (can be your own service or any LLM service provider's service) and must be compatible with OpenAI Chat Completions API.

    For example: https://api.openai.com/v1/chat/completions

    📌 Important Note

    If ApiKey is set to "zego_test", you must use one of the following Url addresses:

    • MiniMax:https://api.minimax.chat/v1/text/chatcompletion_v2
    • Volcano Engine (Doubao): https://ark.cn-beijing.volces.com/api/v3/chat/completions
    • Aliyun Bailei (Tongyi Qianwen): https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions
    • Stepfun: https://api.stepfun.com/v1/chat/completions
    ApiKey string

    The parameter used for authentication by the LLM service provider. It is empty by default, but must be provided in production environments.

    📌 Important Note

    During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

    Model stringrequired

    The LLM model. Different LLM service providers support different models, please refer to their official documentation to select the appropriate model.

    📌 Important Note

    If ApiKey is set to "zego_test", you must use one of the following models:

    • MiniMax:
      • MiniMax-Text-01
    • Volcano Engine (Doubao):
      • doubao-1-5-pro-32k-250115
      • doubao-1-5-lite-32k-250115
    • Aliyun Bailei (Tongyi Qianwen):
      • qwen-plus
    • Stepfun:
      • step-2-16k
    SystemPrompt string

    The system prompt of the AI agent. It is the predefined information that is added at the beginning when calling the LLM, used to control the output of the LLM. It can be role settings, prompts, and answer examples.

    Temperature number

    Possible values: >= 0 and <= 2

    Default value: 0.7

    The higher the value, the more random the output; the lower the value, the more concentrated and determined the output.

    TopP number

    Possible values: >= 0 and <= 1

    Default value: 0.9

    The sampling method. The smaller the value, the stronger the determinism; the larger the value, the stronger the randomness.

    Params object

    Other parameters supported by the LLM service provider, such as the maximum token limit. Different LLM providers support different parameters, please refer to their official documentation and fill in as needed.

    AddAgentInfo boolean

    Default value: false

    If this value is true, the AI Agent server will include the AI agent information(agent_info) in the request parameters when requesting the LLM service. The example of the AI agent information is as follows: Using Custom LLM. You can use this parameter to execute additional business logic in your custom LLM service.

    The structure of agent_info is as follows:

    • room_id: RTC room ID
    • user_id: User ID
    • agent_instance_id: AI agent instance ID
    AgentExtraInfo object
    Agent extra information, the server will pass this parameter in the request parameters when requesting the LLM service. The example of the extra information is as follows: Using Custom LLM.You can use this parameter to execute additional business logic in your custom LLM service.
    key string

    Extra information key.

    value

    Extra information value, can be of any type.

    TTS object
    Vendor stringrequired

    Possible values: [Aliyun, ByteDance, ByteDanceV3, ByteDanceFlowing, MiniMax, CosyVoice]

    The TTS service provider. Please refer to Configuring TTS > TTS Parameters for details.

    Params objectrequired
    TTS configuration parameters, in JSON object format. Contains app parameters (for authentication) and other parameters (for adjusting TTS effects). Please refer to Configuring TTS > Params Parameters for details.
    app objectrequired

    Used for TTS service authentication, the structure of the app parameter required by different Vendor values is different, please refer to Configuring TTS > Params Parameters for details.

    other_params string

    📌 Important Note

    other_params is not a valid parameter, it is only to explain how to pass the vendor parameters. Except for the app parameter, other parameters are directly passed to the vendor parameters. Please refer to Configuring TTS > Params Parameters for details.

    FilterText object[]

    Filter the text within the specified punctuation marks from the content returned by the LLM, and then perform speech synthesis.Note:

    • The content that should be placed within the specified punctuation marks must be defined in LLM > SystemPrompt.
    • This parameter cannot be updated when updating the AI agent instance.
  • Array[
  • BeginCharacters stringrequired

    The start punctuation mark of the filtered text. For example, if you want to filter the content in (), set it to (.

    EndCharacters stringrequired

    The end punctuation mark of the filtered text. For example, if you want to filter the content in (), set it to ).

  • ]
  • TerminatorText string

    Possible values: <= 4 characters

    Can be used to set the termination text of TTS. If the content in the input TTS text matches the TerminatorText string, the content from the TerminatorText string (including) will not be synthesized for this round of TTS.

    📌 Important Note

    Only one character can be set for bidirectional streaming.

    ASR object
    Vendor string

    Possible values: [Tencent, AliyunParaformer, AliyunGummy, Microsoft]

    Default value: Tencent

    ASR provider. Please refer to Configuring ASR > ASR Parameters for details.

    Params object

    Vendor parameters, please refer to Configuring ASR > Params Parameters for details.

    VADSilenceSegmentation number

    Possible values: >= 200 and <= 2000

    Default value: 500

    Set the number of seconds after which two sentences are no longer considered as one. Unit is ms, range [200, 2000], default is 500. Please refer to Speech Segmentation Control for details.

    PauseInterval number

    Possible values: >= 200 and <= 2000

    Set the number of seconds within which two sentences are considered as one, i.e., ASR multi-sentence concatenation. Unit is ms, range [200, 2000]. Only when this value is greater than VADSilenceSegmentation, ASR multi-sentence concatenation will be enabled. Please refer to Speech Segmentation Control for details.

    HotWord stringdeprecated

    This parameter has been deprecated. Please set it through the Params vendor parameters.

    MessageHistory object
    Configuration of the history messages used by the AI agent instance
    SyncMode integer

    Possible values: [0, 1]

    Default value: 0

    Message synchronization mode:

    • 0: Synchronize from the In-app Chat (ZIM)
    • 1: Synchronize through the Messages parameter
    Messages object[]
    Possible values: <= 100
    Message list
  • Array[
  • Role stringrequired

    Possible values: [user, assistant]

    The role of the message sender:

    • user: User
    • assistant: AI agent
    Content stringrequired

    Message content

  • ]
  • WindowSize Int

    Possible values: >= 0 and <= 500

    Default value: 20

    The number of recent history messages used when calling the LLM service. It affects the LLM context understanding ability, and it is recommended to set it to 10-30.

    ZIM object
    ZIM-related information.

    📌 Important Note

    - Only effective when MessageHistory.SyncMode is 0.

    - Please ensure that your project has enabled the ZIM service.

    - Please ensure that you have called the ZIM robot registration interface, and set the returned UserInfo.UserId as the RobotId.

    - It is recommended to register the robot in advance to improve the user information settings and enhance the efficiency of creating AI agent instances.

    RobotId string

    ZIM robot ID. That is, the UserInfo.UserId returned by calling the ZIM register robot interface. It is used to load the chat context between the user and the ZIM robot, and synchronize the messages generated during the conversation to ZIM. If this parameter is empty, the real-time interactive AI Agent backend will randomly generate one.

    LoadMessageCount integer

    Possible values: >= 0 and <= 500

    The number of messages to be fetched from the ZIM service as context when creating an AI agent instance. The default is the value of WindowSize (the upper limit).

    CallbackConfig object
    Server-side callback configuration

    📌 Important Note

    Before configuring the following parameters, you need to set the callback address according to Receiving Callback, and understand the specific field descriptions.

    ASRResult integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for ASR results.

    LLMResult integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for LLM results. If enabled, the ZEGOCLOUD server will return the LLM output result for each sentence.

    Interrupted integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for the AI agent being interrupted.

    UserSpeakAction integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for user speech.

    AgentSpeakAction integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for the AI agent speaking.

    UserAudioData integer

    Possible values: [0, 1]

    Default value: 0

    Whether to enable server-side callback for user speech audio data.

    AdvancedConfig object
    InterruptMode integer

    Possible values: [0, 1]

    Default value: 0

    The mode of interrupting the AI agent when the user speaks:

    • 0: Interrupt immediately. If the user speaks while the AI is speaking, the AI will be immediately interrupted and stop speaking.
    • 1: Do not interrupt. If the user speaks while the AI is speaking, the AI will not be affected until the content is finished.
    MaxIdleTime integer

    Possible values: >= 10 and <= 1800

    Default value: 120

    The automatic destruction time of the AI agent instance. If the user (UserId) of the conversation exceeds the MaxIdleTime and is not in the room, the AI agent instance will be automatically deleted by the background and the 1202 exception callback event will be triggered. MaxIdleTime defaults to 120s, with a value range of [10, 1800].

    DisableTTS boolean

    Default value: false

    Whether to disable the TTS function. If set to true, the AI agent instance will not perform speech synthesis.

    📌 Important Note

    When DisableTTS is true, the Create Digital Human Agent Instance and Trigger TTS interfaces will report errors.

    DigitalHuman objectrequired
    DigitalHumanId string

    Digital human ID

    ConfigId string

    Possible values: [mobile, web]

    Digital human configuration ID

    EncodeCode string

    Possible values: [H264, VP8]

    Default value: H264

    Digital human video encoding format

Responses

Success
Schema
    Code integer

    Return code. 0 indicates success, other values indicate failure. For more information on error codes and response handling recommendations, please refer to Return Codes.

    Message string

    Explanation of the request result

    RequestId string

    Request ID

    Data object
    AgentInstanceId string

    The unique identifier of the AI agent instance.

    DigitalHumanConfig string

    Digital human configuration, used by the digital human mobile SDK.

Previous

Create Agent Instance

Next

Update Agent Instance

On this page

Back to top