CreateDigitalHumanAgentInstance

POST

https://aigc-aiagent-api.zegotech.cn/

With this interface, you can create a digital human agent instance and join the agent instance into a voice (RTC) conversation. 📌 Note: If the RTC room is not occupied by a real user after 120 seconds, the agent instance will be automatically destroyed, and the Event will be AgentInstanceDeleted callback, and the Data.Code will be 1202.

Request

Query Parameters

Action stringrequired

Possible values: [CreateDigitalHumanAgentInstance]

Interface prototype parameters

https://aigc-aiagent-api.zegotech.cn?Action=CreateDigitalHumanAgentInstance

AppId uint32required

The unique Application ID assigned to your project by ZEGOCLOUD. Get it from the ZEGOCLOUD Admin Console.

SignatureNonce stringrequired

Random string.

Timestamp int64required

Unix timestamp, in seconds. The maximum allowed error is 10 minutes.

Signature stringrequired

Signature, used to verify the legitimacy of the request. Refer to Signing the requests for how to generate an API request signature.

SignatureVersion stringrequired

Possible values: [2.0]

Signature version number, default value is 2.0.

application/json

Body

required

AgentId stringrequired

The unique identifier of the registered AI agent.

UserId stringrequired

Possible values: <= 32 characters

The real user ID used to log in to the RTC room. Only numbers, English characters, '-', and '_' are supported.

RTC objectrequired

RTC related information

📌 Important Note

All attribute character restrictions: only numbers, English characters, '_', '-', and '.' are supported.

RoomId stringrequired

Possible values: <= 128 characters

RTC room ID.

AgentStreamId stringrequired

Possible values: <= 128 characters

The stream ID used by the AI agent instance for streaming.

📌 Important Note

Ensure that multiple AI agent instances (even if they are not in the same RTC room) use different stream IDs, otherwise the streaming of the later created AI agent instance will fail.

AgentUserId stringrequired

Possible values: <= 32 characters

The user ID of the AI agent instance.

📌 Important Note

Ensure that multiple AI agent instances (even if they are not in the same RTC room) use different user IDs, otherwise the earlier created AI agent instance will be kicked out of the RTC room.

UserStreamId stringrequired

Possible values: <= 128 characters

The stream ID used by the real user for streaming.

LLM object

Url stringrequired

The endpoint that receives the request (can be your own service or any LLM service provider's service) and must be compatible with OpenAI Chat Completions API.

For example: https://api.openai.com/v1/chat/completions

📌 Important Note

If ApiKey is set to "zego_test", you must use one of the following Url addresses:

MiniMax：https://api.minimax.chat/v1/text/chatcompletion_v2

Volcano Engine (Doubao): https://ark.cn-beijing.volces.com/api/v3/chat/completions

Aliyun Bailei (Tongyi Qianwen): https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions

Stepfun: https://api.stepfun.com/v1/chat/completions

ApiKey string

The parameter used for authentication by the LLM service provider. It is empty by default, but must be provided in production environments.

📌 Important Note

During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

Model stringrequired

The LLM model. Different LLM service providers support different models, please refer to their official documentation to select the appropriate model.

📌 Important Note

If ApiKey is set to "zego_test", you must use one of the following models:

MiniMax:

MiniMax-Text-01

Volcano Engine (Doubao):

doubao-1-5-pro-32k-250115

doubao-1-5-lite-32k-250115

Aliyun Bailei (Tongyi Qianwen):

qwen-plus

Stepfun:

step-2-16k

SystemPrompt string

The system prompt of the AI agent. It is the predefined information that is added at the beginning when calling the LLM, used to control the output of the LLM. It can be role settings, prompts, and answer examples.

Temperature number

Possible values: >= 0 and <= 2

Default value: 0.7

The higher the value, the more random the output; the lower the value, the more concentrated and determined the output.

TopP number

Possible values: >= 0 and <= 1

Default value: 0.9

The sampling method. The smaller the value, the stronger the determinism; the larger the value, the stronger the randomness.

Params object

Other parameters supported by the LLM service provider, such as the maximum token limit. Different LLM providers support different parameters, please refer to their official documentation and fill in as needed.

AddAgentInfo boolean

Default value: false

If this value is true, the AI Agent server will include the AI agent information in the request parameters when requesting the LLM service. You can use this parameter to execute additional business logic in your custom LLM service.

The structure of agent_info is as follows:

room_id: RTC room ID
user_id: User ID
agent_instance_id: AI agent instance ID

TTS object

Vendor stringrequired

Possible values: [Aliyun, ByteDance, ByteDanceFlowing, MiniMax, CosyVoice]

The TTS service provider. Options:

Aliyun: Aliyun
ByteDance: ByteDance (Volcano Voice - Large Model Speech Synthesis API)
ByteDanceFlowing: ByteDance (Volcano Voice - Streaming Speech Synthesis API (WebSocket))
MiniMax: MiniMax
CosyVoice: Aliyun CosyVoice

Params objectrequired

TTS configuration parameters, in JSON object format. Contains app parameters (for authentication) and other parameters (for adjusting TTS effects).

In addition to the app parameter, you can also pass in other TTS configuration parameters to adjust the speech synthesis effect. These parameters will be directly passed to the corresponding TTS service provider.

You can refer to the official documentation of the service provider for the required information according to the value of Vendor.

- Aliyun: Intelligent Speech Interaction - Overview of speech synthesis - 2. Start the synthesis task

- ByteDance: Large Model Speech Synthesis API - Parameter List - Request Parameters

- ByteDanceFlowing: "Payload request parameters" in Streaming Text-to-Speech API (WebSocket) - WebSocket Binary Protocol

- MiniMax: Voice Model - T2A v2 - WebSocket API - Interface Parameters

- CosyVoice: "Payload request parameters" in CosyVoice WebSocket API for Speech Synthesis

app object required

The TTS service authentication parameter, which is different for different Vendor values. Please refer to the requirements of each vendor for the structure of the app parameter.

oneOf

Aliyun
ByteDance
ByteDanceFlowing
MiniMax
CosyVoice

app_key stringrequired

Read the Alibaba Cloud docs Intelligent Speech Interaction - Create a project to create a project and get the AppKey from the Intelligent Speech Interaction console, and pass it here.

📌 Important Note

During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

ak_id stringrequired

Read the Alibaba Cloud docs Intelligent Speech Interaction - Activate Intelligent Speech Interaction - Procedure to obtain the AccessKey ID and pass it here.

📌 Important Note

During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

ak_key stringrequired

Read the Alibaba Cloud docs Intelligent Speech Interaction - Activate Intelligent Speech Interaction - Procedure to obtain the AccessKey Secret and pass it here.

📌 Important Note

During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

appid stringrequired

Read the BytePlus docs Voice Technology - Quick Start - Console Usage FAQ under "Where can I get the following parameters: appid, cluster, token, authorization_type, secret_key?".

📌 Important Note

During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

token stringrequired

Read the BytePlus docs Voice Technology - Quick Start - Console Usage FAQ under "Where can I get the following parameters: appid, cluster, token, authorization_type, secret_key?".

📌 Important Note

During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

cluster required

Possible values: [volcano_tts, volcano_mega, volcano_icl]

Default value: volcano_tts

BytePlus cluster configuration

📌 Important Note

This parameter must match the audio.voice_type parameter.

appid stringrequired

Read the BytePlus Voice docs Voice Console Guide - Step 2: Trial Use to get the App ID, and pass it here.

📌 Important Note

During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

token stringrequired

Read the BytePlus Voice docs Voice Console Guide - Step 2: Trial Use to get the App ID, and pass it here.

📌 Important Note

During the test period (within 2 weeks after the AI Agent service is enabled), you can set this parameter value to "zego_test" to use this service.

resource_id required

Possible values: [volc.service_type.10029, volc.megatts.default, volc.megatts.concurr]

Default value: volc.service_type.10029

BytePlus resource ID

📌 Important Note

This parameter must match the req_params.speaker parameter.

voice string

The voice of the speaker, default is xiaoyun. Except for the app parameter, the other parameters are only for demonstration of the transparent transmission parameter level. You can add or delete them according to your own needs.

volume number

Possible values: >= 0 and <= 100

The volume, range [0, 100]. Except for the app parameter, the other parameters are only for demonstration of the transparent transmission parameter level. You can add or delete them according to your own needs.

speech_rate number

Possible values: >= -500 and <= 500

The speech rate, range [-500, 500]. Except for the app parameter, the other parameters are only for demonstration of the transparent transmission parameter level. You can add or delete them according to your own needs.

FilterText object[]

Filter the text within the specified punctuation marks from the content returned by the LLM, and then perform speech synthesis.Note:- The content that should be placed within the specified punctuation marks must be defined in LLM > SystemPrompt.- This parameter cannot be updated when updating the AI agent instance.

Array[

BeginCharacters stringrequired

The start punctuation mark of the filtered text. For example, if you want to filter the content in (), set it to (.

EndCharacters stringrequired

The end punctuation mark of the filtered text. For example, if you want to filter the content in (), set it to ).

]

ASR object

HotWord string

The hot word list is used to improve the recognition accuracy. Format: Hotword1|Weight1,Hotword2|Weight2,Hotword3|Weight3

A single hot word cannot exceed 30 characters, cannot contain spaces, and the weight range is [-1, 11]. Up to 128 hot words are supported.

📌 Important Note

When the weight is 11, it means that the word is a super hot word. It is recommended to set only the important and must-take-effect hot words to 11, and too many hot words with a weight of 11 will affect the recognition effect.

Params object

Extended parameters, please contact ZEGOCLOUD technical support for details.

VADSilenceSegmentation number

Possible values: >= 200 and <= 2000

Default value: 500

Set the time after which the user's speech is no longer considered as a sentence. The unit is ms, range [200, 2000], default is 500.

PauseInterval number

Possible values: >= 200 and <= 2000

Set the time within which two sentences are considered as one sentence, i.e., ASR multi-sentence concatenation. The unit is ms, range [200, 2000]. Only when this value is greater than VADSilenceSegmentation, ASR multi-sentence concatenation will be enabled.

MessageHistory object

Configuration of the history messages used by the AI agent instance

SyncMode integer

Possible values: [0, 1]

Default value: 0

Message synchronization mode:

0: Synchronize from the In-app Chat (ZIM)
1: Synchronize through the Messages parameter

Messages object[]

Possible values: <= 100

Message list

Array[

Role stringrequired

Possible values: [user, assistant]

The role of the message sender:

user: User
assistant: AI agent

Content stringrequired

Message content

]

WindowSize integer

Possible values: >= 0 and <= 100

Default value: 20

The number of recent history messages used when calling the LLM service. It affects the LLM context understanding ability, and it is recommended to set it to 10-30.

ZIM object

ZIM-related information.

📌 Important Note

- Only effective when MessageHistory.SyncMode is 0.

- Please ensure that your project has enabled the ZIM service.

- Please ensure that you have called the ZIM robot registration interface, and set the returned UserInfo.UserId as the RobotId.

- It is recommended to register the robot in advance to improve the user information settings and enhance the efficiency of creating AI agent instances.

RobotId string

ZIM robot ID. That is, the UserInfo.UserId returned by calling the ZIM register robot interface. It is used to load the chat context between the user and the ZIM robot, and synchronize the messages generated during the conversation to ZIM. If this parameter is empty, the real-time interactive AI Agent backend will randomly generate one.

LoadMessageCount integer

Possible values: >= 0 and <= 100

The number of messages to be fetched from the ZIM service as context when creating an AI agent instance. The default is the value of WindowSize (the upper limit).

CallbackConfig object

Server-side callback configuration

📌 Important Note

Before configuring the following parameters, you need to set the callback address according to Receiving Callback, and understand the specific field descriptions.

ASRResult integer

Possible values: [0, 1]

Default value: 0

Whether to enable server-side callback for ASR results.

LLMResult integer

Possible values: [0, 1]

Default value: 0

Whether to enable server-side callback for LLM results. If enabled, the ZEGOCLOUD server will return the LLM output result for each sentence.

Interrupted integer

Possible values: [0, 1]

Default value: 0

Whether to enable server-side callback for the AI agent being interrupted.

UserSpeakAction integer

Possible values: [0, 1]

Default value: 0

Whether to enable server-side callback for user speech.

AgentSpeakAction integer

Possible values: [0, 1]

Default value: 0

Whether to enable server-side callback for the AI agent speaking.

UserAudioData integer

Possible values: [0, 1]

Default value: 0

Whether to enable server-side callback for user speech audio data.

AdvancedConfig object

InterruptMode integer

Possible values: [0, 1]

Default value: 0

The mode of interrupting the AI agent when the user speaks:

0: Interrupt immediately. If the user speaks while the AI is speaking, the AI will be immediately interrupted and stop speaking.
1: Do not interrupt. If the user speaks while the AI is speaking, the AI will not be affected until the content is finished.

DigitalHuman objectrequired

DigitalHumanId string

Digital human ID

ConfigId string

Possible values: [mobile, web]

Digital human configuration ID

EncodeCode string

Possible values: [H264, VP8]

Default value: H264

Digital human video encoding format

Responses

Success

application/json

Schema
Example (from schema)

Schema

Code integer

Return code. 0 indicates success, other values indicate failure. For more information on error codes and response handling recommendations, please refer to Return Codes.

Message string

Explanation of the request result

RequestId string

Request ID

Data object

AgentInstanceId string

The unique identifier of the AI agent instance.

DigitalHumanConfig string

Digital human configuration, used by the digital human mobile SDK.

{
  "Code": 0,
  "Message": "Success",
  "RequestId": "8825223157230377926",
  "Data": {
    "AgentInstanceId": "1912122918452641792",
    "DigitalHumanConfig": "eyJEaWdpdGFsSHVtYW5JZCI6ImU1ODNkMzVmLTk3OTMtNDJiNC1hYjFiLTE4OWEzNWI4OGQxYyIsIlN0cmVhbXMiOlt7IlJvb21JZCI6ImlyXzU1NTd5bDVoIiwiU3RyZWFtSWQiOiJpcl81NTU3eWw1aF8xNzEwMl9hZ2VudCIsIkVuY29kZUNvZGUiOiJIMjY0IiwiQ29uZmlnSWQiOiJ3ZWIifV19"
  }
}

Copied!

CURL

Copied!

Base URL

https://aigc-aiagent-api.zegotech.cn

Unified access address (no regional distinction)

Parameters

Actionqueryrequired

AppIdqueryrequired

SignatureNoncequeryrequired

Timestampqueryrequired

Signaturequeryrequired

SignatureVersionqueryrequired

Bodyrequired

Example (from schema)
Example

{
  "AgentId": "Jacky",
  "UserId": "user_1",
  "RTC": {
    "RoomId": "room_1",
    "AgentStreamId": "agent_stream_1",
    "AgentUserId": "agent_user_1",
    "UserStreamId": "user_stream_1"
  },
  "LLM": {
    "Url": "https://ark.cn-beijing.volces.com/api/v3/chat/completions",
    "ApiKey": "zego_test",
    "Model": "doubao-1-5-lite-32k-250115",
    "SystemPrompt": "You are a friendly assistant",
    "Temperature": 0.7,
    "TopP": 0.9,
    "Params": {
      "max_tokens": 16384
    },
    "AddAgentInfo": false
  },
  "TTS": {
    "Vendor": "ByteDance",
    "Params": {
      "app": {
        "appid": "zego_test",
        "token": "zego_test",
        "cluster": "volcano_tts"
      },
      "audio": {
        "voice_type": "zh_female_qingxinnvsheng_mars_bigtts",
        "loudness_ratio": 1,
        "speed_ratio": 1
      }
    },
    "FilterText": [
      {
        "BeginCharacters": "(",
        "EndCharacters": ")"
      }
    ]
  },
  "ASR": {
    "HotWord": "ZEGO|10,AI|10,Agent|10",
    "Params": {},
    "VADSilenceSegmentation": 500,
    "PauseInterval": 800
  },
  "MessageHistory": {
    "SyncMode": 0,
    "Messages": [
      {
        "Role": "user",
        "Content": "Hello, I want to know about the product information"
      }
    ],
    "WindowSize": 20,
    "ZIM": {
      "RobotId": "@RBT#robot_123",
      "LoadMessageCount": 20
    }
  },
  "CallbackConfig": {
    "ASRResult": 0,
    "LLMResult": 0,
    "Interrupted": 0,
    "UserSpeakAction": 0,
    "AgentSpeakAction": 0,
    "UserAudioData": 0
  },
  "AdvancedConfig": {
    "InterruptMode": 0
  },
  "DigitalHuman": {
    "DigitalHumanId": "xiaozhi",
    "ConfigId": "mobile",
    "EncodeCode": "H264"
  }
}

RESPONSEClear

Click the "Send" button above and see the response here!

CreateDigitalHumanAgentInstance

https://aigc-aiagent-api.zegotech.cn/

Request

Query Parameters

Body

Responses​

Responses