Clone Voice for AI Agent

During real-time voice interaction with an AI agent, you can switch the AI agent's voice to a desired voice, such as a user's voice. By recording just a few seconds of the target person's voice, you can instantly replicate their voice timbre, speaking style, accent, and acoustic environment.

Voice cloning is a value-added capability. For pricing details, please contact ZEGOCLOUD business staff.

Note

Currently, ZEGOCLOUD supports voice cloning and text-to-speech capabilities from service providers including BytePlus, MiniMax, and Alibaba Cloud.

Prerequisites

You have integrated the ZEGOCLOUD AI Agent service as shown in the Quick Start.
Contact ZEGOCLOUD technical support to select a service provider, activate TTS (Text-to-Speech/Speech Synthesis/Voice Cloning) service, and obtain relevant sub-account or API authentication information.

Steps

Clone voice according to service provider instructions

Use cloned voice in voice chats

When registering an AI agent or creating an AI agent instance, set the Params field in the TTS structure. This field will be passed through to the third-party TTS interface, including voice information:

MiniMax: Fill in voice_id
BytePlus: Fill in speaker_id

MiniMax

// Minimax, fill in voice_id to use the cloned voice
"TTS": {
    "Vendor": "MiniMax",
    "Params": {
        "app": {
// !mark(1:2)
            "group_id": "your_group_id",
            "api_key":  "your_api_key"
        },
        "model": "speech-02-turbo",
        "voice_setting": {
// !mark
            "voice_id": "clone_voice_id"
        }
    }
}

// Minimax, fill in voice_id to use the cloned voice
"TTS": {
    "Vendor": "MiniMax",
    "Params": {
        "app": {
// !mark(1:2)
            "group_id": "your_group_id",
            "api_key":  "your_api_key"
        },
        "model": "speech-02-turbo",
        "voice_setting": {
// !mark
            "voice_id": "clone_voice_id"
        }
    }
}