logo
On this page

Quick Start Voice Call

This document explains how to quickly integrate the client SDK (ZEGO Express SDK) and achieve voice interaction with an AI Agent.

Prerequisites

  • Contact ZEGOCLOUD technical support to get the version of ZEGO Express SDK that supports AI noise reduction and integrate it.

Sample Codes

Listed below are the sample codes of core functionalities. You can refer to these sample codes to implement your own business logic.

The following video demonstrates how to run the server and client (Web) sample code and interact with an AI agent by voice.

Overall Business Process

  1. Server side: Follow the Server Quick Start guide to run the server sample code and deploy your server
    • Integrate ZEGOCLOUD AI Agent APIs to manage AI agents.
  2. Client side: Run the sample code
    • Create and manage AI agents through your server.
    • Integrate ZEGO Express SDK for real-time communication.

After completing these two steps, you can add an AI agent to a room for real-time interaction with real users.

sequenceDiagram participant Client participant Your Server participant ZEGOCLOUD AI Agent Server Your Server->>Your Server: Register an AI agent Your Server->>ZEGOCLOUD AI Agent Server: Register an AI agent ZEGOCLOUD AI Agent Server-->>Your Server: Client->>Your Server: Notify server to start call Your Server->>ZEGOCLOUD AI Agent Server: Create an AI agent instance ZEGOCLOUD AI Agent Server->>ZEGOCLOUD AI Agent Server: The AI agent logs into the RTC room, publishes a stream, and plays the user stream ZEGOCLOUD AI Agent Server-->>Your Server: Your Server-->>Client: Client->Your Server: Request Token Your Server-->>Client: Token Client->>Client: Initialize ZEGO Express SDK, login to room and start publishing stream Client->>Client: User plays the AI agent stream Client->>Your Server: Notify server to stop call Your Server->>ZEGOCLOUD AI Agent Server: Delete the AI agent instance ZEGOCLOUD AI Agent Server-->>Your Server: Your Server-->>Client: Client->>Client: User stops publishing the stream and exits the room

Core Capability Implementation

Integrate ZEGO Express SDK

Please refer to Import the SDK > Method 2 to manually integrate the SDK. After integrating the SDK, follow these steps to initialize ZegoExpressEngine.

    1. Load the AI noise reduction module
    2. Instantiate ZegoExpressEngine
    3. Check system requirements (WebRTC support and microphone permissions)
  • Untitled
    import { ZegoExpressEngine } from "zego-express-engine-webrtc";
    import { VoiceChanger } from "zego-express-engine-webrtc/voice-changer";
    
    const appID = 1234567 // Obtain from ZEGOCLOUD Console
    const server = 'xxx' // Obtain from ZEGOCLOUD Console
    // Load AI noise reduction module
    ZegoExpressEngine.use(VoiceChanger);
    // Instantiate ZegoExpressEngine with appId and server configurations
    const zg = new ZegoExpressEngine(appID, server);
    // Check system requirements
    const checkSystemRequirements = async () => {
        // Detect WebRTC support
        const rtc_sup = await zg.checkSystemRequirements("webRTC");
        if (!rtc_sup.result) {
          // Browser does not support WebRTC
      }
        // Detect microphone permission status
        const mic_sup = await zg.checkSystemRequirements("microphone");
        if (!mic_sup.result) {
          // Microphone permission is not enabled
      }
    }
    checkSystemRequirements()
    
    1
    Copied!

    Notify Your Server to Start Call

    You can notify your server to start the call immediately after the real user enters the room on the client side. Asynchronous calls can help reduce call connection time. After receiving the start call notification, your server creates an AI agent instance using the same roomID and associated userID and streamID as the client, so that the AI agent can interact with real users in the same room through mutual stream publishing and playing.

    User logs in a RTC room and starts publishing a stream

    After a real user logs into the room, they start publishing streams.

    Note

    In this scenario, AI noise reduction should be enabled to achieve better results.

    The token used for login needs to be obtained from your server; please refer to the complete sample code.

    Note

    Please ensure that the roomID, userID, and streamID are unique under one ZEGOCLOUD APPID.

    • roomID: Generated by the user according to their own rules, it will be used to log into the Express SDK room. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
    • userID: Length should not exceed 32 bytes. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
    • streamID: Length should not exceed 256 bytes. Only numbers, English characters, and '-', '_' are supported.
    Client request to login to room and publish a stream
    const userId = "" // User ID for logging into the Express SDK room
    const roomId = "" // RTC Room ID
    const userStreamId = "" // User stream push ID
    async function enterRoom() {
      try {
        // Generate RTC Token [Reference Documentation] (https://www.zegocloud.com/docs/video-call/token?platform=web&language=javascript)
        const token = await Api.getToken();
        // Login to room
        await zg.loginRoom(roomId, token, {
          userID: userId,
          userName: "",
        });
    
        // Create local audio stream
        const localStream = await zg.createZegoStream({
          camera: {
            video: false,
            audio: true,
          },
        });
        if (localStream) {
          // Push local stream
          await zg.startPublishingStream(userStreamId, localStream);
          // Enable AI noise reduction (requires specially packaged ZEGO Express SDK)
          const enableResult = await zg.enableAiDenoise(localStream, true);
          if (enableResult.errorCode === 0) {
            return zg.setAiDenoiseMode(localStream, 1);
          }
        }
      } catch (error) {
        console.error("Failed to enter room:", error);
        throw error;
      }
    }
    enterRoom()
    
    1
    Copied!

    Play the AI Agent Stream

    By default, there is only one real user and one AI agent in the same room, so any new stream added is assumed to be the AI agent stream.

    Client request to play the AI agent stream
    // Listen to remote stream update events
    function setupEvent() {
      zg.on("roomStreamUpdate",
        async (roomID, updateType, streamList) => {
          if (updateType === "ADD" && streamList.length > 0) {
            try {
              for (const stream of streamList) {
                // Play the AI agent stream
                const mediaStream = await zg.startPlayingStream(stream.streamID);
                if (!mediaStream) return;
                const remoteView = await zg.createRemoteStreamView(mediaStream);
                if (remoteView) {
                 // A container with the id 'remoteSteamView' is required on the page to receive the AI agent stream [Reference Documentation](https://www.zegocloud.com/article/api?doc=Express_Video_SDK_API~javascript_web~class~ZegoStreamView)
                  remoteView.play("remoteSteamView", {
                    enableAutoplayDialog: false,
                  });
                }
              }
            } catch (error) {
              console.error("Failed to pull stream:", error);
            }
          }
        }
      );
    }
    
    1
    Copied!

    Congratulations🎉! After completing this step, you can ask the AI agent any question by voice, and the AI agent will answer your questions by voice!

    Delete the agent instance and the user exits the room

    The client calls the logout interface to exit the room and stops publishing and playing streams. At the same time, it notifies your server to end the call. After receiving the end call notification, your server will delete the AI agent instance, and the AI agent instance will automatically exit the room and stop publishing and playing streams. This completes a full interaction.

    Untitled
    // Exit room
    async function stopCall() {
      try {
        const response = await fetch(`${YOUR_SERVER_URL}/api/stop`, { // YOUR_SERVER_URL is the address of your Your Server
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
          }
        });
    
        const data = await response.json();
        console.log('End call result:', data);
        return data;
      } catch (error) {
        console.error('Failed to end call:', error);
        throw error;
      }
    }
    stopCall();
    zg.destroyLocalStream(localStream);
    zg.logoutRoom();
    
    1
    Copied!

    This is the complete core process for you to achieve real-time voice interaction with an AI agent.

    Previous

    Release Notes

    Next

    Display Subtitles