logo
On this page

Quick Start Voice Call

2026-02-05

This document explains how to quickly integrate the client SDK (ZEGO Express SDK) and achieve voice interaction with an AI Agent.

Prerequisites

Sample Codes

The following is the example code for the business backend that integrates the real-time interactive AI Agent API. You can refer to the example code to implement your own business logic.

Below are the client sample codes, you can refer to these sample codes to implement your own business logic.

Overall Business Process

  1. Server side: Follow the Server Quick Start guide to run the server sample code and deploy your server
    • Integrate ZEGOCLOUD AI Agent APIs to manage AI agents.
  2. Client side: Run the sample code
    • Create and manage AI agents through your server.
    • Integrate ZEGO Express SDK for real-time communication.

After completing these two steps, you can add an AI agent to a room for real-time interaction with real users.

Core Capability Implementation

Integrate ZEGO Express SDK

Please refer to Integrate the SDK > Method 1 to integrate the SDK using "pub" dependency. After integrating the SDK, follow these steps to initialize ZegoExpressEngine.

If including web platform, please refer to Import the SDK for Flutter Web projects to manually import JS files.

Warning

You must use the ZEGO Express SDK version optimized for AI Agent from the Download SDK and Demo page, otherwise subtitles will not display properly.

1

Go to android/app/src/main directory, open AndroidManifest.xml file, and add permissions

AndroidManifest.xml
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.BLUETOOTH" />
<uses-permission android:name="android.permission.ACCESS_WIFI_STATE" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
2

Go to ios/Runner directory, open Info.plist file, and add permissions

Info.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
    ...
    <key>UIBackgroundModes</key>
    <array>
        <string>audio</string>
    </array>
    <key>NSMicrophoneUsageDescription</key>
    <string>Need to access microphone to chat</string>
</dict>
</plist>
3

Go to ios directory, open Podfile file, and add permissions

Podfile
post_install do |installer|
  installer.pods_project.targets.each do |target|
    flutter_additional_ios_build_settings(target)

<!-- !mark(1:8) -->
    # Start of the permission_handler configuration
    target.build_configurations.each do |config|
      config.build_settings['GCC_PREPROCESSOR_DEFINITIONS'] ||= [
        '$(inherited)',
        'PERMISSION_MICROPHONE=1',
      ]
    end
    # End of the permission_handler configuration
  end
end
4

Request microphone permission at runtime

import 'package:permission_handler/permission_handler.dart';

void main() {
  WidgetsFlutterBinding.ensureInitialized();

  Permission.microphone.request().then((status) {
    runApp(const MyApp());
  });
}
5

Create and initialize ZegoExpressEngine

await ZegoExpressEngine.createEngineWithProfile(
  /// Set this scenario to avoid requesting camera permissions, and the integrator should set specific values according to their own business scenarios
// !mark
  ZegoEngineProfile(ZegoKey.appId, ZegoScenario.HighQualityChatroom),
);

Notify Your Server to Start Call

You can notify your server to start the call immediately after the real user enters the room on the client side. Asynchronous calls can help reduce call connection time. After receiving the start call notification, your server creates an AI agent instance using the same roomID and associated userID and streamID create an AI agent instance , so that the AI agent can interact with real users in the same room through mutual stream publishing and playing.

Note
By default, each account can have at most 10 AI agent instances. If the limit is exceeded, the creation of an AI agent instance will fail. If you need to adjust this limit, please contact ZEGOCLOUD Technical Support.
Note
In the following examples, roomID, userID, streamID and other parameters are not passed when notifying your server to start the call because fixed values have been agreed between the client and your server in this example. In actual use, please pass the real parameters according to your business requirements.
// Notify your server to start call
Future<Map<String, dynamic>> startCall() async {
  try {
    // The server calls the createAgentInstance interface in the /api/startn interface
    // !mark(2:2)
    final response = await http.post(
      Uri.parse('$_currentBaseUrl/api/start'),
      headers: {'Content-Type': 'application/json'},
    );

    if (response.statusCode == 200) {
      final json = jsonDecode(response.body);
      return json;
    }
    return {'code': -1, 'message': 'Request failed'};
  } catch (e) {
    return {'code': -1, 'message': e.toString()};
  }
}

User logs in a RTC room and starts publishing a stream

After a real user logs into the room, they start publishing streams.

The token used for login needs to be obtained from your server; please refer to the complete sample code.

Note

Please ensure that the roomID, userID, and streamID are unique under one ZEGOCLOUD APPID.

  • roomID: Generated by the user according to their own rules, it will be used to log into the Express SDK room. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
  • userID: Length should not exceed 32 bytes. Only numbers, English characters, and '~', '!', '@', '#', '$', '%', '^', '&', '*', '(', ')', '_', '+', '=', '-', '`', ';', ''', ',', '.', '<', '>', '' are supported. If interoperability with the Web SDK is required, do not use '%'.
  • streamID: Length should not exceed 256 bytes. Only numbers, English characters, and '-', '_' are supported.
Client request to login to room and publish a stream
final String _userId = 'user_id_1';
final String _roomId = 'room_id_1';
final String _userStreamId = 'user_stream_id_1';

/// Generate RTC Token [Reference Documentation](https://www.zegocloud.com/docs/video-call/token?platform=flutter&language=dart)
final token = await getToken();
if (token.isEmpty) {
  return false;
}

// !mark(1:11)
/// The following is used for answering delay optimization, you need to integrate the corresponding version of ZegoExpressEngine sdk, please contact ZEGOCLOUD technical support
ZegoExpressEngine.setEngineConfig(
  ZegoEngineConfig(
    advancedConfig: {
      /**This configuration is used for volume ducking**/
      'set_audio_volume_ducking_mode': '1',
      /**This configuration is used for adaptive playback volume**/
      'enable_rnd_volume_adaptive': 'true'
    },
  ),
);


/// Enable 3A
// !mark(1:3)
ZegoExpressEngine.instance.enableAGC(true);
ZegoExpressEngine.instance.enableAEC(true);
if (!kIsWeb) {
  ZegoExpressEngine.instance.setAECMode(ZegoANSMode.AIAGGRESSIVE2);

// !mark(1:4)
  /// This setting only affects AEC (echo cancellation). Here we set it to ModeGeneral, which uses our proprietary echo cancellation, which is more controllable.
  /// If other options are selected, it might use the system's echo cancellation, which may work better on iPhones but could be less effective on some Android devices.
  ZegoExpressEngine.instance.setAudioDeviceMode(
    ZegoAudioDeviceMode.General,
  );
}
ZegoExpressEngine.instance.enableANS(true);
ZegoExpressEngine.instance.setANSMode(ZegoANSMode.Medium);

/// Login to room
// !mark
final user = ZegoUser(_userId, _userId);
final roomConfig = ZegoRoomConfig.defaultConfig()
  ..isUserStatusNotify = true
  ..token = token;
final loginResult = await ZegoExpressEngine.instance.loginRoom(
  _roomId,
  user,
  config: roomConfig,
);
if (0 != loginResult.errorCode && 1002001 != loginResult.errorCode) {
  return false;
}

/// Start publishing stream (open microphone)
await ZegoExpressEngine.instance.muteMicrophone(false);
await ZegoExpressEngine.instance.startPublishingStream(_userStreamId);

Play the AI Agent Stream

By default, there is only one real user and one AI agent in the same room, so any new stream added is assumed to be the AI agent stream.

Client request to play the AI agent stream
  ZegoExpressEngine.onRoomStreamUpdate = _onRoomStreamUpdate;

  void _onRoomStreamUpdate(
    String roomID,
    ZegoUpdateType updateType,
    List<ZegoStream> streamList,
    Map<String, dynamic> extendedData,
  ) {
    if (updateType == ZegoUpdateType.Add) {
      for (var stream in streamList) {
        ZegoExpressEngine.instance.startPlayingStream(stream.streamID);
      }
    } else if (updateType == ZegoUpdateType.Delete) {
      for (var stream in streamList) {
        ZegoExpressEngine.instance.stopPlayingStream(stream.streamID);
      }
    }
  }

Congratulations🎉! After completing this step, you can ask the AI agent any question by voice, and the AI agent will answer your questions by voice!

Delete the agent instance and the user exits the room

The client calls the logout interface to exit the room and stops publishing and playing streams. At the same time, it notifies your server to end the call. After receiving the end call notification, your server will delete the AI agent instance, and the AI agent instance will automatically exit the room and stop publishing and playing streams. This completes a full interaction.

// Notify your server to end the call
Future<Map<String, dynamic>> stopCall() async {
  try {
    final response = await http.post(
      Uri.parse('$_currentBaseUrl/api/stop'),
      headers: {'Content-Type': 'application/json'},
    );

    if (response.statusCode == 200) {
      final json = jsonDecode(response.body);
      return json;
    }
    return {'code': -1, 'message': 'Request failed'};
  } catch (e) {
    return {'code': -1, 'message': e.toString()};
  }
}

/// Stop the conversation with the AI agent
Future<bool> stop() async {
  stopCall();

  final String _roomId = 'room_id_1';

  final engine = ZegoExpressEngine.instance;

  /// Stop publishing stream
  await engine.stopPublishingStream();

  /// Log out of the room
  await engine.logoutRoom(_roomId);

  return true;
}

This is the complete core process for you to achieve real-time voice interaction with an AI agent.

Best Practices for ZEGO Express SDK Configuration

To achieve the best audio call experience, it is recommended to configure the ZEGO Express SDK according to the following best practices. These configurations can significantly improve the quality of AI agent voice interactions.

Additional Optimization Recommendations

  • Browser Compatibility: Recommended to use the latest versions of modern browsers such as Chrome, Firefox, Safari
  • Network Environment: Ensure stable network connection, recommend using wired network or Wi-Fi with good signal
  • Audio Equipment: Use high-quality microphones and speakers
  • Page Optimization: Avoid running too many JavaScript tasks on the same page, which may affect audio processing performance
  • HTTPS Environment: Use HTTPS protocol in production environment to ensure microphone permission access

:::

Listen for Exception Callback

Note
Due to the large number of parameters for LLM and TTS, it is easy to cause various abnormal problems such as the AI agent not answering or not speaking during the test process due to parameter configuration errors. We strongly recommend that you listen for exception callbacks during the test process and quickly troubleshoot problems based on the callback information.

Previous

Release Notes

Next

Display Subtitles