How to Build a Conversational AI with Flutter- ZEGOCLOUD

Bringing conversational AI into Flutter apps is becoming a practical way to enhance user experience, especially as voice-driven interfaces gain wider adoption. One of Flutter’s strongest advantages is its unified Dart codebase, allowing voice interaction logic to run seamlessly on both Android and iOS without separate native implementations.

Traditionally, implementing real-time voice conversation required assembling multiple components: speech recognition, natural language processing, text-to-speech, audio streaming, and platform-specific audio handling. Managing these pieces individually often resulted in complex architecture and long development cycles.

Using ZEGOCLOUD’s Conversational AI SDK, these capabilities are packaged into a single workflow. The SDK provides real-time audio streaming, automatic speech recognition, LLM-based response generation, and natural voice synthesis, all accessible directly from Flutter. This significantly reduces the engineering effort required to build a production-ready voice conversational system.

In this guide, we will develop a complete conversational AI app with Flutter. The app will listen to the user’s voice, process it through the AI chain, speak the response, and present real-time subtitles during the conversation.

Flutter Conversational AI Built by ZEGOCLOUD

ZEGOCLOUD is a real-time audio and video communication platform that provides fully managed infrastructure, SDKs, and APIs for building interactive experiences. Its Conversational AI solution combines low-latency RTC streaming with AI processing pipelines, enabling apps to support natural voice interaction without maintaining separate speech recognition or language model services.

For Flutter developers, the key advantage is that all user-AI interactions can be handled through a unified Dart integration. ZEGOCLOUD manages the full chain internally, including:

Real-time audio transmission between the device and the cloud
Automatic Speech Recognition for converting voice to text
LLM-based response generation to deliver contextual replies
Text-to-Speech output that produces natural spoken responses

In addition to handling the conversational pipeline, ZEGOCLOUD provides UIKits that simplify the interface setup for voice calls, video calls, live interaction, and in-app chat. These components reduce repetitive UI work and allow developers to focus on application logic instead of audio routing, low-latency networking, or media engine configuration.

By integrating ZEGOCLOUD’s AI and RTC capabilities, Flutter applications can deliver continuous, streaming-based conversations with near real-time response, making it suitable for education, customer support, social interaction, and other voice-driven use cases.

Get the Code for Flutter Conversational AI

If you want to skip ahead or just see the finished product:

What	Where
Server + Web App	github.com/ZEGOCLOUD/blog-aiagent-server-and-web
Flutter App	github.com/ZEGOCLOUD/blog-aiagent-flutter

Before We Start

Before moving into the implementation, ensure the following prerequisites are prepared:

Flutter 3.0 or above
Confirm your version by running flutter --version
A ZEGOCLOUD account
Sign up on the official console if you have not registered yet
Basic understanding of Dart
Fundamental syntax knowledge is sufficient for following this guide
A backend service
We will use a simple server built with Next.js and hosted on Vercel, and the free tier is already enough for this tutorial

Once these are ready, you can proceed with the setup and integration steps.

How Does Conversational AI Work?

Before implementing the conversational flow, it is useful to understand the overall architecture and how the different components interact.

The Two-Part Setup

ZEGOCLOUD handles the core AI pipeline in the cloud, including speech recognition, language model processing, and text-to-speech output. These processes run on the server side, allowing the application to deliver end-to-end voice interaction without maintaining local AI models.

But we need two pieces on our end:

Piece	What It Does
Your Server	Keeps your API keys safe, hands out tokens, tells ZEGO when to start/stop the AI
Flutter App	The brain converts speech to text, thinks, and speaks back
ZEGO Cloud	The brains converts speech to text, thinks, and speaks back

Architecture Overview

This architecture connects the Flutter client, a secure backend server, and ZEGOCLOUD’s AI and RTC services to deliver end-to-end real-time voice conversations. The client handles audio input and output, the server manages credentials and tokens, and ZEGOCLOUD processes speech recognition, LLM responses, and speech synthesis in the cloud. This separation keeps sensitive data secure and ensures a smooth conversational experience.

Your secrets stay on your server, and the Flutter app just deals with the audio and UI.

How to Build a Conversational AI with Flutter

With the architecture and core components understood, we can now begin implementing the real-time voice interaction features in Flutter. The following steps will guide you through backend setup, SDK integration, audio streaming, and AI response handling to complete a fully working conversational AI experience.

Step 1: Setting Up the Server

First, we need a backend service. Next.js works well for this, and deploying to Vercel is very simple. You only need to push your code to Git, and it will handle the deployment process for you.

1.1 Environment Variables

Chuck these in a .env.local file:

# ZEGO Configuration (from ZEGOCLOUD Console: https://console.zegocloud.com/)
NEXT_PUBLIC_ZEGO_APP_ID=your_app_id
ZEGO_SERVER_SECRET=your_server_secret_32_chars

# AI Agent Configuration
ZEGO_AGENT_ID=aiAgent1
ZEGO_AGENT_NAME=AI Assistant

# System Prompt - Define your AI's personality
SYSTEM_PROMPT="You are my best friend who I can talk to about anything. You're warm, understanding, and always there for me. Respond naturally like a close friend would - be supportive, share opinions, and engage in genuine conversation."

# LLM Configuration (Large Language Model)
LLM_URL=https://your-llm-provider.com/api/chat/completions
LLM_API_KEY=your_llm_api_key
LLM_MODEL=your_model_name

# TTS Configuration (Text-to-Speech)
TTS_VENDOR=ByteDance
TTS_APP_ID=zego_test
TTS_TOKEN=zego_test
TTS_CLUSTER=volcano_tts
TTS_VOICE_TYPE=zh_female_wanwanxiaohe_moon_bigtts

How to obtain these values:

Variable	Description	How to Obtain
`NEXT_PUBLIC_ZEGO_APP_ID`	Your ZEGO application ID	ZEGOCLOUD Console → Project Settings
`ZEGO_SERVER_SECRET`	32-character secret key	ZEGOCLOUD Console → Project Settings
`SYSTEM_PROMPT`	AI personality definition	Customize based on your use case
`LLM_URL`	LLM API endpoint	From your LLM provider (OpenAI, Doubao, Claude, etc.)
`LLM_API_KEY`	LLM authentication key	From your LLM provider dashboard
`LLM_MODEL`	Model name to use	e.g., `gpt-4`, `doubao-1-5-pro-32k`
`TTS_*`	Text-to-Speech config	Use `zego_test` for testing, or your own TTS service

1.2 Token Generation API

The token API generates authentication tokens for clients to connect to ZEGO’s RTC service:

// app/api/zego/token/route.ts
import { NextRequest, NextResponse } from 'next/server';
import crypto from 'crypto';

function generateToken(appId: number, userId: string, secret: string,
                       effectiveTimeInSeconds: number): string {
  const tokenInfo = {
    app_id: appId,
    user_id: userId,
    nonce: Math.floor(Math.random() * 2147483647),
    ctime: Math.floor(Date.now() / 1000),
    expire: Math.floor(Date.now() / 1000) + effectiveTimeInSeconds,
    payload: ''
  };

  const plainText = JSON.stringify(tokenInfo);
  const nonce = crypto.randomBytes(12);
  const cipher = crypto.createCipheriv('aes-256-gcm', secret, nonce);
  const encrypted = Buffer.concat([cipher.update(plainText, 'utf8'),
                                   cipher.final(), cipher.getAuthTag()]);

  const buf = Buffer.concat([
    Buffer.alloc(8).fill(0),
    Buffer.from([0, 12]), nonce,
    Buffer.from([encrypted.length >> 8, encrypted.length & 0xff]), encrypted,
    Buffer.from([1]) // GCM mode
  ]);
  buf.writeBigInt64BE(BigInt(tokenInfo.expire), 0);
  return '04' + buf.toString('base64');
}

export async function POST(request: NextRequest) {
  const { userId } = await request.json();
  const token = generateToken(
    parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!),
    userId,
    process.env.ZEGO_SERVER_SECRET!,
    3600
  );
  return NextResponse.json({ code: 0, data: { token } });
}

1.3 AI Agent API with Signature

The signature mechanism ensures secure communication with ZEGO’s server APIs:

// app/api/zego/utils.ts
import crypto from 'crypto';

// Generate API signature: MD5(AppId + SignatureNonce + ServerSecret + Timestamp)
export function generateSignature(appId: number, signatureNonce: string,
                                  serverSecret: string, timestamp: number): string {
  const str = appId.toString() + signatureNonce + serverSecret + timestamp.toString();
  return crypto.createHash('md5').update(str).digest('hex');
}

// Send request to ZEGO AI Agent API
export async function sendZegoRequest<T>(action: string, body: object): Promise<T> {
  const appId = parseInt(process.env.NEXT_PUBLIC_ZEGO_APP_ID!);
  const serverSecret = process.env.ZEGO_SERVER_SECRET!;
  const signatureNonce = crypto.randomBytes(8).toString('hex');
  const timestamp = Math.floor(Date.now() / 1000);
  const signature = generateSignature(appId, signatureNonce, serverSecret, timestamp);

  const url = new URL('https://aigc-aiagent-api.zegotech.cn');
  url.searchParams.set('Action', action);
  url.searchParams.set('AppId', appId.toString());
  url.searchParams.set('SignatureNonce', signatureNonce);
  url.searchParams.set('Timestamp', timestamp.toString());
  url.searchParams.set('Signature', signature);
  url.searchParams.set('SignatureVersion', '2.0');

  const response = await fetch(url.toString(), {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify(body)
  });

  const result = await response.json();
  return result.Data as T;
}

1.4 Deploy to Vercel

Deploy your Next.js server to Vercel with these steps:

Push your code to GitHub
Go to Vercel and import your repository
Add all environment variables in Vercel’s project settings
Click “Deploy”

Your server will be available at https://your-project.vercel.app.

Step 2: Build the Flutter Client

Now, let’s build conversational AI for Flutter step by step.

2.1 Create Flutter Project

Create a new Flutter project:

flutter create --org com.zegocloud --project-name aiagent_demo .

2.2 Configure Dependencies

Add the required dependencies to pubspec.yaml:

dependencies:
  flutter:
    sdk: flutter
  cupertino_icons: ^1.0.8

  # ZEGO Express SDK for real-time audio/video
  zego_express_engine: ^3.22.0

  # HTTP client for API calls
  http: ^1.2.0

  # Permission handling
  permission_handler: ^11.3.0

Then run:

flutter pub get

2.3 Platform Configuration

Android Permissions

Add these permissions to android/app/src/main/AndroidManifest.xml:

<manifest xmlns:android="http://schemas.android.com/apk/res/android">
    <!-- Permissions for ZEGO Express SDK -->
    <uses-permission android:name="android.permission.RECORD_AUDIO" />
    <uses-permission android:name="android.permission.INTERNET" />
    <uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />
    <uses-permission android:name="android.permission.ACCESS_WIFI_STATE" />
    <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
    <uses-permission android:name="android.permission.BLUETOOTH" />
    ...
</manifest>

iOS Permissions

Add these to ios/Runner/Info.plist:

<!-- Microphone permission for voice chat -->
<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access to enable voice conversation with AI Agent.</string>
<!-- Background audio mode -->
<key>UIBackgroundModes</key>
<array>
    <string>audio</string>
</array>

2.4 App Configuration

Create a configuration file to centralize all settings:

// lib/config/app_config.dart
class AppConfig {
  // ZEGO Express SDK App ID - must match your backend server
  static const int appID = 0; // TODO: Replace with your App ID

  // Backend server URL (your Vercel deployment)
  static const String serverURL = 'https://your-server.netlify.app';

  // Generate unique IDs for testing
  static String generateRoomId() {
    final random = DateTime.now().millisecondsSinceEpoch % 100000;
    return 'room$random';
  }

  static String generateUserId() {
    final random = DateTime.now().millisecondsSinceEpoch % 100000;
    return 'user$random';
  }
}

Important: The appID here must match NEXT_PUBLIC_ZEGO_APP_ID in your backend’s .env.local file.

2.5 API Service

Now let’s hook up to our backend. Nothing fancy here — just some HTTP calls to get tokens and manage the AI agent:

// lib/services/api_service.dart
import 'dart:convert';
import 'package:http/http.dart' as http;
import '../config/app_config.dart';

class ApiService {
  /// Step 1-2: Request authentication token from backend
  static Future<String?> getToken(String userId) async {
    try {
      final response = await http.post(
        Uri.parse('${AppConfig.serverURL}/api/zego/token'),
        headers: {'Content-Type': 'application/json'},
        body: jsonEncode({'userId': userId}),
      );

      if (response.statusCode == 200) {
        final data = jsonDecode(response.body);
        return data['data']['token'] as String?;
      }
    } catch (e) {
      print('Failed to get token: $e');
    }
    return null;
  }

  /// Step 4, 5-8: Request backend to create AI Agent instance
  static Future<String?> startAgent({
    required String roomId,
    required String userId,
    required String userStreamId,
  }) async {
    try {
      final response = await http.post(
        Uri.parse('${AppConfig.serverURL}/api/zego/start'),
        headers: {'Content-Type': 'application/json'},
        body: jsonEncode({
          'roomId': roomId,
          'userId': userId,
          'userStreamId': userStreamId,
        }),
      );

      if (response.statusCode == 200) {
        final data = jsonDecode(response.body);
        // Server registers agent with ZEGO, creates instance, and returns stream ID
        return data['data']['agentStreamId'] as String?;
      }
    } catch (e) {
      print('Failed to start agent: $e');
    }
    return null;
  }

  /// Step 10-11: Request backend to stop AI Agent
  static Future<bool> stopAgent(String roomId) async {
    try {
      final response = await http.post(
        Uri.parse('${AppConfig.serverURL}/api/zego/stop'),
        headers: {'Content-Type': 'application/json'},
        body: jsonEncode({'roomId': roomId}),
      );
      return response.statusCode == 200;
    } catch (e) {
      print('Failed to stop agent: $e');
    }
    return false;
  }
}

2.6 Initialize ZEGO Express Engine

Here’s where we set up ZEGO. I like using a singleton pattern for this — keeps things tidy:

// lib/services/zego_express_manager.dart
import 'dart:convert';
import 'package:zego_express_engine/zego_express_engine.dart';
import '../config/app_config.dart';
import '../models/subtitle_message.dart';

class ZegoExpressManager {
  static final ZegoExpressManager _instance = ZegoExpressManager._internal();
  factory ZegoExpressManager() => _instance;
  ZegoExpressManager._internal();

  Function(String, ZegoRoomStateChangedReason, int)? onRoomStateChanged;
  Function(SubtitleMessage)? onSubtitleReceived;

  Future<void> initEngine() async {
    // Set engine config for AI conversation
    final engineConfig = ZegoEngineConfig(advancedConfig: {
      'set_audio_volume_ducking_mode': '1',
      'enable_rnd_volume_adaptive': 'true',
    });
    await ZegoExpressEngine.setEngineConfig(engineConfig);

    // Create engine profile
    final profile = ZegoEngineProfile(
      AppConfig.appID,
      ZegoScenario.HighQualityChatroom,
      enablePlatformView: false,
    );

    await ZegoExpressEngine.createEngineWithProfile(profile);
    _setupEventHandlers();
    await _configureAudioSettings();
  }

  void _setupEventHandlers() {
    ZegoExpressEngine.onRoomStateChanged = (roomID, reason, errorCode, _) {
      onRoomStateChanged?.call(roomID, reason, errorCode);
    };

    // Handle experimental API for subtitle messages
    ZegoExpressEngine.onRecvExperimentalAPI = (content) {
      _parseSubtitleMessage(content);
    };
  }

  Future<void> _configureAudioSettings() async {
    final engine = ZegoExpressEngine.instance;
    await engine.enableAGC(true);
    await engine.enableAEC(true);
    await engine.setAECMode(ZegoAECMode.AIBalanced);
    await engine.enableANS(true);
    await engine.setANSMode(ZegoANSMode.Medium);
  }
}

2.7 Room Login, Publishing & Playing Streams

The RTC bits are pretty straightforward. Login to a room, publish your audio, play the AI’s audio:

// Login to room with token
Future<int> loginRoom(String roomId, String userId, String token) async {
  final user = ZegoUser(userId, userId);
  final config = ZegoRoomConfig(0, true, token);

  final result = await ZegoExpressEngine.instance.loginRoom(
    roomId,
    user,
    config: config,
  );
  return result.errorCode;
}

// Start publishing local audio (user's voice)
Future<void> startPublishing(String streamId) async {
  await ZegoExpressEngine.instance.muteMicrophone(false);
  await ZegoExpressEngine.instance.startPublishingStream(streamId);
}

// Start playing remote audio (AI agent's voice)
Future<void> startPlaying(String streamId) async {
  await ZegoExpressEngine.instance.startPlayingStream(streamId);
}

2.8 Display Subtitles

Good news — ZEGOCLOUD provides a ready-made subtitles component! No need to write all that parsing logic yourself. Grab it from their GitHub repo and drop it into lib/audio/subtitles/.

The component includes:

message_dispatcher.dart – Dispatches subtitle messages to handlers
message_protocol.dart – Protocol models for ASR/LLM messages
model.dart – ViewModel for managing subtitle state
view.dart – UI widget for displaying subtitles

Integrate the Subtitles Component

In your chat page, implement ZegoSubtitlesEventHandler and register the callbacks:

// lib/widgets/chat_page.dart
import '../audio/subtitles/view.dart';
import '../audio/subtitles/model.dart';
import '../audio/subtitles/message_protocol.dart';
import '../audio/subtitles/message_dispatcher.dart';

class _ChatPageState extends State<ChatPage> implements ZegoSubtitlesEventHandler {
  late ZegoSubtitlesViewModel _subtitlesModel;

  @override
  void initState() {
    super.initState();
    _subtitlesModel = ZegoSubtitlesViewModel();

    // Register subtitle event handler
    ZegoExpressEngine.onRecvExperimentalAPI = _onRecvExperimentalAPI;
    ZegoSubtitlesMessageDispatcher().registerEventHandler(this);
  }

  @override
  void dispose() {
    ZegoExpressEngine.onRecvExperimentalAPI = null;
    ZegoSubtitlesMessageDispatcher().unregisterEventHandler(this);
    super.dispose();
  }

  void _onRecvExperimentalAPI(String content) {
    ZegoSubtitlesMessageDispatcher.handleExpressExperimentalAPIContent(content);
  }

  @override
  void onRecvAsrChatMsg(ZegoSubtitlesMessageProtocol message) {
    _subtitlesModel.handleRecvAsrMessage(message);
  }

  @override
  void onRecvLLMChatMsg(ZegoSubtitlesMessageProtocol message) {
    _subtitlesModel.handleRecvLLMMessage(message);
  }
}

The component automatically handles:

ASR messages (Cmd=3): Full text replacement for user speech
LLM messages (Cmd=4): Incremental text accumulation for AI responses
Message ordering: Sorts by SeqId to handle out-of-order delivery

2.9 UI Layout

The layout follows the same pattern as Android/iOS: top section for status and controls, bottom for subtitles using the official component:

// lib/widgets/chat_page.dart
@override
Widget build(BuildContext context) {
  return Scaffold(
    body: SafeArea(
      child: Column(
        children: [
          // Top: Control Panel
          _buildControlPanel(),
          const Divider(height: 1),
          // Bottom: Subtitles View (using official component)
          Expanded(child: ZegoSubtitlesView(model: _subtitlesModel)),
        ],
      ),
    ),
  );
}

Widget _buildControlPanel() {
  return Container(
    padding: const EdgeInsets.symmetric(vertical: 40, horizontal: 20),
    color: Colors.grey[100],
    child: Column(
      children: [
        const Text('ZEGO AI Agent',
          style: TextStyle(fontSize: 24, fontWeight: FontWeight.bold)),
        const SizedBox(height: 20),
        // Status indicator
        Row(
          mainAxisAlignment: MainAxisAlignment.center,
          children: [
            Container(
              width: 12, height: 12,
              decoration: BoxDecoration(
                shape: BoxShape.circle,
                color: _viewModel.isConnected ? Colors.green : Colors.grey,
              ),
            ),
            const SizedBox(width: 8),
            Text(_viewModel.statusText),
          ],
        ),
        const SizedBox(height: 20),
        // Call button
        SizedBox(
          width: 120, height: 120,
          child: ElevatedButton(
            onPressed: _viewModel.isConnected
                ? _viewModel.endCall
                : _viewModel.startCall,
            style: ElevatedButton.styleFrom(
              shape: const CircleBorder(),
              backgroundColor: _viewModel.isConnected ? Colors.red : Colors.green,
            ),
            child: Icon(
              _viewModel.isConnected ? Icons.call_end : Icons.call,
              size: 48, color: Colors.white,
            ),
          ),
        ),
      ],
    ),
  );
}

2.10 Complete Call Flow

Here’s how all the pieces work together. The ViewModel orchestrates the complete flow when the user taps “Start Call”:

// lib/viewmodels/chat_viewmodel.dart
Future<void> startCall() async {
  _isLoading = true;
  notifyListeners();

  try {
    // Generate unique IDs for this session
    final roomId = AppConfig.generateRoomId();
    final userId = AppConfig.generateUserId();
    final userStreamId = '${roomId}_${userId}_main';

    // Initialize ZEGO Engine
    await _zegoManager.initEngine();

    // Step 1-2: Request token from backend
    final token = await ApiService.getToken(userId);
    if (token == null) throw Exception('Failed to get token');

    // Step 3: Login to ZEGO room
    final loginResult = await _zegoManager.loginRoom(roomId, userId, token);
    if (loginResult != 0) throw Exception('Failed to login room');

    // Step 3: Publish local audio stream (user's voice)
    await _zegoManager.startPublishing(userStreamId);

    // Step 4-8: Request backend to create AI Agent
    final agentStreamId = await ApiService.startAgent(
      roomId: roomId,
      userId: userId,
      userStreamId: userStreamId,
    );
    if (agentStreamId == null) throw Exception('Failed to start agent');

    // Step 9: Play AI Agent's audio stream
    await _zegoManager.startPlaying(agentStreamId);

    _isConnected = true;
    _currentRoomId = roomId;
  } catch (e) {
    _errorMessage = e.toString();
  } finally {
    _isLoading = false;
    notifyListeners();
  }
}

// Step 10-11: End call and cleanup
Future<void> endCall() async {
  if (_currentRoomId != null) {
    await ApiService.stopAgent(_currentRoomId!);
    await _zegoManager.logoutRoom(_currentRoomId!);
  }
  _isConnected = false;
  _currentRoomId = null;
  notifyListeners();
}

This flow corresponds to steps 1-11 in the System Architecture diagram:

Step	Action	Code
1-2	Request Token	`ApiService.getToken()`
3	Login Room & Publish Stream	`loginRoom()` + `startPublishing()`
4-8	Start AI Agent	`ApiService.startAgent()`
9	Play Agent Stream	`startPlaying()`
10-11	Stop AI Agent	`ApiService.stopAgent()`

Running the App

# Run on iOS simulator
flutter run -d ios

# Run on Android emulator
flutter run -d android

# Run on connected device
flutter run

Run a demo

Conclusion

Using Flutter together with ZEGOCLOUD makes it much easier to add conversational AI features to an app. With real-time audio, speech recognition, language response, and voice output all handled in the cloud, the client-side code stays simple and focused on the user interface.

The same approach can be used in many scenarios, such as virtual assistants, customer support, learning tools, or subtitle display. As your project grows, you can continue exploring more APIs, UIKits, and features from ZEGOCLOUD to build richer voice interaction experiences.