Voice assistants are everywhere now. People talk to their phones, ask questions of smart speakers, and expect apps to understand what they say. To build a conversational AI used to be really hard. You needed different services for speech recognition, processing what people mean, and making the computer talk back. Managing all the audio streams and making sure voices sound clear on different devices was a nightmare.
ZEGOCLOUD conversational AI solution makes this much simpler. You can add voice conversations to your app without dealing with complex audio processing or expensive backend systems. Your app can listen to users, understand what they want, and respond with natural-sounding speech.
This guide shows you how to build a conversational AI that actually works. Users will be able to have real voice conversations with your app.
Conversational AI Solutions Built by ZEGOCLOUD
ZEGOCLOUD treats AI agents like real participants in your app. Instead of building separate chatbots, you invite AI directly into voice calls, video rooms, or live streams. The AI joins as an active participant and talks with users in real-time.
Multiple people can speak with the same AI agent during group calls. The AI recognizes different voices, gives personalized responses, and even suggests topics to keep conversations flowing. It handles interruptions naturally and responds just like a human participant would.
This approach makes conversational AI feel more natural. Users don’t switch between talking to people and talking to bots. The AI agent participates in the same conversation using the same voice streams as everyone else in the room.

Prerequisites
Before building the conversational AI functionality, ensure you have these essential components:
- ZEGOCLOUD developer account and AI Agent service activation – Signup here.
- Node.js 18+ with npm for package management and development tooling.
- Valid AppID and ServerSecret credentials from ZEGOCLOUD admin console for authentication.
- OpenAI API key for AI responses, or any OpenAI-compatible LLM provider.
- Physical device with microphone access for voice testing, as browser simulators cannot provide reliable audio capabilities.
Steps for Building a Conversational AI
1. Project Structure and Backend Setup
Begin by creating the complete project structure with separated client and server components. This organization enables independent development of frontend and backend while maintaining clean separation of concerns.
Create a new project directory and initialize the backend server:
mkdir conversational-ai
cd conversational-ai
mkdir server client
cd server
npm init -y
Update your server/package.json
with the required dependencies and configuration:
{
"name": "conversational-ai-server",
"version": "1.0.0",
"type": "module",
"scripts": {
"dev": "tsx watch src/server.ts",
"build": "tsc",
"start": "node dist/server.js",
"type-check": "tsc --noEmit"
},
"dependencies": {
"express": "^5.1.0",
"cors": "^2.8.5",
"dotenv": "^17.2.1",
"axios": "^1.11.0"
},
"devDependencies": {
"@types/express": "^5.0.3",
"@types/cors": "^2.8.19",
"@types/node": "^24.3.0",
"typescript": "^5.9.2",
"tsx": "^4.20.4"
}
}
This package configuration establishes a modern Node.js server with TypeScript support, hot-reloading capabilities for development, and essential packages for building REST APIs and integrating with ZEGOCLOUD services.
Install the dependencies:
npm install
Create the TypeScript configuration at server/tsconfig.json:
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "bundler",
"allowSyntheticDefaultImports": true,
"esModuleInterop": true,
"forceConsistentCasingInFileNames": true,
"strict": true,
"noImplicitAny": true,
"strictNullChecks": true,
"strictFunctionTypes": true,
"noImplicitReturns": true,
"noFallthroughCasesInSwitch": true,
"noUncheckedIndexedAccess": true,
"skipLibCheck": true,
"declaration": true,
"declarationMap": true,
"sourceMap": true,
"outDir": "./dist",
"rootDir": "./src",
"removeComments": false,
"resolveJsonModule": true,
"isolatedModules": true,
"moduleDetection": "force"
},
"include": ["src/**/*"],
"exclude": ["node_modules", "dist"]
}
This TypeScript configuration enables modern JavaScript features, strict type checking, and proper module resolution for the latest Node.js environments while generating source maps for debugging.
2. Environment Configuration and ZEGOCLOUD Credentials
Set up your environment variables by creating server/.env
:
# ZEGOCLOUD Configuration
ZEGO_APP_ID=your_zego_app_id_here
ZEGO_SERVER_SECRET=your_zego_server_secret_here
ZEGO_API_BASE_URL=https://aigc-aiagent-api.zegotech.cn
# LLM Provider Configuration
LLM_URL=https://api.openai.com/v1/chat/completions
LLM_API_KEY=your_openai_api_key_here
LLM_MODEL=gpt-4o-mini
# Server Configuration
PORT=8080
Replace the placeholder values with your actual ZEGOCLOUD App ID
and Server Secret
from the console, plus your OpenAI API key
.
3. ZEGOCLOUD Token Generation Implementation
Create the ZEGOCLOUD token generator at server/zego-token.cjs
:
"use strict";
Object.defineProperty(exports, "__esModule", { value: true });
exports.generateToken04 = generateToken04;
var crypto_1 = require("crypto");
// Generate random number in int32 range
function makeNonce() {
var min = -Math.pow(2, 31); // -2^31
var max = Math.pow(2, 31) - 1; // 2^31 - 1
return Math.floor(Math.random() * (max - min + 1)) + min;
}
// AES encryption using GCM mode
function aesGcmEncrypt(plainText, key) {
// Ensure valid key length (16, 24 or 32 bytes)
if (![16, 24, 32].includes(key.length)) {
throw createError(5, 'Invalid Secret length. Key must be 16, 24, or 32 bytes.');
}
// Generate random 12-byte nonce for AES encryption
var nonce = (0, crypto_1.randomBytes)(12);
var cipher = (0, crypto_1.createCipheriv)('aes-256-gcm', key, nonce);
cipher.setAutoPadding(true);
var encrypted = cipher.update(plainText, 'utf8');
var encryptBuf = Buffer.concat([encrypted, cipher.final(), cipher.getAuthTag()]);
return { encryptBuf: encryptBuf, nonce: nonce };
}
function createError(errorCode, errorMessage) {
return {
errorCode: errorCode,
errorMessage: errorMessage
};
}
function generateToken04(appId, userId, secret, effectiveTimeInSeconds, payload) {
if (!appId || typeof appId !== 'number') {
throw createError(1, 'appID invalid');
}
if (!userId || typeof userId !== 'string' || userId.length > 64) {
throw createError(3, 'userId invalid');
}
if (!secret || typeof secret !== 'string' || secret.length !== 32) {
throw createError(5, 'secret must be a 32 byte string');
}
if (!(effectiveTimeInSeconds > 0)) {
throw createError(6, 'effectiveTimeInSeconds invalid');
}
var VERSION_FLAG = '04';
var createTime = Math.floor(new Date().getTime() / 1000);
var tokenInfo = {
app_id: appId,
user_id: userId,
nonce: makeNonce(),
ctime: createTime,
expire: createTime + effectiveTimeInSeconds,
payload: payload || ''
};
// Convert token info to JSON
var plaintText = JSON.stringify(tokenInfo);
console.log('plain text: ', plaintText);
// Perform encryption
var _a = aesGcmEncrypt(plaintText, secret), encryptBuf = _a.encryptBuf, nonce = _a.nonce;
// Binary token assembly: expire time + Base64(nonce length + nonce + encrypted info length + encrypted info + encryption mode)
var _b = [new Uint8Array(8), new Uint8Array(2), new Uint8Array(2), new Uint8Array(1)], b1 = _b[0], b2 = _b[1], b3 = _b[2], b4 = _b[3];
new DataView(b1.buffer).setBigInt64(0, BigInt(tokenInfo.expire), false);
new DataView(b2.buffer).setUint16(0, nonce.byteLength, false);
new DataView(b3.buffer).setUint16(0, encryptBuf.byteLength, false);
new DataView(b4.buffer).setUint8(0, 1);
var buf = Buffer.concat([
Buffer.from(b1),
Buffer.from(b2),
Buffer.from(nonce),
Buffer.from(b3),
Buffer.from(encryptBuf),
Buffer.from(b4),
]);
var dv = new DataView(Uint8Array.from(buf).buffer);
return VERSION_FLAG + Buffer.from(dv.buffer).toString('base64');
}
This token generator creates secure, time-limited authentication tokens using ZEGOCLOUD’s token04 format. It encrypts user session data with AES-GCM encryption and packages it in a binary format that ZEGOCLOUD’s servers can verify and decode.
4. Main Server Implementation
Create the main server file at server/src/server.ts
:
import express from 'express'
import cors from 'cors'
import dotenv from 'dotenv'
import axios from 'axios'
import { generateToken04 } from '../zego-token.cjs'
dotenv.config()
const app = express()
const PORT = process.env.PORT || 8080
// Middleware
app.use(cors({
origin: ['http://localhost:5173', 'http://localhost:3000'],
credentials: true
}))
app.use(express.json())
// Environment validation
const requiredEnvVars = ['ZEGO_APP_ID', 'ZEGO_SERVER_SECRET', 'LLM_API_KEY']
const missingVars = requiredEnvVars.filter(varName => !process.env[varName])
if (missingVars.length > 0) {
console.error('❌ Missing required environment variables:', missingVars)
process.exit(1)
}
const ZEGO_APP_ID = parseInt(process.env.ZEGO_APP_ID!)
const ZEGO_SERVER_SECRET = process.env.ZEGO_SERVER_SECRET!
const ZEGO_API_BASE_URL = process.env.ZEGO_API_BASE_URL!
const LLM_URL = process.env.LLM_URL!
const LLM_API_KEY = process.env.LLM_API_KEY!
const LLM_MODEL = process.env.LLM_MODEL || 'gpt-4o-mini'
// Health check endpoint
app.get('/health', (req, res) => {
res.json({
status: 'healthy',
timestamp: new Date().toISOString(),
environment: {
hasZegoAppId: !!process.env.ZEGO_APP_ID,
hasZegoSecret: !!process.env.ZEGO_SERVER_SECRET,
hasLLMKey: !!process.env.LLM_API_KEY,
nodeVersion: process.version
}
})
})
// Generate ZEGO token for authentication
app.get('/api/token', (req, res) => {
try {
const { user_id } = req.query
if (!user_id || typeof user_id !== 'string') {
return res.status(400).json({
success: false,
error: 'user_id is required and must be a string'
})
}
const effectiveTimeInSeconds = 7200 // 2 hours
const payload = ''
const token = generateToken04(
ZEGO_APP_ID,
user_id,
ZEGO_SERVER_SECRET,
effectiveTimeInSeconds,
payload
)
console.log(`✅ Generated token for user: ${user_id}`)
res.json({
success: true,
token,
expires_in: effectiveTimeInSeconds
})
} catch (error) {
console.error('❌ Token generation failed:', error)
res.status(500).json({
success: false,
error: 'Failed to generate token'
})
}
})
// Start AI agent session
app.post('/api/start', async (req, res) => {
try {
const { room_id, user_id, user_stream_id } = req.body
if (!room_id || !user_id) {
return res.status(400).json({
success: false,
error: 'room_id and user_id are required'
})
}
console.log(`🚀 Starting AI session for room: ${room_id}, user: ${user_id}`)
const response = await axios.post(`${ZEGO_API_BASE_URL}/v1/ai_agent/start`, {
app_id: ZEGO_APP_ID,
room_id: room_id,
user_id: user_id,
user_stream_id: user_stream_id || `${user_id}_stream`,
ai_agent_config: {
llm_config: {
url: LLM_URL,
api_key: LLM_API_KEY,
model: LLM_MODEL,
context: [
{
role: "system",
content: "You are a helpful AI assistant. Be conversational, friendly, and helpful. Keep responses concise but informative. You can engage in natural conversation and help with various topics."
}
]
},
tts_config: {
provider: "elevenlabs",
voice_id: "pNInz6obpgDQGcFmaJgB",
model: "eleven_turbo_v2_5"
},
asr_config: {
provider: "deepgram",
language: "en"
}
}
}, {
headers: {
'Content-Type': 'application/json'
},
timeout: 30000
})
if (response.data && response.data.data && response.data.data.ai_agent_instance_id) {
const agentInstanceId = response.data.data.ai_agent_instance_id
console.log(`✅ AI agent started successfully: ${agentInstanceId}`)
res.json({
success: true,
agentInstanceId: agentInstanceId,
room_id: room_id,
user_id: user_id
})
} else {
throw new Error('Invalid response from ZEGO API')
}
} catch (error: any) {
console.error('❌ Failed to start AI session:', error.response?.data || error.message)
res.status(500).json({
success: false,
error: error.response?.data?.message || error.message || 'Failed to start AI session'
})
}
})
// Send message to AI agent
app.post('/api/send-message', async (req, res) => {
try {
const { agent_instance_id, message } = req.body
if (!agent_instance_id || !message) {
return res.status(400).json({
success: false,
error: 'agent_instance_id and message are required'
})
}
console.log(`💬 Sending message to agent ${agent_instance_id}: ${message.substring(0, 50)}...`)
const response = await axios.post(`${ZEGO_API_BASE_URL}/v1/ai_agent/chat`, {
ai_agent_instance_id: agent_instance_id,
messages: [
{
role: "user",
content: message
}
]
}, {
headers: {
'Content-Type': 'application/json'
},
timeout: 30000
})
console.log(`✅ Message sent successfully to agent: ${agent_instance_id}`)
res.json({
success: true,
message: 'Message sent successfully'
})
} catch (error: any) {
console.error('❌ Failed to send message:', error.response?.data || error.message)
res.status(500).json({
success: false,
error: error.response?.data?.message || error.message || 'Failed to send message'
})
}
})
// Stop AI agent session
app.post('/api/stop', async (req, res) => {
try {
const { agent_instance_id } = req.body
if (!agent_instance_id) {
return res.status(400).json({
success: false,
error: 'agent_instance_id is required'
})
}
console.log(`🛑 Stopping AI session: ${agent_instance_id}`)
const response = await axios.post(`${ZEGO_API_BASE_URL}/v1/ai_agent/stop`, {
ai_agent_instance_id: agent_instance_id
}, {
headers: {
'Content-Type': 'application/json'
},
timeout: 30000
})
console.log(`✅ AI session stopped successfully: ${agent_instance_id}`)
res.json({
success: true,
message: 'AI session stopped successfully'
})
} catch (error: any) {
console.error('❌ Failed to stop AI session:', error.response?.data || error.message)
res.status(500).json({
success: false,
error: error.response?.data?.message || error.message || 'Failed to stop AI session'
})
}
})
app.listen(PORT, () => {
console.log(`🚀 Server running on port ${PORT}`)
console.log(`🏥 Health check: http://localhost:${PORT}/health`)
})
The /api/start
endpoint creates ZEGOCLOUD rooms and initializes AI agents with configured language models, text-to-speech, and speech recognition providers.
The /api/send-message
endpoint forwards user messages to active AI agents, while /api/stop
properly terminates sessions and cleans up resources.
5. Frontend Project Initialization
Now set up the React frontend. Navigate to the root directory and create the client Vite project:
cd ..
npm create vite@latest
Choose react, ts,.. name it client or so
Update client/package.json
with the required frontend dependencies:
{
"name": "zego-convo-ai",
"private": true,
"version": "0.0.0",
"type": "module",
"scripts": {
"dev": "vite",
"build": "tsc -b && vite build",
"lint": "eslint .",
"preview": "vite preview"
},
"dependencies": {
"@tailwindcss/vite": "^4.1.11",
"@types/dom-speech-recognition": "^0.0.6",
"@types/node": "^24.2.0",
"axios": "^1.11.0",
"framer-motion": "^12.23.12",
"lucide-react": "^0.536.0",
"react": "^19.1.0",
"react-dom": "^19.1.0",
"react-speech-kit": "^3.0.1",
"tailwindcss": "^4.1.11",
"zego-express-engine-webrtc": "^3.10.0",
"zod": "^4.0.15"
},
"devDependencies": {
"@eslint/js": "^9.30.1",
"@types/react": "^19.1.8",
"@types/react-dom": "^19.1.6",
"@vitejs/plugin-react": "^4.6.0",
"eslint": "^9.30.1",
"eslint-plugin-react-hooks": "^5.2.0",
"eslint-plugin-react-refresh": "^0.4.20",
"globals": "^16.3.0",
"typescript": "~5.8.3",
"typescript-eslint": "^8.35.1",
"vite": "^7.0.4"
}
}
Install the frontend dependencies:
npm install
Update the Vite configuration at client/vite.config.ts
:
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
import tailwindcss from '@tailwindcss/vite'
export default defineConfig({
plugins: [react(), tailwindcss()],
define: {
global: 'globalThis',
},
server: {
host: true,
},
optimizeDeps: {
include: ['zego-express-engine-webrtc'],
}
})
Now create the type definitions at client/src/types/index.ts
:
export interface Message {
id: string
content: string
sender: 'user' | 'ai'
timestamp: number
type: 'text' | 'voice'
isStreaming?: boolean
audioUrl?: string
duration?: number
transcript?: string
}
export interface ConversationMemory {
id: string
title: string
messages: Message[]
createdAt: number
updatedAt: number
metadata: {
totalMessages: number
lastAIResponse: string
topics: string[]
}
}
export interface VoiceSettings {
isEnabled: boolean
autoPlay: boolean
speechRate: number
speechPitch: number
preferredVoice?: string
}
export interface ChatSession {
roomId: string
userId: string
agentInstanceId?: string
isActive: boolean
conversationId?: string
voiceSettings: VoiceSettings
}
export interface AIAgent {
id: string
name: string
personality: string
voiceCharacteristics: {
language: 'en-US' | 'en-GB'
gender: 'male' | 'female'
speed: number
pitch: number
}
}
These TypeScript interfaces define the data structures for messages, conversations, chat sessions, and voice settings.
The Message interface supports both text and voice messages with streaming capabilities. ConversationMemory handles local storage of chat history with metadata for organization and search functionality.
7. Environment Configuration and Service Setup
Create the frontend environment configuration at client/.env
:
VITE_ZEGO_APP_ID=your_zego_app_id_here
VITE_ZEGO_SERVER=wss://webliveroom-api.zegocloud.com/ws
VITE_API_BASE_URL=http://localhost:8080 preferably deploy your backend and put the URL here
Create the configuration service at client/src/config.ts
:
import { z } from 'zod'
const configSchema = z.object({
ZEGO_APP_ID: z.string().min(1, 'ZEGO App ID is required'),
ZEGO_SERVER: z.string().url('Valid ZEGO server URL required'),
API_BASE_URL: z.string().url('Valid API base URL required'),
})
const rawConfig = {
ZEGO_APP_ID: import.meta.env.VITE_ZEGO_APP_ID,
ZEGO_SERVER: import.meta.env.VITE_ZEGO_SERVER,
API_BASE_URL: import.meta.env.VITE_API_BASE_URL,
}
export const config = configSchema.parse(rawConfig)
export const STORAGE_KEYS = {
CONVERSATIONS: 'ai_conversations',
USER_PREFERENCES: 'ai_user_preferences',
SESSION_HISTORY: 'ai_session_history',
} as const
This configuration module validates environment variables using Zod schemas to ensure required ZEGOCLOUD credentials are present and properly formatted. It also defines localStorage
keys for persisting conversation data and user preferences across browser sessions.
8. API Service Layer Implementation
Create the API service layer at client/src/services/api.ts
:
import axios from 'axios'
import { config } from '../config'
const api = axios.create({
baseURL: config.API_BASE_URL,
timeout: 30000,
headers: {
'Content-Type': 'application/json'
}
})
api.interceptors.request.use(
(config) => {
console.log('🌐 API Request:', config.method?.toUpperCase(), config.url)
if (config.data && config.method !== 'get') {
console.log('📤 Request Data:', config.data)
}
return config
},
(error) => {
console.error('❌ API Request Error:', error)
return Promise.reject(error)
}
)
api.interceptors.response.use(
(response) => {
console.log('✅ API Response:', response.status, response.config.url)
if (response.data) {
console.log('📥 Response Data:', response.data)
}
return response
},
(error) => {
console.error('❌ API Response Error:', {
status: error.response?.status,
statusText: error.response?.statusText,
data: error.response?.data,
url: error.config?.url,
method: error.config?.method
})
return Promise.reject(error)
}
)
export const agentAPI = {
async startSession(roomId: string, userId: string): Promise<{ agentInstanceId: string }> {
try {
const requestData = {
room_id: roomId,
user_id: userId,
user_stream_id: `${userId}_stream`,
}
console.log('🚀 Starting session with data:', requestData)
const response = await api.post('/api/start', requestData)
if (!response.data || !response.data.success) {
throw new Error(response.data?.error || 'Session start failed')
}
if (!response.data.agentInstanceId) {
throw new Error('No agent instance ID returned')
}
console.log('✅ Session started successfully:', response.data.agentInstanceId)
return {
agentInstanceId: response.data.agentInstanceId
}
} catch (error: any) {
console.error('❌ Start session failed:', error.response?.data || error.message)
throw new Error(error.response?.data?.error || error.message || 'Failed to start session')
}
},
async sendMessage(agentInstanceId: string, message: string): Promise<void> {
if (!agentInstanceId) {
throw new Error('Agent instance ID is required')
}
if (!message || !message.trim()) {
throw new Error('Message content is required')
}
try {
const requestData = {
agent_instance_id: agentInstanceId,
message: message.trim(),
}
console.log('💬 Sending message:', {
agentInstanceId,
messageLength: message.length,
messagePreview: message.substring(0, 50) + (message.length > 50 ? '...' : '')
})
const response = await api.post('/api/send-message', requestData)
if (!response.data || !response.data.success) {
throw new Error(response.data?.error || 'Message send failed')
}
console.log('✅ Message sent successfully')
} catch (error: any) {
console.error('❌ Send message failed:', error.response?.data || error.message)
throw new Error(error.response?.data?.error || error.message || 'Failed to send message')
}
},
async stopSession(agentInstanceId: string): Promise<void> {
if (!agentInstanceId) {
console.warn('⚠️ No agent instance ID provided for stop session')
return
}
try {
const requestData = {
agent_instance_id: agentInstanceId,
}
console.log('🛑 Stopping session:', agentInstanceId)
const response = await api.post('/api/stop', requestData)
if (!response.data || !response.data.success) {
console.warn('⚠️ Session stop returned non-success:', response.data)
} else {
console.log('✅ Session stopped successfully')
}
} catch (error: any) {
console.error('❌ Stop session failed:', error.response?.data || error.message)
throw new Error(error.response?.data?.error || error.message || 'Failed to stop session')
}
},
async getToken(userId: string): Promise<{ token: string }> {
if (!userId) {
throw new Error('User ID is required')
}
try {
console.log('🔑 Getting token for user:', userId)
const response = await api.get(`/api/token?user_id=${encodeURIComponent(userId)}`)
if (!response.data || !response.data.token) {
throw new Error('No token returned')
}
console.log('✅ Token received successfully')
return { token: response.data.token }
} catch (error: any) {
console.error('❌ Get token failed:', error.response?.data || error.message)
throw new Error(error.response?.data?.error || error.message || 'Failed to get token')
}
},
async healthCheck(): Promise<{ status: string }> {
try {
console.log('🏥 Checking backend health')
const response = await api.get('/health')
console.log('✅ Backend health check successful:', response.data)
return response.data
} catch (error: any) {
console.error('❌ Backend health check failed:', error.response?.data || error.message)
throw new Error(error.response?.data?.error || error.message || 'Backend health check failed')
}
}
}
This API service provides a clean interface for communicating with your backend server. It includes comprehensive error handling, request/response logging, and automatic retry logic.
The service handles session management, message sending, and token generation while providing detailed console output for debugging.
9. ZEGOCLOUD Real-Time Communication Service
Create the ZEGOCLOUD service at client/src/services/zego.ts
:
import { ZegoExpressEngine } from 'zego-express-engine-webrtc'
import { config } from '../config'
import { agentAPI } from './api'
export class ZegoService {
private static instance: ZegoService
private zg: ZegoExpressEngine | null = null
private isInitialized = false
private currentRoomId: string | null = null
private currentUserId: string | null = null
private localStream: any = null
private isJoining = false
private audioElement: HTMLAudioElement | null = null
static getInstance(): ZegoService {
if (!ZegoService.instance) {
ZegoService.instance = new ZegoService()
}
return ZegoService.instance
}
async initialize(): Promise<void> {
if (this.isInitialized || this.isJoining) return
this.isJoining = true
try {
this.zg = new ZegoExpressEngine(
parseInt(config.ZEGO_APP_ID),
config.ZEGO_SERVER
)
this.setupEventListeners()
this.setupAudioElement()
this.isInitialized = true
console.log('✅ ZEGO initialized successfully')
} catch (error) {
console.error('❌ ZEGO initialization failed:', error)
throw error
} finally {
this.isJoining = false
}
}
private setupAudioElement(): void {
this.audioElement = document.getElementById('ai-audio-output') as HTMLAudioElement
if (!this.audioElement) {
this.audioElement = document.createElement('audio')
this.audioElement.id = 'ai-audio-output'
this.audioElement.autoplay = true
this.audioElement.controls = false
this.audioElement.style.display = 'none'
document.body.appendChild(this.audioElement)
}
this.audioElement.volume = 0.8
this.audioElement.muted = false
this.audioElement.addEventListener('loadstart', () => {
console.log('🔊 Audio loading started')
})
this.audioElement.addEventListener('canplay', () => {
console.log('🔊 Audio ready to play')
})
this.audioElement.addEventListener('play', () => {
console.log('🔊 Audio playback started')
})
this.audioElement.addEventListener('error', (e) => {
console.error('❌ Audio error:', e)
})
}
private setupEventListeners(): void {
if (!this.zg) return
this.zg.on('recvExperimentalAPI', (result: any) => {
const { method, content } = result
if (method === 'onRecvRoomChannelMessage') {
try {
const message = JSON.parse(content.msgContent)
console.log('🎯 Room message received:', message)
this.handleRoomMessage(message)
} catch (error) {
console.error('Failed to parse room message:', error)
}
}
})
this.zg.on('roomStreamUpdate', async (_roomID: string, updateType: string, streamList: any[]) => {
console.log('📡 Stream update:', updateType, streamList.length, 'streams')
if (updateType === 'ADD' && streamList.length > 0) {
for (const stream of streamList) {
const userStreamId = this.currentUserId ? `${this.currentUserId}_stream` : null
if (userStreamId && stream.streamID === userStreamId) {
console.log('🚫 Skipping user\'s own stream:', stream.streamID)
continue
}
try {
console.log('🔗 Playing AI agent stream:', stream.streamID)
const mediaStream = await this.zg!.startPlayingStream(stream.streamID)
if (mediaStream) {
console.log('✅ Media stream received:', mediaStream)
const remoteView = await this.zg!.createRemoteStreamView(mediaStream)
if (remoteView && this.audioElement) {
try {
await remoteView.play(this.audioElement, {
enableAutoplayDialog: false,
muted: false
})
console.log('✅ AI agent audio connected and playing')
this.audioElement.muted = false
this.audioElement.volume = 0.8
} catch (playError) {
console.error('❌ Failed to play audio through element:', playError)
try {
if (this.audioElement) {
this.audioElement.srcObject = mediaStream
await this.audioElement.play()
console.log('✅ Fallback audio play successful')
}
} catch (fallbackError) {
console.error('❌ Fallback audio play failed:', fallbackError)
}
}
}
}
} catch (error) {
console.error('❌ Failed to play agent stream:', error)
}
}
} else if (updateType === 'DELETE') {
console.log('📴 Agent stream disconnected')
if (this.audioElement) {
this.audioElement.srcObject = null
}
}
})
this.zg.on('roomUserUpdate', (_roomID: string, updateType: string, userList: any[]) => {
console.log('👥 Room user update:', updateType, userList.length, 'users')
})
this.zg.on('roomStateChanged', (roomID: string, reason: string, errorCode: number) => {
console.log('🏠 Room state changed:', { roomID, reason, errorCode })
})
this.zg.on('networkQuality', (userID: string, upstreamQuality: number, downstreamQuality: number) => {
if (upstreamQuality > 2 || downstreamQuality > 2) {
console.warn('📶 Network quality issues:', { userID, upstreamQuality, downstreamQuality })
}
})
this.zg.on('publisherStateUpdate', (result: any) => {
console.log('📤 Publisher state update:', result)
})
this.zg.on('playerStateUpdate', (result: any) => {
console.log('📥 Player state update:', result)
})
}
private messageCallback: ((message: any) => void) | null = null
private handleRoomMessage(message: any): void {
if (this.messageCallback) {
this.messageCallback(message)
}
}
async joinRoom(roomId: string, userId: string): Promise<boolean> {
if (!this.zg) {
console.error('❌ ZEGO not initialized')
return false
}
if (this.currentRoomId === roomId && this.currentUserId === userId) {
console.log('ℹ️ Already in the same room')
return true
}
try {
if (this.currentRoomId) {
console.log('🔄 Leaving previous room before joining new one')
await this.leaveRoom()
}
this.currentRoomId = roomId
this.currentUserId = userId
console.log('🔑 Getting token for user:', userId)
const { token } = await agentAPI.getToken(userId)
console.log('🚪 Logging into room:', roomId)
await this.zg.loginRoom(roomId, token, {
userID: userId,
userName: userId
})
console.log('📢 Enabling room message reception')
this.zg.callExperimentalAPI({
method: 'onRecvRoomChannelMessage',
params: {}
})
console.log('🎤 Creating local stream with enhanced audio settings')
const localStream = await this.zg.createZegoStream({
camera: {
video: false,
audio: true
}
})
if (localStream) {
this.localStream = localStream
const streamId = `${userId}_stream`
console.log('📤 Publishing stream:', streamId)
await this.zg.startPublishingStream(streamId, localStream, {
enableAutoSwitchVideoCodec: true
})
console.log('✅ Room joined successfully')
return true
} else {
throw new Error('Failed to create local stream')
}
} catch (error) {
console.error('❌ Failed to join room:', error)
this.currentRoomId = null
this.currentUserId = null
return false
}
}
async enableMicrophone(enabled: boolean): Promise<boolean> {
if (!this.zg || !this.localStream) {
console.warn('⚠️ Cannot toggle microphone: no stream available')
return false
}
try {
if (this.localStream.getAudioTracks) {
const audioTrack = this.localStream.getAudioTracks()[0]
if (audioTrack) {
audioTrack.enabled = enabled
console.log(`🎤 Microphone ${enabled ? 'enabled' : 'disabled'}`)
return true
}
}
console.warn('⚠️ No audio track found in local stream')
return false
} catch (error) {
console.error('❌ Failed to toggle microphone:', error)
return false
}
}
async leaveRoom(): Promise<void> {
if (!this.zg || !this.currentRoomId) {
console.log('ℹ️ No room to leave')
return
}
try {
console.log('🚪 Leaving room:', this.currentRoomId)
if (this.currentUserId && this.localStream) {
const streamId = `${this.currentUserId}_stream`
console.log('📤 Stopping stream publication:', streamId)
await this.zg.stopPublishingStream(streamId)
}
if (this.localStream) {
console.log('🗑️ Destroying local stream')
this.zg.destroyStream(this.localStream)
this.localStream = null
}
await this.zg.logoutRoom()
if (this.audioElement) {
this.audioElement.srcObject = null
}
this.currentRoomId = null
this.currentUserId = null
console.log('✅ Left room successfully')
} catch (error) {
console.error('❌ Failed to leave room:', error)
this.currentRoomId = null
this.currentUserId = null
this.localStream = null
}
}
onRoomMessage(callback: (message: any) => void): void {
this.messageCallback = callback
}
getCurrentRoomId(): string | null {
return this.currentRoomId
}
getCurrentUserId(): string | null {
return this.currentUserId
}
getEngine(): ZegoExpressEngine | null {
return this.zg
}
isInRoom(): boolean {
return !!this.currentRoomId && !!this.currentUserId
}
destroy(): void {
if (this.zg) {
this.leaveRoom()
this.zg = null
this.isInitialized = false
if (this.audioElement && this.audioElement.parentNode) {
this.audioElement.parentNode.removeChild(this.audioElement)
this.audioElement = null
}
console.log('🗑️ ZEGO service destroyed')
}
}
}
This service manages all ZEGOCLOUD WebRTC communication, including room joining, audio stream handling, and real-time message reception.
10. Conversation Memory Service
Now, create the memory service at client/src/services/memory.ts
:
import type { ConversationMemory, Message } from '../types'
import { STORAGE_KEYS } from '../config'
class MemoryService {
private static instance: MemoryService
private conversations: Map<string, ConversationMemory> = new Map()
static getInstance(): MemoryService {
if (!MemoryService.instance) {
MemoryService.instance = new MemoryService()
}
return MemoryService.instance
}
constructor() {
this.loadFromStorage()
}
private loadFromStorage(): void {
try {
const stored = localStorage.getItem(STORAGE_KEYS.CONVERSATIONS)
if (stored) {
const conversations: ConversationMemory[] = JSON.parse(stored)
conversations.forEach(conv => {
this.conversations.set(conv.id, conv)
})
}
} catch (error) {
console.error('Failed to load conversations from storage:', error)
}
}
private saveToStorage(): void {
try {
const conversations = Array.from(this.conversations.values())
localStorage.setItem(STORAGE_KEYS.CONVERSATIONS, JSON.stringify(conversations))
} catch (error) {
console.error('Failed to save conversations to storage:', error)
}
}
createOrGetConversation(id?: string): ConversationMemory {
const conversationId = id || this.generateConversationId()
if (this.conversations.has(conversationId)) {
return this.conversations.get(conversationId)!
}
const newConversation: ConversationMemory = {
id: conversationId,
title: 'New Conversation',
messages: [],
createdAt: Date.now(),
updatedAt: Date.now(),
metadata: {
totalMessages: 0,
lastAIResponse: '',
topics: []
}
}
this.conversations.set(conversationId, newConversation)
this.saveToStorage()
return newConversation
}
addMessage(conversationId: string, message: Message): void {
const conversation = this.conversations.get(conversationId)
if (!conversation) return
const existingIndex = conversation.messages.findIndex(m => m.id === message.id)
if (existingIndex >= 0) {
conversation.messages[existingIndex] = message
} else {
conversation.messages.push(message)
}
conversation.updatedAt = Date.now()
conversation.metadata.totalMessages = conversation.messages.length
if (message.sender === 'ai') {
conversation.metadata.lastAIResponse = message.content
}
if (conversation.messages.length === 1 && message.sender === 'user') {
conversation.title = message.content.slice(0, 50) + (message.content.length > 50 ? '...' : '')
}
this.saveToStorage()
}
deleteMessage(conversationId: string, messageId: string): void {
const conversation = this.conversations.get(conversationId)
if (!conversation) return
conversation.messages = conversation.messages.filter(m => m.id !== messageId)
conversation.updatedAt = Date.now()
conversation.metadata.totalMessages = conversation.messages.length
if (conversation.messages.length > 0) {
const lastAIMessage = conversation.messages
.filter(m => m.sender === 'ai')
.pop()
conversation.metadata.lastAIResponse = lastAIMessage?.content || ''
} else {
conversation.metadata.lastAIResponse = ''
}
this.saveToStorage()
}
getConversation(conversationId: string): ConversationMemory | null {
return this.conversations.get(conversationId) || null
}
getAllConversations(): ConversationMemory[] {
return Array.from(this.conversations.values())
.sort((a, b) => b.updatedAt - a.updatedAt)
}
deleteConversation(conversationId: string): void {
this.conversations.delete(conversationId)
this.saveToStorage()
}
updateConversation(conversationId: string, updates: Partial<ConversationMemory>): void {
const conversation = this.conversations.get(conversationId)
if (!conversation) return
Object.assign(conversation, updates, { updatedAt: Date.now() })
this.saveToStorage()
}
clearAllConversations(): void {
this.conversations.clear()
this.saveToStorage()
}
private generateConversationId(): string {
return `conv_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`
}
}
export const memoryService = MemoryService.getInstance()
This memory service handles local storage of conversation history, enabling users to resume chats across browser sessions. You can switch this to your storage of choice.
11. React Chat Hook Implementation
The chat hook is the second most important integration here after ZEGOCLOUD as it controls the chat flow. Create the main chat hook at client/src/hooks/useChat.ts
:
import { useCallback, useRef, useEffect, useReducer } from 'react'
import type { Message, ChatSession, ConversationMemory, VoiceSettings } from '../types'
import { ZegoService } from '../services/zego'
import { agentAPI } from '../services/api'
import { memoryService } from '../services/memory'
interface ChatState {
messages: Message[]
session: ChatSession | null
conversation: ConversationMemory | null
isLoading: boolean
isConnected: boolean
isRecording: boolean
currentTranscript: string
agentStatus: 'idle' | 'listening' | 'thinking' | 'speaking'
error: string | null
}
type ChatAction =
| { type: 'SET_MESSAGES'; payload: Message[] }
| { type: 'ADD_MESSAGE'; payload: Message }
| { type: 'UPDATE_MESSAGE'; payload: { id: string; updates: Partial<Message> } }
| { type: 'SET_SESSION'; payload: ChatSession | null }
| { type: 'SET_CONVERSATION'; payload: ConversationMemory | null }
| { type: 'SET_LOADING'; payload: boolean }
| { type: 'SET_CONNECTED'; payload: boolean }
| { type: 'SET_RECORDING'; payload: boolean }
| { type: 'SET_TRANSCRIPT'; payload: string }
| { type: 'SET_AGENT_STATUS'; payload: 'idle' | 'listening' | 'thinking' | 'speaking' }
| { type: 'SET_ERROR'; payload: string | null }
| { type: 'RESET_CHAT' }
const initialState: ChatState = {
messages: [],
session: null,
conversation: null,
isLoading: false,
isConnected: false,
isRecording: false,
currentTranscript: '',
agentStatus: 'idle',
error: null
}
function chatReducer(state: ChatState, action: ChatAction): ChatState {
switch (action.type) {
case 'SET_MESSAGES':
return { ...state, messages: action.payload }
case 'ADD_MESSAGE':
const exists = state.messages.some(m => m.id === action.payload.id)
if (exists) {
return {
...state,
messages: state.messages.map(m =>
m.id === action.payload.id ? action.payload : m
)
}
}
return { ...state, messages: [...state.messages, action.payload] }
case 'UPDATE_MESSAGE':
return {
...state,
messages: state.messages.map(m =>
m.id === action.payload.id ? { ...m, ...action.payload.updates } : m
)
}
case 'SET_SESSION':
return { ...state, session: action.payload }
case 'SET_CONVERSATION':
return { ...state, conversation: action.payload }
case 'SET_LOADING':
return { ...state, isLoading: action.payload }
case 'SET_CONNECTED':
return { ...state, isConnected: action.payload }
case 'SET_RECORDING':
return { ...state, isRecording: action.payload }
case 'SET_TRANSCRIPT':
return { ...state, currentTranscript: action.payload }
case 'SET_AGENT_STATUS':
return { ...state, agentStatus: action.payload }
case 'SET_ERROR':
return { ...state, error: action.payload }
case 'RESET_CHAT':
return {
...initialState,
isLoading: state.isLoading
}
default:
return state
}
}
export const useChat = () => {
const [state, dispatch] = useReducer(chatReducer, initialState)
const zegoService = useRef(ZegoService.getInstance())
const processedMessageIds = useRef(new Set<string>())
const messageHandlerSetup = useRef(false)
const cleanupFunctions = useRef<(() => void)[]>([])
const currentConversationRef = useRef<string | null>(null)
const streamingMessages = useRef(new Map<string, string>())
const defaultVoiceSettings: VoiceSettings = {
isEnabled: true,
autoPlay: true,
speechRate: 1.0,
speechPitch: 1.0,
}
const cleanup = useCallback(() => {
cleanupFunctions.current.forEach(fn => fn())
cleanupFunctions.current = []
processedMessageIds.current.clear()
messageHandlerSetup.current = false
streamingMessages.current.clear()
}, [])
const addMessageSafely = useCallback((message: Message, conversationId: string) => {
if (processedMessageIds.current.has(message.id)) {
console.log('Skipping duplicate message:', message.id)
return
}
processedMessageIds.current.add(message.id)
dispatch({ type: 'ADD_MESSAGE', payload: message })
try {
memoryService.addMessage(conversationId, message)
} catch (error) {
console.error('Failed to save message to memory:', error)
}
}, [])
const initializeConversation = useCallback((conversationId?: string) => {
try {
const conv = memoryService.createOrGetConversation(conversationId)
dispatch({ type: 'SET_CONVERSATION', payload: conv })
dispatch({ type: 'SET_MESSAGES', payload: [...conv.messages] })
processedMessageIds.current.clear()
streamingMessages.current.clear()
conv.messages.forEach(msg => {
processedMessageIds.current.add(msg.id)
})
dispatch({ type: 'SET_ERROR', payload: null })
currentConversationRef.current = conv.id
return conv
} catch (error) {
console.error('Failed to initialize conversation:', error)
dispatch({ type: 'SET_ERROR', payload: 'Failed to load conversation' })
return null
}
}, [])
const resetConversation = useCallback(() => {
cleanup()
dispatch({ type: 'RESET_CHAT' })
currentConversationRef.current = null
}, [cleanup])
const setupMessageHandlers = useCallback((conv: ConversationMemory) => {
if (messageHandlerSetup.current) {
console.log('Message handlers already setup')
return
}
console.log('Setting up message handlers for conversation:', conv.id)
messageHandlerSetup.current = true
const handleRoomMessage = (data: any) => {
try {
const { Cmd, Data: msgData } = data
console.log('Room message received:', { Cmd, msgData })
if (currentConversationRef.current !== conv.id) {
console.log('Ignoring message for different conversation')
return
}
if (Cmd === 3) {
const { Text: transcript, EndFlag, MessageId } = msgData
if (transcript && transcript.trim()) {
dispatch({ type: 'SET_TRANSCRIPT', payload: transcript })
dispatch({ type: 'SET_AGENT_STATUS', payload: 'listening' })
if (EndFlag) {
const messageId = MessageId || `voice_${Date.now()}_${Math.random().toString(36).substr(2, 6)}`
const userMessage: Message = {
id: messageId,
content: transcript.trim(),
sender: 'user',
timestamp: Date.now(),
type: 'voice',
transcript: transcript.trim()
}
addMessageSafely(userMessage, conv.id)
dispatch({ type: 'SET_TRANSCRIPT', payload: '' })
dispatch({ type: 'SET_AGENT_STATUS', payload: 'thinking' })
}
}
} else if (Cmd === 4) {
const { Text: content, MessageId, EndFlag } = msgData
if (!content || !MessageId) return
if (EndFlag) {
const currentStreaming = streamingMessages.current.get(MessageId) || ''
const finalContent = currentStreaming + content
dispatch({ type: 'UPDATE_MESSAGE', payload: {
id: MessageId,
updates: {
content: finalContent,
isStreaming: false
}
}})
streamingMessages.current.delete(MessageId)
dispatch({ type: 'SET_AGENT_STATUS', payload: 'idle' })
try {
const finalMessage: Message = {
id: MessageId,
content: finalContent,
sender: 'ai',
timestamp: Date.now(),
type: 'text'
}
memoryService.addMessage(conv.id, finalMessage)
} catch (error) {
console.error('Failed to save final message to memory:', error)
}
} else {
const currentStreaming = streamingMessages.current.get(MessageId) || ''
const updatedContent = currentStreaming + content
streamingMessages.current.set(MessageId, updatedContent)
if (!processedMessageIds.current.has(MessageId)) {
const streamingMessage: Message = {
id: MessageId,
content: updatedContent,
sender: 'ai',
timestamp: Date.now(),
type: 'text',
isStreaming: true
}
processedMessageIds.current.add(MessageId)
dispatch({ type: 'ADD_MESSAGE', payload: streamingMessage })
} else {
dispatch({ type: 'UPDATE_MESSAGE', payload: {
id: MessageId,
updates: { content: updatedContent, isStreaming: true }
}})
}
dispatch({ type: 'SET_AGENT_STATUS', payload: 'speaking' })
}
}
} catch (error) {
console.error('Error handling room message:', error)
dispatch({ type: 'SET_AGENT_STATUS', payload: 'idle' })
}
}
zegoService.current.onRoomMessage(handleRoomMessage)
cleanupFunctions.current.push(() => {
zegoService.current.onRoomMessage(() => {})
})
}, [addMessageSafely])
const startSession = useCallback(async (existingConversationId?: string): Promise<boolean> => {
if (state.isLoading || state.isConnected) {
console.log('Session start blocked - already loading or connected')
return false
}
dispatch({ type: 'SET_LOADING', payload: true })
dispatch({ type: 'SET_ERROR', payload: null })
try {
if (state.session?.isActive) {
console.log('Ending existing session before starting new one')
await endSession()
await new Promise(resolve => setTimeout(resolve, 1000))
}
const roomId = `room_${Date.now()}_${Math.random().toString(36).substr(2, 6)}`
const userId = `user_${Date.now()}_${Math.random().toString(36).substr(2, 6)}`
console.log('Initializing ZEGO service...')
await zegoService.current.initialize()
console.log('Joining room:', roomId)
const joinResult = await zegoService.current.joinRoom(roomId, userId)
if (!joinResult) throw new Error('Failed to join ZEGO room')
console.log('Starting AI agent session...')
const result = await agentAPI.startSession(roomId, userId)
const conv = initializeConversation(existingConversationId)
if (!conv) throw new Error('Failed to initialize conversation')
const newSession: ChatSession = {
roomId,
userId,
agentInstanceId: result.agentInstanceId,
isActive: true,
conversationId: conv.id,
voiceSettings: defaultVoiceSettings
}
dispatch({ type: 'SET_SESSION', payload: newSession })
dispatch({ type: 'SET_CONNECTED', payload: true })
setupMessageHandlers(conv)
console.log('Session started successfully')
return true
} catch (error) {
console.error('Failed to start session:', error)
dispatch({ type: 'SET_ERROR', payload: error instanceof Error ? error.message : 'Failed to start session' })
return false
} finally {
dispatch({ type: 'SET_LOADING', payload: false })
}
}, [state.isLoading, state.isConnected, state.session, initializeConversation, setupMessageHandlers])
const sendTextMessage = useCallback(async (content: string) => {
if (!state.session?.agentInstanceId || !state.conversation) {
dispatch({ type: 'SET_ERROR', payload: 'No active session' })
return
}
const trimmedContent = content.trim()
if (!trimmedContent) return
try {
const messageId = `text_${Date.now()}_${Math.random().toString(36).substr(2, 6)}`
const userMessage: Message = {
id: messageId,
content: trimmedContent,
sender: 'user',
timestamp: Date.now(),
type: 'text'
}
addMessageSafely(userMessage, state.conversation.id)
dispatch({ type: 'SET_AGENT_STATUS', payload: 'thinking' })
await agentAPI.sendMessage(state.session.agentInstanceId, trimmedContent)
} catch (error) {
console.error('Failed to send message:', error)
dispatch({ type: 'SET_ERROR', payload: 'Failed to send message' })
dispatch({ type: 'SET_AGENT_STATUS', payload: 'idle' })
}
}, [state.session, state.conversation, addMessageSafely])
const toggleVoiceRecording = useCallback(async () => {
if (!state.isConnected) return
try {
if (state.isRecording) {
await zegoService.current.enableMicrophone(false)
dispatch({ type: 'SET_RECORDING', payload: false })
dispatch({ type: 'SET_AGENT_STATUS', payload: 'idle' })
} else {
const success = await zegoService.current.enableMicrophone(true)
if (success) {
dispatch({ type: 'SET_RECORDING', payload: true })
dispatch({ type: 'SET_AGENT_STATUS', payload: 'listening' })
}
}
} catch (error) {
console.error('Failed to toggle recording:', error)
dispatch({ type: 'SET_RECORDING', payload: false })
dispatch({ type: 'SET_AGENT_STATUS', payload: 'idle' })
}
}, [state.isConnected, state.isRecording])
const toggleVoiceSettings = useCallback(() => {
if (state.session) {
const updatedSession = {
...state.session,
voiceSettings: {
...state.session.voiceSettings,
isEnabled: !state.session.voiceSettings.isEnabled
}
}
dispatch({ type: 'SET_SESSION', payload: updatedSession })
}
}, [state.session])
const endSession = useCallback(async () => {
if (!state.session && !state.isConnected) return
try {
dispatch({ type: 'SET_LOADING', payload: true })
if (state.isRecording) {
await zegoService.current.enableMicrophone(false)
dispatch({ type: 'SET_RECORDING', payload: false })
}
if (state.session?.agentInstanceId) {
await agentAPI.stopSession(state.session.agentInstanceId)
}
await zegoService.current.leaveRoom()
cleanup()
dispatch({ type: 'SET_SESSION', payload: null })
dispatch({ type: 'SET_CONNECTED', payload: false })
dispatch({ type: 'SET_AGENT_STATUS', payload: 'idle' })
dispatch({ type: 'SET_TRANSCRIPT', payload: '' })
dispatch({ type: 'SET_ERROR', payload: null })
currentConversationRef.current = null
console.log('Session ended successfully')
} catch (error) {
console.error('Failed to end session:', error)
} finally {
dispatch({ type: 'SET_LOADING', payload: false })
}
}, [state.session, state.isConnected, state.isRecording, cleanup])
const clearError = useCallback(() => {
dispatch({ type: 'SET_ERROR', payload: null })
}, [])
useEffect(() => {
const handleConversationChange = async () => {
if (currentConversationRef.current === (state.conversation?.id || null)) {
return
}
if (state.isConnected) {
await endSession()
if (state.conversation?.id) {
await startSession(state.conversation.id)
} else {
resetConversation()
}
}
}
handleConversationChange()
}, [state.conversation?.id])
useEffect(() => {
return () => {
if (state.session?.isActive || state.isConnected) {
endSession()
}
cleanup()
}
}, [])
return {
...state,
startSession,
sendTextMessage,
toggleVoiceRecording,
toggleVoiceSettings,
endSession,
initializeConversation,
resetConversation,
clearError
}
}
With this hook, we can manage the complete chat state, including session management, message handling, voice recording, and ZEGOCLOUD integration.
12. UI Components Implementation
Now, let’s create UI components we will be using in other components. Create the button component at client/src/components/UI/Button.tsx
:
import { motion } from 'framer-motion'
import { forwardRef } from 'react'
interface ButtonProps extends React.ButtonHTMLAttributes<HTMLButtonElement> {
variant?: 'primary' | 'secondary' | 'ghost'
size?: 'sm' | 'md' | 'lg'
isLoading?: boolean
}
export const Button = forwardRef<HTMLButtonElement, ButtonProps>(
({ variant = 'primary', size = 'md', isLoading, children, className = '', ...props }, ref) => {
const baseClasses = 'inline-flex items-center justify-center rounded-lg font-medium transition-colors focus:outline-none focus:ring-2'
const variants = {
primary: 'bg-blue-600 text-white hover:bg-blue-700 focus:ring-blue-500',
secondary: 'bg-gray-200 text-gray-900 hover:bg-gray-300 focus:ring-gray-500',
ghost: 'text-gray-600 hover:text-gray-900 hover:bg-gray-100 focus:ring-gray-500'
}
const sizes = {
sm: 'px-3 py-2 text-sm',
md: 'px-4 py-2.5 text-sm',
lg: 'px-6 py-3 text-base'
}
return (
<motion.button
ref={ref}
whileHover={{ scale: 1.02 }}
whileTap={{ scale: 0.98 }}
className={`${baseClasses} ${variants[variant]} ${sizes[size]} ${className}`}
disabled={isLoading || props.disabled}
{...(props as any)}
>
{isLoading ? (
<div className="animate-spin rounded-full h-4 w-4 border-2 border-current border-t-transparent mr-2" />
) : null}
{children}
</motion.button>
)
}
)
Create the message bubble component at client/src/components/Chat/MessageBubble.tsx
:
import { motion } from 'framer-motion'
import type { Message } from '../../types'
import { Volume2, User, Bot, Clock } from 'lucide-react'
interface MessageBubbleProps {
message: Message
onPlayVoice?: (messageId: string) => void
showTimestamp?: boolean
}
export const MessageBubble = ({ message, onPlayVoice, showTimestamp = false }: MessageBubbleProps) => {
const isUser = message.sender === 'user'
const isVoice = message.type === 'voice'
const formatTime = (timestamp: number) => {
return new Date(timestamp).toLocaleTimeString([], { hour: '2-digit', minute: '2-digit' })
}
return (
<motion.div
initial={{ opacity: 0, y: 20, scale: 0.95 }}
animate={{ opacity: 1, y: 0, scale: 1 }}
transition={{ duration: 0.3, ease: "easeOut" }}
className={`flex w-full mb-6 group ${isUser ? 'justify-end' : 'justify-start'}`}
>
<div className={`flex items-end space-x-3 max-w-[75%] ${isUser ? 'flex-row-reverse space-x-reverse' : 'flex-row'}`}>
{/* Avatar */}
<motion.div
whileHover={{ scale: 1.05 }}
className={`flex-shrink-0 w-10 h-10 rounded-full flex items-center justify-center shadow-md ${
isUser
? 'bg-gradient-to-br from-blue-500 to-blue-600'
: 'bg-gradient-to-br from-gray-700 to-gray-800'
}`}
>
{isUser ? (
<User className="w-5 h-5 text-white" />
) : (
<Bot className="w-5 h-5 text-white" />
)}
</motion.div>
{/* Message Content */}
<div className={`flex flex-col ${isUser ? 'items-end' : 'items-start'}`}>
<motion.div
className={`px-4 py-3 rounded-2xl shadow-sm break-words ${
isUser
? 'bg-blue-600 text-white rounded-br-md'
: 'bg-white text-gray-900 border border-gray-200 rounded-bl-md'
} ${message.isStreaming ? 'animate-pulse' : ''} ${
isVoice ? 'border-2 border-dashed border-purple-300' : ''
}`}
layout
whileHover={{ scale: 1.02 }}
>
{/* Voice indicator */}
{isVoice && (
<div className={`flex items-center space-x-2 mb-2 ${
isUser ? 'text-blue-200' : 'text-purple-600'
}`}>
<Volume2 className="w-4 h-4" />
<span className="text-xs font-medium">Voice Message</span>
{message.duration && (
<span className="text-xs opacity-75">{message.duration}s</span>
)}
</div>
)}
{/* Message text */}
<p className="text-sm leading-relaxed whitespace-pre-wrap">
{isVoice ? message.transcript || message.content : message.content}
</p>
{/* Voice playback button */}
{isVoice && message.audioUrl && (
<button
onClick={() => onPlayVoice?.(message.id)}
className={`mt-3 flex items-center space-x-2 text-xs transition-opacity duration-200 hover:opacity-100 ${
isUser ? 'text-blue-200 opacity-75' : 'text-purple-600 opacity-75'
}`}
>
<Volume2 className="w-3 h-3" />
<span>Play Audio</span>
</button>
)}
</motion.div>
{/* Timestamp */}
{showTimestamp && (
<motion.div
initial={{ opacity: 0 }}
animate={{ opacity: 1 }}
transition={{ delay: 0.2 }}
className={`flex items-center space-x-1 mt-1 text-xs text-gray-500 opacity-0 group-hover:opacity-100 transition-opacity ${
isUser ? 'flex-row-reverse space-x-reverse' : 'flex-row'
}`}
>
<Clock className="w-3 h-3" />
<span>{formatTime(message.timestamp)}</span>
</motion.div>
)}
</div>
</div>
</motion.div>
)
}
This message component displays both text and voice messages with smooth animations, avatar icons, and optional timestamps. It supports streaming message updates and voice message playback controls.
Moreover, create the voice input component at client/src/components/Voice/VoiceMessageInput.tsx
:
import { useState, useEffect, useRef } from 'react'
import { motion, AnimatePresence } from 'framer-motion'
import { Send, Mic, MicOff, Volume2, VolumeX } from 'lucide-react'
import { Button } from '../UI/Button'
interface VoiceMessageInputProps {
onSendMessage: (content: string) => Promise<void>
isRecording: boolean
onToggleRecording: () => void
currentTranscript: string
isConnected: boolean
voiceEnabled: boolean
onToggleVoice: () => void
agentStatus?: 'idle' | 'listening' | 'thinking' | 'speaking'
}
export const VoiceMessageInput = ({
onSendMessage,
isRecording,
onToggleRecording,
currentTranscript,
isConnected,
voiceEnabled,
onToggleVoice,
agentStatus = 'idle'
}: VoiceMessageInputProps) => {
const [message, setMessage] = useState('')
const [isFocused, setIsFocused] = useState(false)
const [isSending, setIsSending] = useState(false)
const textareaRef = useRef<HTMLTextAreaElement>(null)
// Auto-resize textarea
useEffect(() => {
if (textareaRef.current) {
// Reset height to auto to get the correct scrollHeight
textareaRef.current.style.height = 'auto'
// Set height based on content, with min and max limits
const scrollHeight = textareaRef.current.scrollHeight
const minHeight = 44 // Minimum height (about 1 line)
const maxHeight = 120 // Maximum height (about 5 lines)
textareaRef.current.style.height = Math.min(Math.max(scrollHeight, minHeight), maxHeight) + 'px'
}
}, [message])
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault()
const trimmedMessage = message.trim()
if (!trimmedMessage || !isConnected || isSending) return
setIsSending(true)
try {
await onSendMessage(trimmedMessage)
setMessage('')
} catch (error) {
console.error('Failed to send message:', error)
} finally {
setIsSending(false)
}
}
const handleKeyPress = (e: React.KeyboardEvent) => {
if (e.key === 'Enter' && !e.shiftKey && !isSending) {
e.preventDefault()
handleSubmit(e as any)
}
}
const isDisabled = !isConnected || agentStatus === 'thinking' || agentStatus === 'speaking'
const isVoiceDisabled = isDisabled || !voiceEnabled
const getPlaceholderText = () => {
if (!isConnected) return "Connect to start chatting..."
if (agentStatus === 'thinking') return "AI is processing..."
if (agentStatus === 'speaking') return "AI is responding..."
if (isRecording) return "Recording... speak now"
return "Type your message or use voice..."
}
const getRecordingButtonState = () => {
if (isVoiceDisabled) return 'disabled'
if (agentStatus === 'listening' || isRecording) return 'recording'
return 'idle'
}
const recordingState = getRecordingButtonState()
return (
<motion.div
initial={{ y: 20, opacity: 0 }}
animate={{ y: 0, opacity: 1 }}
className="bg-white border-t border-gray-200 p-4"
>
<AnimatePresence>
{(currentTranscript || agentStatus === 'listening') && (
<motion.div
initial={{ height: 0, opacity: 0 }}
animate={{ height: 'auto', opacity: 1 }}
exit={{ height: 0, opacity: 0 }}
className="mb-3 p-3 bg-green-50 rounded-lg border border-green-200"
>
<div className="flex items-center space-x-2">
<motion.div
animate={{ scale: [1, 1.2, 1] }}
transition={{ repeat: Infinity, duration: 1.5 }}
className="flex-shrink-0"
>
<div className="w-3 h-3 rounded-full bg-green-500" />
</motion.div>
<p className="text-sm text-green-700 flex-1">
{currentTranscript || 'Listening... speak now'}
</p>
</div>
</motion.div>
)}
</AnimatePresence>
<form onSubmit={handleSubmit} className="flex items-end space-x-3">
{/* Text Input Container */}
<div className="flex-1 min-w-0">
<div className={`relative rounded-xl border-2 transition-all duration-200 ${
isFocused ? 'border-blue-500 bg-blue-50' : 'border-gray-200 bg-gray-50'
} ${isDisabled ? 'opacity-50' : ''}`}>
<textarea
ref={textareaRef}
value={message}
onChange={(e) => setMessage(e.target.value)}
onKeyDown={handleKeyPress}
onFocus={() => setIsFocused(true)}
onBlur={() => setIsFocused(false)}
placeholder={getPlaceholderText()}
disabled={isDisabled || isSending}
className="w-full px-4 py-3 bg-transparent border-none focus:outline-none resize-none placeholder-gray-500 disabled:cursor-not-allowed text-sm leading-relaxed"
style={{
minHeight: '44px',
maxHeight: '120px',
overflow: 'hidden'
}}
rows={1}
/>
{/* Character counter */}
{message.length > 800 && (
<div className="absolute bottom-2 right-2 text-xs text-gray-400 bg-white px-1 rounded">
{message.length}/1000
</div>
)}
</div>
</div>
{/* Control Buttons */}
<div className="flex items-center space-x-2">
{/* Voice Toggle */}
<Button
type="button"
variant="ghost"
size="md"
onClick={onToggleVoice}
disabled={!isConnected}
className="text-gray-600 hover:text-gray-900 disabled:opacity-50"
title={voiceEnabled ? "Disable voice" : "Enable voice"}
>
{voiceEnabled ? <Volume2 className="w-5 h-5" /> : <VolumeX className="w-5 h-5" />}
</Button>
{/* Voice Recording Button */}
<Button
type="button"
variant="ghost"
size="md"
onClick={onToggleRecording}
disabled={recordingState === 'disabled'}
className={`transition-all duration-200 ${
recordingState === 'recording'
? 'bg-red-500 text-white hover:bg-red-600 shadow-lg scale-110'
: recordingState === 'disabled'
? 'text-gray-400 cursor-not-allowed opacity-50'
: 'text-gray-600 hover:text-blue-600 hover:bg-blue-50'
}`}
title={
recordingState === 'disabled'
? "Voice not available"
: recordingState === 'recording'
? "Stop recording"
: "Start voice input"
}
>
<motion.div
animate={recordingState === 'recording' ? { scale: [1, 1.1, 1] } : {}}
transition={{ repeat: Infinity, duration: 1 }}
>
{recordingState === 'recording' ? (
<MicOff className="w-5 h-5" />
) : (
<Mic className="w-5 h-5" />
)}
</motion.div>
</Button>
{/* Send Button */}
<Button
type="submit"
disabled={!message.trim() || isDisabled || isSending}
size="md"
className="bg-blue-600 hover:bg-blue-700 text-white px-6 disabled:opacity-50 disabled:cursor-not-allowed min-w-[60px]"
isLoading={isSending}
>
<Send className="w-4 h-4" />
</Button>
</div>
</form>
{!isConnected && (
<p className="text-xs text-gray-500 mt-2 text-center">
Start a conversation to enable voice and text input
</p>
)}
</motion.div>
)
}
13. Main Chat Container Component
Create the chat container at client/src/components/Chat/ChatContainer.tsx
:
import { useEffect, useRef } from 'react'
import { motion, AnimatePresence } from 'framer-motion'
import { MessageBubble } from './MessageBubble'
import { VoiceMessageInput } from '../Voice/VoiceMessageInput'
import { Button } from '../UI/Button'
import { useChat } from '../../hooks/useChat'
import { Phone, PhoneOff, Bot } from 'lucide-react'
interface ChatContainerProps {
conversationId?: string
onConversationUpdate?: () => void
onNewConversation?: () => void
}
export const ChatContainer = ({ conversationId, onConversationUpdate, onNewConversation }: ChatContainerProps) => {
const messagesEndRef = useRef<HTMLDivElement>(null)
const {
messages,
isLoading,
isConnected,
isRecording,
currentTranscript,
agentStatus,
session,
conversation,
startSession,
sendTextMessage,
toggleVoiceRecording,
toggleVoiceSettings,
endSession,
resetConversation,
initializeConversation
} = useChat()
const scrollToBottom = () => {
messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' })
}
useEffect(() => {
scrollToBottom()
}, [messages])
useEffect(() => {
if (onConversationUpdate) {
onConversationUpdate()
}
}, [messages, onConversationUpdate])
useEffect(() => {
if (conversationId && conversationId !== conversation?.id) {
initializeConversation(conversationId)
} else if (!conversationId && conversation) {
resetConversation()
}
}, [conversationId, conversation?.id, initializeConversation, resetConversation])
const handleStartChat = async () => {
const success = await startSession(conversationId)
if (success && onNewConversation && !conversationId) {
onNewConversation()
}
}
const handleEndChat = async () => {
await endSession()
if (onNewConversation) {
onNewConversation()
}
}
const getStatusText = () => {
if (!isConnected) return 'Click Start Chat to begin'
switch (agentStatus) {
case 'listening':
return 'Listening for your voice...'
case 'thinking':
return 'AI is processing your message...'
case 'speaking':
return 'AI is responding...'
default:
return 'Connected - Ready to chat'
}
}
const getStatusColor = () => {
if (!isConnected) return 'text-gray-500'
switch (agentStatus) {
case 'listening':
return 'text-green-600'
case 'thinking':
return 'text-blue-600'
case 'speaking':
return 'text-purple-600'
default:
return 'text-green-600'
}
}
return (
<motion.div
initial={{ opacity: 0 }}
animate={{ opacity: 1 }}
className="flex flex-col h-full bg-gray-50"
>
<audio
id="ai-audio-output"
autoPlay
style={{ display: 'none' }}
controls={false}
playsInline
/>
<motion.div
initial={{ y: -20 }}
animate={{ y: 0 }}
className="bg-white border-b border-gray-200 px-6 py-4"
>
<div className="flex items-center justify-between">
<div className="flex items-center space-x-3">
<div className="w-10 h-10 bg-gradient-to-br from-blue-500 to-blue-600 rounded-full flex items-center justify-center">
<Bot className="w-5 h-5 text-white" />
</div>
<div>
<h1 className="text-xl font-semibold text-gray-900">AI Assistant</h1>
<p className={`text-sm ${getStatusColor()}`}>
{getStatusText()}
</p>
</div>
</div>
{isConnected ? (
<Button onClick={handleEndChat} variant="secondary" size="sm" disabled={isLoading}>
<PhoneOff className="w-4 h-4 mr-2" />
End Chat
</Button>
) : (
<Button onClick={handleStartChat} isLoading={isLoading} size="sm">
<Phone className="w-4 h-4 mr-2" />
Start Chat
</Button>
)}
</div>
</motion.div>
<div className="flex-1 overflow-y-auto px-4 py-6">
{messages.length === 0 && (
<motion.div
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
className="flex flex-col items-center justify-center h-full text-center"
>
<div className="w-16 h-16 bg-gradient-to-br from-blue-500 to-blue-600 rounded-full flex items-center justify-center mb-4">
<Bot className="w-8 h-8 text-white" />
</div>
<h3 className="text-lg font-semibold text-gray-900 mb-2">
{isConnected ? 'Ready to Chat' : 'Welcome to AI Assistant'}
</h3>
<p className="text-gray-600 mb-6 max-w-md">
{isConnected
? 'You can type messages orRetrySCContinueEditResumed chat container component development from previous point.The user wants me to continue from where I left off. I was in the middle of creating the chat container component, specifically in the welcome message section. Let me continue from there and complete the rest of the guide.typescript {isConnected
? 'You can type messages or use voice input to start chatting with the AI assistant.'
: 'Start a conversation with our AI assistant. You can type messages or use voice input for a more natural experience.'
}
</p>
{!isConnected && (
<div className="space-y-2 text-sm text-gray-500 mb-6">
<p>🎤 Voice conversations with real-time responses</p>
<p>💬 Natural interruption support</p>
<p>🧠 Context-aware conversations</p>
</div>
)}
{!isConnected && (
<Button onClick={handleStartChat} isLoading={isLoading}>
<Phone className="w-4 h-4 mr-2" />
Start New Conversation
</Button>
)}
</motion.div>
)}
<AnimatePresence mode="popLayout">
{messages.map((message) => (
<MessageBubble
key={message.id}
message={message}
showTimestamp={true}
/>
))}
</AnimatePresence>
{agentStatus === 'thinking' && (
<motion.div
initial={{ opacity: 0, y: 20 }}
animate={{ opacity: 1, y: 0 }}
exit={{ opacity: 0, y: -20 }}
className="flex justify-start mb-6"
>
<div className="flex items-center space-x-3">
<div className="w-10 h-10 bg-gradient-to-br from-gray-700 to-gray-800 rounded-full flex items-center justify-center">
<Bot className="w-5 h-5 text-white" />
</div>
<div className="bg-white border border-gray-200 rounded-2xl px-5 py-3">
<div className="flex items-center space-x-2">
<div className="flex space-x-1">
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce" />
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce" style={{ animationDelay: '0.1s' }} />
<div className="w-2 h-2 bg-gray-400 rounded-full animate-bounce" style={{ animationDelay: '0.2s' }} />
</div>
<span className="text-sm text-gray-500">AI is thinking...</span>
</div>
</div>
</div>
</motion.div>
)}
<div ref={messagesEndRef} />
</div>
{isConnected && (
<VoiceMessageInput
onSendMessage={sendTextMessage}
isRecording={isRecording}
onToggleRecording={toggleVoiceRecording}
currentTranscript={currentTranscript}
isConnected={isConnected}
voiceEnabled={session?.voiceSettings.isEnabled || false}
onToggleVoice={toggleVoiceSettings}
agentStatus={agentStatus}
/>
)}
</motion.div>
)
}
14. Conversation List Component
We will now create the conversation list at client/src/components/Memory/ConversationList.tsx
to show our stored conversations:
import { motion, AnimatePresence } from 'framer-motion'
import type { ConversationMemory } from '../../types'
import { MessageSquare, Clock, Trash2 } from 'lucide-react'
import { Button } from '../UI/Button'
interface ConversationListProps {
conversations: ConversationMemory[]
onSelectConversation: (id: string) => void
onDeleteConversation: (id: string) => void
currentConversationId?: string
}
export const ConversationList = ({
conversations,
onSelectConversation,
onDeleteConversation,
currentConversationId
}: ConversationListProps) => {
const formatDate = (timestamp: number) => {
const date = new Date(timestamp)
const now = new Date()
const diffInHours = (now.getTime() - date.getTime()) / (1000 * 60 * 60)
if (diffInHours < 24) {
return date.toLocaleTimeString([], { hour: '2-digit', minute: '2-digit' })
} else if (diffInHours < 24 * 7) {
return date.toLocaleDateString([], { weekday: 'short' })
} else {
return date.toLocaleDateString([], { month: 'short', day: 'numeric' })
}
}
return (
<div className="w-80 bg-gray-50 border-r border-gray-200 flex flex-col">
<div className="p-4 border-b border-gray-200">
<h2 className="font-semibold text-gray-900 flex items-center">
<MessageSquare className="w-5 h-5 mr-2" />
Conversations
</h2>
<p className="text-sm text-gray-500 mt-1">{conversations.length} total</p>
</div>
<div className="flex-1 overflow-y-auto">
<AnimatePresence>
{conversations.map((conv) => (
<motion.div
key={conv.id}
initial={{ opacity: 0, x: -20 }}
animate={{ opacity: 1, x: 0 }}
exit={{ opacity: 0, x: -20 }}
whileHover={{ backgroundColor: '#f8fafc' }}
className={`p-4 border-b border-gray-100 cursor-pointer transition-colors group ${
currentConversationId === conv.id ? 'bg-blue-50 border-blue-200' : ''
}`}
onClick={() => onSelectConversation(conv.id)}
>
<div className="flex items-start justify-between">
<div className="flex-1 min-w-0">
<h3 className="font-medium text-gray-900 truncate mb-1">
{conv.title}
</h3>
<p className="text-sm text-gray-600 line-clamp-2 mb-2">
{conv.metadata.lastAIResponse || 'No messages yet'}
</p>
<div className="flex items-center space-x-3 text-xs text-gray-500">
<span className="flex items-center">
<MessageSquare className="w-3 h-3 mr-1" />
{conv.metadata.totalMessages}
</span>
<span className="flex items-center">
<Clock className="w-3 h-3 mr-1" />
{formatDate(conv.updatedAt)}
</span>
</div>
</div>
<Button
variant="ghost"
size="sm"
onClick={(e) => {
e.stopPropagation()
onDeleteConversation(conv.id)
}}
className="text-gray-400 hover:text-red-600 opacity-0 group-hover:opacity-100 transition-opacity"
>
<Trash2 className="w-4 h-4" />
</Button>
</div>
</motion.div>
))}
</AnimatePresence>
{conversations.length === 0 && (
<div className="p-8 text-center text-gray-500">
<MessageSquare className="w-12 h-12 mx-auto mb-4 opacity-50" />
<p className="text-sm">No conversations yet</p>
<p className="text-xs mt-1">Start a new chat to begin</p>
</div>
)}
</div>
</div>
)
}
15. Main Application Component
Now, create the main app component at client/src/App.tsx
:
import { useState, useEffect, useCallback } from 'react'
import { motion, AnimatePresence } from 'framer-motion'
import { ChatContainer } from './components/Chat/ChatContainer'
import { ConversationList } from './components/Memory/ConversationList'
import { memoryService } from './services/memory'
import type { ConversationMemory } from './types'
import { Plus, Menu, X, MessageSquare } from 'lucide-react'
import { Button } from './components/UI/Button'
function App() {
const [conversations, setConversations] = useState<ConversationMemory[]>([])
const [currentConversationId, setCurrentConversationId] = useState<string | undefined>(undefined)
const [sidebarOpen, setSidebarOpen] = useState(window.innerWidth >= 1024) // Desktop open by default
const [isCreatingNewConversation, setIsCreatingNewConversation] = useState(false)
// Load conversations on mount and set up periodic refresh
useEffect(() => {
const loadConversations = () => {
try {
const allConversations = memoryService.getAllConversations()
setConversations(allConversations)
} catch (error) {
console.error('Failed to load conversations:', error)
}
}
loadConversations()
const interval = setInterval(() => {
const current = memoryService.getAllConversations()
if (current.length !== conversations.length ||
current.some((conv, index) => conv.updatedAt !== conversations[index]?.updatedAt)) {
loadConversations()
}
}, 5000)
return () => clearInterval(interval)
}, [conversations.length])
// Handle responsive sidebar behavior
useEffect(() => {
const handleResize = () => {
if (window.innerWidth >= 1024) {
setSidebarOpen(true)
}
}
window.addEventListener('resize', handleResize)
return () => window.removeEventListener('resize', handleResize)
}, [])
const handleNewConversation = useCallback(async () => {
if (isCreatingNewConversation) return
setIsCreatingNewConversation(true)
try {
// Clear current conversation
setCurrentConversationId(undefined)
// On mobile, close sidebar after action
if (window.innerWidth < 1024) {
setSidebarOpen(false)
}
} finally {
setIsCreatingNewConversation(false)
}
}, [isCreatingNewConversation])
const handleSelectConversation = useCallback((id: string) => {
if (id !== currentConversationId && !isCreatingNewConversation) {
setCurrentConversationId(id)
// On mobile, close sidebar after selection
if (window.innerWidth < 1024) {
setSidebarOpen(false)
}
}
}, [currentConversationId, isCreatingNewConversation])
const handleDeleteConversation = useCallback((id: string) => {
try {
memoryService.deleteConversation(id)
// Force refresh conversations
const updatedConversations = memoryService.getAllConversations()
setConversations(updatedConversations)
// If we deleted the current conversation, clear it
if (currentConversationId === id) {
setCurrentConversationId(undefined)
}
} catch (error) {
console.error('Failed to delete conversation:', error)
}
}, [currentConversationId])
const refreshConversations = useCallback(() => {
try {
const updatedConversations = memoryService.getAllConversations()
setConversations(updatedConversations)
} catch (error) {
console.error('Failed to refresh conversations:', error)
}
}, [])
const handleConversationCreated = useCallback(() => {
try {
const latestConversations = memoryService.getAllConversations()
setConversations(latestConversations)
// Auto-select the newest conversation
if (latestConversations.length > 0) {
const newestConv = latestConversations[0]
if (newestConv.id !== currentConversationId) {
setCurrentConversationId(newestConv.id)
}
}
} catch (error) {
console.error('Failed to handle conversation creation:', error)
}
}, [currentConversationId])
const toggleSidebar = useCallback(() => {
setSidebarOpen(prev => !prev)
}, [])
const closeSidebar = useCallback(() => {
// Only allow closing on mobile
if (window.innerWidth < 1024) {
setSidebarOpen(false)
}
}, [])
return (
<div className="flex h-screen bg-gray-50 overflow-hidden">
{/* Mobile overlay */}
<AnimatePresence>
{sidebarOpen && window.innerWidth < 1024 && (
<motion.div
initial={{ opacity: 0 }}
animate={{ opacity: 1 }}
exit={{ opacity: 0 }}
className="fixed inset-0 bg-black bg-opacity-50 z-40 lg:hidden"
onClick={closeSidebar}
/>
)}
</AnimatePresence>
{/* Sidebar */}
<AnimatePresence>
{sidebarOpen && (
<motion.div
initial={{ x: window.innerWidth < 1024 ? -320 : 0 }}
animate={{ x: 0 }}
exit={{ x: -320 }}
transition={{ type: "spring", damping: 25, stiffness: 300 }}
className="fixed left-0 top-0 h-full w-80 bg-white z-50 lg:relative lg:z-auto shadow-xl border-r border-gray-200 flex flex-col"
>
{/* Sidebar Header */}
<div className="p-4 border-b border-gray-200 bg-gradient-to-r from-blue-50 to-indigo-50">
<div className="flex items-center justify-between mb-4">
<div className="flex items-center space-x-3">
<div className="w-10 h-10 bg-gradient-to-br from-blue-500 to-blue-600 rounded-xl flex items-center justify-center shadow-md">
<MessageSquare className="w-5 h-5 text-white" />
</div>
<div>
<h1 className="text-lg font-bold text-gray-900">AI Assistant</h1>
<p className="text-xs text-gray-600">{conversations.length} conversations</p>
</div>
</div>
{/* Close button - only on mobile */}
<Button
variant="ghost"
size="sm"
onClick={closeSidebar}
className="lg:hidden text-gray-500 hover:text-gray-700"
>
<X className="w-5 h-5" />
</Button>
</div>
{/* New Conversation Button */}
<Button
onClick={handleNewConversation}
className="w-full bg-gradient-to-r from-blue-600 to-blue-700 hover:from-blue-700 hover:to-blue-800 text-white shadow-md"
disabled={isCreatingNewConversation}
isLoading={isCreatingNewConversation}
>
<Plus className="w-4 h-4 mr-2" />
New Conversation
</Button>
</div>
{/* Conversation List */}
<div className="flex-1 overflow-hidden">
<ConversationList
conversations={conversations}
onSelectConversation={handleSelectConversation}
onDeleteConversation={handleDeleteConversation}
currentConversationId={currentConversationId}
/>
</div>
{/* Sidebar Footer */}
<div className="p-4 border-t border-gray-200 bg-gray-50">
<p className="text-xs text-gray-500 text-center">
{conversations.length} conversations
</p>
</div>
</motion.div>
)}
</AnimatePresence>
{/* Main Content */}
<div className="flex-1 flex flex-col min-w-0">
{/* Mobile Header - Always Visible */}
<div className="bg-white border-b border-gray-200 p-3 flex items-center justify-between lg:hidden">
<Button
variant="ghost"
size="sm"
onClick={toggleSidebar}
className="text-gray-600 hover:text-gray-900"
>
<Menu className="w-5 h-5" />
<span className="ml-2 text-sm font-medium">
{sidebarOpen ? 'Close' : 'Conversations'}
</span>
</Button>
<div className="flex items-center space-x-2">
<div className="w-6 h-6 bg-gradient-to-br from-blue-500 to-blue-600 rounded-lg flex items-center justify-center">
<MessageSquare className="w-3 h-3 text-white" />
</div>
<h1 className="font-semibold text-gray-900">AI Assistant</h1>
</div>
<Button
variant="ghost"
size="sm"
onClick={handleNewConversation}
className="text-blue-600 hover:text-blue-700"
disabled={isCreatingNewConversation}
isLoading={isCreatingNewConversation}
>
<Plus className="w-5 h-5" />
</Button>
</div>
{/* Desktop Header - Only visible when sidebar is closed */}
{!sidebarOpen && (
<div className="bg-white border-b border-gray-200 p-4 hidden lg:flex items-center justify-between">
<Button
variant="ghost"
size="sm"
onClick={toggleSidebar}
className="text-gray-600 hover:text-gray-900"
>
<Menu className="w-5 h-5 mr-2" />
Show Conversations
</Button>
<Button
onClick={handleNewConversation}
className="bg-blue-600 hover:bg-blue-700 text-white"
disabled={isCreatingNewConversation}
isLoading={isCreatingNewConversation}
>
<Plus className="w-4 h-4 mr-2" />
New Chat
</Button>
</div>
)}
{/* Chat Container */}
<div className="flex-1 overflow-hidden">
<ChatContainer
key={currentConversationId || 'new'}
conversationId={currentConversationId}
onConversationUpdate={refreshConversations}
onNewConversation={handleConversationCreated}
/>
</div>
</div>
</div>
)
}
export default App
This main application component coordinates the entire UI, managing sidebar state, conversation selection, and mobile responsiveness.
16. Styling Setup
Finally, create the CSS entry point at client/src/index.css:
@import "tailwindcss";
body {
margin: 0;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', sans-serif;
}
This minimal CSS imports Tailwind’s utility classes and sets up the base font family for consistent typography across the application.
17. Running and Testing the Application
Start the backend server first:
cd server
npm run dev
The server will start on port 8080
and display health check information. You should see confirmation that all environment variables are properly configured.
In a new terminal, start the frontend development server:
cd client
npm run dev
The frontend will start on port 5173
and automatically open your browser. You can now test the complete conversational AI system.
Run a Demo
If you open the frontend, you’ll see an interface like the one shown below. You can click on “Start Chat.”
Conclusion
That’s it! You’ve successfully built a complete conversational AI application using ZEGOCLOUD’s real-time communication platform. The system handles voice recognition, AI response generation, and natural conversation flow with persistent memory across sessions.
The application treats AI agents as real participants in voice calls, enabling natural interruption and real-time responses. Users can seamlessly switch between text and voice input while maintaining conversation context.
The modular architecture makes it easy to extend functionality and customize the experience for specific use cases. Your conversational AI system now provides professional-grade voice communication with the intelligence and responsiveness users expect from modern AI applications.
FAQ
Q1. What technologies are required to build a conversational AI?
You typically need natural language processing (NLP/LLM), automatic speech recognition (ASR), text-to-speech (TTS), and real-time communication (RTC) technologies. Together, they create a seamless loop for listening, understanding, and responding.
Q2. How do I integrate conversational AI into existing apps or platforms?
Most providers offer SDKs and APIs that support cross-platform integration (iOS, Android, Web). A well-documented, all-in-one SDK can significantly speed up development.
Q3. What are the main use cases of conversational AI?
Typical use cases include customer service bots, AI voice assistants, live streaming interactions, in-game NPCs, virtual classrooms, healthcare assistants, and enterprise collaboration tools.
Q4. How do I choose the right conversational AI provider?
Evaluate providers based on latency performance, global coverage, ease of integration, scalability, security standards, cost transparency, and proven case studies.
Let’s Build APP Together
Start building with real-time video, voice & chat SDK for apps today!