Most virtual AI girlfriend apps are just another chat window: type some text, get some text back. A proper companion should feel more like a call – you see her face, you hear her voice, she remembers what you told her, and she steers the conversation instead of waiting for prompts. This article shows you how to build a virtual AI girlfriend that does all of that, using ZEGOCLOUD’s Agent and Digital Human APIs.
How to Develop an AI Girlfriend App
ZEGOCLOUD provides real-time communication APIs and SDKs with native support for AI agents and digital avatars.
In this virtual AI girlfriend project, we’ll use three ZEGOCLOUD tools:
- Agent service – turns your speech into text, applies your “girlfriend” persona prompt on top of an LLM, and streams short TTS replies back.
- Digital Human service – renders your companion as a digital human avatar and keeps her expressions and lip-sync aligned with the agent’s voice.
- WebRTC SDK (ZegoExpressEngine) – manages the real-time room, sends your mic stream up, and plays back the agent + avatar streams in the browser.
Together, these let you ship a companion that talks with a natural voice, appears as a digital human on the page, and runs inside a small React + Node project you can clone and adapt.
Prerequisites
Before you run the virtual girlfriend project, you should have:
- A ZEGOCLOUD account with Agent + Digital Human enabled – Sign up here.
- A project with a valid AppID and ServerSecret.
- A DashScope API key (optional for local demos, required for your own Qwen usage).
- Node.js 18+ and npm installed.
- A modern desktop browser (Chrome/Edge) with microphone access.
- Basic web development knowledge.
1. Project Setup
The virtual girlfriend implementation lives in this project repository under two folders, server/ and client/. You can clone it as-is, or follow this section to understand how the pieces fit together and how to recreate the setup.
1.1 Architecture Overview
- Backend (
server) - Node + Express app exposing REST endpoints such as
/api/start-digital-human,/api/stop, and/api/token. - ZEGOCLOUD signature helper for Agent + Digital Human API calls.
- Agent registration with a virtual girlfriend persona (LLM, TTS, ASR config).
- Digital human agent instance creation and cleanup are tied to room/stream IDs.
- Frontend (
client) - React + TypeScript app built with Vite.
ZegoServicewrapper aroundZegoExpressEnginefor joining rooms, publishing mic, and playing remote streams.useGirlfriendChathook that orchestrates the chat session lifecycle and tracks messages.GirlfriendRoomcomponent that combines the digital human video view and chat UI.
The rest of this guide walks through how those pieces are configured and wired together.
1.2 Installing Dependencies and Environment
If you are setting this up from scratch, you can use the following structure:
mkdir virtual-girlfriend && cd virtual-girlfriend
mkdir server client
Backend setup
cd server
npm init -y
npm install express cors dotenv axios typescript tsx
npm install --save-dev @types/express @types/cors @types/node
Use tsx for development and TypeScript for builds:
// server/package.json (scripts)
{
"scripts": {
"dev": "tsx watch src/server.ts",
"build": "tsc",
"start": "node dist/server.js",
"type-check": "tsc --noEmit"
}
}
Then create server/.env with at least:
ZEGO_APP_ID=your_zego_app_id_here
ZEGO_SERVER_SECRET=your_zego_server_secret_here
ZEGO_API_BASE_URL=https://aigc-aiagent-api.zegotech.cn
DASHSCOPE_API_KEY=sk-your_dashscope_api_key_here
PORT=8080
NODE_ENV=development
SERVER_URL=http://localhost:8080
For step-by-step notes on how to get each value from the ZEGOCLOUD and DashScope consoles, see server/.env.example the repo.
Frontend setup
From the project root:
cd ../client
npm create vite@latest . -- --template react-ts
npm install zego-express-engine-webrtc axios framer-motion lucide-react tailwindcss zod
Add client/.env:
VITE_ZEGO_APP_ID=your_zego_app_id_here
VITE_ZEGO_SERVER=wss://webliveroom[your_app_id]-api.coolzcloud.com/ws
VITE_API_BASE_URL=http://localhost:8080
If you need more guidance on where each value comes from in the ZEGOCLOUD console, check client/.env.example.
2. Server Implementation
2.1 Configure the agent persona
The entire “girlfriend” feeling comes from the Agent configuration on the server.
In server/src/server.ts, the AGENT_CONFIG block defines:
- Which LLM endpoint to hit.
- The system prompts that shape tone and boundaries.
- ASR hotwords that help with speech recognition in this domain.
The core of it looks like this:
const AGENT_CONFIG = {
LLM: {
Url: 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions',
ApiKey: 'zego_test', // or your DASHSCOPE_API_KEY
Model: 'qwen-plus',
SystemPrompt: `You are a warm, caring virtual girlfriend-style AI companion.
Your job is to chat with the user, make them feel seen and appreciated, and keep
the conversation flowing naturally in a relaxed, friendly way.
GUIDELINES:
- Speak in a casual, affectionate tone, as if you are in a close relationship with the user.
- Ask gentle follow-up questions about their day, interests, opinions, memories, and feelings.
- Keep each reply short and easy to listen to in audio (about 1–3 sentences).
- Avoid explicit sexual content, graphic violence, hate, or abusive behavior.
- If the user asks for serious medical, legal, or financial advice, kindly suggest they speak with a qualified professional instead.
- If the user seems lonely, stressed, or sad, respond with empathy, validation, and soft encouragement.
STYLE:
- Be playful sometimes, but always respectful and kind.
- Remember small personal details mentioned earlier and refer back to them when it feels natural.
- Always reply in the same language the user is using.`,
Temperature: 0.7,
TopP: 0.9,
Params: { max_tokens: 400 }
},
TTS: {
Vendor: 'ByteDance',
Params: {
app: {
appid: 'zego_test',
token: 'zego_test',
cluster: 'volcano_tts'
},
speed_ratio: 1,
volume_ratio: 1,
pitch_ratio: 1,
audio: {
rate: 24000
}
},
FilterText: [
{ BeginCharacters: '(', EndCharacters: ')' },
{ BeginCharacters: '[', EndCharacters: ']' }
],
TerminatorText: '#'
},
ASR: {
Vendor: 'Tencent',
Params: {
engine_model_type: '16k_en',
hotword_list: 'chat|10,feelings|8,memories|8,day|8,relationship|8,interests|8'
},
VADSilenceSegmentation: 1500,
PauseInterval: 2000
}
}
You can treat this as the “personality file” – if you want a slightly different companion, this is what you edit.
2.2 Register the agent once per process
The server then registers a single agent on startup:
let REGISTERED_AGENT_ID: string | null = null
async function registerAgent(): Promise<string> {
if (REGISTERED_AGENT_ID) return REGISTERED_AGENT_ID
const agentId = `virtual_girlfriend_agent_${Date.now()}`
const payload = { AgentId: agentId, Name: 'Virtual AI Girlfriend', ...AGENT_CONFIG }
const result = await makeZegoRequest('RegisterAgent', payload)
if (result.Code !== 0) throw new Error(`RegisterAgent failed: ${result.Code} ${result.Message}`)
REGISTERED_AGENT_ID = agentId
return agentId
}
You don’t need a new agent per user; they all share this persona.
2.3 Start a virtual girlfriend session (agent + avatar room)
When a chat starts, you need three things to line up:
- A room ID in ZEGOCLOUD.
- A user stream for your microphone.
- An agent instance bound to a specific Digital Human avatar.
The backend takes care of this in POST /api/start-digital-human:
app.post('/api/start-digital-human', async (req, res) => {
const { room_id, user_id, user_stream_id, digital_human_id } = req.body
if (!room_id || !user_id) {
return res.status(400).json({ error: 'room_id and user_id required' })
}
const roomIdRTC = sanitizeRTCId(room_id)
const userStreamId = (user_stream_id || `${user_id}_stream`)
.toLowerCase()
.replace(/[^a-z0-9_.-]/g, '')
.slice(0, 128)
const { agentUserId, agentStreamId } = buildAgentIdentifiers(roomIdRTC)
const digitalHumanId = digital_human_id || 'your_default_digital_human_id'
console.log('Starting virtual girlfriend chat:', { room_id: roomIdRTC, digitalHumanId })
const agentId = await registerAgent()
const normalizedUserId = String(user_id)
.replace(/[^a-zA-Z0-9_-]/g, '')
.slice(0, 32) || `user${Date.now().toString(36)}`
const payload = {
AgentId: agentId,
UserId: normalizedUserId,
RTC: {
RoomId: roomIdRTC,
AgentUserId: agentUserId,
AgentStreamId: agentStreamId,
UserStreamId: userStreamId
},
DigitalHuman: {
DigitalHumanId: digitalHumanId,
ConfigId: 'web',
EncodeCode: 'H264'
}
}
const result = await makeZegoRequest('CreateDigitalHumanAgentInstance', payload, 'aiagent')
if (!result || result.Code !== 0) {
return res.status(400).json({ error: result?.Message || 'Failed to create digital human agent instance' })
}
res.json({
success: true,
agentInstanceId: result.Data?.AgentInstanceId,
agentStreamId,
roomId: roomIdRTC,
digitalHumanId,
digitalHumanConfig: result.Data?.DigitalHumanConfig,
unifiedDigitalHuman: true
})
})
On the way out, the client receives everything they need to join the room and attach the avatar.
2.4 Stop the session and clean up
There’s also a POST /api/stop route that:
- Queries metrics about the session.
- Stops any associated digital human stream tasks.
- Deletes the agent instance.
The React hook calls this automatically when you hit End Chat.
3. Frontend Implementation
3.1 ZegoExpressEngine wrapper (ZegoService)
On the frontend, all WebRTC and media handling is centralized in ZegoService (client/src/services/zego.ts). It:
- Manages a single
ZegoExpressEngineinstance. - Joins/leaves rooms.
- Publish your microphone stream.
- Plays back the agent audio and digital human video streams.
The core looks like this:
// client/src/services/zego.ts
import { ZegoExpressEngine } from 'zego-express-engine-webrtc'
import { VoiceChanger } from 'zego-express-engine-webrtc/voice-changer'
import { config } from '../config'
import { digitalHumanAPI } from './digitalHumanAPI'
export class ZegoService {
private static instance: ZegoService
private zg: ZegoExpressEngine | null = null
private isInitialized = false
private currentRoomId: string | null = null
private currentUserId: string | null = null
private localStream: MediaStream | null = null
private dhVideoStreamId: string | null = null
static getInstance(): ZegoService {
if (!ZegoService.instance) ZegoService.instance = new ZegoService()
return ZegoService.instance
}
async initialize(): Promise<void> {
if (this.isInitialized) return
try {
try { ZegoExpressEngine.use(VoiceChanger) } catch {}
this.zg = new ZegoExpressEngine(
parseInt(config.ZEGO_APP_ID),
config.ZEGO_SERVER,
{ scenario: 7 } // AI digital human scenario
)
try {
const rtc = await this.zg.checkSystemRequirements('webRTC')
const mic = await this.zg.checkSystemRequirements('microphone')
if (!rtc?.result) throw new Error('WebRTC not supported')
if (!mic?.result) console.warn('Microphone permission not granted yet')
} catch {}
this.setupEventListeners()
this.isInitialized = true
} catch (error) {
console.error('ZEGO initialization failed:', error)
throw error
}
}
async joinRoom(roomId: string, userId: string): Promise<boolean> {
if (!this.zg) return false
if (this.currentRoomId === roomId && this.currentUserId === userId) return true
try {
// Leave previous room if any
if (this.currentRoomId) {
await this.leaveRoom()
}
this.currentRoomId = roomId
this.currentUserId = userId
// 1) Ask backend for a ZEGO token for this user/room
const { token } = await digitalHumanAPI.getToken(userId, roomId)
// 2) Login and enable room messages (ASR / LLM events)
await this.zg.loginRoom(roomId, token, { userID: userId, userName: userId })
this.zg.callExperimentalAPI({ method: 'onRecvRoomChannelMessage', params: {} })
// 3) Create and publish local mic stream
this.localStream = await this.zg.createStream({ camera: { video: false, audio: true } })
await this.zg.startPublishingStream(`${userId}_stream`, this.localStream, {
enableAutoSwitchVideoCodec: true
})
return true
} catch (error) {
console.error('Failed to join room:', error)
this.currentRoomId = null
this.currentUserId = null
this.localStream = null
return false
}
}
async leaveRoom(): Promise<void> {
if (!this.zg || !this.currentRoomId) return
try {
if (this.localStream) {
await this.zg.stopPublishingStream(`${this.currentUserId}_stream`)
this.localStream.getTracks().forEach(t => t.stop())
}
await this.zg.logoutRoom(this.currentRoomId)
} finally {
this.currentRoomId = null
this.currentUserId = null
this.localStream = null
}
}
setDigitalHumanStream(streamId: string | null): void {
this.dhVideoStreamId = streamId
if (!streamId) return
void this.startDigitalHumanPlayback(streamId)
}
private async startDigitalHumanPlayback(streamId: string): Promise<void> {
if (!this.zg) return
const mediaStream = await this.zg.startPlayingStream(streamId)
if (!mediaStream) return
const remoteView = await (this.zg as any).createRemoteStreamView(mediaStream)
if (!remoteView) return
// Attach audio
Promise.resolve(remoteView.playAudio({ enableAutoplayDialog: true })).catch(() => {})
// Attach video to #remoteSteamView container
const attach = async () => {
const container = document.getElementById('remoteSteamView')
if (!container) {
setTimeout(attach, 200)
return
}
const result = await Promise.resolve(
remoteView.playVideo(container, { enableAutoplayDialog: false })
)
setTimeout(() => {
const videoEl = container.querySelector('video') as HTMLVideoElement | null
if (!videoEl) return
if (!videoEl.srcObject) {
videoEl.srcObject = mediaStream
videoEl.load()
void videoEl.play()
}
}, 150)
}
attach()
}
private setupEventListeners(): void {
if (!this.zg) return
this.zg.on('recvExperimentalAPI', (result: any) => {
const { method, content } = result
if (method === 'onRecvRoomChannelMessage') {
try {
const msg = JSON.parse(content.msgContent)
this.handleRoomMessage(msg)
} catch (e) {
console.error('Parse room message failed:', e)
}
}
})
// roomStreamUpdate handler in the real code tracks all remote streams;
// for the girlfriend project, setDigitalHumanStream() is used to mark the
// digital human video stream that should be rendered in the UI.
}
private handleRoomMessage(message: any): void {
// Forward ASR / LLM messages to whoever subscribed (the React hook)
// implementation omitted here; see repo for full version.
}
}
3.2 Digital Human API client
There is also a digitalHumanAPI client (client/src/services/digitalHumanAPI.ts) that wraps your Express routes behind a small axios wrapper.
// client/src/services/digitalHumanAPI.ts
import axios from 'axios'
import { config } from '../config'
const api = axios.create({
baseURL: config.API_BASE_URL,
timeout: 30000,
headers: { 'Content-Type': 'application/json' }
})
export const digitalHumanAPI = {
async startSession(roomId: string, userId: string) {
const requestData = {
room_id: roomId,
user_id: userId,
user_stream_id: `${userId}_stream`,
// digital_human_id: optional override if you don't want the default
}
const response = await api.post('/api/start-digital-human', requestData)
if (!response.data || !response.data.success) {
throw new Error(response.data?.error || 'Virtual girlfriend chat start failed')
}
return {
agentInstanceId: response.data.agentInstanceId,
agentStreamId: response.data.agentStreamId,
digitalHumanTaskId: response.data.digitalHumanTaskId,
digitalHumanVideoStreamId: response.data.digitalHumanVideoStreamId,
digitalHumanConfig: response.data.digitalHumanConfig,
roomId: response.data.roomId || roomId,
digitalHumanId: response.data.digitalHumanId,
unifiedDigitalHuman: response.data.unifiedDigitalHuman
}
},
async stopSession(agentInstanceId: string, digitalHumanTaskId?: string) {
if (!agentInstanceId) return
await api.post('/api/stop', { agent_instance_id: agentInstanceId })
if (digitalHumanTaskId) {
await api.post('/api/stop-digital-human', { task_id: digitalHumanTaskId })
}
},
async sendMessage(agentInstanceId: string, message: string) {
const trimmed = (message || '').trim()
if (!agentInstanceId || !trimmed) return
const response = await api.post('/api/send-message', {
agent_instance_id: agentInstanceId,
message: trimmed
})
if (!response.data?.success) {
throw new Error(response.data?.error || 'Message send failed')
}
},
async getToken(userId: string, roomId?: string) {
const params = new URLSearchParams({ user_id: userId })
if (roomId) params.append('room_id', roomId)
const response = await api.get(`/api/token?${params.toString()}`)
if (!response.data?.token) {
throw new Error('No token returned')
}
return { token: response.data.token }
}
}
3.3 React chat hook (useGirlfriendChat)
On the client, most of the chat logic lives in a custom hook: useGirlfriendChat (client/src/hooks/useGirlfriendChat.ts).
It owns:
- Connection state (
isConnected,isRecording,agentStatus). - Message history (
messages: Message[]). - Timing (
startTime,isChatComplete). - The two main actions:
startChat()andendChat().
In rough shape:
interface ChatState {
messages: Message[]
session: ChatSession | null
isLoading: boolean
isConnected: boolean
isRecording: boolean
currentTranscript: string
agentStatus: 'idle' | 'listening' | 'thinking' | 'speaking'
error: string | null
questionsAsked: number
isChatComplete: boolean
startTime: number | null
}
export const useGirlfriendChat = () => {
const [state, dispatch] = useReducer(chatReducer, initialState)
const zegoService = useRef(ZegoService.getInstance())
const startChat = useCallback(async () => {
if (state.isLoading || state.isConnected) return false
dispatch({ type: 'SET_LOADING', payload: true })
dispatch({ type: 'SET_ERROR', payload: null })
dispatch({ type: 'SET_START_TIME', payload: Date.now() })
try {
const roomId = generateRtcId('chat')
const userId = generateRtcId('user')
await zegoService.current.initialize()
const sessionConfig = await digitalHumanAPI.startSession(roomId, userId)
const joined = await zegoService.current.joinRoom(sessionConfig.roomId || roomId, userId)
if (!joined) throw new Error('Failed to join ZEGO room')
if (sessionConfig.agentStreamId) {
zegoService.current.setAgentAudioStream(sessionConfig.agentStreamId)
}
if (sessionConfig.digitalHumanVideoStreamId) {
zegoService.current.setDigitalHumanStream(sessionConfig.digitalHumanVideoStreamId)
}
// store session metadata and kick off the first message
// ...
} finally {
dispatch({ type: 'SET_LOADING', payload: false })
}
}, [state.isLoading, state.isConnected])
const endChat = useCallback(async () => {
// disable mic, ask backend to stop the agent + digital human, leave room
}, [state.session, state.isConnected])
return { ...state, startChat, endChat /* + message helpers */ }
}
Internally, it also wires ZegoService room messages (ASR and LLM chunks) into your messages array, and flips agentStatus between listening, thinking, speaking, and idle so the UI can respond.
3.4 Chat UI container
The main chat UI is implemented in client/src/components/Chat/ChatContainer.tsx. It connects your hook/state layer to presentational components like MessageBubble and VoiceMessageInput:
export const ChatContainer = () => {
const messagesEndRef = useRef<HTMLDivElement>(null)
const {
messages,
isLoading,
isConnected,
isRecording,
currentTranscript,
agentStatus,
session,
startSession,
sendTextMessage,
toggleVoiceRecording,
toggleVoiceSettings,
endSession
} = useChat()
const handleStartChat = async () => {
await startSession()
}
const handleEndChat = async () => {
await endSession()
}
return (
<motion.div className="flex flex-col h-full bg-gray-50">
<audio id="ai-audio-output" autoPlay style={{ display: 'none' }} playsInline />
{/* Header with title and Start/End Chat buttons */}
{/* Messages list using MessageBubble */}
{/* Typing indicator when agentStatus === 'thinking' */}
{isConnected && (
<VoiceMessageInput
onSendMessage={sendTextMessage}
isRecording={isRecording}
onToggleRecording={toggleVoiceRecording}
currentTranscript={currentTranscript}
isConnected={isConnected}
voiceEnabled={session?.voiceSettings.isEnabled || false}
onToggleVoice={toggleVoiceSettings}
agentStatus={agentStatus}
/>
)}
</motion.div>
)
}
In the real project, this component:
- Renders a header with the Virtual AI Girlfriend title and status text.
- Shows a welcome/empty state when there are no messages.
- Maps
messagestoMessageBubblecomponents with a small “AI is thinking…” typing indicator. - Attaches the
VoiceMessageInputat the bottom once the chat is connected.
3.5 Girlfriend Room: the actual experience
client/src/components/Girlfriend/InterviewRoom.tsx (component name GirlfriendRoom) is where the hook gets turned into UI for the digital human + chat layout:
export interface ChatSummary {
duration: string
questionsCount: number
responsesCount: number
messages: Message[]
}
export const GirlfriendRoom = ({ onComplete }: { onComplete: (d: ChatSummary) => void }) => {
const [currentTime, setCurrentTime] = useState(Date.now())
const {
messages,
isLoading,
isConnected,
isRecording,
error,
agentStatus,
questionsAsked,
isChatComplete,
startTime,
startChat,
endChat
} = useGirlfriendChat()
useEffect(() => { void startChat() }, [])
// track duration and emit ChatSummary when the chat is done
useEffect(() => {
if (!isChatComplete || !startTime) return
const secs = Math.floor((Date.now() - startTime) / 1000)
onComplete({
duration: `${Math.floor(secs / 60)}:${(secs % 60).toString().padStart(2, '0')}`,
questionsCount: messages.filter(m => m.sender === 'ai').length,
responsesCount: messages.filter(m => m.sender === 'user').length,
messages
})
}, [isChatComplete, startTime, messages, questionsAsked, onComplete])
// header + status + avatar + chat panel omitted for brevity
}
The rest of the UI (welcome screen, VoiceMessageInput, chat bubbles) follows the same pattern: it listens to useGirlfriendChat state and renders a companion-style chat experience.
4. Run It Locally and Talk to Her
With .env files filled in and dependencies installed, you can spin everything up:
# From /server
npm install
npm run dev
# From /client
npm install
npm run dev
Then:
- Open
http://localhost:5173in a Chromium-based browser. - Click Start Chat.
- Allow microphone access when the browser prompts you.
- Wait a couple of seconds for the agent + digital human to spin up.
- Start talking – she should greet you and continue the conversation with short audio replies.
If something’s off:
- Check the browser console for
VITE_*issues. - Check the server logs for
RegisterAgent/CreateDigitalHumanAgentInstanceerrors. - Inspect network calls to
/api/start-digital-humanand/api/token.
Demo
Conclusion and next steps
This virtual girlfriend is deliberately minimal – just enough wiring to prove the concept and give you a clean starting point. From here you can:
- Add multiple personas – configure several agents (playful, serious, coach, language partner) and let users pick.
- Persist real memory – store summaries of past chats and prepend them to the LLM prompt for more continuity.
- Expose preferences – sliders for how talkative she is, how much she asks vs. listens, or which topics she should focus on.
- Integrate user accounts – save chat history per user and sync across devices.
You don’t have to think about streaming, codecs, or avatar animation – ZEGOCLOUD already does that. Your job is to decide who the AI feels like, what she remembers, and how the experience fits into your product.
Let’s Build APP Together
Start building with real-time video, voice & chat SDK for apps today!






