How to Create a Virtual AI Girlfriend with ZEGOCLOUD

Most virtual AI girlfriend apps are just another chat window: type some text, get some text back. A proper companion should feel more like a call – you see her face, you hear her voice, she remembers what you told her, and she steers the conversation instead of waiting for prompts. This article shows you how to build a virtual AI girlfriend that does all of that, using ZEGOCLOUD’s Agent and Digital Human APIs.

How to Develop an AI Girlfriend App

ZEGOCLOUD provides real-time communication APIs and SDKs with native support for AI agents and digital avatars.

In this virtual AI girlfriend project, we’ll use three ZEGOCLOUD tools:

Agent service – turns your speech into text, applies your “girlfriend” persona prompt on top of an LLM, and streams short TTS replies back.
Digital Human service – renders your companion as a digital human avatar and keeps her expressions and lip-sync aligned with the agent’s voice.
WebRTC SDK (ZegoExpressEngine) – manages the real-time room, sends your mic stream up, and plays back the agent + avatar streams in the browser.

Together, these let you ship a companion that talks with a natural voice, appears as a digital human on the page, and runs inside a small React + Node project you can clone and adapt.

Prerequisites

Before you run the virtual girlfriend project, you should have:

A ZEGOCLOUD account with Agent + Digital Human enabled – Sign up here.
A project with a valid AppID and ServerSecret.
A DashScope API key (optional for local demos, required for your own Qwen usage).
Node.js 18+ and npm installed.
A modern desktop browser (Chrome/Edge) with microphone access.
Basic web development knowledge.

1. Project Setup

The virtual girlfriend implementation lives in this project repository under two folders, server/ and client/. You can clone it as-is, or follow this section to understand how the pieces fit together and how to recreate the setup.

1.1 Architecture Overview

Backend (server)
Node + Express app exposing REST endpoints such as /api/start-digital-human, /api/stop, and /api/token.
ZEGOCLOUD signature helper for Agent + Digital Human API calls.
Agent registration with a virtual girlfriend persona (LLM, TTS, ASR config).
Digital human agent instance creation and cleanup are tied to room/stream IDs.
Frontend (client)
React + TypeScript app built with Vite.
ZegoService wrapper around ZegoExpressEngine for joining rooms, publishing mic, and playing remote streams.
useGirlfriendChat hook that orchestrates the chat session lifecycle and tracks messages.
GirlfriendRoom component that combines the digital human video view and chat UI.

The rest of this guide walks through how those pieces are configured and wired together.

1.2 Installing Dependencies and Environment

If you are setting this up from scratch, you can use the following structure:

mkdir virtual-girlfriend && cd virtual-girlfriend
mkdir server client

Backend setup

cd server
npm init -y
npm install express cors dotenv axios typescript tsx
npm install --save-dev @types/express @types/cors @types/node

Use tsx for development and TypeScript for builds:

// server/package.json (scripts)
{
  "scripts": {
    "dev": "tsx watch src/server.ts",
    "build": "tsc",
    "start": "node dist/server.js",
    "type-check": "tsc --noEmit"
  }
}

Then create server/.env with at least:

ZEGO_APP_ID=your_zego_app_id_here
ZEGO_SERVER_SECRET=your_zego_server_secret_here
ZEGO_API_BASE_URL=https://aigc-aiagent-api.zegotech.cn
DASHSCOPE_API_KEY=sk-your_dashscope_api_key_here
PORT=8080
NODE_ENV=development
SERVER_URL=http://localhost:8080

For step-by-step notes on how to get each value from the ZEGOCLOUD and DashScope consoles, see server/.env.example the repo.

Frontend setup

From the project root:

cd ../client
npm create vite@latest . -- --template react-ts
npm install zego-express-engine-webrtc axios framer-motion lucide-react tailwindcss zod

Add client/.env:

VITE_ZEGO_APP_ID=your_zego_app_id_here
VITE_ZEGO_SERVER=wss://webliveroom[your_app_id]-api.coolzcloud.com/ws
VITE_API_BASE_URL=http://localhost:8080

If you need more guidance on where each value comes from in the ZEGOCLOUD console, check client/.env.example.

2. Server Implementation

2.1 Configure the agent persona

The entire “girlfriend” feeling comes from the Agent configuration on the server.

In server/src/server.ts, the AGENT_CONFIG block defines:

Which LLM endpoint to hit.
The system prompts that shape tone and boundaries.
ASR hotwords that help with speech recognition in this domain.

The core of it looks like this:

const AGENT_CONFIG = {
  LLM: {
    Url: 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions',
    ApiKey: 'zego_test', // or your DASHSCOPE_API_KEY
    Model: 'qwen-plus',
    SystemPrompt: `You are a warm, caring virtual girlfriend-style AI companion.
Your job is to chat with the user, make them feel seen and appreciated, and keep
the conversation flowing naturally in a relaxed, friendly way.

GUIDELINES:
- Speak in a casual, affectionate tone, as if you are in a close relationship with the user.
- Ask gentle follow-up questions about their day, interests, opinions, memories, and feelings.
- Keep each reply short and easy to listen to in audio (about 1–3 sentences).
- Avoid explicit sexual content, graphic violence, hate, or abusive behavior.
- If the user asks for serious medical, legal, or financial advice, kindly suggest they speak with a qualified professional instead.
- If the user seems lonely, stressed, or sad, respond with empathy, validation, and soft encouragement.

STYLE:
- Be playful sometimes, but always respectful and kind.
- Remember small personal details mentioned earlier and refer back to them when it feels natural.
- Always reply in the same language the user is using.`,
    Temperature: 0.7,
    TopP: 0.9,
    Params: { max_tokens: 400 }
  },
  TTS: {
    Vendor: 'ByteDance',
    Params: {
      app: {
        appid: 'zego_test',
        token: 'zego_test',
        cluster: 'volcano_tts'
      },
      speed_ratio: 1,
      volume_ratio: 1,
      pitch_ratio: 1,
      audio: {
        rate: 24000
      }
    },
    FilterText: [
      { BeginCharacters: '(', EndCharacters: ')' },
      { BeginCharacters: '[', EndCharacters: ']' }
    ],
    TerminatorText: '#'
  },
  ASR: {
    Vendor: 'Tencent',
    Params: {
      engine_model_type: '16k_en',
      hotword_list: 'chat|10,feelings|8,memories|8,day|8,relationship|8,interests|8'
    },
    VADSilenceSegmentation: 1500,
    PauseInterval: 2000
  }
}

You can treat this as the “personality file” – if you want a slightly different companion, this is what you edit.

2.2 Register the agent once per process

The server then registers a single agent on startup:

let REGISTERED_AGENT_ID: string | null = null

async function registerAgent(): Promise<string> {
  if (REGISTERED_AGENT_ID) return REGISTERED_AGENT_ID

  const agentId = `virtual_girlfriend_agent_${Date.now()}`
  const payload = { AgentId: agentId, Name: 'Virtual AI Girlfriend', ...AGENT_CONFIG }

  const result = await makeZegoRequest('RegisterAgent', payload)
  if (result.Code !== 0) throw new Error(`RegisterAgent failed: ${result.Code} ${result.Message}`)

  REGISTERED_AGENT_ID = agentId
  return agentId
}

You don’t need a new agent per user; they all share this persona.

2.3 Start a virtual girlfriend session (agent + avatar room)

When a chat starts, you need three things to line up:

A room ID in ZEGOCLOUD.
A user stream for your microphone.
An agent instance bound to a specific Digital Human avatar.

The backend takes care of this in POST /api/start-digital-human:

app.post('/api/start-digital-human', async (req, res) => {
  const { room_id, user_id, user_stream_id, digital_human_id } = req.body
  if (!room_id || !user_id) {
    return res.status(400).json({ error: 'room_id and user_id required' })
  }

  const roomIdRTC = sanitizeRTCId(room_id)
  const userStreamId = (user_stream_id || `${user_id}_stream`)
    .toLowerCase()
    .replace(/[^a-z0-9_.-]/g, '')
    .slice(0, 128)
  const { agentUserId, agentStreamId } = buildAgentIdentifiers(roomIdRTC)
  const digitalHumanId = digital_human_id || 'your_default_digital_human_id'

  console.log('Starting virtual girlfriend chat:', { room_id: roomIdRTC, digitalHumanId })

  const agentId = await registerAgent()

  const normalizedUserId = String(user_id)
    .replace(/[^a-zA-Z0-9_-]/g, '')
    .slice(0, 32) || `user${Date.now().toString(36)}`

  const payload = {
    AgentId: agentId,
    UserId: normalizedUserId,
    RTC: {
      RoomId: roomIdRTC,
      AgentUserId: agentUserId,
      AgentStreamId: agentStreamId,
      UserStreamId: userStreamId
    },
    DigitalHuman: {
      DigitalHumanId: digitalHumanId,
      ConfigId: 'web',
      EncodeCode: 'H264'
    }
  }

  const result = await makeZegoRequest('CreateDigitalHumanAgentInstance', payload, 'aiagent')
  if (!result || result.Code !== 0) {
    return res.status(400).json({ error: result?.Message || 'Failed to create digital human agent instance' })
  }

  res.json({
    success: true,
    agentInstanceId: result.Data?.AgentInstanceId,
    agentStreamId,
    roomId: roomIdRTC,
    digitalHumanId,
    digitalHumanConfig: result.Data?.DigitalHumanConfig,
    unifiedDigitalHuman: true
  })
})

On the way out, the client receives everything they need to join the room and attach the avatar.

2.4 Stop the session and clean up

There’s also a POST /api/stop route that:

Queries metrics about the session.
Stops any associated digital human stream tasks.
Deletes the agent instance.

The React hook calls this automatically when you hit End Chat.

3. Frontend Implementation

3.1 ZegoExpressEngine wrapper (ZegoService)

On the frontend, all WebRTC and media handling is centralized in ZegoService (client/src/services/zego.ts). It:

Manages a single ZegoExpressEngine instance.
Joins/leaves rooms.
Publish your microphone stream.
Plays back the agent audio and digital human video streams.

The core looks like this:

// client/src/services/zego.ts
import { ZegoExpressEngine } from 'zego-express-engine-webrtc'
import { VoiceChanger } from 'zego-express-engine-webrtc/voice-changer'
import { config } from '../config'
import { digitalHumanAPI } from './digitalHumanAPI'

export class ZegoService {
  private static instance: ZegoService
  private zg: ZegoExpressEngine | null = null
  private isInitialized = false
  private currentRoomId: string | null = null
  private currentUserId: string | null = null
  private localStream: MediaStream | null = null
  private dhVideoStreamId: string | null = null

  static getInstance(): ZegoService {
    if (!ZegoService.instance) ZegoService.instance = new ZegoService()
    return ZegoService.instance
  }

  async initialize(): Promise<void> {
    if (this.isInitialized) return

    try {
      try { ZegoExpressEngine.use(VoiceChanger) } catch {}
      this.zg = new ZegoExpressEngine(
        parseInt(config.ZEGO_APP_ID),
        config.ZEGO_SERVER,
        { scenario: 7 } // AI digital human scenario
      )

      try {
        const rtc = await this.zg.checkSystemRequirements('webRTC')
        const mic = await this.zg.checkSystemRequirements('microphone')
        if (!rtc?.result) throw new Error('WebRTC not supported')
        if (!mic?.result) console.warn('Microphone permission not granted yet')
      } catch {}

      this.setupEventListeners()
      this.isInitialized = true
    } catch (error) {
      console.error('ZEGO initialization failed:', error)
      throw error
    }
  }

  async joinRoom(roomId: string, userId: string): Promise<boolean> {
    if (!this.zg) return false
    if (this.currentRoomId === roomId && this.currentUserId === userId) return true

    try {
      // Leave previous room if any
      if (this.currentRoomId) {
        await this.leaveRoom()
      }

      this.currentRoomId = roomId
      this.currentUserId = userId

      // 1) Ask backend for a ZEGO token for this user/room
      const { token } = await digitalHumanAPI.getToken(userId, roomId)

      // 2) Login and enable room messages (ASR / LLM events)
      await this.zg.loginRoom(roomId, token, { userID: userId, userName: userId })
      this.zg.callExperimentalAPI({ method: 'onRecvRoomChannelMessage', params: {} })

      // 3) Create and publish local mic stream
      this.localStream = await this.zg.createStream({ camera: { video: false, audio: true } })
      await this.zg.startPublishingStream(`${userId}_stream`, this.localStream, {
        enableAutoSwitchVideoCodec: true
      })

      return true
    } catch (error) {
      console.error('Failed to join room:', error)
      this.currentRoomId = null
      this.currentUserId = null
      this.localStream = null
      return false
    }
  }

  async leaveRoom(): Promise<void> {
    if (!this.zg || !this.currentRoomId) return
    try {
      if (this.localStream) {
        await this.zg.stopPublishingStream(`${this.currentUserId}_stream`)
        this.localStream.getTracks().forEach(t => t.stop())
      }
      await this.zg.logoutRoom(this.currentRoomId)
    } finally {
      this.currentRoomId = null
      this.currentUserId = null
      this.localStream = null
    }
  }

  setDigitalHumanStream(streamId: string | null): void {
    this.dhVideoStreamId = streamId
    if (!streamId) return
    void this.startDigitalHumanPlayback(streamId)
  }

  private async startDigitalHumanPlayback(streamId: string): Promise<void> {
    if (!this.zg) return

    const mediaStream = await this.zg.startPlayingStream(streamId)
    if (!mediaStream) return

    const remoteView = await (this.zg as any).createRemoteStreamView(mediaStream)
    if (!remoteView) return

    // Attach audio
    Promise.resolve(remoteView.playAudio({ enableAutoplayDialog: true })).catch(() => {})

    // Attach video to #remoteSteamView container
    const attach = async () => {
      const container = document.getElementById('remoteSteamView')
      if (!container) {
        setTimeout(attach, 200)
        return
      }

      const result = await Promise.resolve(
        remoteView.playVideo(container, { enableAutoplayDialog: false })
      )

      setTimeout(() => {
        const videoEl = container.querySelector('video') as HTMLVideoElement | null
        if (!videoEl) return

        if (!videoEl.srcObject) {
          videoEl.srcObject = mediaStream
          videoEl.load()
          void videoEl.play()
        }
      }, 150)
    }

    attach()
  }

  private setupEventListeners(): void {
    if (!this.zg) return

    this.zg.on('recvExperimentalAPI', (result: any) => {
      const { method, content } = result
      if (method === 'onRecvRoomChannelMessage') {
        try {
          const msg = JSON.parse(content.msgContent)
          this.handleRoomMessage(msg)
        } catch (e) {
          console.error('Parse room message failed:', e)
        }
      }
    })

    // roomStreamUpdate handler in the real code tracks all remote streams;
    // for the girlfriend project, setDigitalHumanStream() is used to mark the
    // digital human video stream that should be rendered in the UI.
  }

  private handleRoomMessage(message: any): void {
    // Forward ASR / LLM messages to whoever subscribed (the React hook)
    // implementation omitted here; see repo for full version.
  }
}

3.2 Digital Human API client

There is also a digitalHumanAPI client (client/src/services/digitalHumanAPI.ts) that wraps your Express routes behind a small axios wrapper.

// client/src/services/digitalHumanAPI.ts
import axios from 'axios'
import { config } from '../config'

const api = axios.create({
  baseURL: config.API_BASE_URL,
  timeout: 30000,
  headers: { 'Content-Type': 'application/json' }
})

export const digitalHumanAPI = {
  async startSession(roomId: string, userId: string) {
    const requestData = {
      room_id: roomId,
      user_id: userId,
      user_stream_id: `${userId}_stream`,
      // digital_human_id: optional override if you don't want the default
    }

    const response = await api.post('/api/start-digital-human', requestData)

    if (!response.data || !response.data.success) {
      throw new Error(response.data?.error || 'Virtual girlfriend chat start failed')
    }

    return {
      agentInstanceId: response.data.agentInstanceId,
      agentStreamId: response.data.agentStreamId,
      digitalHumanTaskId: response.data.digitalHumanTaskId,
      digitalHumanVideoStreamId: response.data.digitalHumanVideoStreamId,
      digitalHumanConfig: response.data.digitalHumanConfig,
      roomId: response.data.roomId || roomId,
      digitalHumanId: response.data.digitalHumanId,
      unifiedDigitalHuman: response.data.unifiedDigitalHuman
    }
  },

  async stopSession(agentInstanceId: string, digitalHumanTaskId?: string) {
    if (!agentInstanceId) return

    await api.post('/api/stop', { agent_instance_id: agentInstanceId })

    if (digitalHumanTaskId) {
      await api.post('/api/stop-digital-human', { task_id: digitalHumanTaskId })
    }
  },

  async sendMessage(agentInstanceId: string, message: string) {
    const trimmed = (message || '').trim()
    if (!agentInstanceId || !trimmed) return

    const response = await api.post('/api/send-message', {
      agent_instance_id: agentInstanceId,
      message: trimmed
    })

    if (!response.data?.success) {
      throw new Error(response.data?.error || 'Message send failed')
    }
  },

  async getToken(userId: string, roomId?: string) {
    const params = new URLSearchParams({ user_id: userId })
    if (roomId) params.append('room_id', roomId)

    const response = await api.get(`/api/token?${params.toString()}`)
    if (!response.data?.token) {
      throw new Error('No token returned')
    }

    return { token: response.data.token }
  }
}

3.3 React chat hook (`useGirlfriendChat`)

On the client, most of the chat logic lives in a custom hook: useGirlfriendChat (client/src/hooks/useGirlfriendChat.ts).

It owns:

Connection state (isConnected, isRecording, agentStatus).
Message history (messages: Message[]).
Timing (startTime, isChatComplete).
The two main actions: startChat() and endChat().

In rough shape:

interface ChatState {
  messages: Message[]
  session: ChatSession | null
  isLoading: boolean
  isConnected: boolean
  isRecording: boolean
  currentTranscript: string
  agentStatus: 'idle' | 'listening' | 'thinking' | 'speaking'
  error: string | null
  questionsAsked: number
  isChatComplete: boolean
  startTime: number | null
}

export const useGirlfriendChat = () => {
  const [state, dispatch] = useReducer(chatReducer, initialState)
  const zegoService = useRef(ZegoService.getInstance())

  const startChat = useCallback(async () => {
    if (state.isLoading || state.isConnected) return false

    dispatch({ type: 'SET_LOADING', payload: true })
    dispatch({ type: 'SET_ERROR', payload: null })
    dispatch({ type: 'SET_START_TIME', payload: Date.now() })

    try {
      const roomId = generateRtcId('chat')
      const userId = generateRtcId('user')

      await zegoService.current.initialize()
      const sessionConfig = await digitalHumanAPI.startSession(roomId, userId)

      const joined = await zegoService.current.joinRoom(sessionConfig.roomId || roomId, userId)
      if (!joined) throw new Error('Failed to join ZEGO room')

      if (sessionConfig.agentStreamId) {
        zegoService.current.setAgentAudioStream(sessionConfig.agentStreamId)
      }
      if (sessionConfig.digitalHumanVideoStreamId) {
        zegoService.current.setDigitalHumanStream(sessionConfig.digitalHumanVideoStreamId)
      }

      // store session metadata and kick off the first message
      // ...
    } finally {
      dispatch({ type: 'SET_LOADING', payload: false })
    }
  }, [state.isLoading, state.isConnected])

  const endChat = useCallback(async () => {
    // disable mic, ask backend to stop the agent + digital human, leave room
  }, [state.session, state.isConnected])

  return { ...state, startChat, endChat /* + message helpers */ }
}

Internally, it also wires ZegoService room messages (ASR and LLM chunks) into your messages array, and flips agentStatus between listening, thinking, speaking, and idle so the UI can respond.

3.4 Chat UI container

The main chat UI is implemented in client/src/components/Chat/ChatContainer.tsx. It connects your hook/state layer to presentational components like MessageBubble and VoiceMessageInput:

export const ChatContainer = () => {
  const messagesEndRef = useRef<HTMLDivElement>(null)
  const {
    messages,
    isLoading,
    isConnected,
    isRecording,
    currentTranscript,
    agentStatus,
    session,
    startSession,
    sendTextMessage,
    toggleVoiceRecording,
    toggleVoiceSettings,
    endSession
  } = useChat()

  const handleStartChat = async () => {
    await startSession()
  }

  const handleEndChat = async () => {
    await endSession()
  }

  return (
    <motion.div className="flex flex-col h-full bg-gray-50">
      <audio id="ai-audio-output" autoPlay style={{ display: 'none' }} playsInline />

      {/* Header with title and Start/End Chat buttons */}
      {/* Messages list using MessageBubble */}
      {/* Typing indicator when agentStatus === 'thinking' */}

      {isConnected && (
        <VoiceMessageInput
          onSendMessage={sendTextMessage}
          isRecording={isRecording}
          onToggleRecording={toggleVoiceRecording}
          currentTranscript={currentTranscript}
          isConnected={isConnected}
          voiceEnabled={session?.voiceSettings.isEnabled || false}
          onToggleVoice={toggleVoiceSettings}
          agentStatus={agentStatus}
        />
      )}
    </motion.div>
  )
}

In the real project, this component:

Renders a header with the Virtual AI Girlfriend title and status text.
Shows a welcome/empty state when there are no messages.
Maps messages to MessageBubble components with a small “AI is thinking…” typing indicator.
Attaches the VoiceMessageInput at the bottom once the chat is connected.

3.5 Girlfriend Room: the actual experience

client/src/components/Girlfriend/InterviewRoom.tsx (component name GirlfriendRoom) is where the hook gets turned into UI for the digital human + chat layout:

export interface ChatSummary {
  duration: string
  questionsCount: number
  responsesCount: number
  messages: Message[]
}

export const GirlfriendRoom = ({ onComplete }: { onComplete: (d: ChatSummary) => void }) => {
  const [currentTime, setCurrentTime] = useState(Date.now())

  const {
    messages,
    isLoading,
    isConnected,
    isRecording,
    error,
    agentStatus,
    questionsAsked,
    isChatComplete,
    startTime,
    startChat,
    endChat
  } = useGirlfriendChat()

  useEffect(() => { void startChat() }, [])

  // track duration and emit ChatSummary when the chat is done
  useEffect(() => {
    if (!isChatComplete || !startTime) return
    const secs = Math.floor((Date.now() - startTime) / 1000)
    onComplete({
      duration: `${Math.floor(secs / 60)}:${(secs % 60).toString().padStart(2, '0')}`,
      questionsCount: messages.filter(m => m.sender === 'ai').length,
      responsesCount: messages.filter(m => m.sender === 'user').length,
      messages
    })
  }, [isChatComplete, startTime, messages, questionsAsked, onComplete])

  // header + status + avatar + chat panel omitted for brevity
}

The rest of the UI (welcome screen, VoiceMessageInput, chat bubbles) follows the same pattern: it listens to useGirlfriendChat state and renders a companion-style chat experience.

4. Run It Locally and Talk to Her

With .env files filled in and dependencies installed, you can spin everything up:

# From /server
npm install
npm run dev

# From /client
npm install
npm run dev

Then:

Open http://localhost:5173 in a Chromium-based browser.
Click Start Chat.
Allow microphone access when the browser prompts you.
Wait a couple of seconds for the agent + digital human to spin up.
Start talking – she should greet you and continue the conversation with short audio replies.

If something’s off:

Check the browser console for VITE_* issues.
Check the server logs for RegisterAgent / CreateDigitalHumanAgentInstance errors.
Inspect network calls to /api/start-digital-human and /api/token.

Demo

Conclusion and next steps

This virtual girlfriend is deliberately minimal – just enough wiring to prove the concept and give you a clean starting point. From here you can:

Add multiple personas – configure several agents (playful, serious, coach, language partner) and let users pick.
Persist real memory – store summaries of past chats and prepend them to the LLM prompt for more continuity.
Expose preferences – sliders for how talkative she is, how much she asks vs. listens, or which topics she should focus on.
Integrate user accounts – save chat history per user and sync across devices.

You don’t have to think about streaming, codecs, or avatar animation – ZEGOCLOUD already does that. Your job is to decide who the AI feels like, what she remembers, and how the experience fits into your product.