How to Build an AI Therapist

Mental health support is becoming more accessible through technology. AI-powered therapy companions can provide 24/7 emotional support, active listening, and a judgment-free space for people to express their feelings. These systems combine real-time voice communication, natural language processing, and empathetic AI responses to create meaningful therapeutic conversations. In this guide, we will use ZEGOCLOUD to build a complete AI therapist application that listens to users through voice or text, processes their emotions with an LLM configured for therapeutic responses, and replies with natural, supportive guidance.

How to Develop an AI Therapist with ZEGOCLOUD

The architecture treats the AI therapist as a participant in a real-time communication room:

The user joins a ZEGOCLOUD room and publishes their microphone stream.
The AI agent joins the same room, listens to the user’s voice through automatic speech recognition (ASR), processes it with an LLM configured for therapy, and responds with text-to-speech (TTS).
Your web application manages the UI, displays the conversation history, and handles session state.

ZEGOCLOUD’s Conversational AI handles the integration between ASR, LLM, and TTS. You register an agent with your preferred LLM provider, configure voice settings, and create agent instances that automatically process voice input and generate spoken responses.

Prerequisites

Before starting, ensure you have:

A ZEGOCLOUD account with AI Agent services enabled → Sign up here
Node.js 18+ and npm installed.
A valid AppID and ServerSecret from the ZEGOCLOUD console.
A DashScope API key for the LLM (you can use zego_test for testing during the trial period)
A modern browser with microphone access (Chrome or Edge recommended)
Basic knowledge of web development.

Step 1. Project Setup

The complete implementation for this guide is available in the zego-therapist repository. Live demo here.

1.1 Architecture Overview

The project is structured as:

Backend (server)
Express API with endpoints for /api/start, /api/stop, /api/send-message, and /api/token
ZEGOCLOUD signature generation for API authentication
Agent registration with therapeutic LLM configuration
Session management and cleanup
Frontend (client)
React application built with Vite and TypeScript
ZegoExpressEngine for WebRTC communication
Real-time message handling for ASR and LLM events
Conversation memory stored in browser localStorage
Voice and text input modes

The backend only provides REST endpoints. All real-time audio and message streaming happens through ZEGOCLOUD’s infrastructure.

1.2 Installing Dependencies and Environment

Create the project structure:

mkdir zego-therapist && cd zego-therapist
mkdir server client

Backend setup

cd server
npm init -y
npm install express cors dotenv axios typescript tsx
npm install --save-dev @types/express @types/cors @types/node

Create server/.env:

ZEGO_APP_ID=your_numeric_app_id
ZEGO_SERVER_SECRET=your_32_character_secret
DASHSCOPE_API_KEY=your_dashscope_api_key
PORT=8080

Add development scripts to server/package.json:

{
  "scripts": {
    "dev": "tsx watch src/server.ts",
    "build": "tsc",
    "start": "node dist/server.js"
  }
}

Frontend setup

cd ../client
npm create vite@latest . -- --template react-ts
npm install zego-express-engine-webrtc axios framer-motion lucide-react tailwindcss zod

Create client/.env:

VITE_ZEGO_APP_ID=your_numeric_app_id
VITE_ZEGO_SERVER=wss://webrtc-api.zegocloud.com/ws
VITE_API_BASE_URL=http://localhost:8080

Validate configuration with Zod:

// client/src/config.ts
import { z } from 'zod'

const configSchema = z.object({
  ZEGO_APP_ID: z.string().min(1, 'ZEGO App ID is required'),
  API_BASE_URL: z.string().url('Valid API base URL required'),
  ZEGO_SERVER: z.string().url('Valid ZEGO server URL required'),
})

const rawConfig = {
  ZEGO_APP_ID: import.meta.env.VITE_ZEGO_APP_ID,
  API_BASE_URL: import.meta.env.VITE_API_BASE_URL,
  ZEGO_SERVER: import.meta.env.VITE_ZEGO_SERVER || 'wss://webrtc-api.zegocloud.com/ws',
}

export const config = configSchema.parse(rawConfig)

This ensures the application fails immediately if environment variables are missing or invalid.

Step 2. Building the Therapy Agent Server

The backend manages ZEGOCLOUD authentication, agent registration, and session lifecycle.

2.1 ZEGOCLOUD API Authentication

ZEGOCLOUD APIs use MD5-based signature authentication:

// server/src/server.ts
import crypto from 'crypto'
import axios from 'axios'
import dotenv from 'dotenv'
dotenv.config()

const CONFIG = {
  ZEGO_APP_ID: process.env.ZEGO_APP_ID!,
  ZEGO_SERVER_SECRET: process.env.ZEGO_SERVER_SECRET!,
  ZEGO_API_BASE_URL: 'https://aigc-aiagent-api.zegotech.cn/',
}

function generateZegoSignature(action: string) {
  const timestamp = Math.floor(Date.now() / 1000)
  const nonce = crypto.randomBytes(8).toString('hex')

  const signString = CONFIG.ZEGO_APP_ID + nonce + CONFIG.ZEGO_SERVER_SECRET + timestamp
  const signature = crypto.createHash('md5').update(signString).digest('hex')

  return {
    Action: action,
    AppId: CONFIG.ZEGO_APP_ID,
    SignatureNonce: nonce,
    SignatureVersion: '2.0',
    Timestamp: timestamp,
    Signature: signature
  }
}

async function makeZegoRequest(action: string, body: object = {}) {
  const queryParams = generateZegoSignature(action)
  const queryString = Object.entries(queryParams)
    .map(([k, v]) => `${k}=${encodeURIComponent(String(v))}`)
    .join('&')

  const url = `${CONFIG.ZEGO_API_BASE_URL}?${queryString}`
  const response = await axios.post(url, body, {
    headers: { 'Content-Type': 'application/json' },
    timeout: 30000
  })
  return response.data
}

This function generates the required signature for every ZEGOCLOUD API call.

2.2 Registering the Therapeutic AI Agent

The agent configuration defines how the AI therapist behaves:

// server/src/server.ts
let REGISTERED_AGENT_ID: string | null = null

async function registerAgent(): Promise<string> {
  if (REGISTERED_AGENT_ID) return REGISTERED_AGENT_ID

  const agentId = `agent_${Date.now()}`
  const agentConfig = {
    AgentId: agentId,
    Name: 'AI Therapist',
    LLM: {
      Url: 'https://dashscope.aliyuncs.com/compatible-mode/v1/chat/completions',
      ApiKey: 'zego_test',
      Model: 'qwen-plus',
      SystemPrompt: `You are a compassionate AI therapist. Listen actively, ask thoughtful questions, and provide supportive guidance. Use empathetic language and validate emotions. Keep responses conversational and under 100 words for natural voice flow. Focus on helping users explore their feelings and find their own solutions.`,
      Temperature: 0.7,
      TopP: 0.9,
      Params: { max_tokens: 200 }
    },
    TTS: {
      Vendor: 'ByteDance',
      Params: {
        app: { appid: 'zego_test', token: 'zego_test', cluster: 'volcano_tts' },
        speed_ratio: 1,
        volume_ratio: 1,
        pitch_ratio: 1,
        audio: { rate: 24000 }
      }
    },
    ASR: {
      Vendor: 'Tencent',
      Params: {
        engine_model_type: '16k_en',
        hotword_list: 'therapist|10,feelings|8,emotions|8,support|8'
      },
      VADSilenceSegmentation: 1500,
      PauseInterval: 2000
    }
  }

  const result = await makeZegoRequest('RegisterAgent', agentConfig)
  if (result.Code !== 0) {
    throw new Error(`RegisterAgent failed: ${result.Message}`)
  }

  REGISTERED_AGENT_ID = agentId
  return agentId
}

Key configuration points:

SystemPrompt: Defines the therapeutic personality and response style
Temperature: 0.7 provides balanced creativity and consistency
max_tokens: 200 keeps responses concise for natural voice flow
VADSilenceSegmentation: 1500ms pause before processing speech
PauseInterval: 2000ms wait time before finalizing transcription

The agent is registered once per server process and reused across all sessions.

2.3 Starting a Therapy Session

The /api/start endpoint creates an agent instance and connects it to a ZEGOCLOUD room:

// server/src/server.ts
import express from 'express'
import cors from 'cors'
import { createRequire } from 'module'
const require = createRequire(import.meta.url)
const { generateToken04 } = require('../zego-token.cjs')

const app = express()
app.use(express.json())
app.use(cors())

app.post('/api/start', async (req, res) => {
  const { room_id, user_id, user_stream_id } = req.body

  if (!room_id || !user_id) {
    res.status(400).json({ error: 'room_id and user_id required' })
    return
  }

  const agentId = await registerAgent()
  const userStreamId = user_stream_id || `${user_id}_stream`
  const agentUserId = `agent_${room_id}`
  const agentStreamId = `agent_stream_${room_id}`

  const instanceConfig = {
    AgentId: agentId,
    UserId: user_id,
    RTC: {
      RoomId: room_id,
      AgentUserId: agentUserId,
      AgentStreamId: agentStreamId,
      UserStreamId: userStreamId
    },
    MessageHistory: {
      SyncMode: 1,
      Messages: [],
      WindowSize: 10
    },
    AdvancedConfig: {
      InterruptMode: 0
    }
  }

  const result = await makeZegoRequest('CreateAgentInstance', instanceConfig)

  if (result.Code !== 0) {
    res.status(400).json({ error: result.Message })
    return
  }

  res.json({
    success: true,
    agentInstanceId: result.Data?.AgentInstanceId,
    agentUserId,
    agentStreamId,
    userStreamId
  })
})

The response includes the agentInstanceId needed for sending messages and cleanup.

2.4 Sending Text Messages to the Agent

Users can type messages when they prefer text over voice:

// server/src/server.ts
app.post('/api/send-message', async (req, res) => {
  const { agent_instance_id, message } = req.body

  if (!agent_instance_id || !message) {
    res.status(400).json({ error: 'agent_instance_id and message required' })
    return
  }

  const result = await makeZegoRequest('SendAgentInstanceLLM', {
    AgentInstanceId: agent_instance_id,
    Text: message,
    AddQuestionToHistory: true,
    AddAnswerToHistory: true
  })

  if (result.Code !== 0) {
    res.status(400).json({ error: result.Message })
    return
  }

  res.json({ success: true })
})

The agent processes text messages the same way it processes voice transcriptions, maintaining conversation context through the message history.

2.5 Token Generation for WebRTC

The frontend needs a token to join ZEGOCLOUD rooms:

// server/src/server.ts
app.get('/api/token', (req, res) => {
  const userId = req.query.user_id as string
  const roomId = req.query.room_id as string

  if (!userId) {
    res.status(400).json({ error: 'user_id required' })
    return
  }

  const payload = {
    room_id: roomId || '',
    privilege: { 1: 1, 2: 1 },
    stream_id_list: null
  }

  const token = generateToken04(
    parseInt(CONFIG.ZEGO_APP_ID, 10),
    userId,
    CONFIG.ZEGO_SERVER_SECRET,
    3600,
    JSON.stringify(payload)
  )

  res.json({ token })
})

Tokens are valid for 3600 seconds (1 hour) and grant both publish and play privileges.

2.6 Stopping the Session

When the user ends the session, clean up the agent instance:

// server/src/server.ts
app.post('/api/stop', async (req, res) => {
  const { agent_instance_id } = req.body

  if (!agent_instance_id) {
    res.status(400).json({ error: 'agent_instance_id required' })
    return
  }

  const result = await makeZegoRequest('DeleteAgentInstance', {
    AgentInstanceId: agent_instance_id
  })

  if (result.Code !== 0) {
    res.status(400).json({ error: result.Message })
    return
  }

  res.json({ success: true })
})

app.listen(CONFIG.PORT, () => {
  console.log(`Server running on port ${CONFIG.PORT}`)
})

This releases resources and stops the AI agent from processing further input.

Step 3. WebRTC Integration with ZegoExpressEngine

The frontend uses ZegoExpressEngine to handle all real-time communication. A service class wraps the SDK to provide a clean interface for the React application.

3.1 Initializing the ZEGO Service

// client/src/services/zego.ts
import { ZegoExpressEngine } from 'zego-express-engine-webrtc'
import { config } from '../config'

export class ZegoService {
  private static instance: ZegoService
  private zg: ZegoExpressEngine | null = null
  private isInitialized = false
  private currentRoomId: string | null = null
  private currentUserId: string | null = null
  private localStream: any = null
  private audioElement: HTMLAudioElement | null = null

  static getInstance(): ZegoService {
    if (!ZegoService.instance) {
      ZegoService.instance = new ZegoService()
    }
    return ZegoService.instance
  }

  async initialize(): Promise<void> {
    if (this.isInitialized) return

    this.zg = new ZegoExpressEngine(
      parseInt(config.ZEGO_APP_ID), 
      config.ZEGO_SERVER
    )

    this.setupEventListeners()
    this.setupAudioElement()
    this.isInitialized = true
  }

  private setupAudioElement(): void {
    this.audioElement = document.getElementById('ai-audio-output') as HTMLAudioElement
    if (!this.audioElement) {
      this.audioElement = document.createElement('audio')
      this.audioElement.id = 'ai-audio-output'
      this.audioElement.autoplay = true
      this.audioElement.style.display = 'none'
      document.body.appendChild(this.audioElement)
    }
    this.audioElement.volume = 0.8
  }
}

The service uses a singleton pattern to ensure only one ZEGO engine instance exists per browser tab.

3.2 Joining Rooms and Publishing Audio

// client/src/services/zego.ts (continued)
async joinRoom(roomId: string, userId: string): Promise<boolean> {
  if (!this.zg) return false

  if (this.currentRoomId === roomId && this.currentUserId === userId) {
    return true
  }

  try {
    if (this.currentRoomId) {
      await this.leaveRoom()
    }

    this.currentRoomId = roomId
    this.currentUserId = userId

    const { token } = await agentAPI.getToken(userId)

    await this.zg.loginRoom(roomId, token, {
      userID: userId,
      userName: userId
    })

    this.zg.callExperimentalAPI({ 
      method: 'onRecvRoomChannelMessage', 
      params: {} 
    })

    const localStream = await this.zg.createZegoStream({
      camera: { video: false, audio: true }
    })

    if (localStream) {
      this.localStream = localStream
      const streamId = `${userId}_stream`

      await this.zg.startPublishingStream(streamId, localStream)
      return true
    }

    throw new Error('Failed to create local stream')
  } catch (error) {
    console.error('Failed to join room:', error)
    this.currentRoomId = null
    this.currentUserId = null
    return false
  }
}

async enableMicrophone(enabled: boolean): Promise<boolean> {
  if (!this.localStream) return false

  const audioTrack = this.localStream.getAudioTracks()[0]
  if (audioTrack) {
    audioTrack.enabled = enabled
    return true
  }
  return false
}

The enableMicrophone method controls whether the user’s voice is transmitted to the AI agent.

3.3 Handling Remote Streams and Room Messages

// client/src/services/zego.ts (continued)
private setupEventListeners(): void {
  if (!this.zg) return

  this.zg.on('recvExperimentalAPI', (result: any) => {
    const { method, content } = result
    if (method === 'onRecvRoomChannelMessage') {
      try {
        const message = JSON.parse(content.msgContent)
        this.handleRoomMessage(message)
      } catch (error) {
        console.error('Failed to parse room message:', error)
      }
    }
  })

  this.zg.on('roomStreamUpdate', async (_roomID, updateType, streamList) => {
    if (updateType === 'ADD') {
      for (const stream of streamList) {
        const userStreamId = this.currentUserId ? `${this.currentUserId}_stream` : null

        if (userStreamId && stream.streamID === userStreamId) {
          continue
        }

        try {
          const mediaStream = await this.zg!.startPlayingStream(stream.streamID)
          if (mediaStream) {
            const remoteView = await this.zg!.createRemoteStreamView(mediaStream)
            if (remoteView && this.audioElement) {
              await remoteView.play(this.audioElement, { 
                enableAutoplayDialog: false,
                muted: false
              })
            }
          }
        } catch (error) {
          console.error('Failed to play agent stream:', error)
        }
      }
    }
  })
}

private messageCallback: ((message: any) => void) | null = null

private handleRoomMessage(message: any): void {
  if (this.messageCallback) {
    this.messageCallback(message)
  }
}

onRoomMessage(callback: (message: any) => void): void {
  this.messageCallback = callback
}

Room messages contain ASR transcriptions and LLM responses. The callback pattern allows React components to handle these events without tight coupling to the ZEGO service.

Step 4. React Chat Interface

The React application manages conversation state, displays messages, and provides voice and text input options.

4.1 Chat State Management with useReducer

// client/src/hooks/useChat.ts
import { useCallback, useRef, useReducer } from 'react'
import { ZegoService } from '../services/zego'
import { agentAPI } from '../services/api'
import { memoryService } from '../services/memory'

interface ChatState {
  messages: Message[]
  session: ChatSession | null
  isLoading: boolean
  isConnected: boolean
  isRecording: boolean
  currentTranscript: string
  agentStatus: 'idle' | 'listening' | 'thinking' | 'speaking'
  error: string | null
}

type ChatAction = 
  | { type: 'ADD_MESSAGE'; payload: Message }
  | { type: 'SET_CONNECTED'; payload: boolean }
  | { type: 'SET_RECORDING'; payload: boolean }
  | { type: 'SET_TRANSCRIPT'; payload: string }
  | { type: 'SET_AGENT_STATUS'; payload: 'idle' | 'listening' | 'thinking' | 'speaking' }
  // ... other actions

function chatReducer(state: ChatState, action: ChatAction): ChatState {
  switch (action.type) {
    case 'ADD_MESSAGE':
      return { ...state, messages: [...state.messages, action.payload] }
    case 'SET_CONNECTED':
      return { ...state, isConnected: action.payload }
    case 'SET_RECORDING':
      return { ...state, isRecording: action.payload }
    case 'SET_TRANSCRIPT':
      return { ...state, currentTranscript: action.payload }
    case 'SET_AGENT_STATUS':
      return { ...state, agentStatus: action.payload }
    default:
      return state
  }
}

export const useChat = () => {
  const [state, dispatch] = useReducer(chatReducer, {
    messages: [],
    session: null,
    isLoading: false,
    isConnected: false,
    isRecording: false,
    currentTranscript: '',
    agentStatus: 'idle',
    error: null
  })

  const zegoService = useRef(ZegoService.getInstance())
  const processedMessageIds = useRef(new Set<string>())

  // ... implementation continues
}

Using useReducer provides predictable state updates and makes it easier to debug complex state transitions.

4.2 Processing ASR and LLM Events

// client/src/hooks/useChat.ts (continued)
const setupMessageHandlers = useCallback((conversationId: string) => {
  const handleRoomMessage = (data: any) => {
    const { Cmd, Data: msgData } = data

    // Cmd 3: ASR transcription events
    if (Cmd === 3) {
      const { Text: transcript, EndFlag, MessageId } = msgData

      if (transcript && transcript.trim()) {
        dispatch({ type: 'SET_TRANSCRIPT', payload: transcript })
        dispatch({ type: 'SET_AGENT_STATUS', payload: 'listening' })

        if (EndFlag) {
          const userMessage: Message = {
            id: MessageId || `voice_${Date.now()}`,
            content: transcript.trim(),
            sender: 'user',
            timestamp: Date.now(),
            type: 'voice'
          }

          dispatch({ type: 'ADD_MESSAGE', payload: userMessage })
          memoryService.addMessage(conversationId, userMessage)
          dispatch({ type: 'SET_TRANSCRIPT', payload: '' })
          dispatch({ type: 'SET_AGENT_STATUS', payload: 'thinking' })
        }
      }
    }

    // Cmd 4: LLM response events
    if (Cmd === 4) {
      const { Text: content, MessageId, EndFlag } = msgData
      if (!content || !MessageId) return

      dispatch({ type: 'SET_AGENT_STATUS', payload: 'speaking' })

      if (EndFlag) {
        const aiMessage: Message = {
          id: MessageId,
          content,
          sender: 'ai',
          timestamp: Date.now(),
          type: 'text'
        }

        dispatch({ type: 'ADD_MESSAGE', payload: aiMessage })
        memoryService.addMessage(conversationId, aiMessage)
        dispatch({ type: 'SET_AGENT_STATUS', payload: 'idle' })
      }
    }
  }

  zegoService.current.onRoomMessage(handleRoomMessage)
}, [])

The agent status transitions through listening → thinking → speaking → idle, providing visual feedback to the user about what the AI is doing.

4.3 Starting and Ending Sessions

// client/src/hooks/useChat.ts (continued)
const startSession = useCallback(async (): Promise<boolean> => {
  if (state.isLoading || state.isConnected) return false

  dispatch({ type: 'SET_LOADING', payload: true })

  try {
    const roomId = `room_${Date.now()}_${Math.random().toString(36).substr(2, 6)}`
    const userId = `user_${Date.now()}_${Math.random().toString(36).substr(2, 6)}`

    await zegoService.current.initialize()

    const joinResult = await zegoService.current.joinRoom(roomId, userId)
    if (!joinResult) throw new Error('Failed to join ZEGO room')

    const result = await agentAPI.startSession(roomId, userId)

    const conversation = memoryService.createOrGetConversation()

    const newSession: ChatSession = {
      roomId,
      userId,
      agentInstanceId: result.agentInstanceId,
      isActive: true,
      conversationId: conversation.id
    }

    dispatch({ type: 'SET_SESSION', payload: newSession })
    dispatch({ type: 'SET_CONNECTED', payload: true })

    setupMessageHandlers(conversation.id)

    return true
  } catch (error) {
    dispatch({ type: 'SET_ERROR', payload: error.message })
    return false
  } finally {
    dispatch({ type: 'SET_LOADING', payload: false })
  }
}, [state.isLoading, state.isConnected, setupMessageHandlers])

const endSession = useCallback(async () => {
  if (!state.session) return

  try {
    if (state.isRecording) {
      await zegoService.current.enableMicrophone(false)
      dispatch({ type: 'SET_RECORDING', payload: false })
    }

    if (state.session.agentInstanceId) {
      await agentAPI.stopSession(state.session.agentInstanceId)
    }

    await zegoService.current.leaveRoom()

    dispatch({ type: 'SET_SESSION', payload: null })
    dispatch({ type: 'SET_CONNECTED', payload: false })
    dispatch({ type: 'SET_AGENT_STATUS', payload: 'idle' })
  } catch (error) {
    console.error('Failed to end session:', error)
  }
}, [state.session, state.isRecording])

Sessions are isolated by unique room IDs, allowing multiple users to have concurrent therapy sessions without interference.

4.4 Voice and Text Input

// client/src/hooks/useChat.ts (continued)
const sendTextMessage = useCallback(async (content: string) => {
  if (!state.session?.agentInstanceId || !state.conversation) return

  const trimmedContent = content.trim()
  if (!trimmedContent) return

  try {
    const userMessage: Message = {
      id: `text_${Date.now()}`,
      content: trimmedContent,
      sender: 'user',
      timestamp: Date.now(),
      type: 'text'
    }

    dispatch({ type: 'ADD_MESSAGE', payload: userMessage })
    memoryService.addMessage(state.conversation.id, userMessage)
    dispatch({ type: 'SET_AGENT_STATUS', payload: 'thinking' })

    await agentAPI.sendMessage(state.session.agentInstanceId, trimmedContent)
  } catch (error) {
    dispatch({ type: 'SET_ERROR', payload: 'Failed to send message' })
    dispatch({ type: 'SET_AGENT_STATUS', payload: 'idle' })
  }
}, [state.session, state.conversation])

const toggleVoiceRecording = useCallback(async () => {
  if (!state.isConnected) return

  try {
    if (state.isRecording) {
      await zegoService.current.enableMicrophone(false)
      dispatch({ type: 'SET_RECORDING', payload: false })
      dispatch({ type: 'SET_AGENT_STATUS', payload: 'idle' })
    } else {
      const success = await zegoService.current.enableMicrophone(true)
      if (success) {
        dispatch({ type: 'SET_RECORDING', payload: true })
        dispatch({ type: 'SET_AGENT_STATUS', payload: 'listening' })
      }
    }
  } catch (error) {
    console.error('Failed to toggle recording:', error)
  }
}, [state.isConnected, state.isRecording])

return {
  ...state,
  startSession,
  sendTextMessage,
  toggleVoiceRecording,
  endSession
}

The hook exposes a clean API that React components can use without understanding the underlying ZEGOCLOUD or API details.

4.5 Conversation Memory Service

Conversations are stored in browser localStorage to persist across page refreshes:

// client/src/services/memory.ts
import type { ConversationMemory, Message } from '../types'

class MemoryService {
  private conversations: Map<string, ConversationMemory> = new Map()

  constructor() {
    this.loadFromStorage()
  }

  private loadFromStorage(): void {
    const stored = localStorage.getItem('ai_conversations')
    if (stored) {
      const conversations: ConversationMemory[] = JSON.parse(stored)
      conversations.forEach(conv => {
        this.conversations.set(conv.id, conv)
      })
    }
  }

  private saveToStorage(): void {
    const conversations = Array.from(this.conversations.values())
    localStorage.setItem('ai_conversations', JSON.stringify(conversations))
  }

  createOrGetConversation(id?: string): ConversationMemory {
    const conversationId = id || `conv_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`

    if (this.conversations.has(conversationId)) {
      return this.conversations.get(conversationId)!
    }

    const newConversation: ConversationMemory = {
      id: conversationId,
      title: 'New Conversation',
      messages: [],
      createdAt: Date.now(),
      updatedAt: Date.now()
    }

    this.conversations.set(conversationId, newConversation)
    this.saveToStorage()
    return newConversation
  }

  addMessage(conversationId: string, message: Message): void {
    const conversation = this.conversations.get(conversationId)
    if (!conversation) return

    conversation.messages.push(message)
    conversation.updatedAt = Date.now()

    if (conversation.messages.length === 1 && message.sender === 'user') {
      conversation.title = message.content.slice(0, 50)
    }

    this.saveToStorage()
  }

  getAllConversations(): ConversationMemory[] {
    return Array.from(this.conversations.values())
      .sort((a, b) => b.updatedAt - a.updatedAt)
  }

  deleteConversation(conversationId: string): void {
    this.conversations.delete(conversationId)
    this.saveToStorage()
  }
}

export const memoryService = new MemoryService()

This allows users to review past therapy sessions and continue previous conversations.

Step 5. Building the UI Components

The interface provides a clean, distraction-free environment for therapy sessions.

5.1 Therapy Session Component

// client/src/components/TherapySession.tsx
import { useEffect, useRef } from 'react'
import { motion } from 'framer-motion'
import { MessageBubble } from './Chat/MessageBubble'
import { VoiceInput } from './VoiceInput'
import { useChat } from '../hooks/useChat'
import { Heart, Phone, PhoneOff } from 'lucide-react'

export const TherapySession = () => {
  const messagesEndRef = useRef<HTMLDivElement>(null)
  const { 
    messages, 
    isLoading, 
    isConnected, 
    isRecording,
    currentTranscript,
    agentStatus,
    startSession, 
    sendTextMessage, 
    toggleVoiceRecording,
    endSession
  } = useChat()

  useEffect(() => {
    messagesEndRef.current?.scrollIntoView({ behavior: 'smooth' })
  }, [messages])

  if (!isConnected && messages.length === 0) {
    return (
      <div className="flex flex-col h-full bg-black">
        <audio id="ai-audio-output" autoPlay style={{ display: 'none' }} />

        <div className="flex-1 flex flex-col items-center justify-center">
          <motion.div initial={{ opacity: 0, y: 20 }} animate={{ opacity: 1, y: 0 }}>
            <div className="w-24 h-24 bg-gradient-to-br from-purple-600 to-purple-700 rounded-full flex items-center justify-center mb-8 mx-auto">
              <Heart className="w-12 h-12 text-white" />
            </div>

            <h2 className="text-3xl font-semibold mb-4">Welcome to Your Safe Space</h2>
            <p className="text-gray-400 mb-10 max-w-md">
              A judgment-free zone where you can express yourself freely.
            </p>

            <button
              onClick={startSession}
              disabled={isLoading}
              className="px-8 py-4 bg-purple-600 hover:bg-purple-700 rounded-full flex items-center space-x-3 mx-auto"
            >
              <Phone className="w-5 h-5" />
              <span>{isLoading ? 'Starting...' : 'Start Session'}</span>
            </button>
          </motion.div>
        </div>
      </div>
    )
  }

  return (
    <div className="flex flex-col h-full bg-black">
      <audio id="ai-audio-output" autoPlay style={{ display: 'none' }} />

      {/* Status Bar */}
      <div className="bg-gray-900/50 border-b border-gray-800 px-6 py-3">
        <div className="flex items-center justify-between">
          <div className="flex items-center space-x-3">
            <div className={`w-3 h-3 rounded-full ${isConnected ? 'bg-green-400 animate-pulse' : 'bg-gray-600'}`} />
            <span className="text-sm text-gray-400">
              {agentStatus === 'listening' && 'Listening...'}
              {agentStatus === 'thinking' && 'Processing...'}
              {agentStatus === 'speaking' && 'Responding...'}
              {agentStatus === 'idle' && 'Connected'}
            </span>
          </div>

          {isConnected && (
            <button
              onClick={endSession}
              className="px-4 py-2 bg-red-600/80 hover:bg-red-600 rounded-lg flex items-center space-x-2"
            >
              <PhoneOff className="w-4 h-4" />
              <span>End Session</span>
            </button>
          )}
        </div>
      </div>

      {/* Messages */}
      <div className="flex-1 overflow-y-auto px-6 py-6">
        {messages.map((message) => (
          <MessageBubble key={message.id} message={message} />
        ))}

        {agentStatus === 'thinking' && (
          <motion.div
            initial={{ opacity: 0, y: 20 }}
            animate={{ opacity: 1, y: 0 }}
            className="flex justify-start mb-6"
          >
            <div className="flex items-center space-x-3">
              <div className="w-10 h-10 bg-gradient-to-br from-purple-600 to-purple-700 rounded-full flex items-center justify-center">
                <Heart className="w-5 h-5 text-white" />
              </div>
              <div className="bg-gray-800 rounded-2xl px-5 py-3">
                <div className="flex space-x-1">
                  <div className="w-2 h-2 bg-purple-400 rounded-full animate-bounce" />
                  <div className="w-2 h-2 bg-purple-400 rounded-full animate-bounce" style={{ animationDelay: '0.1s' }} />
                  <div className="w-2 h-2 bg-purple-400 rounded-full animate-bounce" style={{ animationDelay: '0.2s' }} />
                </div>
              </div>
            </div>
          </motion.div>
        )}

        <div ref={messagesEndRef} />
      </div>

      {/* Input */}
      {isConnected && (
        <VoiceInput 
          onSendMessage={sendTextMessage}
          isRecording={isRecording}
          onToggleRecording={toggleVoiceRecording}
          currentTranscript={currentTranscript}
          agentStatus={agentStatus}
        />
      )}
    </div>
  )
}

The component handles three states: welcome screen, active session, and message display.

5.2 Voice Input Component

// client/src/components/VoiceInput.tsx
import { useState } from 'react'
import { motion } from 'framer-motion'
import { Mic, MicOff, Send, Type } from 'lucide-react'

interface VoiceInputProps {
  onSendMessage: (message: string) => void
  isRecording: boolean
  onToggleRecording: () => void
  currentTranscript: string
  agentStatus: 'idle' | 'listening' | 'thinking' | 'speaking'
}

export const VoiceInput = ({ 
  onSendMessage, 
  isRecording, 
  onToggleRecording, 
  currentTranscript,
  agentStatus 
}: VoiceInputProps) => {
  const [textInput, setTextInput] = useState('')
  const [inputMode, setInputMode] = useState<'voice' | 'text'>('voice')

  const handleSendText = () => {
    if (textInput.trim()) {
      onSendMessage(textInput.trim())
      setTextInput('')
    }
  }

  const isDisabled = agentStatus === 'thinking' || agentStatus === 'speaking'

  return (
    <div className="bg-gray-900 border-t border-gray-800 p-4">
      <div className="max-w-4xl mx-auto">
        {/* Mode Toggle */}
        <div className="flex justify-center mb-4">
          <div className="bg-gray-800 rounded-lg p-1 flex">
            <button
              onClick={() => setInputMode('voice')}
              className={`px-4 py-2 rounded-md flex items-center space-x-2 ${
                inputMode === 'voice' ? 'bg-purple-600 text-white' : 'text-gray-400'
              }`}
            >
              <Mic className="w-4 h-4" />
              <span>Voice</span>
            </button>
            <button
              onClick={() => setInputMode('text')}
              className={`px-4 py-2 rounded-md flex items-center space-x-2 ${
                inputMode === 'text' ? 'bg-purple-600 text-white' : 'text-gray-400'
              }`}
            >
              <Type className="w-4 h-4" />
              <span>Text</span>
            </button>
          </div>
        </div>

        {inputMode === 'voice' ? (
          <div className="flex flex-col items-center space-y-4">
            {currentTranscript && (
              <motion.div
                initial={{ opacity: 0, y: 10 }}
                animate={{ opacity: 1, y: 0 }}
                className="bg-gray-800 rounded-lg p-4 max-w-2xl w-full"
              >
                <p className="text-gray-300">{currentTranscript}</p>
              </motion.div>
            )}

            <motion.button
              onClick={onToggleRecording}
              disabled={isDisabled}
              className={`w-16 h-16 rounded-full flex items-center justify-center ${
                isRecording ? 'bg-red-600 hover:bg-red-700' : 'bg-purple-600 hover:bg-purple-700'
              } ${isDisabled ? 'opacity-50 cursor-not-allowed' : ''}`}
              whileTap={{ scale: 0.95 }}
            >
              {isRecording ? <MicOff className="w-6 h-6 text-white" /> : <Mic className="w-6 h-6 text-white" />}
            </motion.button>

            <p className="text-sm text-gray-400">
              {isRecording ? 'Tap to stop recording' : 'Tap to start speaking'}
            </p>
          </div>
        ) : (
          <div className="flex items-end space-x-3">
            <textarea
              value={textInput}
              onChange={(e) => setTextInput(e.target.value)}
              placeholder="Type your message..."
              disabled={isDisabled}
              className="flex-1 bg-gray-800 border border-gray-700 rounded-lg px-4 py-3 text-white resize-none focus:outline-none focus:border-purple-500"
              rows={1}
            />
            <button
              onClick={handleSendText}
              disabled={!textInput.trim() || isDisabled}
              className="w-12 h-12 bg-purple-600 hover:bg-purple-700 disabled:opacity-50 rounded-lg flex items-center justify-center"
            >
              <Send className="w-5 h-5 text-white" />
            </button>
          </div>
        )}
      </div>
    </div>
  )
}

Users can switch between voice and text input based on their preference or environment.

5.3 Message Bubble Component

// client/src/components/Chat/MessageBubble.tsx
import { motion } from 'framer-motion'
import { User, Heart } from 'lucide-react'

export const MessageBubble = ({ message }) => {
  const isUser = message.sender === 'user'

  return (
    <motion.div
      initial={{ opacity: 0, y: 20 }}
      animate={{ opacity: 1, y: 0 }}
      className={`flex mb-6 ${isUser ? 'justify-end' : 'justify-start'}`}
    >
      <div className={`flex items-start space-x-3 max-w-2xl ${isUser ? 'flex-row-reverse space-x-reverse' : ''}`}>
        <div className={`w-10 h-10 rounded-full flex items-center justify-center ${
          isUser ? 'bg-gradient-to-br from-blue-600 to-blue-700' : 'bg-gradient-to-br from-purple-600 to-purple-700'
        }`}>
          {isUser ? <User className="w-5 h-5 text-white" /> : <Heart className="w-5 h-5 text-white" />}
        </div>

        <div className={`rounded-2xl px-5 py-3 ${
          isUser ? 'bg-blue-600 text-white' : 'bg-gray-800 text-gray-100'
        }`}>
          <p className="text-sm leading-relaxed whitespace-pre-wrap">{message.content}</p>
        </div>
      </div>
    </motion.div>
  )
}

Messages animate in smoothly and use distinct colors for user and AI responses.

Step 6. API Client Service

The frontend communicates with the backend through a centralized API service:

// client/src/services/api.ts
import axios from 'axios'
import { config } from '../config'

const api = axios.create({
  baseURL: config.API_BASE_URL,
  timeout: 30000,
  headers: { 'Content-Type': 'application/json' }
})

export const agentAPI = {
  async startSession(roomId: string, userId: string) {
    const response = await api.post('/api/start', {
      room_id: roomId,
      user_id: userId,
      user_stream_id: `${userId}_stream`,
    })

    if (!response.data?.success) {
      throw new Error(response.data?.error || 'Session start failed')
    }

    return {
      agentInstanceId: response.data.agentInstanceId
    }
  },

  async sendMessage(agentInstanceId: string, message: string) {
    const response = await api.post('/api/send-message', {
      agent_instance_id: agentInstanceId,
      message: message.trim(),
    })

    if (!response.data?.success) {
      throw new Error(response.data?.error || 'Message send failed')
    }
  },

  async stopSession(agentInstanceId: string) {
    await api.post('/api/stop', {
      agent_instance_id: agentInstanceId,
    })
  },

  async getToken(userId: string) {
    const response = await api.get(`/api/token?user_id=${encodeURIComponent(userId)}`)

    if (!response.data?.token) {
      throw new Error('No token returned')
    }

    return { token: response.data.token }
  }
}

This abstraction makes it easy to add error handling, retry logic, or request interceptors in one place.

Step 7. Running and Testing the Application

7.1 Starting the Backend

From the server directory:

npm install
npm run dev

Verify the server is running by checking http://localhost:8080/health. You should see:

{
  "status": "healthy",
  "timestamp": "2025-12-21T08:00:00.000Z",
  "registered": false,
  "config": {
    "appId": true,
    "serverSecret": true,
    "dashscope": true
  }
}

The registered field will become true after the first session starts and the agent is registered.

7.2 Starting the Frontend

From the client directory:

npm install
npm run dev

Open http://localhost:5173 in Chrome or Edge. You should see the welcome screen with a purple heart icon and “Start Session” button.

7.3 Testing the Therapy Session

Click “Start Session”: The application will initialize ZEGOCLOUD, join a room, and create an AI agent instance. This takes 2-3 seconds.
Grant microphone permission: Your browser will request microphone access. Click “Allow”.
Choose input mode: The default is voice input. You can switch to text by clicking the “Text” button.
Voice input: Click the purple microphone button and speak. You will see your words transcribed in real-time. When you stop speaking for 1.5 seconds, the transcription is finalized and sent to the AI.
AI response: The status indicator changes to “Processing…” while the LLM generates a response, then “Responding…” while the TTS audio plays. You will hear the AI therapist’s voice through your speakers.
Text input: Switch to text mode and type a message. Press Enter or click the send button. The AI will respond the same way as with voice input.
End session: Click “End Session” to stop the AI agent and leave the room. Your conversation is saved in browser localStorage.

Step 8. Customizing the Therapeutic Experience

8.1 Adjusting the AI Personality

Modify the SystemPrompt in server/src/server.ts:

SystemPrompt: `You are a cognitive behavioral therapist specializing in anxiety management. 
Guide users through identifying negative thought patterns and developing coping strategies. 
Ask specific questions about triggers and physical symptoms. 
Suggest evidence-based techniques like progressive muscle relaxation and cognitive restructuring.
Keep responses under 100 words.`

Different prompts create different therapeutic approaches:

Mindfulness coach: Focus on present-moment awareness and breathing exercises
Solution-focused therapist: Help users identify goals and actionable steps
Psychodynamic therapist: Explore past experiences and their influence on current feelings
Crisis counselor: Provide immediate support and safety planning

Run a Demo

Conclusion

Your AI therapist is ready to provide 24/7 mental health support through natural voice conversations. The system combines ZEGOCLOUD’s real-time infrastructure with therapeutic LLM prompts to create a safe, judgment-free space for users.

You can extend this foundation with emotion detection, session analytics, or multi-language support. The same pattern works for coaching apps, meditation guides, or peer support platforms where empathetic conversation matters most.

FAQ

Q1. Can I build my own AI app?

Yes, you can build your own AI app by combining AI models with application logic and user interfaces. Many developers use existing AI APIs and SDKs to speed up development without building everything from scratch.

Q2. How to turn AI into a therapist?

To turn AI into a therapist, you need to design conversational flows focused on emotional support, guided questions, and safe responses. This often involves using conversational AI models, clear boundaries, and supportive language rather than medical diagnosis.

Q3. How much does it cost to build an AI app?

The cost of building an AI app depends on features, scale, and infrastructure. Simple apps may cost a few thousand dollars, while more advanced AI apps with real-time interaction and compliance requirements can cost significantly more.

Q4. How to create your own mental health app?

You can create a mental health app by combining AI-driven conversations with features like mood tracking, journaling, or guided exercises. It is also important to consider user privacy, data security, and ethical guidelines during development.