Video Digital Human Shooting Guide
This article introduces how to capture your avatar and voice samples.
Avatar and voice capture can be done separately; you don't need to use a camera to record audio.
Prerequisites
Please contact your business or pre-sales representative to discuss your use cases and requirements. We will provide shooting recommendations based on your specific needs.
Avatar Capture
The avatar capture process consists of four steps: prepare hardware, set up the site, shoot the model, and submit files.
1 Prepare Hardware
Please configure your shooting hardware according to one of the following two parameter requirements.
| Parameter Requirement 1 | Parameter Requirement 2 | |
|---|---|---|
| Recording Specification |
|
|
| Recording Duration | Greater than 12 min | Greater than 12 min |
| Camera Encoding Format, Bitrate, Sampling Standard |
|
|
| Notes | - | When recording at 1080p resolution, try to have the model occupy more pixels in the frame while ensuring body movements don't extend beyond the frame |
2 Set Up the Site
Capture requires you to use a green screen to set up the site for later chroma keying. Please ensure the green screen is flat without obvious wrinkles. You can use roll paper backgrounds or green screen cloth.
If using a cloth screen, use multiple heavy-duty clips to stretch the screen as flat as possible to avoid uneven lighting caused by wrinkles that would make post-production keying difficult and affect the final result.
3 Shoot the Model
During the shooting process, the model and director need to complete the following tasks to achieve the best results.
Model Requirements
| Note | Details |
|---|---|
| Styling |
|
| Beginning and End | Before recording starts and ends, the model needs to maintain a neutral pose for 10s (the neutral pose is up to you, mouth closed, no body movements other than the neutral pose, limbs remain still). This neutral pose also applies to rhythmic pauses during recording. |
| Rhythmic Pause | The model needs to maintain a pause of about 2s between every 3-4 sentences, mouth closed, body movements returning to the neutral pose. |
| Recording and Lip Sync |
|
| Head Movements |
|
| Body Movements | During shooting, the model can use body movements to make the overall appearance vivid and expressive. However, if any of the following rules are violated, reshooting is required:
|
Director Notes
| Note | Details |
|---|---|
| Actor Should Perform Naturally and Expressively |
|
| Details Affecting Green Screen Shooting Results |
|
| Establish Basic Rapport with the Model |
|
| Watch for Model's Makeup Changes | When the model has multiple NG takes, facial oiliness increases and the image in the lens starts to change. The director should promptly remind the model to touch up makeup or apply powder. |
4 Submit Files
After recording is complete, please submit the video files to ZEGO personnel and indicate the camera brand used and whether log mode was used.
Lighting Setup Reference
Here is a lighting setup for reference: 4 Steps to Create High-Quality Green Screen Keying for Live Streaming. This setup uses dual side backlighting to outline the silhouette, which helps eliminate green reflections on the model's surface when there are site limitations (model distance from green screen is not far enough ≤4m). The main light and fill light in front of the model can be adjusted according to the shooting theme.
Voice Capture
The voice capture process consists of four steps: prepare script, prepare recording equipment, start recording, and end recording and submit.
1 Prepare Script
The script used for voice capture must meet the following requirements:
-
More than 6000 characters.
-
Content should match the digital human's application industry/scenario context.
-
Please refer to the template below to adjust the script format, inserting pauses and instructional notes.
Script TemplateGame Script
Opening lines (Can be more enthusiastic) Hello hello, welcome new viewers! Today is a special livestream for [Game Name]. Many benefits will be given to everyone. Let me announce in advance, today's group buying products include various items - food, drinks, and entertainment. We won't disappoint anyone and will definitely surprise you all. (Pause 2s, close mouth naturally) Today we're going to play a very interesting game together - "[Game Name]"! In the game, we need to build our own army to fight against the opposing army and ultimately destroy the enemy castle. Viewers with any questions can send messages in the chat at any time! (Pause 2s, close mouth naturally) 【Gameplay Introduction】 "[Game Name]" is a livestream bullet screen game set in a medieval fantasy continent. Viewers in the livestream chat can input commands to join one side, then recruit soldiers to participate in the battle. During combat, you can summon legions, giants, snowmen, elephant soldiers, and even dragons by gifting items to gain strategic advantage. The goal is to destroy the enemy castle. The game features a fresh and vibrant cartoon art style. The game offers various camera modes for streamers to choose from. Whether soldiers, buildings, or dragons, all have detailed art models. Players can feel the intensity and excitement of a real battlefield fantasy army battle. (Pause 2s, close mouth naturally) Welcome to all new viewers in the livestream! Please follow, like, and share our livestream. Love you all! Thank you to our top supporter for the gift support. I hope you have fun in the game and continue to support game livestreams. (Pause 2s, close mouth naturally) Next, I'll briefly introduce the game rules. In the game, we're divided into red and blue teams. Each team has its own soldiers, buildings, dragons, etc. Viewers can summon legions, giants, snowmen, elephant soldiers, and even dragons by gifting items to gain strategic advantage. The ultimate goal is to destroy the enemy castle. (Pause 2s, close mouth naturally) Tactical strategy is very important in the game. We need to arrange and adjust based on factors like the enemy's formation and terrain. Brothers can also choose different unit combinations based on their preferences and strategies. For example, we can choose units with ranged attack capabilities to protect our position, while also choosing units with strong attack capabilities to directly attack the enemy castle. (Pause 2s, close mouth naturally) [Continue with rest of script...] (Pause 2s, close mouth naturally)
2 Prepare Recording Equipment
- It is recommended to use professional microphones from brands like Rode, DJI, Sony, or Moman.
- If using a camera for recording, please set the camera recording to manual mode.
- If using a computer-connected microphone for recording, please adjust the microphone or audio interface settings.
- Adjust the distance and position from the microphone to ensure no popping when speaking.
3 Start Recording
After starting recording, please ensure the following requirements are met:
- No background noise or ambient sounds.
- The emotion of reading the script should match expectations and remain consistent.
- Clear pronunciation, distinct enunciation, clear sentence breaks, with 2s pauses between each sentence.
4 End Recording and Submit
After recording ends, please play back and check once to ensure the following valid audio standards are met.
| Standard Item | Details |
|---|---|
| Audio Duration, Format, and Other Parameters |
|
| Audio Quality |
|
| Voice Recording |
|
FAQ
- High background noise and clipping may be caused by setting the camera auto recording to automatic. Please adjust the recording device level to ensure signal-to-noise ratio and avoid obvious background noise. If there is electrical buzzing, please consult the device provider to eliminate the electrical noise or seek help from ZEGO pre-sales service.
- Control the environment well to avoid noisy voices and traffic sounds.
You can refer to the following steps:
- Adjust the microphone position. Lavalier microphones can be clipped higher on the collar.
- Adjust recording device settings. Refer to How to fix voice being too loud, clipping, or high background noise?.
A loose 3.5mm connection cable or recording device malfunction can cause electronic distortion. Please check if the cable is inserted correctly or replace the recording device.
The main cause of microphone popping is when the microphone head is in the airflow of plosive sounds during speech. You can adjust using any of these methods:
- Adjust the microphone position, such as clipping the microphone on the collar.
- Adjust the microphone angle to avoid airflow hitting the microphone. Generally, place the microphone diagonally below or above the mouth. You can monitor while adjusting.
Recording in an empty and flat room will have quite severe echo. Therefore, please record in a furnished environment, in a room corner, not in the center of the room.
Device issues and personal pronunciation can both cause muffled sound. If it's a device issue, it's recommended to use the suggested equipment for recording. If it's a personal issue, drinking water to moisten the throat can partially improve it.
Please refer to the following suggestions:
- Add pause markers in the script to control speaking speed.
- Try to enunciate clearly and accurately.
- Consider personal speaking habits. Strict requirements are not enforced here. The cloning may reproduce characteristics of swallowed words and soft pronunciation.
Generally, using a phone to record audio is not recommended because recording quality is relatively poor. If you must use a phone, we recommend using iPhone 12 or later models. Note that phone recording requires the following:
- Go to Settings, select "Voice Memos > Audio Quality > Lossless".
- Use the iPhone's built-in Voice Memos for audio recording.
Using a phone for recording is prone to microphone popping. Please be sure to do a test recording and confirm there are no issues before formal recording.
You can record in segments on the premise that each segment's voice tone and emotion have no obvious differences before and after. You must ensure the effective recording duration is no less than 20 minutes. Audio with mixed emotions and inconsistent voice tone will be judged as unqualified and cannot proceed to the cloning process.
We recommend estimating the script duration in advance. If it's less than 20 min, please prepare more content. During recording, if you finish the script but the audio duration is still not enough, please find another script. You cannot reread content that has already been read.







