Implementation Flow
Overview
This document introduces the principles and steps of using the ZIM SDK and ZEGO Express SDK's object segmentation and Alpha data transmission and rendering capabilities to implement multi-user same-scene real-time video interaction and mic position management in the room.
In most video interactions, users participating in the interaction are separated by their respective rectangular video areas, and the subject in the picture often accounts for less than half of the picture. In this case, the interaction experience is that different users' different backgrounds easily bring a cluttered visual impression to the overall picture, making it difficult to form an immersive interaction experience, as shown in the following figure:

To improve the interaction experience, in addition to the object segmentation capability, ZEGO Express SDK has pioneered the Alpha data transmission function. The principle is to splice the original video and the Alpha information obtained by object segmentation below the original video to obtain a video with height * 2. After encoding, it is transmitted to the playing end. The playing end will separate the original video and Alpha data and use the Alpha data for rendering, which can form the visual effect of only showing the subject in the picture.
As shown in the following figure, the Alpha information below the original video at the publishing end uses black to represent that this part of the content is transparent. After decoding at the playing end, the black part is normalized to Alpha data, so that the corresponding area of the original video picture can be rendered as a transparent effect. On the view, only the subject part of the person will be displayed, and the user's real background will not be displayed.
For information on the implementation principle of the rendering part, please refer to Play Transparent Gift Effects.
| Publishing Picture | Playing Picture | Display Effect After Rendering |
|---|---|---|
![]() | ![]() | ![]() |
Through this method, the subjects of multiple users can be rendered onto the same background image or background video. Although they are in different spaces, they can still interact in real time in the same scene.
Application Scenarios
| Application Scenario | Immersive Conference, Watch Movies Together | Anchor Co-hosting | Large Events, such as Press Conferences, etc. |
|---|---|---|---|
| Illustration | ![]() | ![]() | ![]() |
Solution Architecture
The overall architecture of the in-room business of this best practice is shown in the following figure. Since the developer's business backend only manages the room list and does not involve in-room business, it is not reflected in the figure below. Among them:

Prerequisites
- You have created a project in the ZEGOCLOUD Console and applied for a valid AppID and AppSign. For details, please refer to Console - Project Management - Project Information.
- You have contacted ZEGO sales to enable the object segmentation permission.
- You have contacted ZEGO technical support to obtain the real-time audio and video SDK containing the object segmentation feature, and referred to Real-time Audio and Video's Quick Start - Integration to integrate the SDK. If your project has previously integrated the official website version of the real-time audio and video SDK, you need to replace it.
- You have self-enabled the ZIM service in the ZEGOCLOUD Console (for details, please refer to the Console's Service Configuration - Instant Messaging - Enable Service). If you cannot enable the ZIM service, please contact ZEGO technical support to enable it.
- You have integrated the ZIM SDK. For details, please refer to "Integrate SDK" in Quick Start - Implement Basic Message Sending and Receiving.
Implementation Flow
The implementation flow mainly includes 6 steps: initializing SDK, joining room, managing mic positions, using object segmentation, playing stream, and leaving room.
1 Initialize SDK
- Before managing mic positions in the room, you need to initialize the ZIM SDK first and set notification callbacks to listen to ZIM events. For interface call details, please refer to "Create ZIM Instance" and "Use EventHandler Protocol" in Instant Messaging - Implement Basic Message Sending and Receiving.
- Before implementing object segmentation, you need to initialize the ZEGO Express SDK first and set notification callbacks to listen to Express events. For interface calls, please refer to "Initialize" in Real-time Audio and Video - Implementing Video Call.
2 Join room
- To implement publishing stream, users need to log in to the RTC room first. For interface call details, please refer to "Login Room" in Real-time Audio and Video - Implementing Video Call.
- This best practice implements the business logic of users getting on and off the mic by modifying ZIM room attributes. Therefore, users need to:
- Log in to the ZIM service. For interface call details, please refer to "Login ZIM" in Instant Messaging - Implement Basic Message Sending and Receiving.
- Users need to create or join a ZIM room. For details, please refer to "Create Room, Join Room" in Instant Messaging - Room Management.
3 Manage mic positions
- After joining the ZIM room, users can understand the mic position information in the room by querying all room attributes. For interface call details, please refer to "Get Room Attributes" in Instant Messaging - Room Attribute Management.
- If users need to get on the mic, they can modify the mic position information by modifying room attributes. For interface call details, please refer to "Set Room Attributes" in Instant Messaging - Room Attribute Management.
4 Use object segmentation
-
To transmit the image segmented by object when publishing stream, you need to set the alpha channel. Please refer to "Use Alpha Channel to Transmit Segmented Subject" in Real-time Audio and Video - Object Segmentation to learn how to call enableAlphaChannelVideoEncoder to set the alpha channel.
-
Since the picture captured by the mobile phone's front camera is opposite to the actual left and right, you need to enable screen mirroring to obtain the correct direction picture when previewing or playing stream. For interface call details, please refer to "Set Mirror Mode" in Real-time Audio and Video - Common Video Configuration.
-
For the aesthetic angle of the picture when rotating the mobile phone screen, you need to set the orientation of the captured video. For interface call details, please refer to Real-time Audio and Video - Video Capture Rotation.
-
If you need to preview the local picture, you first need to set the backgroundColor property of the View used for rendering to clearColor (transparent color). For interface call details, please refer to "Special Settings for View" in Real-time Audio and Video - Object Segmentation.
Preview and publish local picture. For interface call details, please refer to "Start Preview and Publishing Stream" in Real-time Audio and Video - Object Segmentation.
-
Enable object segmentation and receive segmentation results. For interface call details, please refer to "Listen to Object Segmentation Status Callback" and "Use Object Segmentation to Implement Multiple Business Functions" in Real-time Audio and Video - Object Segmentation.
NoteSince the RTC room is only related to publishing stream, developers can also preview the object segmentation effect outside the RTC room.
5 Play stream
After the user receives the ZIM event notification and learns that someone has gotten on the mic in the room:
-
Call the real-time audio and video interface to set the backgroundColor property of the View used for rendering to clearColor (transparent color). For interface call details, please refer to "Special Settings for View" in Real-time Audio and Video - Object Segmentation.
NoteIf the View settings were implemented during preview, this step can be omitted.
-
Play the video stream of the other party that has implemented object segmentation, thereby achieving the visual effect of two users being in the "same space" facing each other for dialogue. For interface call details, please refer to "Start Playing Stream" in Real-time Audio and Video - Object Segmentation.
-
If you need to stop playing stream, for interface call details, please refer to "Stop Audio and Video Call" in Real-time Audio and Video - Implementing Video Call.
6 Leave room
- Stop preview and publishing stream. For interface call details, please refer to "Stop Audio and Video Call" in Real-time Audio and Video - Implementing Video Call.
- Leave the ZIM room. For interface call details, please refer to "Leave Room" in Instant Messaging - Room Management.
- Log out of the RTC room. For interface call details, please refer to "Stop Audio and Video Call" in Real-time Audio and Video - Implementing Video Call.





