Image matting refers to dividing an image into foreground and background. Usually, the foreground is the part that people pay most of their attention to, while the background is the part that can be ignored under most circumstances.
Image matting demands refining the details of the figure edge, especially the edge with more details, such as hair and translucent tulle clothes.
1. Use Cases of Image Matting
1) Background replacement of live video streaming
The development of RTC technology helps the live video streaming industry rise rapidly. The most likely place where a host starts streaming are his own home. Generally speaking, there is no independent studio, and the background of the video screen is just the host’s home. How to protect personal privacy will become the main pain point for hosts.
To meet the hosts’ requirements for personal privacy. ZEGOCLOUD matting technology allows quickly extracting the portrait of the host and replacing it with the appropriate image as the background, so as to protect the host’s privacy.
2) Background replacement of credential photo
In daily life, we often need to use various certificates. Different certificates have different requirements for personal photos. For example, the background color of photos will be required to be a certain color, including red background, blue background, white background, etc.
Going to the photo studio for credential photos will not only incur a certain economic cost, but also involve more time-cost. And it is difficult for non-professional people to use professional graphics software to process photos themselves.
ZEGOCLOUD smart credential photo matting technology is very simple and easy to use, without professional knowledge. It can detect and pinpoint the key points of the face, then automatically crops the original image to remain the head and shoulder, and finally uses the image matting algorithm to replace the background with a desired colorful one.
ZEGOCLOUD matting algorithm is lightweight. The whole deep learning model file is only 6 MB, and it takes only 100 milliseconds to process a credential photo with an Intel i5 CPU.
3) Bokeh background for art performance
In art examination scenes, including dance, music, etc, advertisements or other information irrelevant to the examination usually appear in the background, which needs to be removed. In addition, during art performances, related objects, including instruments, props, and costumes (some special ones) tend to be easily erased; instead, they should be retained.
ZEGOCLOUD matting technology can not only finely extract the portrait and replace the background, but also retain objects related to the performer, such as musical instruments, props, and clothing.
In order to maintain the integrity and stereoscopic sense of the picture, and help judges concentrate on the performers, we propose an approach of bokeh background. This algorithm will blur the background while retaining the portrait. The algorithm refers to the continuous timing information between video frames and can suppress picture flicker pretty well.
The whole deep learning model file of the algorithm is only 3 MB, which has the advantage of being extremely lightweight. It only takes 20 milliseconds to process a frame on a Mac Book pro empowered with an M1 chip.
2. A brief intro to image matting technology
Image matting technology can be expressed by the following formula:
Where Ri represents the final result, Ai represents the transparency mask required for matting, Bi represents the new background that needs to be replaced, and Fi represents the foreground, which generally refers to people and related objects. In Ai, the value of the foreground position is greater than 0, while the value of the background position is equal to 0. Matting is essentially how to obtain a high-quality transparency mask (alpha graph).
Most traditional algorithms generate alpha graph by means of trimap (manual drawing). Trimap contains three different pixel values. The position with pixel value of 0 represents the determined background, the position with pixel value of 1 represents the determined foreground, and the position with pixel value of 0.5 represents the undetermined area. This part of the position can be either the foreground or the background.
The matting algorithm is to solve the problem of whether the pixel belongs to the foreground or background by using methods, including random walking, KNN, closed form and others in the undetermined region.
The rendering of trimap requires users to have certain professional knowledge, so not that many people can use it. Trimap also requires that human beings interact with computers. Thus it cannot be done in real-time automatically.
ZEGOCLOUD adopts deep learning to develop its proprietary image matting technology. it adopts encoder and decoder structure, and the only thing needed is an image as input, and it will produce the final alpha map.
The encoder and decoder structure can encode and compress the input image and extract its depth features, and finally fit the real alpha image samples through the decoding process. Our encoder uses lightweight mobilenetv3_ Small architecture, which enables real-time computing on edge devices.
3. Wrapping up
We will stop here for now on the topic, which focuses on use cases. We will introduce image matting technology at an in-depth level in our next series article.
If you happen to work on a product that fits into related use cases or scenarios, please don’t hesitate to contact us to discuss it . Our architects might have a solution that can be of help to you.