Innovative elearning ideas, elearning topics and examples: An experiment with image processing

Image processing is a technique used to enhance an image thereby preparing it for further use. Changing the brightness, contrast and smoothing (noise removal) of an image are examples of image processing. Computer vision, a study that deals with the recognition of objects in images and video depend heavily on image processing. Computer vision has been around since the 50s and has matured over time to what it is today. Augmented reality, Virtual reality and Mixed reality use computer vision in one way or the other. That makes the field of image processing all the more important.

Wouldn't it be cool to try out something in image processing? How about trying to extract the background from a footage of a busy scene? For the simplicity of the experiment, we'll use footage from cameras that do not move. Cameras that are usually locked in a fixed position are trail cams, security cams or a microscope camera.

Experiment
Human vision has limitations. Firstly, we cannot focus on all areas of a footage at the same time. Secondly, we cannot "see" the full background behind a busy foreground. For this experiment, we found some stunning footage that we could use. We're thankful to the creators of these videos. We have provided the credits and links below for anyone who might want to use the videos or reach out to the creators. These videos are from cameras locked in a position looking at a busy scene with moving people and vehicles. We will try and write a code to extract the background of the footage.

The program must be told what a background is, in comparison to the foreground. Let's keep a simple definition - Anything that generally does not move for the length of the video can be labeled as background.

Logic
Moving objects in the foreground obstruct our view of the background. Nonetheless, if something is truly stationary (by our simplified definition) we're more likely to see it in the same place, off and on, over a period of time. Since we are not concerned with detecting specific objects or labeling them, we will only examine blocks of pixels to find their consistency over the length of the video.

To do that, we'll take the following steps:

Set a maximum limit of video duration that will be used for analysis. Say, 30 seconds. If the video is shorter, go with the shorter length.

Grab 50 frames at equal intervals of the clip length (or 30 seconds whichever is shorter)

Cut all frames into blocks of a specific dimension

Examine the same block across all the 50 frames

Give a high score for a block if it is similar to a corresponding block on another frame and a low score otherwise

Find the block in all the frames that gets the highest score at the end of the analysis

Render the block that has the highest score

Method
The MP4 was rendered to an HTML canvas of 1280 x 720. Frames were extracted from the canvas with JavaScript and sent to a Web Worker (JS). The scores were computed by the Web Worker and the data of the highest scoring block was returned to the main JS which was then rendered onto a canvas.

Result
The above mentioned logic was tried out on several videos and here below are some of the results.

This first video is a screen recording of the footage and the program replacing the background post analysis. Take a look.

Credits: Video by George Morina: https://www.pexels.com/video/street-in-a-city-on-a-rainy-night-5687577/

This second video is of a busy street.

Footage:

Credits: Video by George Morina: https://www.pexels.com/video/a-busy-street-on-a-business-district-at-daytime-2944634/

Extracted Background:

Observation:
The final output was generally clean. Interestingly, the program decided to call the following things as background because they remained in that place for most of the clip.

A bicycle on the sidewalk. We think of a bicycle as a moving object and don't necessarily associate it with a background. The program wouldn't know.

A motorcyclist who stopped at the traffic signal on the left and leaves the signal only towards the end.

A few people who were standing in the same place

This third footage is a top view of a street with moving vehicles and people.
Footage:

Credits: Video by CESAR A RAMIREZ V TRAPHITHO: https://www.pexels.com/video/people-and-cars-on-street-13258882/

Extracted Background:

Observation:
The final output was fairly good. The head of the person (appears to be a traffic cop) managing the traffic is a visible error.

Conclusion
From the results, the following can be concluded:

The size of the block that needs to be taken for analysis would depend on how busy the scene is. If the scene is not very busy as in the rainy night footage, we should try and take larger blocks for analysis. We took 50px x 50px here.

Smaller blocks (1x1, 2x2, 4x4, etc) worked better if the moving objects were overlapping and very little of the background was visible at any given time.

Smaller blocks always came with the risk of the final output getting pixelated or having rough edges instead of smooth edges.

The probability of getting a better output improved when the moving objects did not overlap, such as the footage from the top view (or say an army of ants).

If the camera was moving and there was almost nothing stationary (static), the output with 1x1 blocks looked like a painting of sorts.

Final thoughts
The entire experience of building this was great. The program we wrote is rudimentary. We could bring in more options. One of them could be to have the ability to render parts of the frame in a specific block size. Another could be to use the data of neighboring blocks to make better assumptions. A tool such as this would easily benefit photographers who wait long hours for the crowd to clear to get a click of the masterpiece.

Wonder if background detection such as this would also make foreground detection easier?

Tuesday, October 25, 2022

An experiment with image processing

Featured Posts

Designing Competitions for Gamification

Popular Posts