Basic question about video capture

Thread

20th Jun 2018 12:55 #1
RealImage

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2018

Location
East Coast
I'd appreciate some clarification about digital video capture. So a typical consumer camcorder might record 1080p at 30 fps (frames per second) and 60 Mbps (megabits per second). As such, each frame is 1/30 sec in duration so there are 60/30 = 2 Mb available for each frame. For a 1080p frame, there are roughly 2 million pixles and they have to use 2 million bits of data, or on average one bit per pixel. Under the same conditions, a 4K frame would on average have 1/2 bit of data for each pixel.

I understand that there is compression going on, but that doesn’t seem like a lot of data for each frame of color video. So how do they do it and end up with what looks like high quality video? Thanks.

Last edited by RealImage; 20th Jun 2018 at 15:10.

Quote
20th Jun 2018 13:10 #2
jagabo

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2005
Chroma subsampling: The human eye has less color resolution that greyscale resolution. So RGB video is converted to a greyscale image and two color add/subtract images. The color images are reduced in resolution by half in each dimension. So a 1920x1080 video has a 1920x1080 greyscale (luma) channel, and two 960x540 color (chroma) channels. This reduces the size of the data per frame by half.

Interframe encoding: Most frames are not encoded as entire pictures but only the difference between that frame and other frames. So in a talking head shot maybe only the speaker's lips move from one frame to the next. So only that small part of the picture needs to be encoded for that frame. I.e. "copy the last frame, just change this small section..."

Discrete Cosine Transformation: Blocks of pixels are converted from the spacial domain to the frequency domain. Low amplitude, high frequency data can be removed without adversely effecting the picture. The remaining frequency data is more easily compressed later (see below).

Motion Vectors. Often in video something moves from one location to another. Motion vectors, "move this block of pixels from here to there...", are used to reduce the amount of information needed for successive frames.

Entropy encoding: After all the other compression techniques are used the final step is to losslessly compress what's left. Much like a ZIP file is losslessly compressed.

Last edited by jagabo; 20th Jun 2018 at 13:29.

Quote
20th Jun 2018 15:17 #3
RealImage

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2018

Location
East Coast
Interesting and thank you. I can see where a head shot would not require a lot of data from one frame to the next. What about, for instance, a landscape video with trees and vegetation moving in the wind? In that case, almost everything is changing, in a somewhat random way, from one frame to the next. It's impressive to think that only 2 million bits of data is sufficient to more or less faithfully capture each frame of such scenery.

Quote
20th Jun 2018 15:33 #4
jjeff

View Profile

View Forum Posts

Private Message
Member

Join Date
Nov 2007

Location
Minneapolis MN
Originally Posted by RealImage

It's impressive to think that only 2 million bits of data is sufficient to more or less faithfully capture each frame of such scenery.

Actually our ATSC 1.0 is far from faithful when capturing things like a babbling brook or wheat field blowing in the wind. If you look carefully(most of the time it's not even that hard to spot) you'll see lots of macroblocking in scenes like that. Another almost impossible thing to reproduce without macroblocking(and most of the time it's god awful!) are strobes. Things like the Grammys look like absolute garbage once strobes start, with our current ATSC 1.0. I hope ATSC 3.0 will be better but I'm skeptical.......

Quote
20th Jun 2018 18:25 #5
jagabo

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2005
Originally Posted by RealImage

Interesting and thank you. I can see where a head shot would not require a lot of data from one frame to the next. What about, for instance, a landscape video with trees and vegetation moving in the wind? In that case, almost everything is changing, in a somewhat random way, from one frame to the next. It's impressive to think that only 2 million bits of data is sufficient to more or less faithfully capture each frame of such scenery.

Yes, more "complex" frames require more bits. This is partially handled by using variable bitrates -- some shots are given more bitrate, some less, as required. But when the complexity of the frame exceeds the allowable bitrate, picture quality starts to suffer. You'll see more loss of detail, blocky artifacts, blurring, etc.

Quote

Basic question about video capture

Thread Tools

Similar Threads

A video edition software recommendation, basic but no so basic

basic Dedicated Video Memory question

Some basic Audio conversion question

Back to basic question

Basic question on DVD to MKV