I'd appreciate some clarification about digital video capture. So a typical consumer camcorder might record 1080p at 30 fps (frames per second) and 60 Mbps (megabits per second). As such, each frame is 1/30 sec in duration so there are 60/30 = 2 Mb available for each frame. For a 1080p frame, there are roughly 2 million pixles and they have to use 2 million bits of data, or on average one bit per pixel. Under the same conditions, a 4K frame would on average have 1/2 bit of data for each pixel.
I understand that there is compression going on, but that doesn’t seem like a lot of data for each frame of color video. So how do they do it and end up with what looks like high quality video? Thanks.
+ Reply to Thread
Results 1 to 5 of 5
Last edited by RealImage; 20th Jun 2018 at 16:10.
Chroma subsampling: The human eye has less color resolution that greyscale resolution. So RGB video is converted to a greyscale image and two color add/subtract images. The color images are reduced in resolution by half in each dimension. So a 1920x1080 video has a 1920x1080 greyscale (luma) channel, and two 960x540 color (chroma) channels. This reduces the size of the data per frame by half.
Interframe encoding: Most frames are not encoded as entire pictures but only the difference between that frame and other frames. So in a talking head shot maybe only the speaker's lips move from one frame to the next. So only that small part of the picture needs to be encoded for that frame. I.e. "copy the last frame, just change this small section..."
Discrete Cosine Transformation: Blocks of pixels are converted from the spacial domain to the frequency domain. Low amplitude, high frequency data can be removed without adversely effecting the picture. The remaining frequency data is more easily compressed later (see below).
Motion Vectors. Often in video something moves from one location to another. Motion vectors, "move this block of pixels from here to there...", are used to reduce the amount of information needed for successive frames.
Entropy encoding: After all the other compression techniques are used the final step is to losslessly compress what's left. Much like a ZIP file is losslessly compressed.
Last edited by jagabo; 20th Jun 2018 at 14:29.
Interesting and thank you. I can see where a head shot would not require a lot of data from one frame to the next. What about, for instance, a landscape video with trees and vegetation moving in the wind? In that case, almost everything is changing, in a somewhat random way, from one frame to the next. It's impressive to think that only 2 million bits of data is sufficient to more or less faithfully capture each frame of such scenery.