Understanding fundamentals of MPEG encoding

12th Aug 2002 16:21 #1
fatcatfan

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Jul 2002
Would those in the know help me sort things out? I've been reading
http://bmrc.berkeley.edu/research/mpeg/mpeg_overview.html
and elsewhere trying to piece together an understanding. I'm looking at things from a TMPGEnc perspective because that's what I use for encoding.

Concerning "P" and "B" pictures: are they more lossy than "I" pictures? I understand they take macroblock(s) from other pictures, move them, and add an "error term" which means to me the difference between what is desired in this picture and what is being pulled from another picture. But is that "error term" lossy?

How does that relate to P/B picture "spoilage" in TMPG? Does a positive or negative value for spoilage result in better quality? (the sign convention is vague)

What role does the quantization matrix play in encoding? What effect does it have on compression ratio and perceived quality? I understand quantization - a reduction in the amount of data used represent something. For example to move from 24bpp RGB to 15bpp RGB, you can just drop the 3 least significant bits of each component.

What is the point of the "MAX frames in a GOP" setting in TMPG? Does this have some relationship to the VBV buffer? I've noted when I set max frames to 0, TMPG takes an incredibly long time to enocded. Does this setting control how far ahead/back a P/B frame can look for macroblocks? It doesn't seem to make sense otherwise. If I set 1 I / 10 P / 5 B GOP, thats 61 frames per GOP. Why would I then turn around and limit this GOP by setting a max number of frames? hrug:

I apologize for not being more descriptive in some of these questions, or elaborating. I had typed out a nice long post, and submitted it, but apparently my session cookie for login had expired by that point. I had to re-login, and everything I had typed was discarded, even when I went back in my browser. Now I'm just disgusted and don't want to type it all again.

Quote
12th Aug 2002 16:59 #2
circleking

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2001
These questions also interest me. Especially....

What role does the quantization matrix play in encoding? What effect does it have on compression ratio and perceived quality?

I'd like to know how to tweak the quantization matrix for maximum effect.

Up to now, I've simply used the tools and more-or-less blindly experimented with these settings. I lack a decent understanding of the internal workings.

Quote
12th Aug 2002 17:39 #3
slopoke

View Profile

View Forum Posts

Private Message
Member

Join Date
Apr 2002
Originally Posted by circleking

These questions also interest me.

Myself also.

I am quite interested in the I, P and B frames in the GOP.
Say if you use fewer b frames and more P frames? since P frames can be iether I or P when encoded.
Or I may be way offside on this, in which case disregard this post, but I will still read any answers with interest.

Quote
12th Aug 2002 17:50 #4
fatcatfan

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Jul 2002
I've found another document of some help:
http://www.tektronix.com/Measurement/App_Notes/mpegfund/index.html#Contents

Concerning what you said slopoke: From what I understand, both P and B frames can contain some intra-coded data - self-contained data just like an I frame. But the can also use non-intra-coded data. B frames are the "most highly compressed". Not necessarily the lowest quality, but the lowest bitrate cost. So you want more B frames. But there are limitations on just how many are effective, and the more B frames, the bigger a decoding buffer you need.

At least that's how I understand things so far.

In my post that got lost in the ether, I described the GOP and stuff something like this:

First, think of an I picture as a keyframe (if you understand that concept). By setting the number of P and B pictures in a GOP, you are setting the length of the GOP and therefore the recurrence interval of I, or key, frames. Ideally you probably only want I frames at scene changes/cuts. Smooth fade transitions generally don't need them. The rest of the time you want to be using P and B frames, to the extent allowed by the buffers on the target decoder.

TMPGs automatic scene detection works fairly well, but misses some cuts. Dunno why exactly. You can force I pictures in TMPG without doing manual bitrates.

I've been experimenting with encoding the Episode II Clone Wars trailer downloaded from starwars.com, using different TMPG settings. That seemed to be a very clean video source for experimentation. When I looked into the manual picture settings, TMPG had missed some scene cuts where the colors in the adjacent frames were similar.

Now I've got another question. P and B pictures, the frames they reference to: is it ahead/back in terms of frame order, or stream order? Because of B picture dependancy, sometimes frame 6 will come before frame 2 in the stream. So is a B picture looking back at the last P in the stream, or the last P frame in display sequence?

Quote
12th Aug 2002 19:25 #5
slopoke

View Profile

View Forum Posts

Private Message
Member

Join Date
Apr 2002
Thanks fatcatfan for that excellent link @tektronix, my head has now officially exploded..

Your comments are definitely helping me get a tighter grip on the situation.

Quote
12th Aug 2002 22:09 #6
fatcatfan

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Jul 2002
You're welcome.... I just hope I've got it right...

-=[EDIT]=-
Oooh... new discovery:
http://www.tektronix.com/Measurement/App_Notes/mpegfund/sect2-3.html
Go to the last paragraph on that page. Then go back and read the whole page. It at the very least establishes the (exceptional) importance of the quantizing matrix.

....

Hrm.. apparently, other than any initial reduction in the color space, quantization is the only lossy process in MPEG - Further stressing the importance of the Q matrix. Does this mean a Q matrix of all 1s would be lossless compression? Well, technically not compression... you'd just be removing the quantization process. All tht remains would be analogous to zip compression.

Quote
13th Aug 2002 00:50 #7
circleking

View Profile

View Forum Posts

Private Message
Member

Join Date
Jul 2001
Good links fatcatfan. Hope I can absorb some of this.

Quote
13th Aug 2002 01:24 #8
vitualis

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Oct 2000
There is a little bit more complexity to the "I" frames.

Having additional I-frames at scene changes doesn't necessarily improve picture quality. This is as I-frames take a lot more bitrate than than the others. By having additional I frames, you will have improved picture quality at scene changes, but it also means that between scene changes (at P and B frames -- which are the vast majority of video frames), the average bitrate is reduced.

Thus, counter-intuitively, additional I frames at "scene changes" can in fact REDUCE overall video quality.

BTW, I frames are also used for random access on players, so reducing their number can affect FFW and REW functions.

Regards.

Michael Tam
w: Morsels of Evidence

Quote
13th Aug 2002 02:13 #9
fatcatfan

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Jul 2002
Okay... that makes sense now that you've brought it to my attention. I did recognize the need for I frames in seeking, but I hadn't considered that putting bits into an I frame at a scene change reduces the bits available for the other frames.

So if you don't care about being able to seek, you really only want one "I" frame in the whole stream. The only other "I" frames would be P/B frames that wound up being entirely intra-coded. Or something like that. I understand what quantization is, but I haven't yet really integrated it into my perception of the encoding process.

Quote
13th Aug 2002 02:23 #10
kwag

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Jun 2001
Originally Posted by vitualis

Thus, counter-intuitively, additional I frames at "scene changes" can in fact REDUCE overall video quality.

Regards,
Michael Tam.

Michael, I have to disagree on this one. When you have a scene change, you can't have past predictive frames for compression referenced on the new GOP, because there are no "pre-empted" past frames to compare to! So an I frame is needed on a scene change, to pre-empt or "refresh" a GOP. If you don't insert an I frame on a scene change, the encoder can't properly encode with "fading" old frames referenced from the previous GOP. That's just the way MPEG encoders work.

-kwag

KVCD.Net - Advanced Video Conversion
http://www.kvcd.net

Quote
13th Aug 2002 02:33 #11
Sulik

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Dec 2001

Location
San Jose, CA
- Inserting I-frames at a scene change boundary WILL improve the quality, but usually will not make any noticeable difference. The reason why is that if you have a P frame on a scene change, it will end up being mostly intra-coded just like an I-frame, but will require more bits due to the extra macroblock type information stored in the P-frame.
The real benefit is to start a new GOP on a scene change so that it can be used for seeking (chapter points).

- Using a quantization matrix with all 1s will NOT be lossless (in fact, it can reduce your quality if you have a bitrate limit). This is because the actual quantization level is the quantization matrix value multiplied by the quantization level: Q*qmatrix[i].
However, MPEG can be lossless if you put all 1s in the quantization matrix AND use a fixed quantization of 1 as well (bitrate will be very high), although it may be better to use higher values because it could cause some overflow in the coefficients (put all 16s in the Q matrix except 8 for the first coefficient (DC) and use fixed quantization of 1)

- Quantization is actually a simple process: for a value x, the value y=x/N is stored in the MPEG file, and the decoder restores the original value x = y*N. Since x and y are integers, precision is lost if N > 1 (For example (1/2)*2=0 )

Quote
13th Aug 2002 02:34 #12
kinneera

View Profile

View Forum Posts

Private Message
Member

Join Date
Aug 2001
A P-frame can be encoded entirely intra if it determines that there are no useful motion vectors. However, it will not be allowed to do so at the bitrate that an I-frame would. Thus a trade-off occurs in which Michael's statement is entirely correct. If it were true that it was technically impossible, encoders wouldn't have options for I-frames on scene detections since it would be forced to happen automatically.

Quote
13th Aug 2002 02:37 #13
vitualis

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Oct 2000
@ kwag:
To some degree this is true but you must consider that what the MPEG encoder "sees" compared to what your eyes/brain/psychological layer "sees" may be quite different.

For instance, why doesn't an MPEG encoder set to put I frames at necessarily what most people would consider scene changes (e.g., fancatfan's example up above)? It is because according to the algorithm (or what the MPEG encoder sees mathematically) it didn't pass its cut-off for a "scene change".

There is usually video material before and after a scene change so the MPEG encoder definitely has stuff to compare. Whether it is beneficial to have an I frame at the scene change (i.e., it would improve overall quality) is not necessarily clear.

For example, I remember a very old MPEG quality test from ages back that compared the same encoder with and without automatic scene change selection (it may have been an old version of LSX but I honestly can't remember). The result from that test was the automatic scene detection actually worsened the quality for that clip.

@ fancatfan:
No, you will need more than one I frame. The truth lies somewhere in between. I was just highlighting you to that issue that most people don't recognise. What kwag says is mostly correct and I frames "refesh" the GOP.

If you have unlimited bitrate, obviously, an I-frame only video stream will yield the best quality. However, with bitrate restraints, it becomes a balance. Too few I frames will probably lead to poor quality but too many will starve the MPEG encoder of bits (I frames don't employ the benefits of temporal compression).

Regards.

Michael Tam
w: Morsels of Evidence

Quote
13th Aug 2002 09:13 #14
fatcatfan

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Jul 2002
Originally Posted by vitualis

If you have unlimited bitrate, obviously, an I-frame only video stream will yield the best quality. However, with bitrate restraints, it becomes a balance. Too few I frames will probably lead to poor quality but too many will starve the MPEG encoder of bits (I frames don't employ the benefits of temporal compression).

In my searches the on this forum I came across a long, drawn out debate on all I frame encoding versus IPB encodng... I don't want this to get into that. Everything I read in that debate was essentially true, on both sides. They just had different goals in encoding.

Originally Posted by Sulik

This is because the actual quantization level is the quantization matrix value multiplied by the quantization level: Q*qmatrix[i].

Thanks. This is very useful information for understanding quantization. And this is where bitrate is varied. The matrix has somewhat to do with how many DCT coefficients may be discarded as neglible, but the Q level magnifies the effect of the matrix. The Q level is constant for each element in a matrix, correct? Q level can vary from frame to frame (or maybe block to block) but it is essentially just a scalar of the matrix, not a matrix itself. Right?

Ultimately my goal is to produce an XSVCD template with no practical upper limit on bitrate (so 15Mbit for MP@ML), then fine tuned to fit about 40-45 minutes on 1 CD. 2 Pass VBR, or even CQ, tends to reach a high level of quality, in the range necessary to accomplish this goal. I'm experimenting with a new approach. I want to look at getting good quality 720/704x480 video rather than 480x480 on this XSVCD. I understand the difficulties in this - I'm spreading my bits across more pixels by going for a larger frame size. That's my reason for wanting to learn more and experiment. Maybe it's "pointless" to some. To me it's just fun :P

It is becoming clearer to me the importance of the encoder being used. Depending on its algorithms, you may not get "optimal" results. If I was writing one, I'd take a brute force approach because my ability in this level of math is limited. Put a new GOP (and thus I frame) at each scene change, and only there... then try every combination of P/B frames between I frames. See which required the least bits. Then use that configuration and increase the Q level if needed to reduce bitrate. The problem with this approach is that if you have even only 128 frames between "I"s, there'd be 2^128 posible combinations. Then you'd have to repeat that throughout the entire video. Needless to say, mine would be a slooooow encoder.

It's just like most things I do in engineering. You must make an initial assumption and work from there. There are too many variables in MPEG encoding for an encoder to practically look at all possiblities.

Quote
13th Aug 2002 09:27 #15

Guest

please can one of you please explain, or point me in the direction of some infos on Open & Closed GOPS

Cheers

Quote
13th Aug 2002 10:40 #16
fatcatfan

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Jul 2002
A closed GOP is self contained. It doesn't rely on following GOPs for prediction (B frames). That's the short answer.

Quote
13th Aug 2002 12:04 #17
adam

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2000

Location
United States
Originally Posted by fatcatfan

Ultimately my goal is to produce an XSVCD template with no practical upper limit on bitrate (so 15Mbit for MP@ML), then fine tuned to fit about 40-45 minutes on 1 CD. 2 Pass VBR, or even CQ, tends to reach a high level of quality, in the range necessary to accomplish this goal. I'm experimenting with a new approach. I want to look at getting good quality 720/704x480 video rather than 480x480 on this XSVCD. I understand the difficulties in this - I'm spreading my bits across more pixels by going for a larger frame size. That's my reason for wanting to learn more and experiment. Maybe it's "pointless" to some. To me it's just fun :P

I'd say that's a noble cause but I think there are even more basic fundamentals of mpeg encoding preventing this from happening. To fit 40-45mins onto 1 cdr you are going to get an avg bitrate of around 2.6mbits tops. No matter how you tweak your template this is a given. With a max bitrate setting of 15mbps you will still only see your actual bitrate reach levels as high as maybe 6 or 7mbits, and that's probably being very optimistic. You probably already know this but for any bitrate allocated above your max (peaks) the encoder must first free up that bitrate by allocating less than your avg. With an avg of 2.6 and a practical min of say 1mbit (most scenes are going to need at least this much bitrate even though your min may be set as low as 0) your just not going to be able to free up enough bitrate to sustain levels as high as 15mbps, or anywhere close to that.

That's not to say that a you couldn't aim for something a little more realisic such as 0/2.6mbit/8mbit and still achieve excellent results. And that's also not to say that using a max of 15mbps will hurt anything, it just won't help anything necessarily.

Good luck.

Quote
13th Aug 2002 12:22 #18
fatcatfan

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Jul 2002
Originally Posted by adam

That's not to say that a you couldn't aim for something a little more realisic such as 0/2.6mbit/8mbit and still achieve excellent results. And that's also not to say that using a max of 15mbps will hurt anything, it just won't help anything necessarily.

That's the sort of thing I'm shooting for. I realize I'm not likely to ever hit 15Mbps, or even need to in order to have acceptable quality. I just mean I'm going to let the encoder have free reign in the bitrate. Since I'm looking for the highest attainable perceived quality within the constraints I've set, the tweaking will have to take place in the quant matrix and GOP structure. And in finding the best encoding method (multi-pass vbr, CQ, etc.) to facilitate it. CQ seems to be the most fitting. I'm still exploring the differences between CQ and CQ_VBR. All else being equal, CQ_VBR tends to give higher avg bitrate, in my experience.

It's certainly no trivial task. But I like challenges.

-=[EDIT]=-
I've just done two runs on the previously mentioned Star Wars trailer. These are the settings/steps I took.

1. Used mov2avi.exe to convert the original into uncompressed bitmap frames avi. (so it only had to be decoded once rather than each time I ran a test)

2. First run: 2-Pass VBR, Settings:
MPEG2 720x480 4:3DAR 23.976FPS
50/2500/15000 min/avg/max no padding P and B picture spoilage 0
Automatic VBV
MP@ML NTSC 3:2 Pulldown on playback 4:2:0 DC10bit
High Quality (slow) motion search

Source centered at custom size of 624x416. This is to compensate for overscan on my TV. This also screwed the original aspect ratio, but I just wanted to have dynamic video in the entire 624x416 frame.
Original was 24FPS, so I checked "no framerate conversion" just to speed up the encode process.

GOP 1I/55P/3B Max length 218.

Now, here's where I really tweaked. I set a closed GOP (not really necessary, I think, as you'll see). Then I went into manual picture settings, let it autodetect scene changes, added the ones it missed, and also forced a new GOP at each I picture. I thought an "I" was the beginning of a GOP, but apparently not to TMPG, if you study some encode logs. After this, I looked at the longest string of frames between "I" frames. It was 218 frames long, and thats where I pulled the GOP structure from. The reason I say the close GOP probably wasn't necessary was that every GOP was starting with a vastly different looking frame. Previous Bs were unlikely to use it.

TMPG Default quant matrices were used.

Result2489.31 Avg bitrate. Several highly noticable artifacts.

3. Second run: Constant quality 75 -> the only changed setting. min/max and spoilage, everything, same as before.

Result2493.71 Avg bitrate. Previous artifacts eliminated. No new ones that I noticed.

Now, I don't know what this proves other than that I think I prefer TMPGs CQ over it's 2-Pass VBR.

...

Did a third run just now. Same as the CQ version before but without the close GOP. Got 2493.09 avg bps. That's a .62 bps improvement. Not much, but over the long term it could help.

Well, I learned something.

Quote
13th Aug 2002 15:50 #19
Sulik

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Dec 2001

Location
San Jose, CA
I wouldn't use a too long GOP size (The DVD maximum is 18NTSC, 15PAL).

Each P-frame is predicted from the previous P-frame, so as you increase the length of the GOP, the reference frames become increasingly bad.
You will not see much improvement from increasing the GOP size above 15 - if anything, it could actually degrade quality.

Similarly, IBBBP may end up looking worse than IBBP, because the reference frames are too spaced out.

Quote
13th Aug 2002 16:22 #20
fatcatfan

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Jul 2002
I understand what you're saying, but I think (though I'm a newb at this) that since I've forced "I" pictures and new GOPs at scene cuts, what you say isn't as applicable. The P frame is referencing a picture only four frames away (when using 3 Bs). The Bs aren't even looking as far away as the Ps. And at constant quality, with 0 spoilage, a P frame should look just as good as an "I" frame... heepishly: I think.

-=[EDIT]=-
Okay, just to support my point, I did two more runs. I kept the CQ and everything the same except that I went back to the standard GOP with max length of 18 frames, keeping my forced GOPs/I pictures.

Resulting CQ avg bitrate: 2606.72 - 100bps greater than before.

Then I turned off my forced GOPs/Is and also left off scene detection.

Resulting CQ avg bitrate: 2647.35

Though this doesn't really prove anything except that with this particular video, my way was best. I say that because at CQ... the quality should be... constant. That is, the perceived acceptability of the picture.

Quote
13th Aug 2002 23:51 #21
Sulik

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Dec 2001

Location
San Jose, CA
It is a good test to compare different settings: encode at CQ with very high maximum bitrate (to make sure the encoder doesn't increase the Q because of bitrate limitations).

If the file size is smaller, then it usually also means that the quality will be better when using the same bitrate.

Quote

Understanding fundamentals of MPEG encoding

Thread Tools

Search Thread

Similar Threads

Understanding 3D technology

Need help with understanding video encoding.

Understanding Video Renderers

Help understanding antivirus

I need some help understanding bitrates...