Variable FrameRate

27th Jul 2016 09:39 #31

Member

poisondeathray,
Thanks for all the info. I'll read it again later to make sure I took it all in.

I take your point regarding my quality assumption, although I still suspect it'd be correct. I'm running the encodes again with --psnr & --ssim and I'll report back with the results later but I don't know if I want to get too carried away with this.

I know CRF isn't an exact science, but given the bitrate doesn't seem to change much when removing the duplicates and encoding at a variable frame rate, it is somewhat tempting to assume the x264 encoder knows what it's doing and the quality doesn't change much either. If the bitrate doesn't change much I'd be interested to see tests that indicate the quality has improved.

Not that I've ever had much of a need for VFR encoding myself. This was just a bit of an exercise. On the odd occasion when I do re-encode VFR video I invariably convert it to CFR. Maybe if I lived in NTSC land and spent a lot more time re-encoding hybrid DVDs I'd give VFR serious consideration though, because it makes sense. If it's okay for a progressive-scan DVD player to output different frame rates as the source type changes, I can't see why the sky should fall if similar logic is applied to an MKV when re-encoding.

Anyway, once I've run the sample encodes again I'll post back with the --psnr & --ssim and then see if I've got the motivation to do anything else.

Quote

27th Jul 2016 12:38 #32

poisondeathray

Member

Originally Posted by hello_hello

poisondeathray,
Thanks for all the info. I'll read it again later to make sure I took it all in.

I take your point regarding my quality assumption, although I still suspect it'd be correct. I'm running the encodes again with --psnr & --ssim and I'll report back with the results later but I don't know if I want to get too carried away with this.

I know CRF isn't an exact science, but given the bitrate doesn't seem to change much when removing the duplicates and encoding at a variable frame rate, it is somewhat tempting to assume the x264 encoder knows what it's doing and the quality doesn't change much either. If the bitrate doesn't change much I'd be interested to see tests that indicate the quality has improved.

Not that I've ever had much of a need for VFR encoding myself. This was just a bit of an exercise. On the odd occasion when I do re-encode VFR video I invariably convert it to CFR. Maybe if I lived in NTSC land and spent a lot more time re-encoding hybrid DVDs I'd give VFR serious consideration though, because it makes sense. If it's okay for a progressive-scan DVD player to output different frame rates as the source type changes, I can't see why the sky should fall if similar logic is applied to an MKV when re-encoding.

Anyway, once I've run the sample encodes again I'll post back with the --psnr & --ssim and then see if I've got the motivation to do anything else.

I won't blame you if you don't get around to it it takes a lot of time and at minimum you need to do 2 encodes for each to create a "line". More data points of course is better

Yes, the trend will be the same as what you've shown. VFR always makes a difference. BUT whether or not it's substantial enough to offset the potential negatives is the real question. Is 1-2% enough? Is 5% enough ? When you make a statement saying the quality is reduced - well it wouldn't be at the same actual bitrate or filesize. It can only be higher in quality if the VFR was done properly and we make those assumptions. Even if we ignore the effect of using more references for efficiency, duplicate frames still cost something, despite how efficient x264 is at temporal compression. Encoding something has a higher bitrate cost than encoding nothing.

VFR makes a larger difference when there is something like a slideshow presentation. Enormous amounts of duplicates. Yeah, x264 b-frames blah, blah, but a static frame might be held for a few minutes as a presenter is talking. Even a photo slideshow will typically hold a frame for a few seconds, besides the transition segments. The cost of those frames add up. But a typical old style animation DVD's might only have 6,12 along with the "normal" 24 fps sections - that amount of duplicates is tiny compared to the slideshow scenario. For old animation DVD's that are "dirty" it also makes a larger difference than "clean". The "duplicates" "eat up" more bitrate on "dirty" frames as x264 has to store the differences. You saw this trend when using dup (not dedup) .

Quote

27th Jul 2016 12:40 #33

hello_hello

Member

Maybe I should have rerun the tests at a much higher CRF value if that'd make any quality differences more pronounced?

I'm a bit confused at to why the quality appears to have dropped a little for the second encode (compared to the first) given for the second encode the duplicates were replaced with copies to aid compression. I'm not sure why that'd matter in respect to quality. The quality comparison would be between Avisynth's output and the encoded video, wouldn't it?

I was wondering why... of the three encodes with duplicate frames removed.... number four at 19.19fps had a slightly different number of I, P and B frames, Eventually it occurred to me I'd left MeGUI to configure the keyint settings, so it'd used --keyint 192 and --min-keyint 19. Given all the other encodes were run with -keyint 240 and -min-keyint 23, I ran the 19.19fps encode again to make it fair (I think). That resulted in the same number of I, P and B frames as encodes three and five, so I've posted those results. I'll also replaced the statistics for encode four in my previous post. Not that anything changed significantly. The bitrate dropped by about 12kbps.

It's very late here and my brain shut down an hour ago, so I'll post these now and think about them tomorrow, or preferably return and benefit from someone else's in-depth analysis.
There's a slight anomaly with encode two I don't understand, but on the face of it encodes 1, 4, and 5 seem to indicate it doesn't matter too much whether you encode the duplicate frames or remove them. Then again, the numbers below are looking like one of those Magic Eye 3D pictures to me at the moment. If I look hard I can see a hidden picture of a cat. Time for bed....

Encode 1, no duplicate frame filtering:
frame I:295 Avg QP:12.21 size:138912 PSNR Mean Y:53.41 U:57.66 V:57.71 Avg:54.38 Global:54.06
frame P:8454 Avg QP:15.03 size: 18000 PSNR Mean Y:50.41 U:55.13 V:55.20 Avg:51.46 Global:51.17
frame B:22538 Avg QP:21.50 size: 1939 PSNR Mean Y:50.30 U:55.12 V:55.20 Avg:51.36 Global:51.05
SSIM Mean Y:0.9948547 (22.886db)
PSNR Mean Y:50.361 U:55.148 V:55.224 Avg:51.416 Global:51.104

Encode 2, duplicate frames replaced by Dup:
frame I:295 Avg QP:11.80 size:140748 PSNR Mean Y:53.33 U:57.63 V:57.68 Avg:54.31 Global:53.98
frame P:7573 Avg QP:15.03 size: 18984 PSNR Mean Y:50.27 U:54.97 V:55.05 Avg:51.32 Global:51.01
frame B:23419 Avg QP:21.35 size: 1813 PSNR Mean Y:50.21 U:55.01 V:55.09 Avg:51.26 Global:50.93
SSIM Mean Y:0.9947845 (22.827db)
PSNR Mean Y:50.252 U:55.024 V:55.103 Avg:51.306 Global:50.972

Encode 3, duplicate frames removed by DeDup, encoded at 23.976fps:
frame I:286 Avg QP:12.63 size:134917 PSNR Mean Y:53.09 U:57.43 V:57.48 Avg:54.07 Global:53.83
frame P:6870 Avg QP:15.60 size: 19723 PSNR Mean Y:49.87 U:54.53 V:54.62 Avg:50.92 Global:50.66
frame B:17886 Avg QP:21.93 size: 2453 PSNR Mean Y:49.65 U:54.38 V:54.48 Avg:50.71 Global:50.46
SSIM Mean Y:0.9943151 (22.453db)
PSNR Mean Y:49.748 U:54.456 V:54.552 Avg:50.808 Global:50.539

Encode 4, duplicate frames removed by DeDup, encoded at 19.19fps:
frame I:286 Avg QP:11.86 size:142396 PSNR Mean Y:53.70 U:57.99 V:58.04 Avg:54.68 Global:54.43
frame P:6870 Avg QP:14.83 size: 21892 PSNR Mean Y:50.42 U:55.04 V:55.12 Avg:51.47 Global:51.20
frame B:17886 Avg QP:20.96 size: 2720 PSNR Mean Y:50.18 U:54.88 V:54.98 Avg:51.24 Global:50.98
SSIM Mean Y:0.9948908 (22.916db)
PSNR Mean Y:50.285 U:54.961 V:55.052 Avg:51.340 Global:51.069

Encode 5, duplicate frames removed by DeDup, encoded as VFR:
frame I:286 Avg QP:11.67 size:144360 PSNR Mean Y:53.60 U:58.01 V:58.02 Avg:54.60 Global:54.18
frame P:6870 Avg QP:14.95 size: 21655 PSNR Mean Y:50.68 U:55.43 V:55.51 Avg:51.73 Global:51.35
frame B:17886 Avg QP:21.69 size: 2507 PSNR Mean Y:50.15 U:55.03 V:55.13 Avg:51.23 Global:50.91
SSIM Mean Y:0.9948847 (22.911db)
PSNR Mean Y:50.337 U:55.169 V:55.262 Avg:51.402 Global:51.054

Last edited by hello_hello; 28th Jul 2016 at 06:18.

Quote

27th Jul 2016 13:41 #34

poisondeathray

Member

Come back later, it's not that important. Sleep is more imporant.

You still can't comment properly on "quality" because you haven't adjusted for bitate. All things equal, a higher bitrate will yield higher quality, right ? It's one of those general positive relationships, but not precisely exact.

You have just 1 data point for each. You need to vary the bitrate (or if using CRF, change the CRF) , then you need to plot quality (or whatever measure) vs. actual bitrate or filesize , then it will make more sense. The values by themselves as you presented them are not as useful because they aren't compared to a bitrate. That's the problem with CRF encoding - the filesize (thus bitrate) isn't always predictable. That's why tests are usually done with 2 pass encoding, despite taking longer. At the bottom of your graph, you will have "nice round" data points that are comparable like 500kb/s , 1000kb/s etc.. instead of something like 546.46 kb/s for one but 578.463 kb/s for another.

Quote

27th Jul 2016 14:01 #35

-Habanero-

Banned

hello_hello, I admire your penchant for doing comprehensive tests and providing all details. But something is wrong here because the results aren't adding up. One of them was expanded despite the deduplication. Can I know which South Park episode this was so I can try to replicate the results?

Quote

27th Jul 2016 20:43 #36

hello_hello

Member

Originally Posted by poisondeathray

You still can't comment properly on "quality" because you haven't adjusted for bitate. All things equal, a higher bitrate will yield higher quality, right ? It's one of those general positive relationships, but not precisely exact.

Isn't it the "all things being equal" part that makes comparing constant and variable frames rates somewhat hard? How do you determine the percentage of the bitrate that's spent encoding duplicate frames?
Mind you I'm not sure how much bitrate adjusting there is to do when comparing encodes 1, 4 and 5, which in a perfect world would be the same quality using CRF encoding. The respective bitrates were 1452.04, 1469.95 and 1440.09 (from post #26). That's probably similar to a 2 pass encoding margin of error, or a "CRF isn't an exact science" margin of error, and given the reported quality varied by a similarly tiny amount.... it all seems to point to CRF doing it's thing and getting it right.

I probably won't get a chance until later today, but I'll run them all again using 2 pass encoding to see what happens.

I understand what you're saying and it makes perfect sense if you're encoding the same video each time, but..... as an example....
If I encoded the same noisy video at a constant frame rate twice, once with a noise filter and again without, for 2 pass encoding if the bitrate is the same the encoding quality would have to be different. Therefore us mere mortals rely on CRF encoding to give us roughly the same quality instead, and as a result the bitrate changes. Can exactly the same logic be applied to 2 pass encoding where the difference is duplicated frames? And the "frame rate aware" thing, does that go out the window for 2 pass encoding? I guess I'll find out.

Originally Posted by -Habanero-

hello_hello, I admire your penchant for doing comprehensive tests and providing all details. But something is wrong here because the results aren't adding up. One of them was expanded despite the deduplication. Can I know which South Park episode this was so I can try to replicate the results?

What do you mean by "expanded"? Of those encodes the one that seems a bit odd to me is number two. Duplicate frames were replaced with exact copies, the average quantizers were all a little lower, yet apparently the quality dropped.

Anyway, it was episode six, season 19, "Tweek x Craig".

The script for encode 2 was this:
LoadPlugin("D:\Dup.dll")
LoadPlugin("C:\Program Files\MeGUI\tools\ffms\ffms2.dll")
FFVideoSource("E:\video.mkv", cachefile="D:\video", threads=1)
Dup(maxcopies=10)

Script 3:
LoadPlugin("D:\DeDup.dll")
LoadPlugin("C:\Program Files\MeGUI\tools\ffms\ffms2.dll")
FFVideoSource("E:\video.mkv", cachefile="D:\video", threads=1)
DeDup(log="dup.txt", maxcopies=10, maxdrops=10, decwhich=3)

Script 4:
LoadPlugin("D:\DeDup.dll")
LoadPlugin("C:\Program Files\MeGUI\tools\ffms\ffms2.dll")
FFVideoSource("E:\video.mkv", cachefile="D:\video", threads=1)
DeDup(log="dup.txt", maxcopies=10, maxdrops=10, decwhich=3)
AssumeFPS(19.19)

Script 5 was the same as script 3 but the x264 command line included --tcfile-in D:\times.txt
After initially creating the dup log file with DupMC(log="dup.txt") I ran another non-encode pass to create the timecodes file using the following script, which is why there's no creation of a timecodes file included in the scripts I used for encoding.

LoadPlugin("D:\DeDup.dll")
LoadPlugin("C:\Program Files\MeGUI\tools\ffms\ffms2.dll")
FFVideoSource("E:\video.mkv", cachefile="D:\video", threads=1)
DeDup(log="dup.txt", times="times.txt", maxcopies=10, maxdrops=10, decwhich=3)

I used decwhich=3 so when there's a string of duplicates the last frame is retained, but only because the Dup help file says that's the frame it uses for copies and there's no option to change it. I wanted Dup and DeDup to duplicate or keep the same frames if possible. I'm not sure I understand the logic behind keeping the last frame in a string of duplicates as I'd have thought that's likely to be the lowest quality one, but that's the way Dup does it.

Edit: The Dup help file says
"The last frame is used instead of the first because often the first frame after a scene change has more blocking artifacts, etc."

Last edited by hello_hello; 27th Jul 2016 at 22:00.

Quote

27th Jul 2016 21:49 #37

poisondeathray

Member

Originally Posted by hello_hello

Isn't it the "all things being equal" part that makes comparing constant and variable frames rates somewhat hard?

Wow, you are a light sleeper aren't you

Yes, it's problematic - I already mentioned that. You have different frame counts. The sources are "different". That's why those assumptions mentioned earlier were made - you're testing against remaining frames and assessing their quality; and under controlled testing you CAN ensure that the VFR is correct, and correct frames are decimated, not incorrect ones. Otherwise you can make other assumptions and design the test the other way (converting VFR to CFR) , to the same full framecount which almost never matches perfectly (some frames are mismatched in timing, and a "pure" duplicate won't match unless you started with a "pure" duplicate from a lossless source - which is almost never) .

And that brings up the other big issue with the "easy" testing x264 metrics method, and average PSNR/SSIM. Boiling down quality of an ENTIRE video to 1 number can be problematic and misleading. You will find when you compare against other encoders , that some excel in some sections or areas, or conditions, but have weakness in others. For example, x264 and fades. When you do a more detailed per frame analysis on remaining frames (ie the method usually used), the individual per frame PSNR's / SSIM or other metric can be compared directly. You can assess trends, weakesses, strengths etc.. and easily see if there was a methodolgy error like mistmatched frames, non frame accurate source filter etc.. (large deviation usually indicates this). If the individual frame graph shows something interesting, you can assess individual frame quality between the VFR and CFR directly and subjectively (ie. with your eyes) under CFR timing instead of looking at every frame. That's the main purpose of a per frame plot - to see problem areas and trends

How do you determine the percentage of the bitrate that's spent encoding fuplicate frames?

I'm not quite sure what you're asking, but I'll try to guess what you're thinking:

It's not 1 number. You have to plot your graph and extrapolate. More data points makes the graph more accurate an smoother. The relationship isn't necessarily linear. At higher bitrate ranges the relationship might be different than lower bitrate ranges. For example if the CFR encode needs 1100kbps to attain quality level "C" (some dB value for example) , but the VFR encode needs 1000kbps to attain quality "C" , you can make the statement that at quality level "C" , under those encoding settings, for that source, the CFR encode requires 10% more bitrate than the VFR encode to achieve similar quality, and the major reason is because of duplicates

You can also do a spreedsheet that tallies up the actual per frame values (frame size in bytes) at a given "quality level", and subtract the difference. This way is not fun to do

I understand what you're saying and it makes perfect sense if you're encoding the same video each time, but..... as an example....
If I encoded the same noisy video at a constant frame rate twice, once with a noise filter and again without, for 2 pass encoding if the bitrate is the same the encoding quality would have to be different.

Yes. But think of what you're testing against. Encoding "quality" compared to "what" ? Are you testing against the original, or filtered original? Those internal x264 PSNR/SSIM metrics measure against the immediate source.

Therefore us mere mortals rely on CRF encoding to give us roughly the same quality instead, and as a result the bitrate changes. Can exactly the same logic be applied to 2 pass encoding where the difference is duplicate frames?

I don't understand what you're asking in the 2nd part ? Can you clarify

And the "frame rate aware" thing, does that go out the window for 2 pass encoding? I guess I'll find out.

Yes, because the bitrate converges to whatever you set the bitrate as (or at least it should, or pretty close hopefully - if you enter 1000kbps , hopefully x264 gives you close to 1000kbps) .

Last edited by poisondeathray; 27th Jul 2016 at 21:55.

Quote

28th Jul 2016 04:55 #38

hello_hello

Member

Originally Posted by poisondeathray

Wow, you are a light sleeper aren't you

Not as a general rule, but I was woken up early this morning and it was one of those days where for some reason I felt good without a lot of sleep. Normally I'd be captain grumpy. Having said that, now I've sat down for a while I can fee it catching up.

Originally Posted by poisondeathray

It's not 1 number. You have to plot your graph and extrapolate. More data points makes the graph more accurate an smoother. The relationship isn't necessarily linear. At higher bitrate ranges the relationship might be different than lower bitrate ranges. For example if the CFR encode needs 1100kbps to attain quality level "C" (some dB value for example) , but the VFR encode needs 1000kbps to attain quality "C" , you can make the statement that at quality level "C" , under those encoding settings, for that source, the CFR encode requires 10% more bitrate than the VFR encode to achieve similar quality, and the major reason is because of duplicates

You can also do a spreedsheet that tallies up the actual per frame values (frame size in bytes) at a given "quality level", and subtract the difference. This way is not fun to do.

Another thought regarding 2 pass, because I'm sure I've seen it happen when comparing 2 pass and CRF encodes a fair while ago (I was running CRF encodes and using the resulting bitrates for 2 pass encodes). I'm not sure just because you're comparing identical bitrates you can always assume the bits were perfectly distributed for uniform quality. I can't remember how x264's rate control works but I think in 2 pass mode it's constantly fiddling with the quality a little to make sure it hits the target bitrate. Not enough to be concerned about it, but I do recall one comparison where the bitrate of the 2 pass encode increased quite substantially towards the end compared to the CRF encode. I remember thinking the quality of the end credits should be exceptionally high.
And I've seen examples, admittedly fairly extreme, where 2 pass and CRF looked visually different. The bitrate was ridiculously low and the opening scene very complex and it wasn't representative of real world encoding, but I think because CRF has to guess as to the keyframe quantisers it uses to a certain extent, 2 pass managed a slightly better job. They both looked quite horrible, but obviously 2 pass and CRF aren't exactly the same even if as a general rule there's so little difference it's not a factor, but still, 2 pass encoding for quality comparisons as a basis for CRF encoding isn't perfect either.

Originally Posted by poisondeathray

Yes. But think of what you're testing against. Encoding "quality" compared to "what" ? Are you testing against the original, or filtered original? Those internal x264 PSNR/SSIM metrics measure against the immediate source.

Yeah that's what I meant. You're comparing the quality of each encode with it's source, so you're measuring encoding quality, and in the case of a noise filtered source the "encoding quality" would be better at a given bitrate.

Originally Posted by poisondeathray

I don't understand what you're asking in the 2nd part ? Can you clarify

What I mean is as a general rule we tend to trust CRF to give us a constant quality relative to the source when using the same encoder settings and the encoder uses whatever bitrate is necessary to achieve that "relative" quality.
We accept it's not an exact science, unless as a rule people run test encodes on different sources to plot graphs and extrapolate data, and then conclude a clean source needs to be encoded at CRF19 in order to produce the same quality as a noisier source at CRF18.2? No doubt that sort of thing would be true, but most of us use CRF as a guidepost, so when it comes to encoding or not encoding duplicate frames I'm wondering why we should stop.

Originally Posted by poisondeathray

Yes, because the bitrate converges to whatever you set the bitrate as (or at least it should, or pretty close hopefully - if you enter 1000kbps , hopefully x264 gives you close to 1000kbps) .

That's exactly what I assumed would happen until I compared encodes and realised it didn't.
It took me a while to work out why because my brain is slowing down again, but I can't specify a bitrate for encode three if I want the file size to be the same because the duration is shorter (the file size ended up 45MB less than the others). For that one I need to specify a file size and account for container overhead etc.....
I didn't encode it again because by specifying a bitrate rather than a file size I reduced the quality compared to the other encodes, and to a certain extent I accidentally proved CRF is smarter than I am and probably got it right. In the previous tests instead of producing a similar bitrate to the other encodes, CRF increased it for encode three by about 250kbps and that kept the quality on a par. It seems you were correct to question my original assumption that the quality decreased, and I was wrong and CRF knows best.

Here's the highlights from the 2 pass encodes. I specified a bitrate of 1450kbps as that was the bitrate of the CRF encode of the original video, before fiddling with the duplicate frames (to be precise it was 1448.67, but I rounded up).
The first thought that comes to mind is 2 pass doesn't produce exactly the same result as CRF so do we ignore that, or should we run encodes at different CRF values, use the resulting bitrates for 2 pass encodes and confirm any difference between 2 pass and CRF is linear and that it's consistent for both constant and variable frame rates....
I'm not trying to make light of what you've said.... it's all perfectly reasonable and logical... but I'm wondering if it's practical to draw a line in the sand somewhere.

Encode 1, no duplicate frame filtering:
frame I:297 Avg QP:12.53 size:135719 PSNR Mean Y:53.14 U:57.41 V:57.46 Avg:54.12 Global:53.79
frame P:8531 Avg QP:15.05 size: 17918 PSNR Mean Y:50.37 U:55.07 V:55.15 Avg:51.42 Global:51.11
frame B:22459 Avg QP:21.41 size: 1927 PSNR Mean Y:50.30 U:55.10 V:55.18 Avg:51.35 Global:51.05
SSIM Mean Y:0.9948384 (22.872db)
PSNR Mean Y:50.344 U:55.116 V:55.194 Avg:51.397 Global:51.083
kb/s:1449.55

Encode 2, duplicate frames replaced by Dup:
frame I:295 Avg QP:11.79 size:140876 PSNR Mean Y:53.34 U:57.63 V:57.67 Avg:54.31 Global:53.98
frame P:7660 Avg QP:14.72 size: 19731 PSNR Mean Y:50.46 U:55.11 V:55.19 Avg:51.50 Global:51.18
frame B:23332 Avg QP:20.93 size: 1875 PSNR Mean Y:50.43 U:55.19 V:55.27 Avg:51.49 Global:51.16
SSIM Mean Y:0.9950063 (23.016db)
PSNR Mean Y:50.467 U:55.195 V:55.275 Avg:51.515 Global:51.181
kb/s:1449.53

Encode 3, duplicate frames removed by DeDup, encoded at 23.976fps (specifying a bitrate resulted in a smaller file size than the other encodes due to the shorted duration).
frame I:287 Avg QP:13.97 size:121778 PSNR Mean Y:51.99 U:56.35 V:56.38 Avg:52.98 Global:52.71
frame P:6985 Avg QP:16.71 size: 16755 PSNR Mean Y:49.04 U:53.73 V:53.82 Avg:50.10 Global:49.83
frame B:17770 Avg QP:23.16 size: 2095 PSNR Mean Y:48.89 U:53.66 V:53.76 Avg:49.96 Global:49.70
SSIM Mean Y:0.9934290 (21.824db)
PSNR Mean Y:48.965 U:53.712 V:53.805 Avg:50.031 Global:49.762
kb/s:1449.33

Encode 4, duplicate frames removed by DeDup, encoded at 19.19fps:
frame I:287 Avg QP:12.32 size:137879 PSNR Mean Y:53.32 U:57.62 V:57.68 Avg:54.30 Global:54.04
frame P:6985 Avg QP:14.95 size: 21426 PSNR Mean Y:50.30 U:54.89 V:54.98 Avg:51.34 Global:51.08
frame B:17770 Avg QP:21.01 size: 2657 PSNR Mean Y:50.11 U:54.79 V:54.89 Avg:51.16 Global:50.91
SSIM Mean Y:0.9948032 (22.843db)
PSNR Mean Y:50.197 U:54.848 V:54.947 Avg:51.249 Global:50.985
kb/s:1449.54

Encode 5, duplicate frames removed by DeDup, encoded as VFR:
frame I:287 Avg QP:11.98 size:141178 PSNR Mean Y:53.46 U:57.87 V:57.87 Avg:54.45 Global:54.00
frame P:6985 Avg QP:14.90 size: 21610 PSNR Mean Y:50.59 U:55.32 V:55.41 Avg:51.64 Global:51.27
frame B:17770 Avg QP:21.48 size: 2529 PSNR Mean Y:50.19 U:55.04 V:55.13 Avg:51.25 Global:50.96
SSIM Mean Y:0.9948885 (22.915db)
PSNR Mean Y:50.331 U:55.145 V:55.237 Avg:51.393 Global:51.066
kb/s:1449.32

On the face of it I think these encodes support a theory that CRF is doing it's thing even when encoding VFR video.
The bits were obviously distributed a little differently when comparing encodes four and five. While I wouldn't pretend to be able to interpret a stats file correctly, it does seem that for encode number five, even though it has the same number of frames, and even though it put I, B and P frames in all the same places, the encoder was making different quantizer decisions according to the frame durations.

The beginning of the time-codes file where duplicate frames were plentiful:
# timecode format v2
0.000000
41.708333
83.416667
333.666667
542.208333
875.875000
1126.125000
1376.375000
1668.333333
1876.875000
2210.541667
2419.083333

The beginning of the stats file for encode 4:
in:0 out:0 type:I dur:2 cpbdur:2 q:27.34 aq:17.59 tex:363199 mv:34781 misc:6516 imb:3600 pmb:0 smb:0 d:- ref:;
in:1 out:1 type:P dur:2 cpbdur:2 q:31.27 aq:24.58 tex:111130 mv:1435 misc:1147 imb:98 pmb:1034 smb:2468 d:- ref:0 w:4,19,-3 ;
in:2 out:2 type:P dur:2 cpbdur:2 q:31.34 aq:26.02 tex:64519 mv:931 misc:1382 imb:63 pmb:948 smb:2589 d:- ref:0 w:6,73,-2 ;
in:3 out:3 type:P dur:2 cpbdur:2 q:31.32 aq:26.22 tex:7577 mv:333 misc:1098 imb:19 pmb:164 smb:3417 d:- ref:0 ;
in:7 out:4 type:P dur:2 cpbdur:2 q:31.21 aq:26.21 tex:1999 mv:127 misc:650 imb:4 pmb:57 smb:3539 d:- ref:0 ;
in:5 out:5 type:B dur:2 cpbdur:2 q:31.32 aq:21.00 tex:0 mv:0 misc:248 imb:0 pmb:0 smb:3600 d:- ref:0 ;
in:4 out:6 type:b dur:2 cpbdur:2 q:31.32 aq:22.00 tex:15 mv:80 misc:249 imb:8 pmb:1 smb:3591 d:- ref:0 ;
in:6 out:7 type:b dur:2 cpbdur:2 q:31.32 aq:22.00 tex:17 mv:95 misc:248 imb:10 pmb:1 smb:3589 d:- ref:0 ;
in:13 out:8 type:P dur:2 cpbdur:2 q:35.21 aq:23.00 tex:3 mv:10 misc:259 imb:1 pmb:0 smb:3599 d:- ref:0 ;
in:10 out:9 type:B dur:2 cpbdur:2 q:35.16 aq:22.00 tex:17 mv:95 misc:264 imb:10 pmb:1 smb:3589 d:- ref:0 ;
in:8 out:10 type:b dur:2 cpbdur:2 q:35.13 aq:23.00 tex:7 mv:32 misc:249 imb:1 pmb:8 smb:3591 d ref:0 ;
in:9 out:11 type:b dur:2 cpbdur:2 q:35.15 aq:24.00 tex:7 mv:23 misc:250 imb:0 pmb:9 smb:3591 d ref:0 ;

The beginning of the stats file for encode 5:
in:0 out:0 type:I dur:2 cpbdur:2 q:30.04 aq:16.86 tex:386306 mv:35520 misc:6518 imb:3600 pmb:0 smb:0 d:- ref:;
in:1 out:1 type:P dur:2 cpbdur:2 q:34.31 aq:23.23 tex:128433 mv:1621 misc:1026 imb:109 pmb:1070 smb:2421 d:- ref:0 w:4,19,-3 ;
in:2 out:2 type:P dur:12 cpbdur:12 q:30.31 aq:25.29 tex:51578 mv:956 misc:1570 imb:80 pmb:909 smb:2611 d:- ref:0 w:6,73,-2 ;
in:3 out:3 type:P dur:10 cpbdur:10 q:28.33 aq:22.45 tex:64995 mv:883 misc:1626 imb:50 pmb:935 smb:2615 d:- ref:0 ;
in:7 out:4 type:P dur:14 cpbdur:14 q:26.67 aq:22.09 tex:31207 mv:463 misc:1738 imb:19 pmb:502 smb:3079 d:- ref:0 ;
in:5 out:5 type:B dur:12 cpbdur:12 q:28.31 aq:17.00 tex:4 mv:16 misc:268 imb:0 pmb:2 smb:3598 d:- ref:0 ;
in:4 out:6 type:b dur:16 cpbdur:16 q:28.32 aq:19.00 tex:20 mv:96 misc:260 imb:12 pmb:1 smb:3587 d ref:0 ;
in:6 out:7 type:b dur:12 cpbdur:12 q:28.30 aq:18.00 tex:20 mv:90 misc:250 imb:12 pmb:0 smb:3588 d:- ref:0 ;
in:13 out:8 type:P dur:14 cpbdur:14 q:21.67 aq:17.59 tex:110890 mv:1193 misc:1165 imb:62 pmb:1103 smb:2435 d:- ref:0 ;
in:10 out:9 type:B dur:10 cpbdur:10 q:26.61 aq:12.00 tex:24 mv:97 misc:279 imb:12 pmb:1 smb:3587 d ref:0 ;
in:8 out:10 type:b dur:10 cpbdur:10 q:26.65 aq:17.00 tex:10 mv:39 misc:247 imb:0 pmb:12 smb:3588 d ref:0 ;
in:9 out:11 type:b dur:16 cpbdur:16 q:26.63 aq:17.46 tex:39 mv:61 misc:252 imb:1 pmb:11 smb:3588 d ref:0 ;

You'd have to assume the encoder does the same thing in CRF mode. Not that it proves the quality must be the same when the duplicates are kept, but even if it's distributed a little differently, is there a clever way to measure any differences in the way that quality is perceived when the frames are rushing by at normal speed?

That's my enthusiasm quota for today spent. I'll have another look tomorrow.

Last edited by hello_hello; 28th Jul 2016 at 06:20.

Quote

28th Jul 2016 09:11 #39

poisondeathray

Member

Originally Posted by hello_hello

Another thought regarding 2 pass, because I'm sure I've seen it happen when comparing 2 pass and CRF encodes a fair while ago (I was running CRF encodes and using the resulting bitrates for 2 pass encodes). I'm not sure just because you're comparing identical bitrates you can always assume the bits were perfectly distributed for uniform quality. I can't remember how x264's rate control works but I think in 2 pass mode it's constantly fiddling with the quality a little to make sure it hits the target bitrate. Not enough to be concerned about it, but I do recall one comparison where the bitrate of the 2 pass encode increased quite substantially towards the end compared to the CRF encode. I remember thinking the quality of the end credits should be exceptionally high.
And I've seen examples, admittedly fairly extreme, where 2 pass and CRF looked visually different. The bitrate was ridiculously low and the opening scene very complex and it wasn't representative of real world encoding, but I think because CRF has to guess as to the keyframe quantisers it uses to a certain extent, 2 pass managed a slightly better job. They both looked quite horrible, but obviously 2 pass and CRF aren't exactly the same even if as a general rule there's so little difference it's not a factor, but still, 2 pass encoding for quality comparisons as a basis for CRF encoding isn't perfect either.

Yes, they both have potential issues and do not produce exactly identical results at the same bitrate. But that's the assumption made for that testing methodology - that if you run a CRF encode and get "x" bitrate, doing a 2pass encode with "x" bitrate gives you close to the same thing. Doing runs of CRF encodes is perfectly valid too. If you mostly use CRF encoding, it' s more valid to use CRF encoding; your test results will have higher positive predictive value. It doesn't matter because you're extrapolating from the curve, looking at trends . So what if your actual data points are perfectly spaced

You just have to decide from the onset what you're intending to test, and what you're actually testing, what assumptions you're making. Then you structure design your test. When you do per frame analysis graphs, then you can say whether or not your encodes match up well - you can easily see if there is a bump or drop in sections.

Yeah that's what I meant. You're comparing the quality of each encode with it's source, so you're measuring encoding quality, and in the case of a noise filtered source the "encoding quality" would be better at a given bitrate.

Yes, noisy vs. noisy, clean vs. clean - at a given bitrate the "clean" output will score higher according to objective metrics

What I mean is as a general rule we tend to trust CRF to give us a constant quality relative to the source when using the same encoder settings and the encoder uses whatever bitrate is necessary to achieve that "relative" quality.
We accept it's not an exact science, unless as a rule people run test encodes on different sources to plot graphs and extrapolate data, and then conclude a clean source needs to be encoded at CRF19 in order to produce the same quality as a noisier source at CRF18.2? No doubt that sort of thing would be true, but most of us use CRF as a guidepost, so when it comes to encoding or not encoding duplicate frames I'm wondering why we should stop.

You can choose to do whatever you want. You don't have to stop using CRF as a "guide". It's a good rough indicator, or at least the trend. But you have to be careful that the CRF trend only applies to same source, same settings. You cannot say "CRF 18 produces the same average quality across all videos." . Duplicate vs non duplicate are 2 different videos, 2 different framecounts. When you jump through the hoops you will see there are measureable , demonstratable differences between frame quality. Do the tests, plot the graphs and you will see. Do the per frame graphs and you will see exactly what sections need attention , strengths/ weaknesses. You can even tweak encodes by using --qpfile over sections to override x264 decisions so it's a feedback tool as well . Pick a frame that has a larger difference between CFR , VFR and then look at it with your eyes and you will see the CFR encode is lower quality at the same filesize (you will see this under lower bitrate ranges, under higher bitrate ranges, it will be too difficult to tell with human eye. It's true with small differences too, that's another reason we resort to using metrics).

Originally Posted by poisondeathray

Yes, because the bitrate converges to whatever you set the bitrate as (or at least it should, or pretty close hopefully - if you enter 1000kbps , hopefully x264 gives you close to 1000kbps) .

That's exactly what I assumed would happen until I compared encodes and realised it didn't.
It took me a while to work out why because my brain is slowing down again, but I can't specify a bitrate for encode three if I want the file size to be the same because the duration is shorter (the file size ended up 45MB less than the others). For that one I need to specify a file size and account for container overhead etc.....
I didn't encode it again because by specifying a bitrate rather than a file size I reduced the quality compared to the other encodes, and to a certain extent I accidentally proved CRF is smarter than I am and probably got it right. In the previous tests instead of producing a similar bitrate to the other encodes, CRF increased it for encode three by about 250kbps and that kept the quality on a par. It seems you were correct to question my original assumption that the quality decreased, and I was wrong and CRF knows best.

Here's the highlights from the 2 pass encodes. I specified a bitrate of 1450kbps as that was the bitrate of the CRF encode of the original video, before fiddling with the duplicate frames (to be precise it was 1448.67, but I rounded up).
The first thought that comes to mind is 2 pass doesn't produce exactly the same result as CRF so do we ignore that, or should we run encodes at different CRF values, use the resulting bitrates for 2 pass encodes and confirm any difference between 2 pass and CRF is linear and that it's consistent for both constant and variable frame rates....
I'm not trying to make light of what you've said.... it's all perfectly reasonable and logical... but I'm wondering if it's practical to draw a line in the sand somewhere.

Yes, you have to be careful to plot the actual bitrate (or filesize).

As I said above, testing CRF encodes are perfectly valid - you just plot the actual bitrate or filesize on the x-axis. Just decide what you want to do, evaluate your testing methodology and assumptions and do it. If you feel 2pass cannot accurately predict CRF results at the same filesize, then test CRF directly instead. State the rationale for why you structured the test that way - it's perfectly valid because that's what you set out to test. Post the assumptions / details and everything required to replicate results by 3rd parties and that's testing transparency .

Quote

28th Jul 2016 09:22 #40

jagabo

Member

Originally Posted by -Habanero-

The core of it all is the fact that x264 has a limited number of frames it can reference, which is 16. Doubling the frame rate halves the amount of content it can reference.

Yes, this is the reason usually given why VFR gives better compression than CFR with animated material (which has lots of duplicate frames). And it seems to make a lot of sense.

But you can easily test this by encoding with fewer reference frames, say 8 or even 4. If the greater reach of reference frames is the primary reason for the better compression you would expect the file size (using CRF encoding) to balloon back up toward the same size as a CFR encode. In my experience it doesn't. If a VFR encoding at ref=16 is 20 percent smaller than the CFR encoding, a VFR enconding at ref=8 will still be about 19 percent smaller. At ref=4 it will still be about 17 percent smaller. Obviously, this isn't the perfect way of testing this but I think it's pretty clear the the wider reach of reference frames is not responsible for the better compression.

Quote

29th Jul 2016 00:44 #41

-Habanero-

Banned

Guys, I gotta eat my fückin words here. I did a test to settle this once and for all.

I couldn't use the same episode as you hello_hello because I don't have the Blu-ray for Season 19 so I did a test on the 12th of Season 15.

CRF18 gives about 470 kb/s so I used 500 kb/s 2pass to have an objective test. Results:

No deduplication, 2pass
31474 frames
81,815 KB
0.99245 SSIM

Deduplicated (exact duplicates), 2pass
19389 frames
81,680 KB
0.99242 SSIM

I owe an apology to cornucopia. But I swear to God that previous tests I've done showed clear benefit. I guess it's more beneficial for content with non-exact duplicate and I'm still convinced that for platformer video game footage they are a must for further reducing bandwidth. Tons of tests on tasvideos corroborate this but I'll be revisiting this as well because I don't trust my gut instinct at this point.

In the meantime, 10 innocent puppies are gonna suffer for this.

Quote

29th Jul 2016 00:55 #42

poisondeathray

Member

Originally Posted by -Habanero-

Guys, I gotta eat my fückin words here. I did a test to settle this once and for all.

I couldn't use the same episode as you hello_hello because I don't have the Blu-ray for Season 19 so I did a test on the 12th of Season 15.

CRF18 gives about 470 kb/s so I used 500 kb/s 2pass to have an objective test. Results:

No deduplication, 2pass
31474 frames
81,815 KB
0.99245 SSIM

Deduplicated (exact duplicates), 2pass
19389 frames
81,680 KB
0.99242 SSIM

I owe an apology to cornucopia. But I swear to God that previous tests I've done showed clear benefit. I guess it's more beneficial for content with non-exact duplicate and I'm still convinced that for platformer video game footage they are a must for further reducing bandwidth. Tons of tests on tasvideos corroborate this but I'll be revisiting this as well because I don't trust my gut instinct at this point.

In the meantime, 10 innocent puppies are gonna suffer for this.

These are using internal metrics correct ? If so, you're not doing the test correctly. I explained it (probably poorly) in one of the earlier posts. Think of what is actually being measured with those numbers, and what the real question you set out to answer. I'll explain in more detail tomorrow, as it's getting late.

Quote

29th Jul 2016 00:58 #43

-Habanero-

Banned

No, I used MSU to calculate the SSIMs.

Quote

29th Jul 2016 01:04 #44

poisondeathray

Member

Originally Posted by -Habanero-

No, I used MSU to calculate the SSIMs.

It's still "wrong" or you're not measuring what you set out to measure.

But quickly: You wanted to assess "bitrate savings" for VFR vs. CFR right ? You made some claims or expressed it as x% better compression for z% removal of frames or something like that ? What is that MSU test showing ? What is being compared to what ? Sorry to leave you hanging, I'll explain in more detail tomorrow, my turn to zzz

Quote

29th Jul 2016 09:41 #45

poisondeathray

Member

You're comparing a CFR encode to it's direct CFR source, and VFR encode to it's direct decimated source. You cannot compare them to each other (even at the same actual bitrate or filesize), because the frames mismatch and don't align up . To an objective metric, they are completely different videos.

The "Raison d'être" for the VFR encode in the first place (at least for this scenario) was to dump duplicates to save bitrate. It logically follows that you should test those frames and assess their quality, and compare them to their CFR counterpart both with objective metrics and subjective evaluation.

The assumption is that you've correctly created the VFR file - i.e. correct duplicates were chosen and decimated, thus the cadence is exactly the same as the original CFR file . This can be verified as you use the debug mode to adjust the thresholds and settings. The assumption is a true duplicate should be exactly the same . If a source has a duplicate which looks the same to the human eye, often temporal compression, IBP differences will make it look "different" to a metric . This is the same idea behind dup (not dedup), but dedup takes it a step farther. The premise for this type of VFR - duplicates "cost" something to encode. They have an encoded frame size, and it's measureable. If you decimate them, the frame quality of the remaining frames should increase because you can spread that bitrate that would have been "wasted" on duplicates. This was mentioned in the discussion about how to compare - method "A" vs. method "B" - and both ways have pros/cons as already mentioned.

You already have the decimation log and script that you can apply to the CFR encode and source to test against. Thus every video lines up you can compare frame vs. frame , both with objective and subjective testing when they are the same actual bitrate or filesize. Importantly, you can correlate what the metrics are saying with human eyes. For example, you can flip tabs in avspmod, or use interleave in avisynth and step through. You can choose which video to compare to which video with 3rd party metric tools (not with internal x264 metrics, which only test against immediate source). 3rd party tools like msu vqmt, ffmpeg have ssim, psnr and can read from .avs directly .

The bottom line is we want to see if the actual frame quality is higher or lower at a given actual bitrate (or filesize), when using VFR vs. CFR. We can extrapolate the "bitrate savings" from the graph at different target actual bitrates and filesizes once aggregate points are plotted (ie. "x" bitrate is require to achieved "y" quality), if there were not major issues with the per frame analysis.

The actual bitrate is (physical filesize) / (running time) . The running time is the same in all the videos, because of the VFR timecodes. The reason 2pass is commonly used for testing is you can use it to output the actual bitrate by specifying --tcfile-in . That generates a VFR file with the actual bitrate (the actual running time is the same, thus the filesize is the same, or very close). This makes it easier to have "nice" numbers, or to match to the resultant bitrate of a CRF encode so you can do direct comparisons (that is, if you're willing to accept that 2pass and CRF will give approximately the same results at the same bitrate if everything else is equal - if you accept that, then you can directly compare them because they are the same filesize, and you have fewer encodes to do. If not, you need to do serial encodes e.g CRF 18.2, 18.4, 18.7 ....etc until the filesize matches ie. lots of trial and error. That's the reason - laziness and not wanting to do as many serial encodes). Technically, you should do all CRF , or all 2pass to eliminate the differences, however minor. In the end, it really doesn't matter how you do it or what the actual bitrate or filesize of an individual run is (unless you want to do direct comparisons) - because when you plot all the data points with any method on a graph, you get the same trend. The points won't be in the exact same location on the x=axis , but they should be on the same trend line when you connect them, thus you can extrapolate any value. But when the points do line up, then you can do direct comparisons because they are the same filesize.

There are major issues with using objective measures too. It's a big topic , discussed extensively. The biggest complaint is it doesn't correlate perfectly with human visual perception. Technically you shouldn't even use SSIM or PSNR testing when using psy or AQ options. x264 even has a warning message when you do. You can inflate your scores by using --tune SSIM or PSNR. Subjectively it looks worse, but the numbers are usually better. It's easy to confuse or trick the metrics as well - so everything has to be assessed in context.

Quote

29th Jul 2016 12:14 #46

-Habanero-

Banned

But quickly: You wanted to assess "bitrate savings" for VFR vs. CFR right ? You made some claims or expressed it as x% better compression for z% removal of frames or something like that ? What is that MSU test showing ? What is being compared to what ? Sorry to leave you hanging, I'll explain in more detail tomorrow, my turn to zzz

In the beginning, yes, but this was solely trusting the CRF to judge the quality. This time I did a 2pass comparison. Non-decimated episode I did 500 kb/s 2pass and decimated encode I did 828 kb/s to match the filesize of the non-decimated one. It was actually 200 KB smaller but that's only a difference of 1 kb/s overall so I let it slide as my PC is worn out. I can't stress it out with encodes running all day anymore.

Yes, I compared non-decimated x264 encode to non-decimated source and decimated x264 encode to decimated source with MSU VQMT, but you have a problem with this, apparently? I guess I could do directshowsource to include the dropped frames in the decimated encode and compare that to the original source but I don't see how this would change the average SSIM.

But I do want to be wrong about my latest conclusions, more than ever. So I'll do anything you ask that might invalidate this.

The dropped frames were all pretty much exact duplicates. In the debug, most of them were 0% with the occasional 0.06% up to 0.10%. For comparison, a character's mouth moving in the far distance that can barely be discerned was 0.65%.

Quote

31st Jul 2016 10:46 #47

poisondeathray

Member

Originally Posted by -Habanero-

But quickly: You wanted to assess "bitrate savings" for VFR vs. CFR right ? You made some claims or expressed it as x% better compression for z% removal of frames or something like that ? What is that MSU test showing ? What is being compared to what ? Sorry to leave you hanging, I'll explain in more detail tomorrow, my turn to zzz

In the beginning, yes, but this was solely trusting the CRF to judge the quality. This time I did a 2pass comparison. Non-decimated episode I did 500 kb/s 2pass and decimated encode I did 828 kb/s to match the filesize of the non-decimated one. It was actually 200 KB smaller but that's only a difference of 1 kb/s overall so I let it slide as my PC is worn out. I can't stress it out with encodes running all day anymore.

Yes, I compared non-decimated x264 encode to non-decimated source and decimated x264 encode to decimated source with MSU VQMT, but you have a problem with this, apparently? I guess I could do directshowsource to include the dropped frames in the decimated encode and compare that to the original source but I don't see how this would change the average SSIM.

But I do want to be wrong about my latest conclusions, more than ever. So I'll do anything you ask that might invalidate this.

The dropped frames were all pretty much exact duplicates. In the debug, most of them were 0% with the occasional 0.06% up to 0.10%. For comparison, a character's mouth moving in the far distance that can barely be discerned was 0.65%.

Something doesn't make sense in your mind, right ? The results are not expected - so you should re-evaluate what your test really shows

Did you read my last post ? I thought it was pretty clear. I'll try again -

You were comparing how it did with itself - VFR vs VFR, CFR vs CFR, not the examination of VFR vs. CFR.

So you're not answering the question that was asked. You're not interested in comparing frames that were dropped (ie. duplicates) - because they aren't even there! Compare the frames that ARE there . The assumption that the selected duplicates were correct , thus the file cadence is the same between VFR and CFR. We don't care about duplicates, they are supposed to look the "same" in the first place. We're concerned about unique frames and want to quantify the negative impact encoding duplicates has on quality with metrics, then verify with subjective human eyes to assess if the metrics correlate with subjective "quality". So the VFR data point is ok, but the CFR data point isn't.

Another way of thinking of it is x264 is pretty efficient with the right settings and does a good job of encoding duplicates. If you've seen typical per frame bitrate graphs, normal x264 b-frames are tiny in coded framesize in general and the visual b-frame quality is much higher than other AVC encoders - but the b-frames that are also duplicates are even smaller. But how good of a job does it really do ? or How good of a job to other encoders do? How much is "wasted" on encoding duplicates? That's basically the same question about VFR bitrate savings rephrased.

The CFR encode minus the duplicates is what is left over - ie. how much was spent on unique frames. You subtract out the duplicates with the same decimation script, and now you're assessing the effect of having to encode those duplicates. Encoded duplicates still cost something, even a few bits - each frame has a coded frame size. This means fewer bits were allocated to the unique frames. The VFR encode had no duplicates in the first place, so 100% of the bitrate was allocated to the unique frames. Plain logic will tell you at the same filesize, since more bitrate was diverted to encoding duplicates, the quality of the remaining frames will be lower for the CFR encode. Not every frame will be necessarily worse, a few might be better - that's the way encodes work out as you know, but the majority will be worse, and a mean average will be worse. So you want to quantify HOW much worse. That's what the metrics, data points and graphs are for. The reason you ideally want x-axis points to line up, is not only does it make the graph "pretty" with nice round numbers, you can do direct comparisons at those actual bitrates because they are the same.

It's not a perfect testing model, but there are more pros/than cons than the alternative testing models as discussed earlier. One argument is that the CFR encode is excessively penalized by that method, because it "wastes" bitrate on non identical duplicates (eg. IBP differences if the source used lossy temporal compression). But that is the whole rationale for using VFR in the first place! You're not trying to replicate IBP differences in duplicates that were in the source - the assumption is duplicates are the "same", that's why your drop them to make a VFR file. You want to see the difference in how much was "wasted" in the CFR encode by encoding duplicates because the visual cadence and timing is the same between VFR and CFR.

Quote

31st Jul 2016 16:24 #48

-Habanero-

Banned

Ok so just to clarify, you want me to compare the VFR encode to the CFR source (by including duplicates via times.txt)?

Or you want me to remove duplicates from the CFR encode and compare it to the VFR source?

Quote

31st Jul 2016 17:18 #49

poisondeathray

Member

Originally Posted by -Habanero-

Ok so just to clarify, you want me to compare the VFR encode to the CFR source (by including duplicates via times.txt)?

Or you want me to remove duplicates from the CFR encode and compare it to the VFR source?

You can do whatever you want

(The 2nd one.)

Just explain why you chose to structure a comparison a certain way. Think about what is actually being measured, and whether your assumptions are valid. If an assumption isn't valid , try to restructure the test or assmuptions so they are more valid. There is no "perfect" way. Pros/cons.

Dropping duplicates in the CFR encode is more consistent with the reasoning behind this type of VFR. That's the "cost" associated with having to encode duplicates in the first place. You compare VFR as you normally do, but the CFR encode is decimated and compared to the decimated source . You want to measure the impact on the remaining frames that are there.

Testing the other way by adding duplicates to a VFR source (converting VFR to CFR) has more "cons" than "pros" . A) Difficult sometimes to perfectly match up because VFR to CFR can be approximate, so you can't do direct comparisons without a lot of manual work, and can't do direct assessments B) A true lossless duplicate generated by VFR to CFR will never match up with an IBP "lossy duplicate" in the source (ie. a source that uses lossy temporal compression) . C) The goal of this type of VFR isn't to replicate the "IBP or lossy duplicates", you're making the assumption that duplicates were supposed to be true duplicates in the first place, so you can drop them. Dropping duplicates and saving bitrate is the reason for this type of VFR.

SSIM and PSNR have serious issues too. It's easy to "massage" results a certain way. Technically, you shouldn't use psy when doing metrics either. ie. psy opts, (and sometimes aq, ssim is supposed to use mode 2) should be disabled because they cause "deviation from the original". x264 even gives you a warning when you try to use it's internal metrics with psy enabled. This , of course limits the value of metric testing, because in real life you use psy and AQ. In most cases they definitely increase subjective quality

Also be careful about aggregate or "average" psnr or ssim, because the formula used to calculate can vary. On one hand, metrics are supposed to be objective; but on the other hand, they are also supposed to model human visual system perception. So Y' should be "weighted" more than U,V because humans are more senitive to black/white than color (that's the whole concept behind why chroma subsampling is used in the first place) . Some formulas do a simple mean, others do a weighting.

Interpret everything in context, you have to look at everything. Don't rely on a single number or test.

Quote

31st Jul 2016 23:54 #50

-Habanero-

Banned

You know what, this m2ts source is not being opened frame-accurately with ffvideosource because I'm getting different SSIM results when running the SAME comparison again. Looks like I have to do all this all over again.

Everything you wrote is good advice.

Quote

1st Aug 2016 01:33 #51

poisondeathray

Member

newer versions of l-smash handle transport streams better (LWLibavVideoSource), also try threads=1

Or demux/remux into mkv or mp4 (I'm assuming it's AVC) . If it's mpeg2, use dgindex

A more popular metric for quality these days is PSNR-HVS-M . Much higher correlation between subjective quality than SSIM or PSNR. It's gaining traction and frequent usage in newer testbeds, such as for AV1, Daala, HEVC research .

http://www.ponomarenko.info/psnrhvsm.htm

PSNR is still the "go to" and familar territory for signal engineers, broadcast. SSIM was purported to be more in line with human subjective perception, but it's shown to fail pretty badly in some situations. Not much better and usually worse predictor than PSNR IMO. I've started using PSNR-HVS-M more frequently, so far it predicts better. But I haven't used it enough to comfortably recommend it for everything yet, but it's looking that way. The fails of especially SSIM , and some PSNR don't seem to "trick" it as much

Quote

1st Aug 2016 06:23 #52

jagabo

Member

Originally Posted by -Habanero-

You know what, this m2ts source is not being opened frame-accurately with ffvideosource because I'm getting different SSIM results when running the SAME comparison again. Looks like I have to do all this all over again.

When I do this kind of testing with such sources I first re-encode such sources to AVI with a lossless intra codec, and all further testing is done using that AVI file as the source. That accesses the original file linearly and usually avoids out of order frames. The resulting AVI file has no problems returning the exact same frame every time it is read, regardless of the order in which the frames are read. Even if some frames are out of order in the AVI file (from the original conversion) they are in the same order every time the AVI file is read.

Last edited by jagabo; 1st Aug 2016 at 06:29.

Quote

2nd Aug 2016 21:23 #53

-Habanero-

Banned

All good advice, I just remuxed to an MKV and ffvideosource did its trick but in the future I might have to start using lsmash or whatever the latest trend is.

New tests:
CFR video, 81,823 KB, SSIM 0.99247
VFR video, 81,839 KB, SSIM 0.99285

More sensible but still way too low of a difference. Only 5% better for 60% less frames. This still throws my previous assertion out the window. I'll have to do a proper test like this on a platformer game to see how much benefit this really has.

I looked at random frames and the VFR does look slightly better. I've attached the comparison slideshow.

Yes, objective metrics aren't perfect. As far as PSNR goes, it was the very first quality metric I tried and I got really bad values when I compared 2 videos that I thought were identical. So after exhaustively trying to find out what was so wrong with the second video on the frames that I got the bad values, I discovered the pixel column on the right border of the frame was tinted slightly green. PSNR blows.
The only contender to SSIM I've heard about was VQM but I never really tried it.

Attached Files

S15E12cfrvsvfrcomparison.avi (4.04 MB, 30 views)

Last edited by -Habanero-; 11th Aug 2016 at 02:12.

Quote

2nd Aug 2016 23:20 #54

poisondeathray

Member

If you use ffms2 or lsmash, set threads=1 to make it more consistent (but slower)

What were the encoding settings ? Did you use --tune SSIM (which adds --no-psy --aq-mode 2) ?

It's just 1 test and 1 data point, 1 metric, so I wouldn't draw any firm conclusions or try apply those observations to other scenarios

Just for fun, can post the PSNR values? PSNR is slightly better than SSIM in my experience.

Did you take a look at the correlation values for SSIM and PSNR in the link above... ouch!

Definitely try PSNR-HVS-M, it's looking significantly more accurate than either so far

Quote

3rd Aug 2016 13:58 #55

-Habanero-

Banned

Encoding settings for both except bitrate:
Code:
cabac=1 / ref=16 / deblock=1:1:1 / analyse=0x3:0x133 / me=umh / subme=11 / psy=0 / mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=0 / chroma_qp_offset=0 / threads=9 / lookahead_threads=2 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=16 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=3 / weightb=1 / open_gop=0 / weightp=2 / keyint=240 / keyint_min=23 / scenecut=40 / intra_refresh=0 / rc_lookahead=240 / rc=2pass / mbtree=1 / bitrate=829 / ratetol=1.0 / qcomp=0.60 / qpmin=10 / qpmax=51 / qpstep=4 / cplxblur=20.0 / qblur=0.5 / ip_ratio=1.40 / aq=2:1.00
I never use -tune SSIM.

PSNRs:

CFR: 40.31060
VFR: 40.70827

I looked at both briefly and I have no idea if higher or lower values means better quality because completely black scenes received a score of 100 and the second highest-rated frame got a score of 26 which was a completely black frame with an 8x8 patch on the far left that had pixel values with 1 or 2 more red in them. An invisible difference completely obliterated in the compressed version of course.
And apparently this is supposed to mean a 400% difference according to human perception? GTFO here. PSNR blows just like the last time I tried it 8 years ago.

Did you take a look at the correlation values for SSIM and PSNR in the link above... ouch!

EDIT: Sorry, I missed your link. I'll read it in a bit.

Definitely try PSNR-HVS-M, it's looking significantly more accurate than either so far

MSU VQMT does not have this, just regular PSNR or PSNR256, whatever that is.

Last edited by -Habanero-; 3rd Aug 2016 at 14:13.

Quote

3rd Aug 2016 22:23 #56

poisondeathray

Member

Originally Posted by -Habanero-

I never use -tune SSIM.

Unless you copy pasted the wrong info, you actually did

psy=0 aq=2:1.00

Originally Posted by -Habanero-

And apparently this is supposed to mean a 400% difference according to human perception?

PSNR doesn't measure human subjective "perception" , nor was it ever intended to. Not part of the original "job description" or model. It estimates absolute errors. People only "abuse" it in that fashion, using it as an approximate measure of "perception." (That's why there are PSNR variants that include perception contrast sensitivity and contrast masking).

In contrast, SSIM is a based on a perception model - that's what the original paper marketed it as . But it actually doesn't correlate very well in real tests, especially compared to newer metrics. That's why there are SSIM variants like MS-SSIM

Surely you've looked at the numbers given and encodes - often there are major discrepancies. That's why everyone keeps saying "use your eyes" . PSNR and SSIM have a positive correlation to subjective "quality", but it's only moderate. You have to jump through hoops and handicap the encoder using specific settings just to tweak results to achieve higher "numbers", forget about similarity or "quality". Despite that, PSNR and SSIM are still the most widely used. I would like a metric that you could just use under any circumstance that has high correlation and predictive value; something closer to what your "eyes" tell you. We're not there yet.

Originally Posted by -Habanero-

MSU VQMT does not have this, just regular PSNR or PSNR256, whatever that is.

MSU comes out with a new version quite frequently. I just updated to the 7.0 beta and it looks like they dropped a bunch of measures. IIRC they used to have some CUDA accelerated ones. It seems the free version is still limited to < 720p , and still doesn't have PSNR-HVS-M, but it does have MS-SSIM

For PSNR-HVS-M , you can find a windows compiled version here
http://mmspg.epfl.ch/vqmt

CLI and not as user friendly, only takes raw YUV input , but you don't have the <720p limitation

Quote

3rd Aug 2016 23:20 #57

-Habanero-

Banned

Originally Posted by poisondeathray

Unless you copy pasted the wrong info, you actually did

psy=0 aq=2:1.00

I never use psychovisuals because in my experience they make the quality worse and I've always been using autovariance AQ since mb-tree came out because that's the combination that was recommended. I don't use --tune ssim or you would've found that exact string in there.

And apparently this is supposed to mean a 400% difference according to human perception?[/quote]

Originally Posted by poisondeathray

PSNR doesn't measure human subjective "perception" , nor was it ever intended to.

Duh, I'm saying it sucks and should never be used for this.

Originally Posted by poisondeathray

Surely you've looked at the numbers given and encodes - often there are major discrepancies. That's why everyone keeps saying "use your eyes" . PSNR and SSIM have a positive correlation to subjective "quality", but it's only moderate.

SSIM is way better than PSNR and I don't have to use my eyes that often with SSIM. I only have to look once to get an idea if this scene looks good with 0.98500 or 0.99500 and the rest of the frames become consistent with this "floor". Spikes in either direction are a great indicator. They don't mislead so profoundly like PSNR does.
When it comes to PSNR, I may as well survey the quality with my eyes and not bother with the metric at all.

Originally Posted by poisondeathray

MSU comes out with a new version quite frequently. I just updated to the 7.0 beta and it looks like they dropped a bunch of measures. IIRC they used to have some CUDA accelerated ones. It seems the free version is still limited to < 720p , and still doesn't have PSNR-HVS-M, but it does have MS-SSIM

Should I use MS-SSIM? Do you recommend it? I've been using SSIM (precise) and only that for 8 straight years that I can't imagine a change of this habit.

Originally Posted by poisondeathray

For PSNR-HVS-M , you can find a windows compiled version here
http://mmspg.epfl.ch/vqmt

CLI and not as user friendly, only takes raw YUV input , but you don't have the <720p limitation

Thanks, I'll check it out. I read that link you provided earlier and it's an intriguing idea but if it rates PSNR as more consistent than SSIM then that paper has no credibility as far as I'm concerned and I have no idea what kind of crappy methodologies were involved in coming to that conclusion.

Quote

3rd Aug 2016 23:52 #58

poisondeathray

Member

Originally Posted by -Habanero-

Should I use MS-SSIM? Do you recommend it? I've been using SSIM (precise) and only that for 8 straight years that I can't imagine a change of this habit.

I haven't tested MS-SSIM enough to say either way . You might like SSIM better than PSNR, but both are bad, especially when used alone. At least if you provide both measures, and they say similar things, then at least it's some evidence the trend is in the right direction . I don't think MSU lets you test a bunch of metrics at once with the free version, you have to do 1 by 1.

SSIM and PSNR are only useful for trends, if you have many data points, or can correlate with eyes. And to test losslessness - that's what they're really good for

Thanks, I'll check it out. I read that link you provided earlier and it's an intriguing idea but if it rates PSNR as more consistent than SSIM then that paper has no credibility as far as I'm concerned and I have no idea what kind of crappy methodologies were involved in coming to that conclusion.

Use your eyes. I wouldn't trust it "blindly" either. But both PSNR and SSIM rate rather poorly according to those correlations (which seems to be correct so far) .

The only reason I started checking out PSNR-HVS-M is I saw them using it for HEVC, AV1, Daala comparisons. It's supposed to be much better. I'm not entirely convinced yet, but it doesn't fail as badly in prototypical scenarios where SSIM and PSNR fail

But the subset of test images used in those scientific papers, isn't necessarily representative of the type of image artifacts that we commonly see with HEVC, AVC, MPEG2 etc..ie video based compression artifacts (or wavelet when comparing something like cineform, jpeg2000) . So those aren't necessarily directly applicable either.

Quote

4th Aug 2016 09:22 #59

jagabo

Member

Originally Posted by -Habanero-

And apparently this is supposed to mean a 400% difference according to human perception?

No, it's a 400 percent bigger error. But when errors are very small a 400 percent increase may be imperceptible. Whereas, when errors are large a 400 percent increase is obvious.

Quote

7th Aug 2016 09:15 #60

hello_hello

Member

I was stuck in the real world but I see this discussion continued. Some thoughts....
Originally Posted by -Habanero-
Encoding settings for both except bitrate:
Code:
cabac=1 / ref=16 / deblock=1:1:1 / analyse=0x3:0x133 / me=umh / subme=11 / psy=0 / mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=0 / chroma_qp_offset=0 / threads=9 / lookahead_threads=2 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=16 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=3 / weightb=1 / open_gop=0 / weightp=2 / keyint=240 / keyint_min=23 / scenecut=40 / intra_refresh=0 / rc_lookahead=240 / rc=2pass / mbtree=1 / bitrate=829 / ratetol=1.0 / qcomp=0.60 / qpmin=10 / qpmax=51 / qpstep=4 / cplxblur=20.0 / qblur=0.5 / ip_ratio=1.40 / aq=2:1.00
I never use -tune SSIM.
Could I ask why the bitrates were different?
I assume you were removing duplicates and encoding at the original frame rate, which necessitated increasing the bitrate due to the reduced duration, but....

Would that be the best way to do it? If you encoded at a constant frame rate the encoder would assume each frame is going by at the same speed, but after they're encoded and made variable with a timecodes file, they won't be.
When I tested while giving the timecodes file to the x264 encoder, I could specify the same bitrate as for a constant frame rate encode (no duplicates removed), or if I used the same CRF value, the resulting bitrates were pretty much the same. So logically.... an exaggerated example.... what if there was a succession of duplicate frames for a duration of ten seconds. If they weren't removed, in a perfect world the quality of the first frame would dictate the quality of the rest, as the encoder would ensure it's of high quality and it'd spend enough bits on the rest to maintain quality, but the assumption is those duplicates wouldn't be very expensive.
If you remove the duplicates and encode at a constant frame rate the encoder has no idea the remaining frame will eventually display for 10 seconds and therefore it can't decide on quality accordingly.
When you remove the duplicates and give the encoder the timecodes file so it knows what's going on, it can decide accordingly and theoretically you're less likely to end up with 10 seconds of crap quality.... I assume.

There's other factors that could come into it. Does anyone know if the -keyint setting is based on a fixed timescale when the frame rate is variable or does it apply to a number of frames? I suspect it's more likely to be the former otherwise if you only had a couple of unique frames every ten seconds (the standard gop duration) and that continued for a while, you could theoretically use -keyint 240 and end up with gops spanning minutes.
When you remove duplicates and encode at a constant frame rate though, isn't that the sort of thing that could happen? After the duplicates are removed the encoder would have no way of knowing frames 345 to 361 are eventually going to display for a combined duration of 45 seconds, unless you give it a timecodes file.

Then there's x264 being all clever about putting keyframes at the beginning of each scene, which would help reduce the likelihood of 45 second gops... so many variables.

I'm not sure if I'll get motivated to run any more test encodes given I don't tend to use VFR encoding myself, but I'll confess I do hope when the encoder knows the source is VFR it distributes the quality a little better than if it's fed the same frames at a constant frame rate. If it didn't, there'd be no point to it be VFR aware.

Last edited by hello_hello; 7th Aug 2016 at 09:21.

Quote

Variable FrameRate

Thread Tools

Similar Threads

Need help working with video with variable framerate

Variable or Constant Framerate?

Advice for hard telecined, variable framerate DVD?

Possible to convert PAL framerate to NTSC framerate?

MPEG-dash files from youtube: really variable framerate??