Yes, there were some of issues with that -
1) The current vmaf models didn' t train video games, so the applicability is questionable .
That author is looking to fix some of them and actually got help from Netflix and vmaf to train proper models
2) He used ffmpeg libvmaf, and the results are different than vmafossexec or vapoursynth-vmaf. The latter 2 keep values in float
3) There are current 8bit/10bit issues with ffmpeg libvmaf . Netflix assigned someone to it a few months ago.
4) Some streams are not frame accurate in ffmpeg . This leads to wrong values . Common culprits are open gop, and those with weird timecodes, and non zero start times . If you don't use workarounds such as force framerates, PTS, you get wrong values. My workaround is avisynth or vapoursynth indexed script
I can demonstrate those seek/ accuracy issues with PSNR and lossless encode, so "known" values (can't use VMAF to demonstrate this because it doesn't register "perfect"). avs/vpy feeding into ffmpeg work. ffmpeg native does not.
Nobody posted actual videos in those gameplay videos. If you look at the actual screenshots, the most common theme I noticed was the texture loss.
Actual video samples are being posted now, and still no video game samples... Anyways I posted some of my observations and early thoughts on Turing in the other thread in this forum. In the source folder of this one, sneaker posted a Turing encode, and a x264 encode. They were lower bitrate 2.2Mbps and the x264 encode is full of artifacts but still manages to retain some details, Turing encode is clean and exhibits that blurring and texture loss. (At that bitrate range however, the goal probably isn't to retain detail, it's to minimize artifacts. And its 10bit HEVC vs. 8bit AVC, not really apples to apples).
There were some other tests being done at Doom9 too, for Turing you can check those out. The more tests, the more you can get a sense of how an encoder "behaves" in different situations. Turing seems tuned more for PSNR and cannot disable or reduce SAO . Fine details seem difficult to retain, and that's what the gameplay screenshots suggested too . I asked sneaker to play with some settings, and maybe a later Nvidia update will help things. Also it's only a few samples, so too early to pass judgement, just some observations and thoughs
Try StreamFab Downloader and download from Netflix, Amazon, Youtube! Or Try DVDFab and copy Blu-rays! or rip iTunes movies!
+ Reply to Thread
Results 61 to 90 of 99
Thread
-
-
This is very strange. These low minima will also unduly lower the average value, I guess.
How is the average calculated btw.? Is it a linear average or an average of the decibels?
Most of the encodes were 2-pass VBR, some (e.g. all HW encodes) 1-pass constant quality or constant quantizer. Just wondering how much 1-pass or 2-passes affect the result (for this short clip).
Also, were the pictures (.png) within a series (1, 2 or 3) all of the same type for the various encoders (means all B,P or I)? Comparing pictures of different types for rating would be doubtful or possibly misleading, I think, even though we compare at same file size.
Added: I just noticed that SONY Vegas does not support B frames at all..... so I guess the pictures are of different type? Anyway, a metric (PSNR etc.) which analyzes the full clip will deliver an 'average' of the I,B,P cadence though.Last edited by Sharc; 17th Oct 2019 at 08:25.
-
For ffmpeg psnr/ssim I think it's arithmetic mean (simple average) . I'll double check, but I just copy/pasted into the spreadsheet what ffmpeg spit out as the final . But that's why I use the per-frame logs , in case something is amiss you can go check which frames are "off" .
vmaf is different in that you can use harmonic mean or arithmetic mean . HM "penalizes" bad frames more .
Most of the encodes were 2-pass VBR, some (e.g. all HW encodes) 1-pass constant quality or constant quantizer. Just wondering how much 1-pass or 2-passes affect the result (for this short clip).
Also, were the pictures (.png) within a series (1, 2 or 3) all of the same type for the various encoders (means all B,P or I)? Comparing pictures of different types for rating would be doubtful or possibly misleading, I think, even though we compare at same file size.
2,3 were I-Frames for x264, MC (mentioned in the code box, "series 2 & 3 - frame 1000 (I frame for MC and x264)")
The point was to include an I-frame and a non I-frame. I use FFVideoSource() with FFInfo() and you can check frametypes as you go along and hot swap in avspmod or vapoursynth multi viewer. Single frames are limited, you need to look at a bunch of frames . The point of that framecompare website was to emulate the experience of how superimposing and your brain's split second memory can help in seeing differences .
Added: I just noticed that SONY Vegas does not support B frames at all..... so I guess the pictures are of different type? Anyway, a metric (PSNR etc.) which analyzes the full clip will deliver an 'average' of the I,B,P cadence though. -
Sneaker's HEVC turing encode displayed the same blip. Visually , it doesn't look like "12" dB . I checked Y separately with Greyscale too.
n:1186 mse_avg:126.36 mse_y:166.59 mse_u:61.63 mse_v:30.15 psnr_avg:39.18 psnr_y:37.98 psnr_u:42.30 psnr_v:45.40
n:1187 mse_avg:43913.45 mse_y:65416.83 mse_u:1209.19 mse_v:604.20 psnr_avg:13.77 psnr_y:12.04 psnr_u:29.37 psnr_v:32.39
n:1188 mse_avg:237.71 mse_y:335.18 mse_u:52.42 mse_v:33.10 psnr_avg:36.44 psnr_y:34.94 psnr_u:43.00 psnr_v:45.00
Same spot of that scene change, not a forced keyframe when max keyint was reached .
I'm looking into it, if I can reproduce it on my maxwell with NVEncC , then I can run some low level tests, pattern tests emulating a scene change
The overall average weighting differs for different ssim/psnr implementations. Usually the Y channel is weighted more, because humans are not as sensitive to color. But people disagree on what the proportions should be. Should it be 2Y+1U+1V ? some other ratio ?
But if I just take mean average of psnr_y (Y only) in Sharc's 2nd encode using the per frame data , I get 37.29194502. But ffmpeg spits out PSNR y:35.698693 for the totals. So I'm not sure what it's using. Maybe some windowed average.
There are slightly different ssim and psnr implementations too. Some downsample first, some don't apply the original gaussian weights of the original SSIM paper. Eitherway, the trends are more important than the actual numbers.
Maybe some code junkie can translate what ffmpeg psnr, ssim are using for the averaging
https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_psnr.c
https://github.com/FFmpeg/FFmpeg/blob/master/libavfilter/vf_ssim.c -
I couldn't reproduce it on Maxwell . I thought maybe FFMpeg NVEnc vs. Rigaya NVEncC , but both are ok in terms of Min value, no blip
I double checked , re-ran ffmpeg and blip still there on those - it's repeatable. But only on the whole file.
If you trim it down to that 1 frame, make a lossless encode then run on that 1 frame, it's reads normal
Also blip was not present with other measuring tools, 3rd party, including avisynth's internal compare() for psnr . Even when using the same source filter, same input script.
The problem turns it was jitter in timecodes. The problem files used MKV or MOV with non exact timecodes. I used MP4 when testing NVEncC / and FF NVenc
For example, Sharc's was 19001/317 or 59.94006309... instead of 60000/1001 exactly . So even though the frames were aligned and matched exactly, it caused ffmpeg's measurement on that exact frame to glitch. Using AssumeFPS(60000,1001) fixed it. The other measuring methods were not susceptible to the jitter , they only required frame alignment -
-
And this is why I am such a strong proponent of objective metrics and place less emphasis on subjective metrics. In this case, the PSNR calculations where signalling something was wrong when it was imperceptible to the naked eye.
Math does not lie, but your eyes do. -
-
Is there a way to include the 'assumeFPS(60000,1001)' somehow in the ffmpeg commandline which is used to calculate the metrics?
I am presently using
Code:ffmpeg -y -i "file1.mkv" -i "source.mp4" -lavfi "ssim=ssim.csv;[0:v][1:v]psnr=psnr.csv" -f null -
I found that ffmpeg accepts an .avs script as input. EasyLast edited by Sharc; 21st Oct 2019 at 07:27.
-
That's the point of tests and literally proves which one is better than the other
"oH wElL iF yOu SiT fAr eNoUgH", "ZoOm iN aNd pIxEl pEeP" yeah guy if we stand far enough we see the earth from orbit
I don't know why this has gone on for as long as it did, the guy is a straight up troll, I have never seen anything this unecessary before. Why this has gone on as long as it did, beats me. -
Either he is a troll or he is blind, idk which and it doesn't really matter anymore. These tests will never be good enough, there something will come up to invalidate it like it's the wrong codec version, or he'll say "oh well professionals like me only use high Bluray bitrates (make it hard to pick a winner as everything looks good)", or he'll say "oh check it out guys, Nvidia beats x264" to which he used Nvidia to convert TV range content to PC range and then decoded that as TV range giving it extreme contrast and black clipping while leaving the x264 at normal and sane TV levels. I'm pretty sure I've been going back and forth with him on this dead horse for years now and thankfully I have not taking the bait much this year.
-
Whatever is the case, I think it is worth to revisit the topic from time to time because encoders (especially HW encoders) as well as metrics are still under development and make progress, hopefully.
-
FWIW I found this presentation quite interesting:
https://streaminglearningcenter.com/wp-content/uploads/2019/10/Choosing-an-x264-Preset_1.pdf
Slide 3 states that (untrained) people will notice a quality difference at around 5 ... 6 VMAF points under normal "3H viewing" condition.
According to poisondeathray's table 'results_upd.png' all encoders are within 96.0149/92.0338 = 1.0433 => 4.33% which would explain sophisticles' statement that he can't see a difference (for 'normal' viewing conditions).
Of course this has nothing to do with calling x264 'the most overrated encoder'. -
It is overrated, there's a mythos surrounding that has allowed it attained a cult status that is undeserved.
According to PDR's own tests, MC beats it, despite his best attempts at cheating. X264 deserves credit for being legally free, being cross platform allowing it to run on Windows, Linux, Unix and OSX and being able to come close to the commercial offerings, but as PDR has pointed out there are substantial problems with the quality of the encodes it produces.
This hasn't stopped user, egged on by the x264 developers, from taking a commercially encoded video, and re-encoding it to a fraction of the bit-rate a given spec allows for and claiming that it's just as good as the original as they trade the content on torrent networks.
Let's be honest with ourselves, DS was a ruthless self promoter, that spewed lie after lie about competing products, all the while claiming credit for algorithms that existed decades before he ever came on the scene, in fact during the time when x264 was gaining in popularity and usage, he didn't even have his Comp Sci degree yet.
He talked crap about every other encoder out there, and he even talked crap about licensees of his software, go find his comments on Doom9 when people complained about the x264 encodes coming out of TMPG VMW, instead of blaming his software he blamed TMPG despite the fact that said software was only passing the video stream to x264 via the API.
I have no respect for him at all, and while I acknowledge the contributions he made to making AVC encoding available to a wide population for free, the harm he did to AVC encoding and it's users, via the brainwashing campaign he engaged in more than offsets any benefit he contributed. -
-
1) VMAF measures at 3H distance only, it only attempts to measure some of the data - its missing part of the picture. It also does not measure color information. 2) VMAF results on this test do not seem to correlate very well to what the majority of responders are reporting. (Apparently there is 1 outlier that thinks differently)
Do you agree that AQComplexity(-100) is the most similar? I thought that even some of the other MC encodes were more similar than that one. But x264 veryslow using default settings was the most similar
What about PSNR and SSIM ? Aren't you a strong proponent of objective metrics ?
Since you put a lot of "weight" into objective metrics, x264 still "wins" in your mind, by your criteria. VMAF is not an objective metric. It cannot even measure 100% accurately (source against itself). PSNR is the most classically "objective" metric, and not suprisingly x264 --tune psnr "wins."
But PSNR optimized encode looks visually dissimilar, most people would agree. It's not useful - tuning an encoder just to score higher, but in most people's eyes look dissimilar, so it can "win" some award. You could probably make a --tune VMAF preset to have it "win" that too. (Intel SVT-HEVC had a VMAF tune preset for a while, but it got removed recently) . I gave some hints on how you might do this if someone was so inclined
and the samples that you posted.
It's probably pointless, but can you point out or describe why you think the samples show this ? -
-
-
-
-
x264 was only using "veryslow" defaults and ~2-3x faster than the MC encodes which were set to "best", so in a sense x264 was being "handicapped." You could use slower settings, or optimize it for VMAF in this scenario if you wanted to.
Some options might be: slow first pass, more (or less) bframes, open GOP or adjust GOP (up to) infinite keyint, subme 11, me tesa, larger merange, larger rc-lookahead, adjust IP/PB ratios or mbtree, etc...you could even use zones if there were specific problem sections. The other encoders are already maxed out, and/or do not allow some of those adjustments.
But it looks like VMAF is sensitive to x264 b-frames , at least on this scene. Changing to --bframes 3 (with --preset veryslow --tune psnr) yields 96.0815 . We have a new "winner" --subme 11 gives a slight bump as well to 96.0987 . At those settings it's still about 1.5x faster , and 1st pass about 10-15x faster than MC . Many other things you can still optimize, but diminishing returns . Almost anything with --tune psnr looks worse subjectively, more blurry, less detail. (Some exceptions might be some types of cartoons)
You can also "cheat" by pumping bitrate to Y' from CbCr for with --chroma-qp-offset for VMAF, but the PSNR and SSIM U,V should pick up on that. One thing they are good for is trends if nothing else.
[Attachment 50615 - Click to enlarge] -
I redid my NVencC (Pascal) 1-sec GOP encode, reducing --aq-strength from 8 to 1. To my surprise VMAF(HM) improved from 93.6533 to 95.3628 (even at a slightly lower bitrate). It is also visually better (details, sharpness). For this clip a strong aq seems to do more harm than good.
Last edited by Sharc; 24th Oct 2019 at 03:23.
-
Despite PDR's claims AQ should always be expected to do more harm than good, AQ was first patented back in 1995:
https://patents.google.com/patent/US5650860A/en
Despite DS's claims, either overt or covert, he did not invent AQ, it was around for at least a decade before he ever implemented a version in x264. If you read up on what it does, it works by robbing Peter to pay Paul, it moves bits around in a scene from places it thinks are less important to places it thinks are more important, so by definition it's doing more harm than good.
It, and the other Psy "optimizations", remind me of real estate stagers, people that specialize in beautifying a home in order to maximize the amount someone is likely to offer and increase the likelihood of getting offers. It works like this, it's known that the first thing people look at when they walk up to a house it the door, so what they will do is suggest the seller put a new door frame and door or at least either paint them with a high quality paint or resurface and stain them.
On the inside, they will suggest you paint the wall opposite the entrance with a high gloss paint, as well as the ceiling and the window panes, the rest you will use a cheap flat paint. The will suggest you remove the carpeting and use a small throw rug in the middle of the room, because it has the effect of making the room look bigger and other suggestions along these lines.
What they are doing is applying AQ principles to selling a home, but just as in AQ for video encoding it's all a scam, smoke and mirrors.
@PDR: Regarding PSNR (and SSIM), you, and people like you get results that are inconsistent with observation because you guys insist on misunderstanding it and misusing it.
PSNR (and SSIM) are engineering concepts that are valid, you just need to understand the proper way to use them. PSNR literally measures the Peak Signal To Noise ratio a transmission source and what comes out the other. In the case of video, it measures the YUV channels on I, P and B frames.
Where you guys fail is you will take the PSNR of the source, encode 1 and encode 2 and only look at the PSNR of the Y component of the I frame or in some cases the Y component of I/P/B, see that one is higher than the other, notice that it doesn't correlate to what your eyes are telling you and conclude that somehow it's an invalid or inaccurate metric, while failing to look at the entirety of the calculation.
I will tell you right now, do 2 encodes, where 1 encode has higher PSNR across YUV for I, P, and B frames and it will correlate to what your eyes are telling you 100% of the time.
If you're going to sit here and hold yourself up as an "expert" on objective metrics and encoding, if you're going to go on and on about "the scientific method", which you clearly do not understand, then at least do yourself a favor, open up a book, and read up on the damn thing you and other like you repeatedly dismiss.
Seriously, you guys are just too much. -
Wrong, I did not say "always" . I explicitly listed side effects and cases where it usually does more harm than good. Like all things it's a balance
Despite DS's claims, either overt or covert, he did not invent AQ, it was around for at least a decade before he ever implemented a version in x264. If you read up on what it does, it works by robbing Peter to pay Paul, it moves bits around in a scene from places it thinks are less important to places it thinks are more important, so by definition it's doing more harm than good.
The goal is perceptual improvements, and it can be beneficial, and usually is in most situations, that's why it's default enabled. Of course you can disable or modify the strength, use different modes if you don't like it
Where you guys fail is you will take the PSNR of the source, encode 1 and encode 2 and only look at the PSNR of the Y component of the I frame or in some cases the Y component of I/P/B, see that one is higher than the other, notice that it doesn't correlate to what your eyes are telling you and conclude that somehow it's an invalid or inaccurate metric, while failing to look at the entirety of the calculation.
What you fail to understand is the per frame metrics look at every frame, I/P/B and all planes Y, U, V. You can take those measurements and correlate by visually look at everything, including Y, U, V as separate greyscale images - that's what I did when the ffmpeg measurement was off.
I will tell you right now, do 2 encodes, where 1 encode has higher PSNR across YUV for I, P, and B frames and it will correlate to what your eyes are telling you 100% of the time.
Do you need more explicit examples ? I can post per frame metrics and their corresponding frame where the PSNR predicts high , but similarity is clearly low - but you in particular probably can't "see" (or won't admit truthfully) if it's similar or different
This is why we have subjective and perceptually modelled metrics such as PSNR-HVS, PSNR-HVS-M. The correlation between PSNR and human perception is the lowest of all metrics. If PSNR was adequate we wouldn't need any other measurements. Netflix would be wasting their time and money developing VMAF
Each metric has pros/cons; only when you use them and correlate the findings will you understand where they are weak or strong or what they are useful for. It sounds like you probably haven't actually used any of them, just copy and pasted something from a book -
Last edited by Sharc; 24th Oct 2019 at 10:30.
Similar Threads
-
How to get correct PTS for B-Frames when encoding to AVC H.264 videos?
By pxstein in forum Newbie / General discussionsReplies: 0Last Post: 30th Jan 2019, 03:13 -
PAFF H264/AVC encoding
By marcorocchini in forum Newbie / General discussionsReplies: 1Last Post: 29th Jul 2016, 15:42 -
Good GUI audio encoders & AAC 7.1 encoding?
By ManiKz in forum AudioReplies: 11Last Post: 10th Apr 2016, 06:53 -
X.264 AVC is better than DIVX265 HEVC. A TEST which proved it.DEAL WITH IT!
By Stears555 in forum Video ConversionReplies: 54Last Post: 21st Jan 2015, 04:10 -
Editing AVC files without re-encoding
By galapogos in forum EditingReplies: 2Last Post: 18th Oct 2014, 12:33