VideoHelp Forum
+ Reply to Thread
Results 1 to 26 of 26
Thread
  1. This is something I have been meaning to do for a long time but I tend to work long hours so it's taken a while. Be aware that I intend to keep adding to this thread, with more test samples and encoding settings as well as test sources.

    Speaking of test sources, for this test I went to:

    https://media.xiph.org/video/derf/

    And downloaded every single file they have that's 720p and above. For this test I used red_kayak_1080p, speed_bag_1080p, snow_mnt_1080p, touchdown_pass_1080p, west_wind_easy_1080p, aspen_1080p, controlled_burn_1080p and rush_field_cuts_1080p; I loaded them into ShotCut (my favorite free NLE), combined them in the time line and exported a 7.9Gb 414Mb/s 4:2:2 8-bit HuffYUV 1080p 2m32s mkv, this was used as my source for this encoding test.

    The test environment is Ubuntu MATE, 17.04 with kernel 4.10, Xeon E3-1241 v3 @ 3.5Ghz, quad core HT, 16Gb DDR3, GTX1050, ffmpeg built from the latest git, latest cuda package install, latest nvidia sdk used and the 381.22 drivers.

    I used the following command lines:

    time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v libx264 -pix_fmt yuv420p -preset medium -crf 18 1080p.mp4

    time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v h264_nvenc -pix_fmt nv12 -preset slow -b:v 24.8M 1080p_nvenc_h264.mp4

    The x264 encode finished in 3m57.636s, the nvenc finished in 0m39.101s; I tested quality objectively with the following command line:

    ffmpeg -i output -i source -lavfi "ssim;[0:v][1:v]psnr" -f null -

    I also tested it visually by watching both clips as well as the original lossless, to my eyes, on this monitor, I couldn't see any appreciable difference between the 2 encodes, i also used ffplay to overlay the encodes over the original and subtracting the similarities to only leave the differences, this shows which encode is closer to the original, in parts x264 was closer, in other parts nvenc was closer.

    Here are the PSNR and SSIM values:

    SSIM Y:0.894080 (9.750232) U:0.974536 (15.940795) V:0.981503 (17.328887) All:0.922060 (11.082398)
    PSNR y:26.192250 u:40.881033 v:42.814985 average:27.893067 min:11.842668 max:inf
    time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v libx264 -pix_fmt yuv420p -preset medium -crf 18 1080p.mp4

    SSIM Y:0.894303 (9.759364) U:0.971734 (15.487291) V:0.979393 (16.859851) All:0.921390 (11.045202)
    PSNR y:26.198225 u:40.660773 v:42.575120 average:27.895740 min:11.850593 max:inf
    time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v h264_nvenc -pix_fmt nv12 -preset slow -b:v 24.8M 1080p_nvenc_h264.mp4

    As you can guys can see, in this test, nvenc scored the higher minimum and average PSNR but a slightly lower overall SSIM value. Attached are the two test encodes, the nvenc used a tiny bit more bitrate, but it's such a tiny amount that I don't think it makes any appreciable difference in PSNR/SSIM.

    Bookmark this thread, I intend to do many more test encodes, with many more sources, including a monster 2160p test with the Netflix clips I downloaded as well as various encoder setting. I will also be adding a bunch of nvenc hevc encodes, just bear in mind that today and tomorrow I am working long hours and most likely I won't be able to post again until Sunday morning.

    One last thing, if you do any type of work with lossless or intermediate 4k files, basically any formats that don't have gpu decode support for playback, a quad core is not fast enough to play them back, attempting to do so swamps my Xeon and results in a slideshow.

    On Sunday I will also show you guys how to use NVDEC to speed up your x264/x265 transcodes so long as your source is something that NVIDIA has gpu hardware acceleration support for, such as VC-1, h264, hevc, and vp9 sources.
    Image Attached Files
    Quote Quote  
  2. You should use tune psnr / ssim for x264 / x265 if you're using psnr or ssim to measure respectively .

    x264 / x265 have psy opts enabled by default, so they are "penalized" both in terms of metrics and speed (not that they will even come close in encoding fps / time regardless of settings )
    Quote Quote  
  3. Originally Posted by poisondeathray View Post
    You should use tune psnr / ssim for x264 / x265 if you're using psnr or ssim to measure respectively .

    x264 / x265 have psy opts enabled by default, so they are "penalized" both in terms of metrics and speed (not that they will even come close in encoding fps / time regardless of settings )
    I've been hearing things like this for what has to be the better part of a decade and it's such a crock, just to prove it to you, I redid the x264 encodes with both tune ssim and tune psnr, I also did a few more nvenc encodes with the same source, here are the results:

    3m26.116s
    time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v libx264 -pix_fmt yuv420p -preset medium -crf 18 -tune ssim 1080p_ssim.mp4
    SSIM Y:0.895567 (9.811640) U:0.973680 (15.797100) V:0.980919 (17.193902) All:0.922811 (11.124467)
    PSNR y:26.197482 u:40.802744 v:42.748055 average:27.897201 min:11.842869 max:inf

    3m18.962s
    time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v libx264 -pix_fmt yuv420p -preset medium -crf 18 -tune psnr 1080p_psnr.mp4
    SSIM Y:0.895767 (9.819961) U:0.975088 (16.035848) V:0.982074 (17.465111) All:0.923372 (11.156114)
    PSNR y:26.205163 u:40.924847 v:42.867546 average:27.906449 min:11.842520 max:inf

    real 0m46.291s
    time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v h264_nvenc -pix_fmt nv12 -preset slow -spatial-aq 1 -temporal-aq 1 -weighted_pred 1 -rc-lookahead 32 -b:v 24.8M 1080p_nvenc_h264_3.mp4
    SSIM Y:0.894355 (9.761524) U:0.970732 (15.336053) V:0.978903 (16.757808) All:0.921176 (11.033419)
    PSNR y:26.190763 u:40.570085 v:42.516248 average:27.887244 min:11.843003 max:inf

    real 0m40.048s
    time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v h264_nvenc -pix_fmt nv12 -preset slow -g 250 -bf 2 -rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1 -b:v 24.8M 1080p_nvenc_h264_2.mp4
    SSIM Y:0.894150 (9.753107) U:0.973499 (15.767327) V:0.980944 (17.199653) All:0.921841 (11.070194)
    PSNR y:26.193607 u:40.793545 v:42.732986 average:27.893221 min:11.843256 max:inf

    SSIM Y:0.896111 (9.834323) U:0.971417 (15.438858) V:0.979270 (16.833934) All:0.922522 (11.108217)
    PSNR y:26.217697 u:40.601060 v:42.524783 average:27.914109 min:11.855784 max:96.526841
    time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v hevc_nvenc -pix_fmt nv12 -preset slow -b:v 24.8M 1080p_nvenc_h265.mp4

    As you can see there's very little variation in either PSNR and SSIM scores with this particular test clip, the x264 encodes resulted in slightly different amounts of bit rate used but the actual scores barely budged.

    My next post will feature encodes done with different sources, including a rather interesting brutal test for any encoder, when you see the test source I have created you will understand.

    One last thing, for anyone thinking about buying a GTX1050, don't, as i mentioned in a previous post there is a very high likely hood that Pascal's replacement will feature hardware vp9 encoding, but if you must buy a Pascal I recommend going with a Quadro with lots of VRAM, for reasons I will go into in my next post when I explain how and why I created my torture source.

    In the mean time enjoy the attached test encodes.
    Image Attached Files
    Quote Quote  
  4. Originally Posted by sophisticles View Post
    I've been hearing things like this for what has to be the better part of a decade and it's such a crock, just to prove it to you, I redid the x264 encodes with both tune ssim and tune psnr, I also did a few more nvenc encodes with the same source, here are the results:

    As you can see there's very little variation in either PSNR and SSIM scores with this particular test clip, the x264 encodes resulted in slightly different amounts of bit rate used but the actual scores barely budged.
    You think it's crock ? I literally got thousands of tests that show otherwise... yeah, there are a few outliers, but do a few dozen more tests at different bitrates on different sources and plot the curves , then we can talk. Your SSIM test actually supports the effect of "tune" , but this PSNR test didn't for some reason . It's basically a fact for 99% of cases, the values will go up, but subjective visual quality will go down

    And you don't think that's significant ? Higher SSIM 0.922811 at smaller filesize of 401.69MB when using tune ssim vs. 0.922060 at 453MB ? +/- 0.5% bitrate might be acceptable delta for bitrate, not +/- 13% when doing tests.

    PSNR didn't move much given the larger filesize difference , and really that is unexpected, perhaps something wrong with the encode or this particular clip is an outlier, because they generally move in the same direction with each "tune" respectively, and significantly. It's a log scale.

    Thx for posting tests BTW



    Have you noticed any issues with vdpau yet ? It's linux only , but GPU decoding has been semi reliable on windows, unless it's indexed (then it's perfect). DXVA2 tends to be more reliable than CUVID. I don't have any native linux to test, only some VM's
    Quote Quote  
  5. Dinosaur Supervisor KarMa's Avatar
    Join Date
    Jul 2015
    Location
    US
    Search Comp PM
    I would think that pushing the encoders to the low bitrate end , to force obvious artifacts would help make it more obvious which is better. Lets say 5Mbit and below for 1080p, instead of forcing us to rely on SSIM and PNSR numbers. Nearly 25Mbit for 1080p pretty much forces you to simply look at the SSIM/PNSR numbers because these kinds of bitrates should be pretty transparent. Not allowing for the metric that matters (your eyes). And if you are planning on just using your eyes as a metric, then SSIM/PNSR encoding presets should be off.

    I regularly watch certain H.264 satellite channels at a very near constant 11.5Mbit video bitrate, with excellent transparency. If I were to compare this 11.5Mbit video to a ~25Mbit copy, I tend to believe it would be like splitting hairs. Making the idea of comparing two H.264 ~25Mbit videos even worse.
    Quote Quote  
  6. I you had to choose 1 yes, then lower bitrate range is preferrable

    But you should actually be doing multiple bitrates and plotting them, especially with metrics. The curves are non linear, and they typically level off at the higher ranges.

    If a difference is 0.1 or 0.2 db it might not seem like much by itself, but if you needed 1.5x or 2x the bitrate with "encode B" or "settings C" in that "plateau" region to achieve that same value, then that is a huge difference. More data points to see the relationship becomes crucial

    Of course it's an enormous amount of work, and to top it off metrics are of limited value anyways...especially PSNR and SSIM which have only moderate correlation with the perception of "quality"
    Quote Quote  
  7. I didn't choose the bit rate, x264 did, as you will note I used the old default value for crf which used to be 18 (for some reason it seems they have changed that to 23); I then tried to match that bit rate with nvenc.

    I don't believe in artificially bit rate starving encodes to the point of artifacting, Blu-Ray is about 25 Mb/s for 1080p, UHD is about 100 Mb/s for 4k, you can now get 940 Mb/s FIOS for $80 in some areas, my ISP just more than doubled my download speeds to over 200 Mb/s for no extra charge, I see no reason to start trying to contrive a test where any encoder will fall apart just to show that the software encoders will hold up better.

    I don't have to test at ridiculously low bit rate to know that x265 will come out on top, followed by x264 but the results will still be unwatchable. I want to test in a range where I, and most people, will produce content, either for archival purposes, or for public consumption purposes. If you were preparing professional 1080p content for sale on a web site or streaming service you wouldn't use crf 30, would you? If not then what is the point of testing a bit rate range that no one in their right mind would ever use?
    Quote Quote  
  8. Dinosaur Supervisor KarMa's Avatar
    Join Date
    Jul 2015
    Location
    US
    Search Comp PM
    Originally Posted by sophisticles View Post
    I don't believe in artificially bit rate starving encodes to the point of artifacting, Blu-Ray is about 25 Mb/s for 1080p, UHD is about 100 Mb/s for 4k, you can now get 940 Mb/s FIOS for $80 in some areas, my ISP just more than doubled my download speeds to over 200 Mb/s for no extra charge, I see no reason to start trying to contrive a test where any encoder will fall apart just to show that the software encoders will hold up better.
    Meanwhile I pay $60 for 1.5Mbit because it's the best on offer, in my part of the Central US. So your local situation is certainly not a given for everyone, I hangout on DSLReports.com enough to know. About 5 miles down the road there are houses with 1Gbit fiber but I don't expect that anytime soon. As far as UHD Bluray Discs, the 4 known movies that have been decrpyted had an average bitrate from 42Mbit to 72Mbit (with HEVC on all of them). With Netflix UHD they seem to call for a minimum of 15Mbit download and probably average a few Mbit below the minimum, and tend to average 3Mbit for 1080p.

    Originally Posted by sophisticles View Post
    I don't have to test at ridiculously low bit rate to know that x265 will come out on top, followed by x264 but the results will still be unwatchable. I want to test in a range where I, and most people, will produce content, either for archival purposes, or for public consumption purposes. If you were preparing professional 1080p content for sale on a web site or streaming service you wouldn't use crf 30, would you? If not then what is the point of testing a bit rate range that no one in their right mind would ever use?
    Well then maybe you should make that clearer in your first post, as your title makes it seem like a straight up codec comparison to see what's best. If this test comparison is for speed encoding of high quality footage that's fine, I just don't expect much quality difference as you keep throwing bitrate at it. Forcing us to mostly just use PSNR/SSIM as a metric.

    I myself like to make 4k and 1080p timelapse videos using JPG/TIFF images (which were made from RAW pictures that take a lot of time to be converted to JPG or TIFF). My bottlenecks tend to be simply decoding the images fast enough for x264, so a faster encoder in my case would not help at all. An SSD might help me but I'm not going that far yet. Add in Avisynth filters and the bottlenecks only worsen.
    Quote Quote  
  9. Originally Posted by poisondeathray View Post
    Have you noticed any issues with vdpau yet ? It's linux only , but GPU decoding has been semi reliable on windows, unless it's indexed (then it's perfect). DXVA2 tends to be more reliable than CUVID. I don't have any native linux to test, only some VM's
    On Linux I have not had a single issue with either vdpau or cuvid and this seems like the perfect segway to talk about how to speed up x264/x265 encodes with nvdec.

    As you know, when you're transcoding video from one compressed format to another, you first need to decode the video and then it gets fed to the encoder, and decoding can become a huge bottleneck depending on the resolution size and bit rate used. If you are starting from a supported lossy format, such as vc-1, h264, hevc, vp9, etc you can speed up x264 encoding by up to 30 fps and x265 encoding by up to 20 fps by using cuvid to decode, like this:

    time ffmpeg -vsync 0 -hwaccel cuvid -c:v h264_cuvid -i source -c:v libx264/libx265 -preset whatever -crf whatever output

    The vsync 0 prevents ffmpeg from adding duplicate or extra frames under certain circumstances (there's a bug that rears it's head every once in a while), the -hwaccel cuvid and h264_cuvid tell it to use the hardware decoder associated with whatever is specified, here are the possible options:

    h264_cuvid, hevc_cuvid, mjpeg_cuvid, mpeg1_cuvid, mpeg2_cuvid. mpeg4_cuvid, vc1_cuvid, vp8_cuvid, vp9_cuvid

    I've tested it and it works, if the source is in any of the above formats and in a supported resolution / frame rate combination you can free up your cpu from decoding duties so that it can be used just for the encoding part, in my tests as I said the speed up can be substantial. One caveat, if you wish to specify a specific pixel format the encode will fail, also 10 bit sources are not supported, you will have to use the following:

    time ffmpeg -vsync 0 -hwaccel vdpau -i source -c:v libx264/libx265 -preset whatever -crf whatever output

    This will result in a somewhat lower speed up but there will still be a nice speed gain.

    While I'm at it, allow me to share this, NVIDIA has done some interesting things with NVENC, in previous cards there was only one nvenc unit, with Pascal (maybe even Maxwell), there are 2 or 3 depending on the card. With consumer grade cards, i.e. Geforce, you are limited to 2 simultaneous sessions, with Quadros there is no limit, the limit is your hardware resources.

    As I mentioned I decided to go cheap and buy a GTX1050 and while I am happy with the speed and encode quality, I have found that the 2Gb VRAM is insufficient if one wants to truly get the most of what nvenc is capable of you really need a lot more ram. For instance I made a 4k mosaic with 4 1080p BD sources, using the following approach:

    ffmpeg -i ducks_take_off_444_720p50.y4m -i in_to_tree_444_720p50.y4m -i old_town_cross_444_720p50.y4m -i park_joy_444_720p50.y4m -filter_complex "nullsrc=size=2560x1440 [base]; [0:v] setpts=PTS-STARTPTS, scale=1280x720 [upperleft]; [1:v] setpts=PTS-STARTPTS, scale=1280x720 [upperright]; [2:v] setpts=PTS-STARTPTS, scale=1280x720 [lowerleft]; [3:v] setpts=PTS-STARTPTS, scale=1280x720 [lowerright]; [base][upperleft] overlay=shortest=1 [tmp1]; [tmp1][upperright] overlay=shortest=1=1280 [tmp2]; [tmp2][lowerleft] overlay=shortest=1:y=720 [tmp3]; [tmp3][lowerright] overlay=shortest=1=1280:y=720" -c:v huffyuv source.avi <-- this is an example I used in creating my next 2.5k torture source

    It ran right up against the 2Gb VRAM this card card and MATE started becoming unresponsive. Similarly, nvenc is capable of creating multiple resolution output from a single source like this:

    ffmpeg -y -vsync 0 -hwaccel cuvid -c:v h264_cuvid -i input.mp4 -vf scale_npp=4096:2160 -c:a copy -c:v h264_nvenc -qp18 output_2160p.mp4 -vf scale_npp=1920:1080 -c:a copy -c:v h264_nvenc -qp18 output_1080p.mp4 -vf scale_npp=1280:720 -c:a copy -c:v h264_nvenc -qp 18 output_720p.mp4

    But it takes a lot of gpu horsepower if you plan on using high resolutions, frame rates and bit rates or running multiple instances of said encoder to create multiple clips at the same time.

    Attached is a taste of the torture test I am working on:

    0m28.599s
    time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v libx264 -pix_fmt yuv444p -preset medium -crf 18 1440p.mp4
    SSIM Y:0.967999 (14.948350) U:0.966631 (14.766540) V:0.973843 (15.824090) All:0.969491 (15.155701)
    PSNR y:37.831583 u:42.743570 v:44.834699 average:40.778407 min:39.635829 max:43.164826

    0m2.480s
    time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v h264_nvenc -pix_fmt nv16 -preset slow -b:v 80.6M 1440p_nvenc_h264.mp4
    SSIM Y:0.953319 (13.308597) U:0.955781 (13.543890) V:0.966175 (14.707643) All:0.958425 (13.811679)
    PSNR y:36.918330 u:41.297134 v:43.075292 average:39.629041 min:38.784533 max:43.630083

    0m5.123s
    time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv16 -preset slow -b:v 80.6M 1440p_nvenc_hevc.mp4
    SSIM Y:0.961950 (14.196440) U:0.958788 (13.849741) V:0.968744 (15.050597) All:0.963160 (14.336853)
    PSNR y:38.094077 u:41.655226 v:43.743832 average:40.528416 min:39.823694 max:43.71221

    0m5.257s
    time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv20 -preset slow -b:v 80.6M 1440p_nvenc_hevc_10-bit.mp4
    SSIM Y:0.960911 (14.079419) U:0.963528 (14.380446) V:0.971691 (15.480799) All:0.965377 (14.606328)
    PSNR y:37.967738 u:42.245058 v:44.210780 average:40.668000 min:39.684348 max:43.888382

    Some of you, after seeing the test clip, may wonder what the purpose of creating it is' it boils down to this, for a long time I have suspected that perhaps various codec developers, such as the x264 and x265 people may have "tuned" their encoders to perform well with certain very common test clips, such as crowd run, x264 and x265 are almost untouchable at really low bit rate, nvenc can't come close but the advantage these software encoders enjoy in those limited tests does not hold up in my experience with more realistic scenarios, such as bit rate that content creators may use or test clips that represent content like you may buy or rent. I happened to run across an Intel pdf for testing quick sync via ffmpeg and they suggested creating test clips like the one I used, as a way of really torturing an encoders analysis and encoding algorithms.

    I also did a test especially for PDR, since I remember he seemed to think that Spatial AQ with NVENC was something special:

    SSIM Y:0.965739 (14.652055) U:0.968062 (14.956912) V:0.974488 (15.932555) All:0.969430 (15.147020)
    PSNR y:38.860241 u:42.904530 v:44.766188 average:41.454625 min:41.133913 max:46.049186
    time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv20 -rc constqp -qp 22 NVENC_QP22_No_SAQ.mp4

    SSIM Y:0.965529 (14.625456) U:0.960652 (14.050786) V:0.969770 (15.195576) All:0.965317 (14.598828)
    PSNR y:37.583990 u:41.790823 v:43.870674 average:40.274191 min:39.791486 max:45.284033
    time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv20 -rc constqp -qp 22 -spatial_aq 1 NVENC_QP22_SAQ.mp4

    SSIM Y:0.964409 (14.486618) U:0.952386 (13.222615) V:0.965495 (14.621216) All:0.960763 (14.063083)
    PSNR y:36.265087 u:40.748199 v:43.107499 average:39.096446 min:38.493043 max:44.369969
    time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv20 -rc constqp -qp 22 -spatial_aq 1 -aq-strength 15 NVENC_QP22_SAQ_MAX.mp4

    SSIM Y:0.965739 (14.651970) U:0.968040 (14.953925) V:0.974469 (15.929314) All:0.969416 (15.145046)
    PSNR y:38.853058 u:42.898605 v:44.762847 average:41.448340 min:41.126400 max:46.057169
    time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv20 -rc constqp -qp 22 -spatial_aq 1 -aq-strength 1 NVENC_QP22_SAQ_MIN.mp4

    You will note that PSNR went down, as NVIDIA in it's documentation explicitly says it will, also note that nvenc hevc always has SAO turned on, there is not way to turn it off, SAO only works with Pascal cards and SDK 8.0 and above.

    I will shortly be finishing the monster test clip I am working on and posting encoding samples, as well as some SAQ tests with nvenc h264, but in the meantime enjoy comparing the clips I attached.
    Image Attached Files
    Quote Quote  
  10. The is supposed to be : x without a space between them.
    Quote Quote  
  11. FFmpeg just added hardware VP9 encoding via va-api on Kaby Lake processors:

    http://phoronix.com/scan.php?page=news_item&px=FFmpeg-VP9-Encode

    If there are any Kaby Lake Linux users here that could do some test encodes, that would be great.
    Quote Quote  
  12. Yes many of those things were discussed in other threads,

    Originally Posted by sophisticles View Post
    The vsync 0 prevents ffmpeg from adding duplicate or extra frames under certain circumstances (there's a bug that rears it's head every once in a while),
    But using it with vsync 0 is new to me...

    (also the windows variant is just -vsync 0 -c:v h264_cuvid for cuvid , adding the -hwaccel cuvid seems to mess things up when looking at cpu-z / gpu-z . And pure decoding to rawvideo/null as a speed test along with cpu-z/gpu-zyou can verify that it's working)

    And looking at it quickly, the -vsync 0 seems to fix the dropped/duplicate frames on some quick tests with cuvid . However , CFR input becomes VFR output. Is there a switch or something that I'm missing? (you can force-cfr with libx264 x264opts if using that to encode, but it doesn't work here). VFR potentially causes a lot of problems in editors, even if it's "minimally" variable VFR . It seems to be a muxing/ timecodes issue, because the frames were correct on a few tests so far (without -vsync there were frame count and duplicate/dropped problems)

    The DXVA2 method keeps CFR if input is CFR, and seems more reliable so far in my testing (at least on windows)

    I'm still looking into it, but not quite ready to use it for semi-important things yet. The avs indexed method using dgnv , however is 100% stable.

    And what is the other "bug rearing the ugly head" that you're referring to?



    Originally Posted by sophisticles View Post
    FFmpeg just added hardware VP9 encoding via va-api on Kaby Lake processors:
    Fantastic. Any news for Nvidia VP9 Encoding?
    Quote Quote  
  13. So a few more longer tests and I've come across videos where both dxva2 and cuvid fail in terms of accuracy with or without vsync. Either dropped frames, duplicate frames . Unusable for anything of importance IMO .

    Or maybe this is the other bug you were referring to ?
    Quote Quote  
  14. Originally Posted by poisondeathray View Post
    So a few more longer tests and I've come across videos where both dxva2 and cuvid fail in terms of accuracy with or without vsync. Either dropped frames, duplicate frames . Unusable for anything of importance IMO .

    Or maybe this is the other bug you were referring to ?
    What about vdpau?
    Quote Quote  
  15. Originally Posted by sophisticles View Post
    Originally Posted by poisondeathray View Post
    So a few more longer tests and I've come across videos where both dxva2 and cuvid fail in terms of accuracy with or without vsync. Either dropped frames, duplicate frames . Unusable for anything of importance IMO .

    Or maybe this is the other bug you were referring to ?
    What about vdpau?
    I don't have a native linux install, just a vm, will it work there ? I'm thinking driver issues and such

    The issues didn't show up on the quicky short tests today, but a few longer ones (but not all) demonstrate problems. The -vsync 0 with cuvid always has the VFR problem, but that "feels" like a timecodes issue, which is minor on the grand scheme of things compared to dropped or duplicated frames

    It's not necessarily a "GPU problem" per se, because the avisynth DGNV works on the ones the failed. But that's indexed and requires avisynth or vapoursynth; in many ways thats a lot more "clunky" to work with.

    I'm trying to narrow down the problem, or what's causing some to fail but not others. I've actually reported these issues with GPU decoding in the past, but thought maybe they got them fixed



    What was the "other bug" you were referring to ? I don't want to be doubling up or investigating on work already done
    Quote Quote  
  16. Originally Posted by poisondeathray View Post
    What was the "other bug" you were referring to ? I don't want to be doubling up or investigating on work already done
    There was/is a bug at least up to ffmpeg 2.8 that would result in ffmpeg not draining all the frames from the hardware buffers, thus the output stream would be missing the last few frames. This seemed to happen at random and seemed to happen more often with Intel's QS than NVENC.

    With regard to the specific issue you're having, I have a theory. According the the ffmpeg docs:

    -vsync parameter

    Video sync method. For compatibility reasons old values can be specified as numbers. Newly added values will have to be specified as strings always.

    0, passthrough - Each frame is passed with its timestamp from the demuxer to the muxer.
    1, cfr - Frames will be duplicated and dropped to achieve exactly the requested constant frame rate.
    2, vfr - Frames are passed through with their timestamp or dropped so as to prevent 2 frames from having the same timestamp.
    drop - As passthrough but destroys all timestamps, making the muxer generate fresh timestamps based on frame-rate.
    -1, auto - Chooses between 1 and 2 depending on muxer capabilities. This is the default method.

    I think you should give -vsync drop a try and see what happens.
    Quote Quote  
  17. Are you talking about the vfr output for cuvid when using vsync? or the other issue with drops or duplicates ? eitherway, -vsync drop doesn't work

    dxva2 definitely seems less flaky, but it too has errors on some . I just need to run a bunch more tests and figure out why or what situations

    /sorry for the thread hijack, this sidetrack is not really "pascal NVENC"
    Quote Quote  
  18. Here are 3 more test encodes, it's a 4k mosaic done using four 4k60 NetFlix test files from the link I posted earlier, here are the command lines:

    ffmpeg -vsync 1 -i Netflix_BarScene_4096x2160_60fps_10bit_420.y4m -i Netflix_Aerial_4096x2160_60fps_10bit_420.y4m -i Netflix_DinnerScene_4096x2160_60fps_10bit_420.y4m -i Netflix_DrivingPOV_4096x2160_60fps_10bit_420.y4m -filter_complex "nullsrc=size=4096x2160 [base]; [0:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperleft]; [1:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperright]; [2:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerleft]; [3:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerright]; [base][upperleft] overlay=shortest=1 [tmp1]; [tmp1][upperright] overlay=shortest=1=2048 [tmp2]; [tmp2][lowerleft] overlay=shortest=1:y=1080 [tmp3]; [tmp3][lowerright] overlay=shortest=1=2048:y=1080" -c:v hevc_nvenc -preset slow -r 60 -b:v 45.8M hevc_hevc.mp4

    time ffmpeg -vsync 1 -i Netflix_BarScene_4096x2160_60fps_10bit_420.y4m -i Netflix_Aerial_4096x2160_60fps_10bit_420.y4m -i Netflix_DinnerScene_4096x2160_60fps_10bit_420.y4m -i Netflix_DrivingPOV_4096x2160_60fps_10bit_420.y4m -filter_complex "nullsrc=size=4096x2160 [base]; [0:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperleft]; [1:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperright]; [2:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerleft]; [3:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerright]; [base][upperleft] overlay=shortest=1 [tmp1]; [tmp1][upperright] overlay=shortest=1=2048 [tmp2]; [tmp2][lowerleft] overlay=shortest=1:y=1080 [tmp3]; [tmp3][lowerright] overlay=shortest=1=2048:y=1080" -c:v h264_nvenc -preset slow -r 60 -b:v 45.8M h264_hevc.mp4

    time ffmpeg -vsync 1 -i Netflix_BarScene_4096x2160_60fps_10bit_420.y4m -i Netflix_Aerial_4096x2160_60fps_10bit_420.y4m -i Netflix_DinnerScene_4096x2160_60fps_10bit_420.y4m -i Netflix_DrivingPOV_4096x2160_60fps_10bit_420.y4m -filter_complex "nullsrc=size=4096x2160 [base]; [0:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperleft]; [1:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperright]; [2:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerleft]; [3:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerright]; [base][upperleft] overlay=shortest=1 [tmp1]; [tmp1][upperright] overlay=shortest=1=2048 [tmp2]; [tmp2][lowerleft] overlay=shortest=1:y=1080 [tmp3]; [tmp3][lowerright] overlay=shortest=1=2048:y=1080" -c:v libx264 -preset medium -r 60 -crf 18 x264.mp4

    Note there are no SSIM or PSNR measurements done this time I did not create an intermediate master, if there is demand I'll redo the tests by first creating a master against which to test the encodes.

    By the way, if anyone is interested you can easily encrypt any file on *.nix from the command line like this:

    gpg -c file

    To decrypt type gpg file. Thought someone may find this useful.
    Image Attached Files
    Quote Quote  
  19. Originally Posted by sophisticles View Post
    Some of you, after seeing the test clip, may wonder what the purpose of creating it is' it boils down to this, for a long time I have suspected that perhaps various codec developers, such as the x264 and x265 people may have "tuned" their encoders to perform well with certain very common test clips, such as crowd run, x264 and x265 are almost untouchable at really low bit rate, nvenc can't come close....
    I'd be keen to learn how they'd do that. Would x264/x265 contain code designed to recognise test clips and switch the encoder to a special test clip encoding mode? It sounds very much like a conspiracy theory to me.
    Quote Quote  
  20. [just a follow up to the gpu decoding issues]

    The dupe/drop that affects some files is a rare case, it's related to certain types of open-gop. I wouldn't worry about it for the most part (but it still decodes correctly in avisynth/vapoursynth dgnv(GPU) or software)

    The cuvid VFR issue with -vsync 0 is a container timebase issue, it doesn't occur with mkv output, only mp4. I tried some of the movflags , and forcing the timebase and timescale but couldn't get it to work with mp4 container. There are other ways to remux , rewrite timecodes, and make it CFR in other programs. VFR can cause problems with some editors, some HW players

    cuvid with nvidia seems the fastest on windows, significantly faster than dxva2 for pure decoding speed and it translates to slightly faster encoding speed. -vsync 0 fixes the dropped/dupe frames on most sources, but at the expense of some VFR timecode weirdness in mp4 container. MKV is read as CFR, but MKV isn't supported in "professional" programs like editors. Plain remuxing to mp4 doesn't fix it, you need to re-write the timecodes

    dxva2 is windows only, slower than cuvid, but still seems more consistent. It doesn't have the container timebase issues with mp4.

    How much GPU decoding can benefit libx264(cpu) encoding speed depends on many factors, including resolution, settings used, other bottlenecks etc... it can be maybe -2% to +10% faster. But you generally benefit more the higher the resolution as CPU cycles "wasted" on decoding can be allocated to encoding. cuvid is consistently faster than dxva2 and that translates to encoding speed in general . e.g. if dxva2 was +3% faster in a given set of settings and scenarios, cuvid would be maybe +5% on average


    ok , back to the regular programming, carry on with pascal nvenc
    Quote Quote  
  21. Originally Posted by hello_hello View Post
    I'd be keen to learn how they'd do that. Would x264/x265 contain code designed to recognize test clips and switch the encoder to a special test clip encoding mode? It sounds very much like a conspiracy theory to me.
    I love the term "conspiracy theory", it's one of those terms that people use to dismiss anything that makes them uncomfortable.

    It's actually very easy to do, simply tune your analysis and encoding algorithms for scenes that contain certain characteristics. Here's the thing, I have done tons of test encodes with both clips and full movies and the one thing that strikes me as odd is that both x264 and x265 seem to excel in a handful of commonly used test clips, like that crowd run test clip, particularly in the trees yet that advantage does not hold up when you use more realistic sources. If you try test encodes with Elephant's Dream, there is one part, where the guy is running on a bridge, where every encodes falls apart, except for x265, yet in the rest of the movie all encoders are capable of producing great quality. In sources like Sintel, everything produces great encodes.

    Honestly, it doesn't make a difference because no one that knows what they are doing is going to encode a movie with uniform settings, what they'll do is use segmented encoding and up the bit rate in parts that need it in order to ensure a great encode.

    Getting back to the topic at hand, I created one more master, I used ShotCut and imported NetFlix Tango, NetFlix Ritual Dance, NetFlix Food Market 2, NetFlix Boxing Practice and applied a glow and text to the first, vignette to the second, a projector effect to the third and Technicolor to the fourth, I combined all 4 in the time line and exported a 4096x2160p60 lossless huffyuv:

    4m1.350s
    time ffmpeg -vsync 0 -hwaccel vdpau -i NetFlix_Master.mkv -c:v libx264 -pix_fmt yuv420p -preset medium -crf 18 NetFlix_x264.mp4

    1m8.199s
    time ffmpeg -vsync 0 -hwaccel vdpau -i NetFlix_Master.mkv -c:v h264_nvenc -pix_fmt nv12 -preset slow -b:v 73.9M NetFlix_h264_nvenc.mp4

    1m14.433s
    time ffmpeg -vsync 0 -hwaccel vdpau -i NetFlix_Master.mkv -c:v hevc_nvenc -pix_fmt nv12 -preset slow -b:v 73.9M NetFlix_hevc_nvenc.mp4


    SSIM Y:0.892836 (9.699525) U:0.966144 (14.703686) V:0.963074 (14.326704) All:0.916761 (10.796715)
    PSNR y:25.963650 u:39.848219 v:38.448001 average:27.620164 min:12.843680 max:48.714651
    ffmpeg -i NetFlix_x264.mp4 -i NetFlix_Master.mkv -lavfi "ssim;[0:v][1:v]psnr" -f null -

    SSIM Y:0.886554 (9.452111) U:0.960028 (13.982463) V:0.956801 (13.645301) All:0.910508 (10.482142)
    PSNR y:25.948563 u:39.425053 v:38.133323 average:27.596543 min:12.852938 max:49.057867
    ffmpeg -i NetFlix_h264_nvenc.mp4 -i NetFlix_Master.mkv -lavfi "ssim;[0:v][1:v]psnr" -f null -

    SSIM Y:0.888461 (9.525718) U:0.961090 (14.099337) V:0.957759 (13.742686) All:0.912115 (10.560865)
    PSNR y:25.962801 u:39.439475 v:38.108577 average:27.610207 min:12.847422 max:49.697014
    ffmpeg -i NetFlix_hevc_nvenc.mp4 -i NetFlix_Master.mkv -lavfi "ssim;[0:v][1:v]psnr" -f null -

    I think it's pretty obvious that if x264+crf18+medium is good enough for you then Pascal NVENC with it's speed advantage is the better choice.
    Image Attached Files
    Quote Quote  
  22. Member
    Join Date
    Feb 2019
    Location
    Melbourne, Australia
    Search PM
    The new NVENC on the Turing cards is excellent. I tested it on Overwatch footage and was blown away. Thinking of doing it again on something like Apex, or IRL footage, ideas?

    https://unrealaussies.com/tech/nvenc-x264-quicksync-qsv-vp9-av1/
    Quote Quote  
  23. 2 things:

    That is an impressive amount of work in testing and putting together that website and the way to present your findings.

    Your test source was a poor choice. I would like to see the same type of testing done with the various NetFlix sources available for testing:

    https://media.xiph.org/video/derf/

    Especially the Meridian test file, which NetFlix designed specifically to test various codecs.

    I would also not use OBSm because of the limitations that you outline, such as certain presets not being available,

    I would use something that allows you to access more of the encoders options, I believe Staxrip on Windows offers an abundance of encoder options, or just use FFMPEG directly from the command line.

    Also, Kaby Lake and later Quick Sync supports hardware VP9 encoding, it's only available via vaapi, which itself is only available on Linux. I have tested it using an i3 7100 and the hardware VP9 encoder is pretty good, as is the hardware deinterlace filter Quick Sync has.

    The main problem with Turing, as I see it, is the pricing model NVIDIA has embraced, it used to be that NVENC offered a huge price advantage, you could buy a cheap GTX1050 2GB model for about $120 and get a video card that didn't need an auxiliary power connection and it was capable of encoding 1080p h265 at close to 300fps.

    The cheapest RTX that is a true RTX, not a gimped RTX wannabe (the GTX1660) costs about $350 and it needs an auxiliary power connection. When you add in the fact that AMD, with its aggressive pricing model, and the soon to be released Ryzen replacements, not to mention that Intel is set to release it's own graphics card in 2020, and I suspect it may have a hardware AV1 encoder*, I see no reason to spend good money on a Turing based graphics card.

    *I base this one the work Intel has done on the open source SVT encoders, including it's HEVC, VP9 and AV! encoder:

    https://github.com/intel/SVT-HEVC
    https://github.com/OpenVisualCloud/SVT-VP9
    https://github.com/OpenVisualCloud/SVT-AV1
    https://www.phoronix.com/scan.php?page=news_item&px=SVT-VP9-Open-Source

    Keep an eye on these Intel open source encoders, they will overtake all the current popular open source encoders, such as libvp9, x264 and x265 in both quality and speed very shortly, if they haven't already.

    And I think they are laying the ground work for Intel GPU powered encoders in future Intel dGPU and iGPU products.
    Quote Quote  
  24. Member
    Join Date
    Feb 2019
    Location
    Melbourne, Australia
    Search PM
    Originally Posted by sophisticles View Post
    That is an impressive amount of work in testing and putting together that website and the way to present your findings.
    I appreciate it, but it was another person who made most of the site, I just wanted to write stuff.

    Originally Posted by sophisticles View Post
    Your test source was a poor choice. I would like to see the same type of testing done with the various NetFlix sources available for testing:
    I would also not use OBSm because of the limitations that you outline, such as certain presets not being available,
    I would use something that allows you to access more of the encoders options, I believe Staxrip on Windows offers an abundance of encoder options, or just use FFMPEG directly from the command line.
    Yes I realise this, and I've said to some people in OBS forums that the results could be skewed because I do all my tests on gameplay footage. The colour space and range will also impact it since I use 709 partial and my webcam input is MJPEG which is then composited in the scene. Originally I wanted to just focus on gameplay streaming, but I'm intrigued by other stuff now. I'm trying to convince people to add default settings for QuickSync presets into OBS simple settings, because currently the only way to do it is to use the advanced recording tab and just insert what is basically custom FFMPEG options. Most people will simply not do that, it needs to be easy to access or nobody will even try.

    I was thinking of trying either something like that Meridian clip, or the big buck bunny one for my next test. Do you think people will be interested? It takes me quite a while to do them especially if I include AV1, and most of the people I see asking for help are just interested in streaming games.

    Originally Posted by sophisticles View Post
    Also, Kaby Lake and later Quick Sync supports hardware VP9 encoding, it's only available via vaapi, which itself is only available on Linux. I have tested it using an i3 7100 and the hardware VP9 encoder is pretty good, as is the hardware deinterlace filter Quick Sync has.
    I'd love to do that, but I'm terrible at Linux. Apparently the recent version of OBS for Linux does include something for VAAPI, but I have yet to try it out, I only have command-line Linux available at the moment and I don't have a lot of motivation to setup a GUI. I got VAAPI working once through somebody else's build of FFMPEG and output some files, but it keeps scuffing for me and I ran out of patience.

    Originally Posted by sophisticles View Post
    The main problem with Turing, as I see it, is the pricing model NVIDIA has embraced, it used to be that NVENC offered a huge price advantage, you could buy a cheap GTX1050 2GB model for about $120 and get a video card that didn't need an auxiliary power connection and it was capable of encoding 1080p h265 at close to 300fps.
    100% agree. Especially when it just comes to gaming alone, the buff to game graphics that RTX gives is a decent buff, but it's just as much of a buff to the price. It used to be that by the time a new family came out, the brand new items cost approximately as much as the old ones did 18 months ago or whatever. But with RTX, it was like here's a $200 $400 $600 $800 card family, and now here's also a $1000 $1400 and $2000 card family. (Australian prices) It's as though the last 2 years didn't give us new tech for the same price as years ago, it's new tech that give a little more, but just costs more. I wasn't going to get an RTX until I heard so much about NVENC and then I got swayed with that.

    Originally Posted by sophisticles View Post
    I would love to try these out and do exactly the same thing with them. ESPECIALLY the AV1 one since it takes so frigging long to make files with libaom-av1. Can't wait!

    I have heard a lot of promises about the iGPU coming with Ice Lake. A HEVC revamp, as well as discreet VP9 encoding instead of just hardware accelerated. I would greatly appreciate it if they made a media SDK for Windows like they did for Linux, they don't have all the same stuff
    Quote Quote  
  25. it takes so frigging long to make files with libaom-av1
    there's also rav1e, which is faster den aomenc, but has less screws to turn.
    users currently on my ignore list: deadrats, Stears555
    Quote Quote  
  26. Member
    Join Date
    Feb 2019
    Location
    Melbourne, Australia
    Search PM
    I did come across rav1e at some point in my searches, but at the time (only about 2 months ago) it indicated that it wasn't complete and the result wasn't guaranteed to conform to the bitstream spec. I got the impression that's what they're aiming to do in the end, but during development they're concentrating on other things instead. Which is totally fine, probably helps in fact. But for this reason I chose not to use it this time. Maybe next time?
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!