This is something I have been meaning to do for a long time but I tend to work long hours so it's taken a while. Be aware that I intend to keep adding to this thread, with more test samples and encoding settings as well as test sources.
Speaking of test sources, for this test I went to:
And downloaded every single file they have that's 720p and above. For this test I used red_kayak_1080p, speed_bag_1080p, snow_mnt_1080p, touchdown_pass_1080p, west_wind_easy_1080p, aspen_1080p, controlled_burn_1080p and rush_field_cuts_1080p; I loaded them into ShotCut (my favorite free NLE), combined them in the time line and exported a 7.9Gb 414Mb/s 4:2:2 8-bit HuffYUV 1080p 2m32s mkv, this was used as my source for this encoding test.
The test environment is Ubuntu MATE, 17.04 with kernel 4.10, Xeon E3-1241 v3 @ 3.5Ghz, quad core HT, 16Gb DDR3, GTX1050, ffmpeg built from the latest git, latest cuda package install, latest nvidia sdk used and the 381.22 drivers.
I used the following command lines:
time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v libx264 -pix_fmt yuv420p -preset medium -crf 18 1080p.mp4
time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v h264_nvenc -pix_fmt nv12 -preset slow -b:v 24.8M 1080p_nvenc_h264.mp4
The x264 encode finished in 3m57.636s, the nvenc finished in 0m39.101s; I tested quality objectively with the following command line:
ffmpeg -i output -i source -lavfi "ssim;[0:v][1:v]psnr" -f null -
I also tested it visually by watching both clips as well as the original lossless, to my eyes, on this monitor, I couldn't see any appreciable difference between the 2 encodes, i also used ffplay to overlay the encodes over the original and subtracting the similarities to only leave the differences, this shows which encode is closer to the original, in parts x264 was closer, in other parts nvenc was closer.
Here are the PSNR and SSIM values:
SSIM Y:0.894080 (9.750232) U:0.974536 (15.940795) V:0.981503 (17.328887) All:0.922060 (11.082398)
PSNR y:26.192250 u:40.881033 v:42.814985 average:27.893067 min:11.842668 max:inf
time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v libx264 -pix_fmt yuv420p -preset medium -crf 18 1080p.mp4
SSIM Y:0.894303 (9.759364) U:0.971734 (15.487291) V:0.979393 (16.859851) All:0.921390 (11.045202)
PSNR y:26.198225 u:40.660773 v:42.575120 average:27.895740 min:11.850593 max:inf
time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v h264_nvenc -pix_fmt nv12 -preset slow -b:v 24.8M 1080p_nvenc_h264.mp4
As you can guys can see, in this test, nvenc scored the higher minimum and average PSNR but a slightly lower overall SSIM value. Attached are the two test encodes, the nvenc used a tiny bit more bitrate, but it's such a tiny amount that I don't think it makes any appreciable difference in PSNR/SSIM.
Bookmark this thread, I intend to do many more test encodes, with many more sources, including a monster 2160p test with the Netflix clips I downloaded as well as various encoder setting. I will also be adding a bunch of nvenc hevc encodes, just bear in mind that today and tomorrow I am working long hours and most likely I won't be able to post again until Sunday morning.
One last thing, if you do any type of work with lossless or intermediate 4k files, basically any formats that don't have gpu decode support for playback, a quad core is not fast enough to play them back, attempting to do so swamps my Xeon and results in a slideshow.
On Sunday I will also show you guys how to use NVDEC to speed up your x264/x265 transcodes so long as your source is something that NVIDIA has gpu hardware acceleration support for, such as VC-1, h264, hevc, and vp9 sources.
+ Reply to Thread
Results 1 to 26 of 26
You should use tune psnr / ssim for x264 / x265 if you're using psnr or ssim to measure respectively .
x264 / x265 have psy opts enabled by default, so they are "penalized" both in terms of metrics and speed (not that they will even come close in encoding fps / time regardless of settings )
time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v libx264 -pix_fmt yuv420p -preset medium -crf 18 -tune ssim 1080p_ssim.mp4
SSIM Y:0.895567 (9.811640) U:0.973680 (15.797100) V:0.980919 (17.193902) All:0.922811 (11.124467)
PSNR y:26.197482 u:40.802744 v:42.748055 average:27.897201 min:11.842869 max:inf
time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v libx264 -pix_fmt yuv420p -preset medium -crf 18 -tune psnr 1080p_psnr.mp4
SSIM Y:0.895767 (9.819961) U:0.975088 (16.035848) V:0.982074 (17.465111) All:0.923372 (11.156114)
PSNR y:26.205163 u:40.924847 v:42.867546 average:27.906449 min:11.842520 max:inf
time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v h264_nvenc -pix_fmt nv12 -preset slow -spatial-aq 1 -temporal-aq 1 -weighted_pred 1 -rc-lookahead 32 -b:v 24.8M 1080p_nvenc_h264_3.mp4
SSIM Y:0.894355 (9.761524) U:0.970732 (15.336053) V:0.978903 (16.757808) All:0.921176 (11.033419)
PSNR y:26.190763 u:40.570085 v:42.516248 average:27.887244 min:11.843003 max:inf
time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v h264_nvenc -pix_fmt nv12 -preset slow -g 250 -bf 2 -rc-lookahead 20 -i_qfactor 0.75 -b_qfactor 1.1 -b:v 24.8M 1080p_nvenc_h264_2.mp4
SSIM Y:0.894150 (9.753107) U:0.973499 (15.767327) V:0.980944 (17.199653) All:0.921841 (11.070194)
PSNR y:26.193607 u:40.793545 v:42.732986 average:27.893221 min:11.843256 max:inf
SSIM Y:0.896111 (9.834323) U:0.971417 (15.438858) V:0.979270 (16.833934) All:0.922522 (11.108217)
PSNR y:26.217697 u:40.601060 v:42.524783 average:27.914109 min:11.855784 max:96.526841
time ffmpeg -vsync 0 -hwaccel vdpau -i 1080p.mkv -c:v hevc_nvenc -pix_fmt nv12 -preset slow -b:v 24.8M 1080p_nvenc_h265.mp4
As you can see there's very little variation in either PSNR and SSIM scores with this particular test clip, the x264 encodes resulted in slightly different amounts of bit rate used but the actual scores barely budged.
My next post will feature encodes done with different sources, including a rather interesting brutal test for any encoder, when you see the test source I have created you will understand.
One last thing, for anyone thinking about buying a GTX1050, don't, as i mentioned in a previous post there is a very high likely hood that Pascal's replacement will feature hardware vp9 encoding, but if you must buy a Pascal I recommend going with a Quadro with lots of VRAM, for reasons I will go into in my next post when I explain how and why I created my torture source.
In the mean time enjoy the attached test encodes.
And you don't think that's significant ? Higher SSIM 0.922811 at smaller filesize of 401.69MB when using tune ssim vs. 0.922060 at 453MB ? +/- 0.5% bitrate might be acceptable delta for bitrate, not +/- 13% when doing tests.
PSNR didn't move much given the larger filesize difference , and really that is unexpected, perhaps something wrong with the encode or this particular clip is an outlier, because they generally move in the same direction with each "tune" respectively, and significantly. It's a log scale.
Thx for posting tests BTW
Have you noticed any issues with vdpau yet ? It's linux only , but GPU decoding has been semi reliable on windows, unless it's indexed (then it's perfect). DXVA2 tends to be more reliable than CUVID. I don't have any native linux to test, only some VM's
I would think that pushing the encoders to the low bitrate end , to force obvious artifacts would help make it more obvious which is better. Lets say 5Mbit and below for 1080p, instead of forcing us to rely on SSIM and PNSR numbers. Nearly 25Mbit for 1080p pretty much forces you to simply look at the SSIM/PNSR numbers because these kinds of bitrates should be pretty transparent. Not allowing for the metric that matters (your eyes). And if you are planning on just using your eyes as a metric, then SSIM/PNSR encoding presets should be off.
I regularly watch certain H.264 satellite channels at a very near constant 11.5Mbit video bitrate, with excellent transparency. If I were to compare this 11.5Mbit video to a ~25Mbit copy, I tend to believe it would be like splitting hairs. Making the idea of comparing two H.264 ~25Mbit videos even worse.
I you had to choose 1 yes, then lower bitrate range is preferrable
But you should actually be doing multiple bitrates and plotting them, especially with metrics. The curves are non linear, and they typically level off at the higher ranges.
If a difference is 0.1 or 0.2 db it might not seem like much by itself, but if you needed 1.5x or 2x the bitrate with "encode B" or "settings C" in that "plateau" region to achieve that same value, then that is a huge difference. More data points to see the relationship becomes crucial
Of course it's an enormous amount of work, and to top it off metrics are of limited value anyways...especially PSNR and SSIM which have only moderate correlation with the perception of "quality"
I didn't choose the bit rate, x264 did, as you will note I used the old default value for crf which used to be 18 (for some reason it seems they have changed that to 23); I then tried to match that bit rate with nvenc.
I don't believe in artificially bit rate starving encodes to the point of artifacting, Blu-Ray is about 25 Mb/s for 1080p, UHD is about 100 Mb/s for 4k, you can now get 940 Mb/s FIOS for $80 in some areas, my ISP just more than doubled my download speeds to over 200 Mb/s for no extra charge, I see no reason to start trying to contrive a test where any encoder will fall apart just to show that the software encoders will hold up better.
I don't have to test at ridiculously low bit rate to know that x265 will come out on top, followed by x264 but the results will still be unwatchable. I want to test in a range where I, and most people, will produce content, either for archival purposes, or for public consumption purposes. If you were preparing professional 1080p content for sale on a web site or streaming service you wouldn't use crf 30, would you? If not then what is the point of testing a bit rate range that no one in their right mind would ever use?
I myself like to make 4k and 1080p timelapse videos using JPG/TIFF images (which were made from RAW pictures that take a lot of time to be converted to JPG or TIFF). My bottlenecks tend to be simply decoding the images fast enough for x264, so a faster encoder in my case would not help at all. An SSD might help me but I'm not going that far yet. Add in Avisynth filters and the bottlenecks only worsen.
As you know, when you're transcoding video from one compressed format to another, you first need to decode the video and then it gets fed to the encoder, and decoding can become a huge bottleneck depending on the resolution size and bit rate used. If you are starting from a supported lossy format, such as vc-1, h264, hevc, vp9, etc you can speed up x264 encoding by up to 30 fps and x265 encoding by up to 20 fps by using cuvid to decode, like this:
time ffmpeg -vsync 0 -hwaccel cuvid -c:v h264_cuvid -i source -c:v libx264/libx265 -preset whatever -crf whatever output
The vsync 0 prevents ffmpeg from adding duplicate or extra frames under certain circumstances (there's a bug that rears it's head every once in a while), the -hwaccel cuvid and h264_cuvid tell it to use the hardware decoder associated with whatever is specified, here are the possible options:
h264_cuvid, hevc_cuvid, mjpeg_cuvid, mpeg1_cuvid, mpeg2_cuvid. mpeg4_cuvid, vc1_cuvid, vp8_cuvid, vp9_cuvid
I've tested it and it works, if the source is in any of the above formats and in a supported resolution / frame rate combination you can free up your cpu from decoding duties so that it can be used just for the encoding part, in my tests as I said the speed up can be substantial. One caveat, if you wish to specify a specific pixel format the encode will fail, also 10 bit sources are not supported, you will have to use the following:
time ffmpeg -vsync 0 -hwaccel vdpau -i source -c:v libx264/libx265 -preset whatever -crf whatever output
This will result in a somewhat lower speed up but there will still be a nice speed gain.
While I'm at it, allow me to share this, NVIDIA has done some interesting things with NVENC, in previous cards there was only one nvenc unit, with Pascal (maybe even Maxwell), there are 2 or 3 depending on the card. With consumer grade cards, i.e. Geforce, you are limited to 2 simultaneous sessions, with Quadros there is no limit, the limit is your hardware resources.
As I mentioned I decided to go cheap and buy a GTX1050 and while I am happy with the speed and encode quality, I have found that the 2Gb VRAM is insufficient if one wants to truly get the most of what nvenc is capable of you really need a lot more ram. For instance I made a 4k mosaic with 4 1080p BD sources, using the following approach:
ffmpeg -i ducks_take_off_444_720p50.y4m -i in_to_tree_444_720p50.y4m -i old_town_cross_444_720p50.y4m -i park_joy_444_720p50.y4m -filter_complex "nullsrc=size=2560x1440 [base]; [0:v] setpts=PTS-STARTPTS, scale=1280x720 [upperleft]; [1:v] setpts=PTS-STARTPTS, scale=1280x720 [upperright]; [2:v] setpts=PTS-STARTPTS, scale=1280x720 [lowerleft]; [3:v] setpts=PTS-STARTPTS, scale=1280x720 [lowerright]; [base][upperleft] overlay=shortest=1 [tmp1]; [tmp1][upperright] overlay=shortest=1=1280 [tmp2]; [tmp2][lowerleft] overlay=shortest=1:y=720 [tmp3]; [tmp3][lowerright] overlay=shortest=1=1280:y=720" -c:v huffyuv source.avi <-- this is an example I used in creating my next 2.5k torture source
It ran right up against the 2Gb VRAM this card card and MATE started becoming unresponsive. Similarly, nvenc is capable of creating multiple resolution output from a single source like this:
ffmpeg -y -vsync 0 -hwaccel cuvid -c:v h264_cuvid -i input.mp4 -vf scale_npp=4096:2160 -c:a copy -c:v h264_nvenc -qp18 output_2160p.mp4 -vf scale_npp=1920:1080 -c:a copy -c:v h264_nvenc -qp18 output_1080p.mp4 -vf scale_npp=1280:720 -c:a copy -c:v h264_nvenc -qp 18 output_720p.mp4
But it takes a lot of gpu horsepower if you plan on using high resolutions, frame rates and bit rates or running multiple instances of said encoder to create multiple clips at the same time.
Attached is a taste of the torture test I am working on:
time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v libx264 -pix_fmt yuv444p -preset medium -crf 18 1440p.mp4
SSIM Y:0.967999 (14.948350) U:0.966631 (14.766540) V:0.973843 (15.824090) All:0.969491 (15.155701)
PSNR y:37.831583 u:42.743570 v:44.834699 average:40.778407 min:39.635829 max:43.164826
time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v h264_nvenc -pix_fmt nv16 -preset slow -b:v 80.6M 1440p_nvenc_h264.mp4
SSIM Y:0.953319 (13.308597) U:0.955781 (13.543890) V:0.966175 (14.707643) All:0.958425 (13.811679)
PSNR y:36.918330 u:41.297134 v:43.075292 average:39.629041 min:38.784533 max:43.630083
time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv16 -preset slow -b:v 80.6M 1440p_nvenc_hevc.mp4
SSIM Y:0.961950 (14.196440) U:0.958788 (13.849741) V:0.968744 (15.050597) All:0.963160 (14.336853)
PSNR y:38.094077 u:41.655226 v:43.743832 average:40.528416 min:39.823694 max:43.71221
time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv20 -preset slow -b:v 80.6M 1440p_nvenc_hevc_10-bit.mp4
SSIM Y:0.960911 (14.079419) U:0.963528 (14.380446) V:0.971691 (15.480799) All:0.965377 (14.606328)
PSNR y:37.967738 u:42.245058 v:44.210780 average:40.668000 min:39.684348 max:43.888382
Some of you, after seeing the test clip, may wonder what the purpose of creating it is' it boils down to this, for a long time I have suspected that perhaps various codec developers, such as the x264 and x265 people may have "tuned" their encoders to perform well with certain very common test clips, such as crowd run, x264 and x265 are almost untouchable at really low bit rate, nvenc can't come close but the advantage these software encoders enjoy in those limited tests does not hold up in my experience with more realistic scenarios, such as bit rate that content creators may use or test clips that represent content like you may buy or rent. I happened to run across an Intel pdf for testing quick sync via ffmpeg and they suggested creating test clips like the one I used, as a way of really torturing an encoders analysis and encoding algorithms.
I also did a test especially for PDR, since I remember he seemed to think that Spatial AQ with NVENC was something special:
SSIM Y:0.965739 (14.652055) U:0.968062 (14.956912) V:0.974488 (15.932555) All:0.969430 (15.147020)
PSNR y:38.860241 u:42.904530 v:44.766188 average:41.454625 min:41.133913 max:46.049186
time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv20 -rc constqp -qp 22 NVENC_QP22_No_SAQ.mp4
SSIM Y:0.965529 (14.625456) U:0.960652 (14.050786) V:0.969770 (15.195576) All:0.965317 (14.598828)
PSNR y:37.583990 u:41.790823 v:43.870674 average:40.274191 min:39.791486 max:45.284033
time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv20 -rc constqp -qp 22 -spatial_aq 1 NVENC_QP22_SAQ.mp4
SSIM Y:0.964409 (14.486618) U:0.952386 (13.222615) V:0.965495 (14.621216) All:0.960763 (14.063083)
PSNR y:36.265087 u:40.748199 v:43.107499 average:39.096446 min:38.493043 max:44.369969
time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv20 -rc constqp -qp 22 -spatial_aq 1 -aq-strength 15 NVENC_QP22_SAQ_MAX.mp4
SSIM Y:0.965739 (14.651970) U:0.968040 (14.953925) V:0.974469 (15.929314) All:0.969416 (15.145046)
PSNR y:38.853058 u:42.898605 v:44.762847 average:41.448340 min:41.126400 max:46.057169
time ffmpeg -vsync 0 -hwaccel vdpau -i source1.y4m -c:v hevc_nvenc -pix_fmt nv20 -rc constqp -qp 22 -spatial_aq 1 -aq-strength 1 NVENC_QP22_SAQ_MIN.mp4
You will note that PSNR went down, as NVIDIA in it's documentation explicitly says it will, also note that nvenc hevc always has SAO turned on, there is not way to turn it off, SAO only works with Pascal cards and SDK 8.0 and above.
I will shortly be finishing the monster test clip I am working on and posting encoding samples, as well as some SAQ tests with nvenc h264, but in the meantime enjoy comparing the clips I attached.
The is supposed to be : x without a space between them.
FFmpeg just added hardware VP9 encoding via va-api on Kaby Lake processors:
If there are any Kaby Lake Linux users here that could do some test encodes, that would be great.
Yes many of those things were discussed in other threads,
(also the windows variant is just -vsync 0 -c:v h264_cuvid for cuvid , adding the -hwaccel cuvid seems to mess things up when looking at cpu-z / gpu-z . And pure decoding to rawvideo/null as a speed test along with cpu-z/gpu-zyou can verify that it's working)
And looking at it quickly, the -vsync 0 seems to fix the dropped/duplicate frames on some quick tests with cuvid . However , CFR input becomes VFR output. Is there a switch or something that I'm missing? (you can force-cfr with libx264 x264opts if using that to encode, but it doesn't work here). VFR potentially causes a lot of problems in editors, even if it's "minimally" variable VFR . It seems to be a muxing/ timecodes issue, because the frames were correct on a few tests so far (without -vsync there were frame count and duplicate/dropped problems)
The DXVA2 method keeps CFR if input is CFR, and seems more reliable so far in my testing (at least on windows)
I'm still looking into it, but not quite ready to use it for semi-important things yet. The avs indexed method using dgnv , however is 100% stable.
And what is the other "bug rearing the ugly head" that you're referring to?
So a few more longer tests and I've come across videos where both dxva2 and cuvid fail in terms of accuracy with or without vsync. Either dropped frames, duplicate frames . Unusable for anything of importance IMO .
Or maybe this is the other bug you were referring to ?
The issues didn't show up on the quicky short tests today, but a few longer ones (but not all) demonstrate problems. The -vsync 0 with cuvid always has the VFR problem, but that "feels" like a timecodes issue, which is minor on the grand scheme of things compared to dropped or duplicated frames
It's not necessarily a "GPU problem" per se, because the avisynth DGNV works on the ones the failed. But that's indexed and requires avisynth or vapoursynth; in many ways thats a lot more "clunky" to work with.
I'm trying to narrow down the problem, or what's causing some to fail but not others. I've actually reported these issues with GPU decoding in the past, but thought maybe they got them fixed
What was the "other bug" you were referring to ? I don't want to be doubling up or investigating on work already done
ffmpeg 2.8 that would result in ffmpeg not draining all the frames from the hardware buffers, thus the output stream would be missing the last few frames. This seemed to happen at random and seemed to happen more often with Intel's QS than NVENC.
With regard to the specific issue you're having, I have a theory. According the the ffmpeg docs:
Video sync method. For compatibility reasons old values can be specified as numbers. Newly added values will have to be specified as strings always.
0, passthrough - Each frame is passed with its timestamp from the demuxer to the muxer.
1, cfr - Frames will be duplicated and dropped to achieve exactly the requested constant frame rate.
2, vfr - Frames are passed through with their timestamp or dropped so as to prevent 2 frames from having the same timestamp.
drop - As passthrough but destroys all timestamps, making the muxer generate fresh timestamps based on frame-rate.
-1, auto - Chooses between 1 and 2 depending on muxer capabilities. This is the default method.
I think you should give -vsync drop a try and see what happens.
Are you talking about the vfr output for cuvid when using vsync? or the other issue with drops or duplicates ? eitherway, -vsync drop doesn't work
dxva2 definitely seems less flaky, but it too has errors on some . I just need to run a bunch more tests and figure out why or what situations
/sorry for the thread hijack, this sidetrack is not really "pascal NVENC"
Here are 3 more test encodes, it's a 4k mosaic done using four 4k60 NetFlix test files from the link I posted earlier, here are the command lines:
ffmpeg -vsync 1 -i Netflix_BarScene_4096x2160_60fps_10bit_420.y4m -i Netflix_Aerial_4096x2160_60fps_10bit_420.y4m -i Netflix_DinnerScene_4096x2160_60fps_10bit_420.y4m -i Netflix_DrivingPOV_4096x2160_60fps_10bit_420.y4m -filter_complex "nullsrc=size=4096x2160 [base]; [0:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperleft]; [1:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperright]; [2:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerleft]; [3:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerright]; [base][upperleft] overlay=shortest=1 [tmp1]; [tmp1][upperright] overlay=shortest=1=2048 [tmp2]; [tmp2][lowerleft] overlay=shortest=1:y=1080 [tmp3]; [tmp3][lowerright] overlay=shortest=1=2048:y=1080" -c:v hevc_nvenc -preset slow -r 60 -b:v 45.8M hevc_hevc.mp4
time ffmpeg -vsync 1 -i Netflix_BarScene_4096x2160_60fps_10bit_420.y4m -i Netflix_Aerial_4096x2160_60fps_10bit_420.y4m -i Netflix_DinnerScene_4096x2160_60fps_10bit_420.y4m -i Netflix_DrivingPOV_4096x2160_60fps_10bit_420.y4m -filter_complex "nullsrc=size=4096x2160 [base]; [0:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperleft]; [1:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperright]; [2:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerleft]; [3:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerright]; [base][upperleft] overlay=shortest=1 [tmp1]; [tmp1][upperright] overlay=shortest=1=2048 [tmp2]; [tmp2][lowerleft] overlay=shortest=1:y=1080 [tmp3]; [tmp3][lowerright] overlay=shortest=1=2048:y=1080" -c:v h264_nvenc -preset slow -r 60 -b:v 45.8M h264_hevc.mp4
time ffmpeg -vsync 1 -i Netflix_BarScene_4096x2160_60fps_10bit_420.y4m -i Netflix_Aerial_4096x2160_60fps_10bit_420.y4m -i Netflix_DinnerScene_4096x2160_60fps_10bit_420.y4m -i Netflix_DrivingPOV_4096x2160_60fps_10bit_420.y4m -filter_complex "nullsrc=size=4096x2160 [base]; [0:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperleft]; [1:v] setpts=PTS-STARTPTS, scale=2048x1080 [upperright]; [2:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerleft]; [3:v] setpts=PTS-STARTPTS, scale=2048x1080 [lowerright]; [base][upperleft] overlay=shortest=1 [tmp1]; [tmp1][upperright] overlay=shortest=1=2048 [tmp2]; [tmp2][lowerleft] overlay=shortest=1:y=1080 [tmp3]; [tmp3][lowerright] overlay=shortest=1=2048:y=1080" -c:v libx264 -preset medium -r 60 -crf 18 x264.mp4
Note there are no SSIM or PSNR measurements done this time I did not create an intermediate master, if there is demand I'll redo the tests by first creating a master against which to test the encodes.
By the way, if anyone is interested you can easily encrypt any file on *.nix from the command line like this:
gpg -c file
To decrypt type gpg file. Thought someone may find this useful.
[just a follow up to the gpu decoding issues]
The dupe/drop that affects some files is a rare case, it's related to certain types of open-gop. I wouldn't worry about it for the most part (but it still decodes correctly in avisynth/vapoursynth dgnv(GPU) or software)
The cuvid VFR issue with -vsync 0 is a container timebase issue, it doesn't occur with mkv output, only mp4. I tried some of the movflags , and forcing the timebase and timescale but couldn't get it to work with mp4 container. There are other ways to remux , rewrite timecodes, and make it CFR in other programs. VFR can cause problems with some editors, some HW players
cuvid with nvidia seems the fastest on windows, significantly faster than dxva2 for pure decoding speed and it translates to slightly faster encoding speed. -vsync 0 fixes the dropped/dupe frames on most sources, but at the expense of some VFR timecode weirdness in mp4 container. MKV is read as CFR, but MKV isn't supported in "professional" programs like editors. Plain remuxing to mp4 doesn't fix it, you need to re-write the timecodes
dxva2 is windows only, slower than cuvid, but still seems more consistent. It doesn't have the container timebase issues with mp4.
How much GPU decoding can benefit libx264(cpu) encoding speed depends on many factors, including resolution, settings used, other bottlenecks etc... it can be maybe -2% to +10% faster. But you generally benefit more the higher the resolution as CPU cycles "wasted" on decoding can be allocated to encoding. cuvid is consistently faster than dxva2 and that translates to encoding speed in general . e.g. if dxva2 was +3% faster in a given set of settings and scenarios, cuvid would be maybe +5% on average
ok , back to the regular programming, carry on with pascal nvenc
It's actually very easy to do, simply tune your analysis and encoding algorithms for scenes that contain certain characteristics. Here's the thing, I have done tons of test encodes with both clips and full movies and the one thing that strikes me as odd is that both x264 and x265 seem to excel in a handful of commonly used test clips, like that crowd run test clip, particularly in the trees yet that advantage does not hold up when you use more realistic sources. If you try test encodes with Elephant's Dream, there is one part, where the guy is running on a bridge, where every encodes falls apart, except for x265, yet in the rest of the movie all encoders are capable of producing great quality. In sources like Sintel, everything produces great encodes.
Honestly, it doesn't make a difference because no one that knows what they are doing is going to encode a movie with uniform settings, what they'll do is use segmented encoding and up the bit rate in parts that need it in order to ensure a great encode.
Getting back to the topic at hand, I created one more master, I used ShotCut and imported NetFlix Tango, NetFlix Ritual Dance, NetFlix Food Market 2, NetFlix Boxing Practice and applied a glow and text to the first, vignette to the second, a projector effect to the third and Technicolor to the fourth, I combined all 4 in the time line and exported a 4096x2160p60 lossless huffyuv:
time ffmpeg -vsync 0 -hwaccel vdpau -i NetFlix_Master.mkv -c:v libx264 -pix_fmt yuv420p -preset medium -crf 18 NetFlix_x264.mp4
time ffmpeg -vsync 0 -hwaccel vdpau -i NetFlix_Master.mkv -c:v h264_nvenc -pix_fmt nv12 -preset slow -b:v 73.9M NetFlix_h264_nvenc.mp4
time ffmpeg -vsync 0 -hwaccel vdpau -i NetFlix_Master.mkv -c:v hevc_nvenc -pix_fmt nv12 -preset slow -b:v 73.9M NetFlix_hevc_nvenc.mp4
SSIM Y:0.892836 (9.699525) U:0.966144 (14.703686) V:0.963074 (14.326704) All:0.916761 (10.796715)
PSNR y:25.963650 u:39.848219 v:38.448001 average:27.620164 min:12.843680 max:48.714651
ffmpeg -i NetFlix_x264.mp4 -i NetFlix_Master.mkv -lavfi "ssim;[0:v][1:v]psnr" -f null -
SSIM Y:0.886554 (9.452111) U:0.960028 (13.982463) V:0.956801 (13.645301) All:0.910508 (10.482142)
PSNR y:25.948563 u:39.425053 v:38.133323 average:27.596543 min:12.852938 max:49.057867
ffmpeg -i NetFlix_h264_nvenc.mp4 -i NetFlix_Master.mkv -lavfi "ssim;[0:v][1:v]psnr" -f null -
SSIM Y:0.888461 (9.525718) U:0.961090 (14.099337) V:0.957759 (13.742686) All:0.912115 (10.560865)
PSNR y:25.962801 u:39.439475 v:38.108577 average:27.610207 min:12.847422 max:49.697014
ffmpeg -i NetFlix_hevc_nvenc.mp4 -i NetFlix_Master.mkv -lavfi "ssim;[0:v][1:v]psnr" -f null -
I think it's pretty obvious that if x264+crf18+medium is good enough for you then Pascal NVENC with it's speed advantage is the better choice.
The new NVENC on the Turing cards is excellent. I tested it on Overwatch footage and was blown away. Thinking of doing it again on something like Apex, or IRL footage, ideas?
That is an impressive amount of work in testing and putting together that website and the way to present your findings.
Your test source was a poor choice. I would like to see the same type of testing done with the various NetFlix sources available for testing:
Especially the Meridian test file, which NetFlix designed specifically to test various codecs.
I would also not use OBSm because of the limitations that you outline, such as certain presets not being available,
I would use something that allows you to access more of the encoders options, I believe Staxrip on Windows offers an abundance of encoder options, or just use FFMPEG directly from the command line.
Also, Kaby Lake and later Quick Sync supports hardware VP9 encoding, it's only available via vaapi, which itself is only available on Linux. I have tested it using an i3 7100 and the hardware VP9 encoder is pretty good, as is the hardware deinterlace filter Quick Sync has.
The main problem with Turing, as I see it, is the pricing model NVIDIA has embraced, it used to be that NVENC offered a huge price advantage, you could buy a cheap GTX1050 2GB model for about $120 and get a video card that didn't need an auxiliary power connection and it was capable of encoding 1080p h265 at close to 300fps.
The cheapest RTX that is a true RTX, not a gimped RTX wannabe (the GTX1660) costs about $350 and it needs an auxiliary power connection. When you add in the fact that AMD, with its aggressive pricing model, and the soon to be released Ryzen replacements, not to mention that Intel is set to release it's own graphics card in 2020, and I suspect it may have a hardware AV1 encoder*, I see no reason to spend good money on a Turing based graphics card.
*I base this one the work Intel has done on the open source SVT encoders, including it's HEVC, VP9 and AV! encoder:
Keep an eye on these Intel open source encoders, they will overtake all the current popular open source encoders, such as libvp9, x264 and x265 in both quality and speed very shortly, if they haven't already.
And I think they are laying the ground work for Intel GPU powered encoders in future Intel dGPU and iGPU products.
I was thinking of trying either something like that Meridian clip, or the big buck bunny one for my next test. Do you think people will be interested? It takes me quite a while to do them especially if I include AV1, and most of the people I see asking for help are just interested in streaming games.
I have heard a lot of promises about the iGPU coming with Ice Lake. A HEVC revamp, as well as discreet VP9 encoding instead of just hardware accelerated. I would greatly appreciate it if they made a media SDK for Windows like they did for Linux, they don't have all the same stuff
it takes so frigging long to make files with libaom-av1users currently on my ignore list: deadrats, Stears555
I did come across rav1e at some point in my searches, but at the time (only about 2 months ago) it indicated that it wasn't complete and the result wasn't guaranteed to conform to the bitstream spec. I got the impression that's what they're aiming to do in the end, but during development they're concentrating on other things instead. Which is totally fine, probably helps in fact. But for this reason I chose not to use it this time. Maybe next time?