Not sure which forum to post in, so here goes.
I am contemplating a new PC build and hence a new video card - 2060 Super or a 2070 Super. Mainly interested in video decoding and transcoding, eg nvencc or ffmpeg.
Is there any difference in the NVENC encoding chips and performance between the 2060 Super vs 2070 Super ?
Did a goggle and didn't readily spot anything useful, results were mostly reviews (i.e marketing ads called reviews) aimed at "gamers".
Advice and links welcomed.
Thanks.
+ Reply to Thread
Results 1 to 13 of 13
-
-
The answer is a bit nuanced NVENC refers to 2 things, NVIDIA's hardware encoding chip and the software found in the SDK used to access it. NVIDIA's documentation makes it clear that the encoding chip only handles part of the encoding process, part of the encoding process is handled by the GPU cores. The documentation further states, and I have confirmed through experimentation if the encoding is done using RGB instead of NV12 then the entire encoding process is done using the GPU cores.
Because of this, in theory, and especially with higher resolution encodes, the faster card with more GP cores and more VRAM should encode faster. This would becomes even more pronounced if one were using GPU powered filters.
Honestly, at the prices NVIDIA is charging for these new RTX cards, I find it hard to justify their purchase. It would be interesting to see what AMD's NAVI offers in terms of encoding quality and speed. -
Thank you. Ah, I see, nuanced it is.
I have some scripts which can use use DG's lovely software and CUDA and ncencc in various ways, which are nividia dependent.
If the encoder chips in the 2060S and the 2070S are functionally the same and it is cores/vram and thus speed which is the difference, then I may opt for the 2060 Super for budget reasons and live with a few % slower encode time.
OK, good oh, I'll look into the potential for RGB instead of NV12 in my workflows.
I should check, these cards do HDR10 type encoding (10 bit) - I expect so as the matrix in here (although out of date) https://developer.nvidia.com/video-encode-decode-gpu-support-matrix suggests 10 and 12 bit decode but doesn't clearly mention it in encode.
Yes to the noting of not so pleasing pricinghowever my current PC is old and crashing daily (h/w issue) so "best I can afford" currently applies.
-
Bought an RTX2060 Super. 10bit depth only with h.265.
Code:NVEncC64.exe --check-features NVEncC (x64) 4.55 (r1224) by rigaya, Nov 5 2019 15:40:51 (VC 1916/Win/avx2) [NVENC API v9.1, CUDA 10.1] reader: raw, avi, avs, vpy, avhw [H.264/AVC, H.265/HEVC, MPEG2, VP8, VP9, VC-1, MPEG1, MPEG4] Environment Info OS : Windows 10 x64 (18363) CPU: AMD Ryzen 9 3900X 12-Core Processor (12C/24T) RAM: Used 6833 MB, Total 32681 MB GPU: #0: GeForce RTX 2060 SUPER (2176 cores, 1665 MHz)[PCIe3x16][441.20] List of available features. Codec: H.264/AVC Max Bframes 4 B Ref Mode yes RC Modes 63 Field Encoding no MonoChrome no FMO no Quater-Pel MV yes B Direct Mode yes CABAC yes Adaptive Transform yes Max Temporal Layers 0 Hierarchial P Frames no Hierarchial B Frames no Max Level 51 Min Level 1 4:4:4 yes Min Width 145 Max Width 4096 Min Height 49 Max Height 4096 Multiple Refs yes Max LTR Frames 8 Dynamic Resolution Change yes Dynamic Bitrate Change yes Forced constant QP yes Dynamic RC Mode Change no Subframe Readback yes Constrained Encoding yes Intra Refresh yes Custom VBV Bufsize yes Dynamic Slice Mode yes Ref Pic Invalidiation yes PreProcess no Async Encoding yes Max MBs 65536 Lossless yes SAO no Me Only Mode yes Lookahead yes AQ (temporal) yes Weighted Prediction yes 10bit depth no Codec: H.265/HEVC Max Bframes 5 B Ref Mode yes RC Modes 63 Field Encoding no MonoChrome no Quater-Pel MV yes B Direct Mode no Max Temporal Layers 0 Hierarchial P Frames no Hierarchial B Frames no Max Level 62 Min Level 1 4:4:4 yes Min Width 129 Max Width 8192 Min Height 33 Max Height 8192 Multiple Refs yes Max LTR Frames 7 Dynamic Resolution Change yes Dynamic Bitrate Change yes Forced constant QP yes Dynamic RC Mode Change no Subframe Readback yes Constrained Encoding no Intra Refresh yes Custom VBV Bufsize yes Dynamic Slice Mode yes Ref Pic Invalidiation yes PreProcess no Async Encoding yes Max MBs 262144 Lossless yes SAO yes Me Only Mode yes Lookahead yes AQ (temporal) yes Weighted Prediction yes 10bit depth yes
-
Interestingly, on an RTX 2070, notice the hardware h.265 encoding rate in fps and the resulting filesize, for an 8k source :
https://github.com/rigaya/NVEnc/issues/127#issuecomment-499110640
Code:encoded 995 frames, 0.50 fps, 46363.95 kbps, 229.37 MB
-
That is a crazy system, I am jealous!!! I will say this, with a 3900X I would probably be sticking with a software based encoding solution as it will give you much greater flexibility with all the available settings. I would use that GTX for gpu accelerated filtering, in my experience the biggest bottlenecks tend to be various filters, for instance LUTs.
-
These processes use CUDA as per the docs
I don't know what the speed "penalty" is, but they suggest it's "minimal"
Although the core video encoder hardware on GPU is completely independent of CUDA
cores or the graphics engine on the GPU, the following encoder features internally use
CUDA for hardware acceleration.
Note: The impact of enabling these features on overall CUDA or graphics
performance is minimal, and this list is provided purely for information purposes.
Two-pass rate control modes for high quality presets
Look-ahead
All adaptive quantization modes
Weighted prediction
Encoding of RGB contents
Some LUT filters are single threaded or poorly multithreaded (e.g. ffmpeg -vf lut3d) . If you use the vapoursynth or avisynth version to apply the LUT will be much faster -
Unfortunately neither of these work on Linux without WINE and in my experience they do not work all that well in that configuration either.
On Windows, from what I remember, Vegas had a bunch of GPU accelerated filters and LUTs were also accelerated. -
vapoursynth can run natively on linux without wine
http://vapoursynth.com/doc/installation.html#linux-and-os-x-compilation-instructions -
Donald Graft did a bunch of testing on GPU processing limitations in regard to developing "cudasynth", commentary is available over on his forum.
IIRC, the memory transfers to/from the GPU were a major performance constraining factor, especially when multiple GPU filters were in play and data was traversing to/from the GPU in between each.
For my system, "OK today, too slow in 2 years" very likely appliesOnly a 2060 because I don't game and it's all I could afford anyway.
I was amazed by only 1/2 of an fps for 8k encoding by rigaya using an 2070, though. Unusable, really. -
This is very outdated information, when CUDA first came out it was in fact the case that data needed to be uploaded to VRAM from system RAM and then downloaded back to system RAM after it was worked on and this did create a bottleneck that negated some of the performance benefit of GPU acceleration. This has not been the case for a while, starting with Maxwell NVIDIA added the ability for their gpu's to access system RAM much like AMD's gpu have been able to do for a while via their UMA architecture.
-
OK. Well, I just re-looked and
- that thread is from mid to late 2018, an example post of which is http://rationalqm.us/board/viewtopic.php?f=14&t=671&start=20#p8958 (250% speed boost).
- also in Aug 2019, "here are the test results showing a very healthy FPS improvement of x3.6" http://rationalqm.us/board/viewtopic.php?f=14&t=671&start=110#p9819
Hence I was wondering re "This has not been the case for a while, starting with Maxwell NVIDIA added the ability for their gpu's to access system RAM" where the difference is ?
Ah, oops, on re-reading the first post over there which saysLooking at available frameworks for CUDA/NVDec with Avisynth/Vapoursynth, we do not see proper pipelines running on the GPU that eliminate unnecessary PCIe frame transfers and copies into Avisynth/Vapoursynth buffers.
I assume that's different to what you were saying. -
The point is that for CUDA Avisynth(+) filters in a chain, the frame would normally have to be returned to the CPU to deliver it to Avisynth to pass it to the next filter. CUDASynth eliminates these transfers while still returning control to Avisynth after each filter. Only the last filter in the chain returns the final frame to the CPU. The performance gains are substantial compared to non-CUDASynth operation. Note that it is not implemented with any changes to Avisynth(+) itself; instead, each filter must be CUDASynth aware.
Similar Threads
-
Anyone know a super super fast free mkv to mp4 ripper (non re-encoding?
By Goshite in forum Newbie / General discussionsReplies: 6Last Post: 8th Jul 2017, 13:33 -
Xmedia Recode and support for NVidia NVENC H.265
By Nick Payne in forum Video ConversionReplies: 3Last Post: 8th Feb 2017, 21:48 -
ffmpeg nvidia-gpu-accelerated encoding using NVENC - commandline settings
By hydra3333 in forum Video ConversionReplies: 3Last Post: 7th Sep 2016, 09:11 -
Video convertor that uses Nvidia Nvenc
By amirzubair in forum Video ConversionReplies: 7Last Post: 19th Jul 2015, 08:20 -
Nvidia NVENC HEVC is better than x264
By sophisticles in forum Video ConversionReplies: 10Last Post: 25th Mar 2015, 13:27