Not sure which forum to post in, so here goes.
I am contemplating a new PC build and hence a new video card - 2060 Super or a 2070 Super. Mainly interested in video decoding and transcoding, eg nvencc or ffmpeg.
Is there any difference in the NVENC encoding chips and performance between the 2060 Super vs 2070 Super ?
Did a goggle and didn't readily spot anything useful, results were mostly reviews (i.e marketing ads called reviews) aimed at "gamers".
Advice and links welcomed.
+ Reply to Thread
Results 1 to 13 of 13
Because of this, in theory, and especially with higher resolution encodes, the faster card with more GP cores and more VRAM should encode faster. This would becomes even more pronounced if one were using GPU powered filters.
Honestly, at the prices NVIDIA is charging for these new RTX cards, I find it hard to justify their purchase. It would be interesting to see what AMD's NAVI offers in terms of encoding quality and speed.
Thank you. Ah, I see, nuanced it is.
I have some scripts which can use use DG's lovely software and CUDA and ncencc in various ways, which are nividia dependent.
If the encoder chips in the 2060S and the 2070S are functionally the same and it is cores/vram and thus speed which is the difference, then I may opt for the 2060 Super for budget reasons and live with a few % slower encode time.
I should check, these cards do HDR10 type encoding (10 bit) - I expect so as the matrix in here (although out of date) https://developer.nvidia.com/video-encode-decode-gpu-support-matrix suggests 10 and 12 bit decode but doesn't clearly mention it in encode.
Yes to the noting of not so pleasing pricing however my current PC is old and crashing daily (h/w issue) so "best I can afford" currently applies.
Bought an RTX2060 Super. 10bit depth only with h.265.
NVEncC64.exe --check-features NVEncC (x64) 4.55 (r1224) by rigaya, Nov 5 2019 15:40:51 (VC 1916/Win/avx2) [NVENC API v9.1, CUDA 10.1] reader: raw, avi, avs, vpy, avhw [H.264/AVC, H.265/HEVC, MPEG2, VP8, VP9, VC-1, MPEG1, MPEG4] Environment Info OS : Windows 10 x64 (18363) CPU: AMD Ryzen 9 3900X 12-Core Processor (12C/24T) RAM: Used 6833 MB, Total 32681 MB GPU: #0: GeForce RTX 2060 SUPER (2176 cores, 1665 MHz)[PCIe3x16][441.20] List of available features. Codec: H.264/AVC Max Bframes 4 B Ref Mode yes RC Modes 63 Field Encoding no MonoChrome no FMO no Quater-Pel MV yes B Direct Mode yes CABAC yes Adaptive Transform yes Max Temporal Layers 0 Hierarchial P Frames no Hierarchial B Frames no Max Level 51 Min Level 1 4:4:4 yes Min Width 145 Max Width 4096 Min Height 49 Max Height 4096 Multiple Refs yes Max LTR Frames 8 Dynamic Resolution Change yes Dynamic Bitrate Change yes Forced constant QP yes Dynamic RC Mode Change no Subframe Readback yes Constrained Encoding yes Intra Refresh yes Custom VBV Bufsize yes Dynamic Slice Mode yes Ref Pic Invalidiation yes PreProcess no Async Encoding yes Max MBs 65536 Lossless yes SAO no Me Only Mode yes Lookahead yes AQ (temporal) yes Weighted Prediction yes 10bit depth no Codec: H.265/HEVC Max Bframes 5 B Ref Mode yes RC Modes 63 Field Encoding no MonoChrome no Quater-Pel MV yes B Direct Mode no Max Temporal Layers 0 Hierarchial P Frames no Hierarchial B Frames no Max Level 62 Min Level 1 4:4:4 yes Min Width 129 Max Width 8192 Min Height 33 Max Height 8192 Multiple Refs yes Max LTR Frames 7 Dynamic Resolution Change yes Dynamic Bitrate Change yes Forced constant QP yes Dynamic RC Mode Change no Subframe Readback yes Constrained Encoding no Intra Refresh yes Custom VBV Bufsize yes Dynamic Slice Mode yes Ref Pic Invalidiation yes PreProcess no Async Encoding yes Max MBs 262144 Lossless yes SAO yes Me Only Mode yes Lookahead yes AQ (temporal) yes Weighted Prediction yes 10bit depth yes
Interestingly, on an RTX 2070, notice the hardware h.265 encoding rate in fps and the resulting filesize, for an 8k source :
encoded 995 frames, 0.50 fps, 46363.95 kbps, 229.37 MB
That is a crazy system, I am jealous!!! I will say this, with a 3900X I would probably be sticking with a software based encoding solution as it will give you much greater flexibility with all the available settings. I would use that GTX for gpu accelerated filtering, in my experience the biggest bottlenecks tend to be various filters, for instance LUTs.
These processes use CUDA as per the docs
I don't know what the speed "penalty" is, but they suggest it's "minimal"
Although the core video encoder hardware on GPU is completely independent of CUDA
cores or the graphics engine on the GPU, the following encoder features internally use
CUDA for hardware acceleration.
Note: The impact of enabling these features on overall CUDA or graphics
performance is minimal, and this list is provided purely for information purposes.
Two-pass rate control modes for high quality presets
All adaptive quantization modes
Encoding of RGB contents
ffmpeg -vf lut3d) . If you use the vapoursynth or avisynth version to apply the LUT will be much faster
On Windows, from what I remember, Vegas had a bunch of GPU accelerated filters and LUTs were also accelerated.
Donald Graft did a bunch of testing on GPU processing limitations in regard to developing "cudasynth", commentary is available over on his forum.
IIRC, the memory transfers to/from the GPU were a major performance constraining factor, especially when multiple GPU filters were in play and data was traversing to/from the GPU in between each.
For my system, "OK today, too slow in 2 years" very likely applies Only a 2060 because I don't game and it's all I could afford anyway.
I was amazed by only 1/2 of an fps for 8k encoding by rigaya using an 2070, though. Unusable, really.
OK. Well, I just re-looked and
- that thread is from mid to late 2018, an example post of which is http://rationalqm.us/board/viewtopic.php?f=14&t=671&start=20#p8958 (250% speed boost).
- also in Aug 2019, "here are the test results showing a very healthy FPS improvement of x3.6" http://rationalqm.us/board/viewtopic.php?f=14&t=671&start=110#p9819
Hence I was wondering re "This has not been the case for a while, starting with Maxwell NVIDIA added the ability for their gpu's to access system RAM" where the difference is ?
Ah, oops, on re-reading the first post over there which saysLooking at available frameworks for CUDA/NVDec with Avisynth/Vapoursynth, we do not see proper pipelines running on the GPU that eliminate unnecessary PCIe frame transfers and copies into Avisynth/Vapoursynth buffers.
I assume that's different to what you were saying.
The point is that for CUDA Avisynth(+) filters in a chain, the frame would normally have to be returned to the CPU to deliver it to Avisynth to pass it to the next filter. CUDASynth eliminates these transfers while still returning control to Avisynth after each filter. Only the last filter in the chain returns the final frame to the CPU. The performance gains are substantial compared to non-CUDASynth operation. Note that it is not implemented with any changes to Avisynth(+) itself; instead, each filter must be CUDASynth aware.