nvenc encoding on nvidia 2060 Super vs 2070 Super

17th Jul 2019 04:22 #1
hydra3333

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2009

Location
Australia
Not sure which forum to post in, so here goes.

I am contemplating a new PC build and hence a new video card - 2060 Super or a 2070 Super. Mainly interested in video decoding and transcoding, eg nvencc or ffmpeg.

Is there any difference in the NVENC encoding chips and performance between the 2060 Super vs 2070 Super ?

Did a goggle and didn't readily spot anything useful, results were mostly reviews (i.e marketing ads called reviews) aimed at "gamers".

Advice and links welcomed.

Thanks.

Quote
17th Jul 2019 07:28 #2
sophisticles

View Profile

View Forum Posts
Banned

Join Date
Jul 2014
Originally Posted by hydra3333

Not sure which forum to post in, so here goes.

I am contemplating a new PC build and hence a new video card - 2060 Super or a 2070 Super. Mainly interested in video decoding and transcoding, eg nvencc or ffmpeg.

Is there any difference in the NVENC encoding chips and performance between the 2060 Super vs 2070 Super ?

Did a goggle and didn't readily spot anything useful, results were mostly reviews (i.e marketing ads called reviews) aimed at "gamers".

Advice and links welcomed.

Thanks.

The answer is a bit nuanced NVENC refers to 2 things, NVIDIA's hardware encoding chip and the software found in the SDK used to access it. NVIDIA's documentation makes it clear that the encoding chip only handles part of the encoding process, part of the encoding process is handled by the GPU cores. The documentation further states, and I have confirmed through experimentation if the encoding is done using RGB instead of NV12 then the entire encoding process is done using the GPU cores.

Because of this, in theory, and especially with higher resolution encodes, the faster card with more GP cores and more VRAM should encode faster. This would becomes even more pronounced if one were using GPU powered filters.

Honestly, at the prices NVIDIA is charging for these new RTX cards, I find it hard to justify their purchase. It would be interesting to see what AMD's NAVI offers in terms of encoding quality and speed.

Quote
17th Jul 2019 08:55 #3
hydra3333

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2009

Location
Australia
Thank you. Ah, I see, nuanced it is.
I have some scripts which can use use DG's lovely software and CUDA and ncencc in various ways, which are nividia dependent.
If the encoder chips in the 2060S and the 2070S are functionally the same and it is cores/vram and thus speed which is the difference, then I may opt for the 2060 Super for budget reasons and live with a few % slower encode time.

Originally Posted by sophisticles

I have confirmed through experimentation if the encoding is done using RGB instead of NV12 then the entire encoding process is done using the GPU cores.

Because of this, in theory, and especially with higher resolution encodes, the faster card with more GP cores and more VRAM should encode faster. This would becomes even more pronounced if one were using GPU powered filters.

Honestly, at the prices NVIDIA is charging for these new RTX cards, I find it hard to justify their purchase. It would be interesting to see what AMD's NAVI offers in terms of encoding quality and speed.

OK, good oh, I'll look into the potential for RGB instead of NV12 in my workflows.

I should check, these cards do HDR10 type encoding (10 bit) - I expect so as the matrix in here (although out of date) https://developer.nvidia.com/video-encode-decode-gpu-support-matrix suggests 10 and 12 bit decode but doesn't clearly mention it in encode.
Yes to the noting of not so pleasing pricing however my current PC is old and crashing daily (h/w issue) so "best I can afford" currently applies.

Quote

19th Nov 2019 03:51 #4

Member

Bought an RTX2060 Super. 10bit depth only with h.265.

Code:

NVEncC64.exe --check-features
NVEncC (x64) 4.55 (r1224) by rigaya, Nov  5 2019 15:40:51 (VC 1916/Win/avx2)
  [NVENC API v9.1, CUDA 10.1]
 reader: raw, avi, avs, vpy, avhw [H.264/AVC, H.265/HEVC, MPEG2, VP8, VP9, VC-1, MPEG1, MPEG4]

Environment Info
OS : Windows 10 x64 (18363)
CPU: AMD Ryzen 9 3900X 12-Core Processor (12C/24T)
RAM: Used 6833 MB, Total 32681 MB
GPU: #0: GeForce RTX 2060 SUPER (2176 cores, 1665 MHz)[PCIe3x16][441.20]

List of available features.
Codec: H.264/AVC
Max Bframes               4
B Ref Mode                yes
RC Modes                  63
Field Encoding            no
MonoChrome                no
FMO                       no
Quater-Pel MV             yes
B Direct Mode             yes
CABAC                     yes
Adaptive Transform        yes
Max Temporal Layers       0
Hierarchial P Frames      no
Hierarchial B Frames      no
Max Level                 51
Min Level                 1
4:4:4                     yes
Min Width                 145
Max Width                 4096
Min Height                49
Max Height                4096
Multiple Refs             yes
Max LTR Frames            8
Dynamic Resolution Change yes
Dynamic Bitrate Change    yes
Forced constant QP        yes
Dynamic RC Mode Change    no
Subframe Readback         yes
Constrained Encoding      yes
Intra Refresh             yes
Custom VBV Bufsize        yes
Dynamic Slice Mode        yes
Ref Pic Invalidiation     yes
PreProcess                no
Async Encoding            yes
Max MBs                   65536
Lossless                  yes
SAO                       no
Me Only Mode              yes
Lookahead                 yes
AQ (temporal)             yes
Weighted Prediction       yes
10bit depth               no

Codec: H.265/HEVC
Max Bframes               5
B Ref Mode                yes
RC Modes                  63
Field Encoding            no
MonoChrome                no
Quater-Pel MV             yes
B Direct Mode             no
Max Temporal Layers       0
Hierarchial P Frames      no
Hierarchial B Frames      no
Max Level                 62
Min Level                 1
4:4:4                     yes
Min Width                 129
Max Width                 8192
Min Height                33
Max Height                8192
Multiple Refs             yes
Max LTR Frames            7
Dynamic Resolution Change yes
Dynamic Bitrate Change    yes
Forced constant QP        yes
Dynamic RC Mode Change    no
Subframe Readback         yes
Constrained Encoding      no
Intra Refresh             yes
Custom VBV Bufsize        yes
Dynamic Slice Mode        yes
Ref Pic Invalidiation     yes
PreProcess                no
Async Encoding            yes
Max MBs                   262144
Lossless                  yes
SAO                       yes
Me Only Mode              yes
Lookahead                 yes
AQ (temporal)             yes
Weighted Prediction       yes
10bit depth               yes

Quote

19th Nov 2019 03:57 #5
hydra3333

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2009

Location
Australia
Interestingly, on an RTX 2070, notice the hardware h.265 encoding rate in fps and the resulting filesize, for an 8k source :

https://github.com/rigaya/NVEnc/issues/127#issuecomment-499110640

Code:

encoded 995 frames, 0.50 fps, 46363.95 kbps, 229.37 MB
Quote
19th Nov 2019 07:53 #6
sophisticles

View Profile

View Forum Posts
Banned

Join Date
Jul 2014
That is a crazy system, I am jealous!!! I will say this, with a 3900X I would probably be sticking with a software based encoding solution as it will give you much greater flexibility with all the available settings. I would use that GTX for gpu accelerated filtering, in my experience the biggest bottlenecks tend to be various filters, for instance LUTs.

Quote
19th Nov 2019 12:09 #7
poisondeathray

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2007

Location
Canada
These processes use CUDA as per the docs

I don't know what the speed "penalty" is, but they suggest it's "minimal"

Although the core video encoder hardware on GPU is completely independent of CUDA
cores or the graphics engine on the GPU, the following encoder features internally use
CUDA for hardware acceleration.

 Note: The impact of enabling these features on overall CUDA or graphics
performance is minimal, and this list is provided purely for information purposes.

 Two-pass rate control modes for high quality presets
 Look-ahead
 All adaptive quantization modes
 Weighted prediction
 Encoding of RGB contents

Originally Posted by sophisticles

in my experience the biggest bottlenecks tend to be various filters, for instance LUTs.

Some LUT filters are single threaded or poorly multithreaded (e.g. ffmpeg -vf lut3d) . If you use the vapoursynth or avisynth version to apply the LUT will be much faster

Quote
19th Nov 2019 13:15 #8
sophisticles

View Profile

View Forum Posts
Banned

Join Date
Jul 2014
Originally Posted by poisondeathray

Some LUT filters are single threaded or poorly multithreaded (e.g. ffmpeg -vf lut3d) . If you use the vapoursynth or avisynth version to apply the LUT will be much faster

Unfortunately neither of these work on Linux without WINE and in my experience they do not work all that well in that configuration either.

On Windows, from what I remember, Vegas had a bunch of GPU accelerated filters and LUTs were also accelerated.

Quote
19th Nov 2019 13:18 #9
poisondeathray

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2007

Location
Canada
Originally Posted by sophisticles

Originally Posted by poisondeathray

Some LUT filters are single threaded or poorly multithreaded (e.g. ffmpeg -vf lut3d) . If you use the vapoursynth or avisynth version to apply the LUT will be much faster

Unfortunately neither of these work on Linux without WINE and in my experience they do not work all that well in that configuration either.

vapoursynth can run natively on linux without wine

http://vapoursynth.com/doc/installation.html#linux-and-os-x-compilation-instructions

Quote
19th Nov 2019 18:38 #10
hydra3333

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2009

Location
Australia
Donald Graft did a bunch of testing on GPU processing limitations in regard to developing "cudasynth", commentary is available over on his forum.

IIRC, the memory transfers to/from the GPU were a major performance constraining factor, especially when multiple GPU filters were in play and data was traversing to/from the GPU in between each.

For my system, "OK today, too slow in 2 years" very likely applies Only a 2060 because I don't game and it's all I could afford anyway.

I was amazed by only 1/2 of an fps for 8k encoding by rigaya using an 2070, though. Unusable, really.

Quote
20th Nov 2019 07:07 #11
sophisticles

View Profile

View Forum Posts
Banned

Join Date
Jul 2014
Originally Posted by hydra3333

Donald Graft did a bunch of testing on GPU processing limitations in regard to developing "cudasynth", commentary is available over on his forum.

IIRC, the memory transfers to/from the GPU were a major performance constraining factor, especially when multiple GPU filters were in play and data was traversing to/from the GPU in between each.

This is very outdated information, when CUDA first came out it was in fact the case that data needed to be uploaded to VRAM from system RAM and then downloaded back to system RAM after it was worked on and this did create a bottleneck that negated some of the performance benefit of GPU acceleration. This has not been the case for a while, starting with Maxwell NVIDIA added the ability for their gpu's to access system RAM much like AMD's gpu have been able to do for a while via their UMA architecture.

Quote
20th Nov 2019 19:16 #12
hydra3333

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2009

Location
Australia
OK. Well, I just re-looked and
that thread is from mid to late 2018, an example post of which is http://rationalqm.us/board/viewtopic.php?f=14&t=671&start=20#p8958 (250% speed boost).

also in Aug 2019, "here are the test results showing a very healthy FPS improvement of x3.6" http://rationalqm.us/board/viewtopic.php?f=14&t=671&start=110#p9819

Hence I was wondering re "This has not been the case for a while, starting with Maxwell NVIDIA added the ability for their gpu's to access system RAM" where the difference is ?

Ah, oops, on re-reading the first post over there which says

Looking at available frameworks for CUDA/NVDec with Avisynth/Vapoursynth, we do not see proper pipelines running on the GPU that eliminate unnecessary PCIe frame transfers and copies into Avisynth/Vapoursynth buffers.

I did not characterise it correctly, it instead seems to do with the speed/efficiency of copying frames to/from the GPU and a range of individual software-product/filter buffers in RAM (which may be "historical" software filters).

I assume that's different to what you were saying.
Quote
20th Nov 2019 20:29 #13
veresov

View Profile

View Forum Posts
Banned

Join Date
Apr 2018
The point is that for CUDA Avisynth(+) filters in a chain, the frame would normally have to be returned to the CPU to deliver it to Avisynth to pass it to the next filter. CUDASynth eliminates these transfers while still returning control to Avisynth after each filter. Only the last filter in the chain returns the final frame to the CPU. The performance gains are substantial compared to non-CUDASynth operation. Note that it is not implemented with any changes to Avisynth(+) itself; instead, each filter must be CUDASynth aware.

Quote

nvenc encoding on nvidia 2060 Super vs 2070 Super

Thread Tools

Search Thread

Similar Threads

Anyone know a super super fast free mkv to mp4 ripper (non re-encoding?

Xmedia Recode and support for NVidia NVENC H.265

ffmpeg nvidia-gpu-accelerated encoding using NVENC - commandline settings

Video convertor that uses Nvidia Nvenc

Nvidia NVENC HEVC is better than x264