VideoHelp Forum




+ Reply to Thread
Results 1 to 13 of 13
  1. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Not sure which forum to post in, so here goes.

    I am contemplating a new PC build and hence a new video card - 2060 Super or a 2070 Super. Mainly interested in video decoding and transcoding, eg nvencc or ffmpeg.

    Is there any difference in the NVENC encoding chips and performance between the 2060 Super vs 2070 Super ?

    Did a goggle and didn't readily spot anything useful, results were mostly reviews (i.e marketing ads called reviews) aimed at "gamers".

    Advice and links welcomed.

    Thanks.
    Quote Quote  
  2. Originally Posted by hydra3333 View Post
    Not sure which forum to post in, so here goes.

    I am contemplating a new PC build and hence a new video card - 2060 Super or a 2070 Super. Mainly interested in video decoding and transcoding, eg nvencc or ffmpeg.

    Is there any difference in the NVENC encoding chips and performance between the 2060 Super vs 2070 Super ?

    Did a goggle and didn't readily spot anything useful, results were mostly reviews (i.e marketing ads called reviews) aimed at "gamers".

    Advice and links welcomed.

    Thanks.
    The answer is a bit nuanced NVENC refers to 2 things, NVIDIA's hardware encoding chip and the software found in the SDK used to access it. NVIDIA's documentation makes it clear that the encoding chip only handles part of the encoding process, part of the encoding process is handled by the GPU cores. The documentation further states, and I have confirmed through experimentation if the encoding is done using RGB instead of NV12 then the entire encoding process is done using the GPU cores.

    Because of this, in theory, and especially with higher resolution encodes, the faster card with more GP cores and more VRAM should encode faster. This would becomes even more pronounced if one were using GPU powered filters.

    Honestly, at the prices NVIDIA is charging for these new RTX cards, I find it hard to justify their purchase. It would be interesting to see what AMD's NAVI offers in terms of encoding quality and speed.
    Quote Quote  
  3. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Thank you. Ah, I see, nuanced it is.
    I have some scripts which can use use DG's lovely software and CUDA and ncencc in various ways, which are nividia dependent.
    If the encoder chips in the 2060S and the 2070S are functionally the same and it is cores/vram and thus speed which is the difference, then I may opt for the 2060 Super for budget reasons and live with a few % slower encode time.

    Originally Posted by sophisticles View Post
    I have confirmed through experimentation if the encoding is done using RGB instead of NV12 then the entire encoding process is done using the GPU cores.

    Because of this, in theory, and especially with higher resolution encodes, the faster card with more GP cores and more VRAM should encode faster. This would becomes even more pronounced if one were using GPU powered filters.

    Honestly, at the prices NVIDIA is charging for these new RTX cards, I find it hard to justify their purchase. It would be interesting to see what AMD's NAVI offers in terms of encoding quality and speed.
    OK, good oh, I'll look into the potential for RGB instead of NV12 in my workflows.

    I should check, these cards do HDR10 type encoding (10 bit) - I expect so as the matrix in here (although out of date) https://developer.nvidia.com/video-encode-decode-gpu-support-matrix suggests 10 and 12 bit decode but doesn't clearly mention it in encode.
    Yes to the noting of not so pleasing pricing however my current PC is old and crashing daily (h/w issue) so "best I can afford" currently applies.
    Quote Quote  
  4. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Bought an RTX2060 Super. 10bit depth only with h.265.

    Code:
    NVEncC64.exe --check-features
    NVEncC (x64) 4.55 (r1224) by rigaya, Nov  5 2019 15:40:51 (VC 1916/Win/avx2)
      [NVENC API v9.1, CUDA 10.1]
     reader: raw, avi, avs, vpy, avhw [H.264/AVC, H.265/HEVC, MPEG2, VP8, VP9, VC-1, MPEG1, MPEG4]
    
    Environment Info
    OS : Windows 10 x64 (18363)
    CPU: AMD Ryzen 9 3900X 12-Core Processor (12C/24T)
    RAM: Used 6833 MB, Total 32681 MB
    GPU: #0: GeForce RTX 2060 SUPER (2176 cores, 1665 MHz)[PCIe3x16][441.20]
    
    List of available features.
    Codec: H.264/AVC
    Max Bframes               4
    B Ref Mode                yes
    RC Modes                  63
    Field Encoding            no
    MonoChrome                no
    FMO                       no
    Quater-Pel MV             yes
    B Direct Mode             yes
    CABAC                     yes
    Adaptive Transform        yes
    Max Temporal Layers       0
    Hierarchial P Frames      no
    Hierarchial B Frames      no
    Max Level                 51
    Min Level                 1
    4:4:4                     yes
    Min Width                 145
    Max Width                 4096
    Min Height                49
    Max Height                4096
    Multiple Refs             yes
    Max LTR Frames            8
    Dynamic Resolution Change yes
    Dynamic Bitrate Change    yes
    Forced constant QP        yes
    Dynamic RC Mode Change    no
    Subframe Readback         yes
    Constrained Encoding      yes
    Intra Refresh             yes
    Custom VBV Bufsize        yes
    Dynamic Slice Mode        yes
    Ref Pic Invalidiation     yes
    PreProcess                no
    Async Encoding            yes
    Max MBs                   65536
    Lossless                  yes
    SAO                       no
    Me Only Mode              yes
    Lookahead                 yes
    AQ (temporal)             yes
    Weighted Prediction       yes
    10bit depth               no
    
    Codec: H.265/HEVC
    Max Bframes               5
    B Ref Mode                yes
    RC Modes                  63
    Field Encoding            no
    MonoChrome                no
    Quater-Pel MV             yes
    B Direct Mode             no
    Max Temporal Layers       0
    Hierarchial P Frames      no
    Hierarchial B Frames      no
    Max Level                 62
    Min Level                 1
    4:4:4                     yes
    Min Width                 129
    Max Width                 8192
    Min Height                33
    Max Height                8192
    Multiple Refs             yes
    Max LTR Frames            7
    Dynamic Resolution Change yes
    Dynamic Bitrate Change    yes
    Forced constant QP        yes
    Dynamic RC Mode Change    no
    Subframe Readback         yes
    Constrained Encoding      no
    Intra Refresh             yes
    Custom VBV Bufsize        yes
    Dynamic Slice Mode        yes
    Ref Pic Invalidiation     yes
    PreProcess                no
    Async Encoding            yes
    Max MBs                   262144
    Lossless                  yes
    SAO                       yes
    Me Only Mode              yes
    Lookahead                 yes
    AQ (temporal)             yes
    Weighted Prediction       yes
    10bit depth               yes
    Quote Quote  
  5. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Interestingly, on an RTX 2070, notice the hardware h.265 encoding rate in fps and the resulting filesize, for an 8k source :

    https://github.com/rigaya/NVEnc/issues/127#issuecomment-499110640

    Code:
    encoded 995 frames, 0.50 fps, 46363.95 kbps, 229.37 MB
    Quote Quote  
  6. That is a crazy system, I am jealous!!! I will say this, with a 3900X I would probably be sticking with a software based encoding solution as it will give you much greater flexibility with all the available settings. I would use that GTX for gpu accelerated filtering, in my experience the biggest bottlenecks tend to be various filters, for instance LUTs.
    Quote Quote  
  7. These processes use CUDA as per the docs

    I don't know what the speed "penalty" is, but they suggest it's "minimal"

    Although the core video encoder hardware on GPU is completely independent of CUDA
    cores or the graphics engine on the GPU, the following encoder features internally use
    CUDA for hardware acceleration.

     Note: The impact of enabling these features on overall CUDA or graphics
    performance is minimal, and this list is provided purely for information purposes.

     Two-pass rate control modes for high quality presets
     Look-ahead
     All adaptive quantization modes
     Weighted prediction
     Encoding of RGB contents


    Originally Posted by sophisticles View Post
    in my experience the biggest bottlenecks tend to be various filters, for instance LUTs.
    Some LUT filters are single threaded or poorly multithreaded (e.g. ffmpeg -vf lut3d) . If you use the vapoursynth or avisynth version to apply the LUT will be much faster
    Quote Quote  
  8. Originally Posted by poisondeathray View Post
    Some LUT filters are single threaded or poorly multithreaded (e.g. ffmpeg -vf lut3d) . If you use the vapoursynth or avisynth version to apply the LUT will be much faster
    Unfortunately neither of these work on Linux without WINE and in my experience they do not work all that well in that configuration either.

    On Windows, from what I remember, Vegas had a bunch of GPU accelerated filters and LUTs were also accelerated.
    Quote Quote  
  9. Originally Posted by sophisticles View Post
    Originally Posted by poisondeathray View Post
    Some LUT filters are single threaded or poorly multithreaded (e.g. ffmpeg -vf lut3d) . If you use the vapoursynth or avisynth version to apply the LUT will be much faster
    Unfortunately neither of these work on Linux without WINE and in my experience they do not work all that well in that configuration either.
    vapoursynth can run natively on linux without wine

    http://vapoursynth.com/doc/installation.html#linux-and-os-x-compilation-instructions
    Quote Quote  
  10. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Donald Graft did a bunch of testing on GPU processing limitations in regard to developing "cudasynth", commentary is available over on his forum.

    IIRC, the memory transfers to/from the GPU were a major performance constraining factor, especially when multiple GPU filters were in play and data was traversing to/from the GPU in between each.

    For my system, "OK today, too slow in 2 years" very likely applies Only a 2060 because I don't game and it's all I could afford anyway.

    I was amazed by only 1/2 of an fps for 8k encoding by rigaya using an 2070, though. Unusable, really.
    Quote Quote  
  11. Originally Posted by hydra3333 View Post
    Donald Graft did a bunch of testing on GPU processing limitations in regard to developing "cudasynth", commentary is available over on his forum.

    IIRC, the memory transfers to/from the GPU were a major performance constraining factor, especially when multiple GPU filters were in play and data was traversing to/from the GPU in between each.
    This is very outdated information, when CUDA first came out it was in fact the case that data needed to be uploaded to VRAM from system RAM and then downloaded back to system RAM after it was worked on and this did create a bottleneck that negated some of the performance benefit of GPU acceleration. This has not been the case for a while, starting with Maxwell NVIDIA added the ability for their gpu's to access system RAM much like AMD's gpu have been able to do for a while via their UMA architecture.
    Quote Quote  
  12. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    OK. Well, I just re-looked and
    Hence I was wondering re "This has not been the case for a while, starting with Maxwell NVIDIA added the ability for their gpu's to access system RAM" where the difference is ?

    Ah, oops, on re-reading the first post over there which says
    Looking at available frameworks for CUDA/NVDec with Avisynth/Vapoursynth, we do not see proper pipelines running on the GPU that eliminate unnecessary PCIe frame transfers and copies into Avisynth/Vapoursynth buffers.
    I did not characterise it correctly, it instead seems to do with the speed/efficiency of copying frames to/from the GPU and a range of individual software-product/filter buffers in RAM (which may be "historical" software filters).

    I assume that's different to what you were saying.
    Quote Quote  
  13. The point is that for CUDA Avisynth(+) filters in a chain, the frame would normally have to be returned to the CPU to deliver it to Avisynth to pass it to the next filter. CUDASynth eliminates these transfers while still returning control to Avisynth after each filter. Only the last filter in the chain returns the final frame to the CPU. The performance gains are substantial compared to non-CUDASynth operation. Note that it is not implemented with any changes to Avisynth(+) itself; instead, each filter must be CUDASynth aware.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!