VideoHelp Forum
+ Reply to Thread
Results 1 to 25 of 25
Thread
  1. Member Bernix's Avatar
    Join Date
    Apr 2016
    Location
    Europe
    Search Comp PM
    Hi,
    I like GPU encoding for its speed. But when comes to any filter, resizing deinterlacing and so on, the speed advantage gone. Is there any software that is using GPU for this? And I'm sure there are such filters for GPU available but not used in any program I know. Is any software GUI using GPU oriented filters?

    Bernix
    Quote Quote  
  2. Commonly resizers are HW implemented (albeit they can be limited to few variants - bilinear, bicubic, perhaps lanczos, never heard on spline resize - generally HQ resizer in HW is implemented as so called polyphase filter - http://www.ti.com/lit/an/spraai7b/spraai7b.pdf ).
    Other filtering can be implemented as HW (deinterlacer, LUT etc). GPU can be used also trough OpneCL, CUDA etc (less common approach, not always efficient).
    Seem newer generation of GPU will be better adaptable to machine learning algorithms and as such we may expect in future some neural network based solutions (deinterlacing seem to be very good candidate).
    There is one very important limitation for most of GPU - data flow seem to be very inflexible and usually all processed data must be done as sequence of operations from start to end by only HW blocks on GPU (it is difficult to mix for example CPU and GPU based filters). For video filtering more or leess same approach like for video encoding (encoding can be considered as very complex conditional filtering).
    Quote Quote  
  3. Member Bernix's Avatar
    Join Date
    Apr 2016
    Location
    Europe
    Search Comp PM
    Thank you both for response!
    Pandy what you are talking about, has it something to do with latency? 6 or more frames has to be loaded to GPU and then processed? Sorry I'm not technical type of person. But read one article, that some expert from Frauenhofer, it was about car cameras, that nowadays usual latency in h265 is about 6 frames, and they have way to latency be only 1 frame. It was from start of February but can't find this article now...
    But good něws is there is way. Also as you mentioned neural network, Waifu2x resize algorithm works with GPU too, but probably not in Avisynth. Probably only CPU based there, don't know.
    But glad you understand my OP, that do anything with Nvenc or QSV or AMD equivalent is braked by any cpu operation, resizing, deinterlacing or denoising and more.

    Thank you both
    Bernix
    Quote Quote  
  4. The important part is pandy's last sentence. It has to be full HW decode , processing, encode to use GPU filters like resize

    These features are not implemented in common ffmpeg distributed builds like zeranoe's builds, you need to compile special libraries. For example for nvenc, you would need to compile with libnpp (-enable-libnpp) .

    rigaya's NVEncC has some of these filters and libraries enabled through -vpp . Definitely some resizing kernals and deinterlacing are GPU enabled (and much faster, it definitely works) , but beware some filters are still CPU bound .


    Waifu2x resize algorithm works with GPU too, but probably not in Avisynth.
    avisynth's version doesn't even have CPU optimizations. It's very very slow . vapoursynth has the GPU caffe version available, it's actually faster than the standalone caffe and no need for intermediate image sequences
    Quote Quote  
  5. Member Bernix's Avatar
    Join Date
    Apr 2016
    Location
    Europe
    Search Comp PM
    Found same article on different site : http://www.eenewsautomotive.com/news/camera-data-compression-low-latency
    But not sure if it has something to do with this thread. I don't fully understand Pandys reply...

    Bernix
    Quote Quote  
  6. Latency is something else than GPU processing (i mean it is completely unrelated to CPU or GPU). Encoding with classical approach is based on history - as described in this article - based on history changes (deltas) can be calculated and coded - such approach will introduce latency - low latency encoding mean less efficient temporal compression - not sure why for automotive they pushing for efficient compression - to display data they don't need any compression, for storage they may use classical approach - it is not clear to me and article is quite vague on this.
    Quote Quote  
  7. I don't use it , but I think staxrip should have some of the filters , because it can use NVEncC.

    When comparing NVEncC vs ffmpeg NVENC for a 1920x1080 cubic resize to 1280x720 , streamcopy audio, same encoding settings for video, mux to MP4 - it's about 2-2.5x faster for NVEncC when using GPU scaling for any of the options (eg. SW decode vs. CUVID full HW decode /filter)

    NVenCC can use GPU-GPU or CPU-GPU (by copy to host device transfer aka "copyHtoD") , which in some cases might be faster or slower depending on the GPU; or even "better" (sometimes GPU decoding has errors if it's not indexed- like dropped frames , corrupt frames, grey frames etc...) . So it's not limited to GPU only end to end - and that's a great option to enforce SW decoding . It accepts avisynth and vapoursynth direct input so you have access to many filters there including some GPU ones like KNLMeansCL. But the "GPU" filters in NVEnc have to be applied afterwards if you are using a script as input

    For NVEncC, some resize algorithms can run on CUDA , but some run on NPP (NVIDIA Performance Primitives) . Apparently bilinear and spline36 run using CUDA ; but nn, npp_linear, cubic, cubic_bspline, cubic_catmull, cubic_b05c03,super, lanczos use the npp resizer .
    Last edited by poisondeathray; 12th Feb 2018 at 21:44.
    Quote Quote  
  8. Member
    Join Date
    Apr 2017
    Location
    England
    Search Comp PM
    Kdenlive has some GPU effects, they just need to be enabled in Settings. It depends on what effects you are looking for. I've tested some and like them.
    Quote Quote  
  9. Member Bernix's Avatar
    Join Date
    Apr 2016
    Location
    Europe
    Search Comp PM
    Hi,
    I think they say, the unit is able to process data about 1Gb. And it has 10 cams. 1 cam produces about 1Gb so they need to push it down 10 times. Probably 10 Gb for device that analyze these pictures is too much and latency of 6 frames is big for security of street traffic. I also don't understand, why they do not send it directly to this unit and compress lately but it seems that 10Gb at once is 10 times more than this unit can handle.
    Thank you

    Bernix
    Quote Quote  
  10. Originally Posted by Bernix View Post
    Hi,
    I think they say, the unit is able to process data about 1Gb. And it has 10 cams. 1 cam produces about 1Gb so they need to push it down 10 times. Probably 10 Gb for device that analyze these pictures is too much and latency of 6 frames is big for security of street traffic. I also don't understand, why they do not send it directly to this unit and compress lately but it seems that 10Gb at once is 10 times more than this unit can handle.
    Thank you

    Bernix
    Intra coding should be able reduce data 10 times... IMHO 60fps too low fps to rely on camera as main/primary driving guidance.
    Quote Quote  
  11. Member Bernix's Avatar
    Join Date
    Apr 2016
    Location
    Europe
    Search Comp PM
    Hi Pandy,
    If cars react in 0,01666 secs, it is 1frame at 60 fps, it is much much less than any superman or any other superhuman can do. The latency is then important. Otherwise reaction of car is 6*0,01666 and this is unacceptable. It is 0,1 sec. And now I see the importance of latency. And the images in that car are not for store them, but let the car see.
    So car reaction is at speed 100 m/s about 1,6666 meter. But probably wrongly calculated With usual latency is it at same speed 10 meters.
    EDIT: Checked some drivers reaction and it is around 0,2 secs. On computer test. In car it should be bit slower.
    And 100m/s is 360km/h more usual in ordinary world speed is around 36-50 m/s.

    Bernix
    Last edited by Bernix; 13th Feb 2018 at 09:01. Reason: EDIT
    Quote Quote  
  12. Originally Posted by Bernix View Post
    Hi Pandy,
    If cars react in 0,01666 secs, it is 1frame at 60 fps, it is much much less than any superman or any other superhuman can do. The latency is then important. Otherwise reaction of car is 6*0,01666 and this is unacceptable. It is 0,1 sec. And now I see the importance of latency. And the images in that car are not for store them, but let the car see.
    So car reaction is at speed 100 m/s about 1,6666 meter. But probably wrongly calculated With usual latency is it at same speed 10 meters.
    EDIT: Checked some drivers reaction and it is around 0,2 secs. On computer test. In car it should be bit slower.
    And 100m/s is 360km/h more usual in ordinary world speed is around 36-50 m/s.

    Bernix
    Nope - i've said first - 60fps is too low to rely only on camera - i would go rather toward 90 - 120fps if camera will be used as primary source of information about road environment ( i mean driving not driving assistance ) - higher frame rate automatically reduce latency (double framerate leading to reducing latency by half). Secondly - Intra coding (virtually latency free if we consider 1 frame as no latency) should be able deliver 10 times data reduction without problems.
    Quote Quote  
  13. Member Bernix's Avatar
    Join Date
    Apr 2016
    Location
    Europe
    Search Comp PM
    Hi Pandy,
    I understand you and have nothing with higher framerate is better. But apparently cars companies choose 60 fps. And also have to say, it is images/videos in h264 format, no h265. And since processor unit is able to do max 1Gb/s higher framerate means lower resolution. So if the car reaction is about 6-12 times faster than human driver, probably therefore they decide for this. I'm not car engineer and don't know why they choose this, but they did.
    So limits are 1Gb/s for central unit, and 10 cameras with 1Gb/s each.
    Nothing wrong.

    Bernix
    Last edited by Bernix; 13th Feb 2018 at 09:31. Reason: bitrate - framerate mistake
    Quote Quote  
  14. Member Bernix's Avatar
    Join Date
    Apr 2016
    Location
    Europe
    Search Comp PM
    But back to OP, is there any GUI that is using GPU filters? I mean for ordinary encoding things. Resize Deinterlace and Denoise...Hi DeJay thank you for Kdenlive mentioning, but on windows it seems it has not this capabilities as Zeranoe's builds of Ffmpeg has to be installed, and as mentioned earlier it doesn't support GPU. I tested it search in setting but found nothing.
    I will check Hybrid if it has something in Vapoursynt accelerated by GPU, if there is some possibility.

    Bernix
    Quote Quote  
  15. There is something to remember regarding GPU based filters, while they generally will run faster than CPU based ones, the main limitation is how big and fast the onboard frame buffer is. With 1080p content a 2GB buffer is enough with 1 or 2 filters, if you're planning on using a lot of filters, say 3 or 4 AND OR 4k content, a 2GB frame buffer will not be enough and and even a 4GB buffer may not be enough.

    The reality is that with the current crypto-craze fueled artificial inflation of graphics card, and the fact that there doesn't seem to be any end in sight, most are probably better off not even considering GPU acceleration and stick pure CPU powered software.
    Quote Quote  
  16. The whole premise of this thread is wrong because it has an underlying assumption that GPU is always going to be "better," in every way, compared to CPU. Why should anyone care whether the filter uses the CPU or GPU? The only things that matter is whether some method -- hardware or software -- produces either better results, or produces the exact same results in less time. GPU often produces results with glitches, and often provides very little, if any speed advantage.

    There are certainly a class of computations that can be done faster on some GPUs. However, in my experience, you always have to do a lot of tests to makes sure the quality is the same, because I have seen all sorts of glitches and weirdness with GPU-based software. Obviously this does not happen all the time, but it has screwed me enough times that I actually go out of my way to make sure that, if the software can take advantage of the GPU, I turn it off. I simply have had too many multi-hour (or multi-day) GPU renders (or filters) that I had to throw out, that I simply don't bother anymore.

    So, for some things, GPU acceleration is amazing. For other things, not so much. And, for still other things, it can be a nightmare.
    Quote Quote  
  17. Plus transfer from main memory to GPU memory -- and back -- is a big bottleneck, cutting into potential performance gains.

    There is an AviSynth plugin called AviSynthShader that lets you do (you guessed it) shader processing on the GPU. I have been meaning to try it...
    Quote Quote  
  18. Member Bernix's Avatar
    Join Date
    Apr 2016
    Location
    Europe
    Search Comp PM
    Hi,
    you are probably right about glitches and also speed and quality. But not everybody has strong CPU to do all operation at reasonable speed. I know it sounds horrible, but if I save say 30% time i will give up some quality loss. Of course not for family videos and similar thing, that needs best quality. And my OP was mentioned about GPU accelerated filters that are amazing with GPU. I enabled in Hybrid one (i know it is important what, but don't remember it honestly(probably best deinterlacing filter)) and encoding speed was about 1fps in SD video. Which is unacceptable. So will stay with yadif on CPU, but curious if there is for example yadif GPU based.
    Everybody nowadays are making video in higher than FullHD i do SD, so memory and transfer of data shouldn't be biggest problem.
    So basically I need some fast denoise filter to help better compression at minimal quality loss. And if there is some deinterlace superfast filter (Bob can be nice bonux), can be also very useful. Just to test it, on my own skin. To know if is it worth or not.
    Problem with cli is that I'm probably lazy, i tried avisynth and make some scripts but it is not my cup of tea (scripts and cli commands)

    Bernix
    Quote Quote  
  19. Originally Posted by sophisticles View Post
    There is something to remember regarding GPU based filters, while they generally will run faster than CPU based ones, the main limitation is how big and fast the onboard frame buffer is. With 1080p content a 2GB buffer is enough with 1 or 2 filters, if you're planning on using a lot of filters, say 3 or 4 AND OR 4k content, a 2GB frame buffer will not be enough and and even a 4GB buffer may not be enough.
    Why? 4k at 4:4:4:4 10 bit and double buffer is around 80MB - seem even 2GB video ram is more than enough to do filtering on 10 - 20 frames.
    Quote Quote  
  20. Member Bernix's Avatar
    Join Date
    Apr 2016
    Location
    Europe
    Search Comp PM
    Just tested, don't know how relevant the size of picture without any compression at 3840x2160 tif 16bit float/per channel (48bit) is 63.3 MB and 32 bit float/per channel is 127 MB. 8bit is 31.6 MB. It should be very similar to raw picture, but don't know for sure. And because comes not from video it should be 4:4:4.

    Edit: 96x96 dpi

    Bernix
    Last edited by Bernix; 14th Feb 2018 at 11:36. Reason: Edit
    Quote Quote  
  21. Originally Posted by Bernix View Post
    Edit: 96x96 dpi
    For? dpi works only if display size is specified. 96x96 dpi for 2 inch screen is different resolution than 96x96 dpi for 80inch screen.
    btw dpi is more accurate for printing where for video ppi can be more accurate.

    Bitmap size is constant and raw video has exactly same size as bitmap - this is simple math X*Y*bitdepth*channels/components(RGB,A/YCbCr,A vs Grayscale,A)

    Even 2GB video RAM has plenty of space for many video frames (even assuming some memory reserved for code, coefficients etc).
    Quote Quote  
  22. Member Bernix's Avatar
    Join Date
    Apr 2016
    Location
    Europe
    Search Comp PM
    Hi,
    it is then not relevant. It was pixel/inch but same, not relevant. Thanks for correction me.

    Bernix
    Quote Quote  
  23. there is for example yadif GPU based.
    In case the decoding is done through the VPU, deinterlacing could also be done that way. (DGDecNV offers this. FRIM always deinterlaces automatically.)
    When using Vapoursynth one could use NNEDI3CL for deinterlacing or use it to increase it to speedUp QTGMC.

    Avisynth GPU filters I know of:
    • BilateralFilter
    • Deathray
    • FFT3DGPU
    • ML3DexGPU
    • NLMeansCL
    • AviShader
    • DGDecNV (as source filter and deinterlacer)
    • there were a bunch of gpu based resizers and misc filters by thejam79 (haven't seen a working link for these for quite some time)
    • nnedi3ocl
    • SVP
    There are also quite a few gpu filters for Vapoursynth.

    That said, I no of non-gpu/vpu based filter which supports 4:4:4(:4) sampling.

    Bitmap size is constant and raw video has exactly same size as bitmap - this is simple math X*Y*bitdepth*channels/components(RGB,A/YCbCr,A vs Grayscale,A)

    Even 2GB video RAM has 23,73046875 for many video frames (even assuming some memory reserved for code, coefficients etc).
    using RGB24: 3840*2160*24= 199065600 bit = 194 400 kBit = 24 300 kByte ~= 23,7 MB
    2GB = 2048 MB ~ 83 frames
    not sure if I would call that plenty of space,.... especially when you handle content with rather large gop sizes and multiple references so that even a simple decoding would not be done one frame at a time,...

    Cu Selur
    Last edited by Selur; 16th Feb 2018 at 16:30.
    users currently on my ignore list: deadrats, Stears555
    Quote Quote  
  24. On the broader topic of "GPU filters" in general - in professional applications there is very large number of GPU accelerated filters and plugins that all use GUIs. Probably hundreds or closer to thousands from various 3rd party companies - everthing from general effects, blurs, scaling, dof, particles, denoising, motion tracking, ray tracing, lighting, color manipulation, LUTs, deinterlacing, a gazillion other things, etc.. Some are very stable and the real deal. Some cost hundreds or even thousands of USD. But there are quite a few "lemons" too. But my point is virtually everything has an optional GPU accelerated mode these days which is sometimes faster/better, but sometimes slower/worse too (especially depending on the HW setup). If you're an effects/plugin company it's almost impossible to sell anything new that doesn't have some sort of GPU acceleration.

    But I have a feeling he's not looking at those, he's looking at free , open source variants. In that area, it's truly underdeveloped. You can count on your hands the number of usable plugins and filters (I'm not talking about research variants, I'm talking about real usage, for end user)
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!