VideoHelp Forum

Our website is made possible by displaying online advertisements to our visitors. Consider supporting us by disable your adblocker or Try ConvertXtoDVD and convert all your movies to DVD. Free trial ! :)
+ Reply to Thread
Results 1 to 26 of 26
Thread
  1. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    It seems ffmpeg is cross-compilable (non-free, using the new nvidia CUDA 10.1 toolkit) with the now-inbuilt GPU based YADIF_CUDA deinterlacer mentioned in the nvidia forum here: https://devtalk.nvidia.com/default/topic/1042459/video-codec-and-optical-flow-sdk/nvde...ing-deint-2-/2
    ffmpeg documentation here https://ffmpeg.org/ffmpeg-filters.html#yadif_005fcuda

    Has anyone done any comparative speed tests of the (hopefully extreme-speed!) GPU-based YADIF_CUDA deinterlacer eg vs vanilla YADIF ?
    eg with mode=0,parity=-1,deint=0

    Suggestions and comments and the commandlines would be very welcome.
    (although, I do prefer not to use the nvdec h.264 hardware decoder in ffmpeg).
    Quote Quote  
  2. CPU Yadif is already super fast so I see no real benefit of using GPU implementation.

    AVC-HD 1920x1080@50i
    no deinterlacing
    Code:
    AVSMeter 2.2.6 (x64)
    AviSynth+ 0.1 (r2772, MT, x86_64) (0.1.0.0)
    Loading script...
    
    Number of frames:                  829
    Length (hh:mm:ss.ms):     00:00:33.160
    Frame width:                      1920
    Frame height:                     1080
    Framerate:                      25.000 (25/1)
    Colorspace:                       YV12
    
    Frames processed:               829 (0 - 828)
    FPS (min | max | average):      5.563 | 3184 | 65.20
    Memory usage (phys | virt):     199 | 122 MiB
    Thread count:                   33
    CPU usage (average):            10%
    
    Time (elapsed):                 00:00:12.714
    YADIF 50p
    Code:
    AVSMeter 2.2.6 (x64)
    AviSynth+ 0.1 (r2772, MT, x86_64) (0.1.0.0)
    Loading script...
    
    Number of frames:                 1658
    Length (hh:mm:ss.ms):     00:00:33.160
    Frame width:                      1920
    Frame height:                     1080
    Framerate:                      50.000 (50/1)
    Colorspace:                       YV12
    
    Frames processed:               1658 (0 - 1657)
    FPS (min | max | average):      5.810 | 199.0 | 123.2
    Memory usage (phys | virt):     221 | 143 MiB
    Thread count:                   33
    CPU usage (average):            14%
    
    Time (elapsed):                 00:00:13.454
    YADIF 25p
    Code:
    AVSMeter 2.2.6 (x64)
    AviSynth+ 0.1 (r2772, MT, x86_64) (0.1.0.0)
    Loading script...
    
    Number of frames:                  829
    Length (hh:mm:ss.ms):     00:00:33.160
    Frame width:                      1920
    Frame height:                     1080
    Framerate:                      25.000 (25/1)
    Colorspace:                       YV12
    
    Frames processed:               829 (0 - 828)
    FPS (min | max | average):      5.787 | 188.6 | 64.41
    Memory usage (phys | virt):     211 | 132 MiB
    Thread count:                   33
    CPU usage (average):            13%
    
    Time (elapsed):                 00:00:12.870
    Quote Quote  
  3. Your 64 fps single-rate case is pitifully slow. Here is DGSource + DGBob (DGBob is a CUDA YADIF clone). It's running over 6 times faster.

    Code:
    AVSMeter 2.7.5 (x64) - Copyright (c) 2012-2017, Groucho2004
    AviSynth+ 0.1 (r2728, MT, x86_64) (0.1.0.0)
    
    Number of frames:                 3645
    Length (hh:mm:ss.ms):     00:02:01.622
    Frame width:                      1920
    Frame height:                     1080
    Framerate:                      29.970 (30000/1001)
    Colorspace:                       YV12
    
    Frames processed:               3645 (0 - 3644)
    FPS (min | max | average):      99.05 | 408.2 | 393.4
    Memory usage (phys | virt):     268 | 755 MiB
    Thread count:                   21
    CPU usage (average):            14%
    
    Time (elapsed):                 00:00:09.266
    I'm baffled why people use CPU solutions for problems that CUDA/NVDec excels at.
    Quote Quote  
  4. Originally Posted by veresov View Post
    Your 64 fps single-rate case is pitifully slow. Here is DGSource + DGBob (DGBob is my own CUDA YADIF clone). It's running over 6 times faster.

    Code:
    AVSMeter 2.7.5 (x64) - Copyright (c) 2012-2017, Groucho2004
    AviSynth+ 0.1 (r2728, MT, x86_64) (0.1.0.0)
    
    Number of frames:                 3645
    Length (hh:mm:ss.ms):     00:02:01.622
    Frame width:                      1920
    Frame height:                     1080
    Framerate:                      29.970 (30000/1001)
    Colorspace:                       YV12
    
    Frames processed:               3645 (0 - 3644)
    FPS (min | max | average):      99.05 | 408.2 | 393.4
    Memory usage (phys | virt):     268 | 755 MiB
    Thread count:                   21
    CPU usage (average):            14%
    
    Time (elapsed):                 00:00:09.266
    Try again with software decoder (ffms2)...
    Quote Quote  
  5. Not interested in sacrificing performance. You can try with a CUDA decoder.

    Also, one of the points you apparently miss when you say you see no benefit is that offloading to the GPU frees up CPU.

    If we did this for 4K video your solution would go from poor to miserable, whereas mine would retain very high performance.

    Finally, further gains in the CUDA solution can be had by using the CUDASynth framework.
    Last edited by veresov; 2nd Mar 2019 at 10:38.
    Quote Quote  
  6. If we did this for 4K video your solution would go from poor to miserable, whereas mine would retain very high performance.
    Yeah... because 3840x2160@60i is soooooooo common...

    Originally Posted by veresov View Post
    Not interested in sacrificing performance. You can try with a CUDA decoder.

    Also, one of the points you apparently miss when you say you see no benefit is that offloading to the GPU frees up CPU.
    Add encoder to the mix and you will notice this



    I'm using E5-2690 + x264 default medium profile CRF20 + FFMS2 + YADIF 50fps and decoding process uses only ~8% of my cpu. I doubt that you can reduce that to 0% on your CUDA decoder + deinterlacer. Hardware decoder in this task is basically a placebo.
    Quote Quote  
  7. Stay with your low performance if you like. I don't care.
    Quote Quote  
  8. Originally Posted by veresov View Post
    Stay with your low performance if you like. I don't care.
    Enjoy your placebo effect during encoding with x264/x265/AV1 and so on. Mr. Neuron2
    Quote Quote  
  9. I use NVEnc. Why would I want to choose a low performance solution?
    Quote Quote  
  10. Originally Posted by veresov View Post
    I use NVEnc. Why would I want to choose a low performance solution?
    Because NVenc sucks in terms of fine detail retention at low bitrates?

    Just for lulz. The same but with default x265 profile


    Yeah. In this case I could save whole 1% by using hardware decoding+deinterlacing. AMAZING! SHUT UP AND TAKE ALL MY PESOS!
    Quote Quote  
  11. Ah, I get it, you're too poor to afford a decent GPU. You have my sympathy.
    Quote Quote  
  12. Originally Posted by veresov View Post
    Ah, I get it, you're too poor to afford a decent GPU. You have my sympathy.
    Lack of arguments, I sense here...
    Quote Quote  
  13. Your ability to sense and understand things will surely improve as you mature.
    Last edited by veresov; 2nd Mar 2019 at 12:34.
    Quote Quote  
  14. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Whilst open to information and suggestions/discussion, I am not sure I comprehend the line of reasoning ... it seems to be along the lines of "don't bother to use something faster" ?
    It may be a valid conclusion at this point in some circumstances, although it does seem counter intuitive.

    One would have reasonably thought that as videos inevitably change in characteristic over time (eg recall old phone .3gp etc vs 1080i which seem ubiquitous nowadays) into more size/complexity, faster tools and whatnot would seem to be desirable.

    Some like NVDec. In vapoursynth and using DG's handy and fast tools, yes I do; with vanilla commandline ffmpeg some (a minority of?) streams seem more problematic possibly ending up with out of sync audio and whatnot which isn't great in automated workflows.

    Still, perhaps I wasn't very good in the initial post, I was more interested in single ffmpeg commandline ("quick'n'dirty") de-interlacing quality and performance information about the recently updated yadif_cuda filter If I was wanting to use beaut stuff and fine control on video sources which would be worth it, I'd use vapoursynth and DG's amazing gear and the large range of great filters available for vapoursynth

    Thank you for your informative comments, though !
    Last edited by hydra3333; 2nd Mar 2019 at 17:38.
    Quote Quote  
  15. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Hello, seeking help using the newly updated FFMPEG filter YADIF_CUDA.

    I'm obviously doing something wrong in (not?) converting between formats in the second commandline, but I don't know what.

    Example 1 works, using NVDEC as source input filter.
    Example 2 fails, using vanilla ffmpeg mpeg2 source input filter.

    Can anyone please clarify ?

    Code:
    1. ---------------------------------------------------- 
    "C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel verbose -stats -hwaccel nvdec -hwaccel_output_format cuda -i ".\1.7TWO.mpg" -t 05 -vf yadif_cuda=0:-1:0 -c:v h264_nvenc -preset lossless -f mp4 -y ".\1.7TWO.aac.yadif_cuda.works.mp4" 
    ffmpeg version N-93276-g3b23eb283a-test Copyright (c) 2000-2019 the FFmpeg developers
      built with gcc 8.3.0 (GCC)
      configuration: --arch=x86_64 --target-os=mingw32 --cross-prefix=x86_64-w64-mingw32- --pkg-config=pkg-config --disable-w32threads --enable-pthreads --enable-cross-compile --enable-pic --enable-libsoxr --enable-libass --enable-iconv --enable-libtwolame --enable-libzvbi --enable-libcaca --enable-libmodplug --enable-cuvid --enable-libmp3lame --enable-version3 --enable-zlib --enable-librtmp --enable-libvorbis --enable-libtheora --enable-libspeex --enable-libgsm --enable-libopus --enable-bzlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-libvpx --enable-libilbc --enable-libwavpack --enable-libwebp --enable-dxva2 --disable-avisynth --enable-vapoursynth --enable-gray --enable-libmysofa --enable-libflite --enable-lzma --enable-libsnappy --enable-libzimg --enable-libx264 --enable-libx265 --enable-libaom --enable-libdav1d --enable-frei0r --enable-filter=frei0r --enable-librubberband --enable-libvidstab --enable-libxvid --enable-libgme --enable-runtime-cpudetect --enable-libfribidi --enable-gnutls --enable-gmp --enable-fontconfig --enable-libfontconfig --enable-libfreetype --enable-libbluray --enable-libcdio --enable-libmfx --disable-schannel --enable-ladspa --enable-libxml2 --enable-libdavs2 --enable-libopenmpt --enable-libxavs --enable-libxavs2 --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-opengl --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-gpl --extra-version='test _of_DeadSix27/python_cross_compile_script' --enable-avresample --pkg-config-flags=--static --extra-libs='-lintl -liconv' --extra-cflags=-DLIBTWOLAME_STATIC --extra-cflags=-DMODPLUG_STATIC --enable-libbluray --prefix=/home/u/Desktop/workdir/x86_64_products/ffmpeg_static_non_free_opencl.installed --disable-shared --enable-static --enable-cuda-nvcc --enable-nonfree --enable-opencl --enable-nonfree --enable-libfdk-aac --enable-decklink --extra-cflags=-DLIBXML_STATIC --extra-cflags=-DGLIB_STATIC_COMPILATION
      libavutil      56. 26.100 / 56. 26.100
      libavcodec     58. 47.102 / 58. 47.102
      libavformat    58. 26.101 / 58. 26.101
      libavdevice    58.  6.101 / 58.  6.101
      libavfilter     7. 48.100 /  7. 48.100
      libavresample   4.  0.  0 /  4.  0.  0
      libswscale      5.  4.100 /  5.  4.100
      libswresample   3.  4.100 /  3.  4.100
      libpostproc    55.  4.100 / 55.  4.100
    [mpeg @ 000002199435b3c0] max_analyze_duration 5000000 reached at 5000000 microseconds st:0
    Input #0, mpeg, from '.\1.7TWO.mpg':
      Duration: 01:24:11.57, start: 0.229856, bitrate: 3129 kb/s
        Stream #0:0[0x1e0]: Video: mpeg2video (Main), 1 reference frame, yuv420p(tv, top first, left), 720x576 [SAR 64:45 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
        Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 192 kb/s
    Stream mapping:
      Stream #0:0 -> #0:0 (mpeg2video (native) -> h264 (h264_nvenc))
      Stream #0:1 -> #0:1 (mp2 (native) -> aac (native))
    Press [q] to stop, [?] for help
    [mpeg2video @ 000002199436d380] NVDEC capabilities:
    [mpeg2video @ 000002199436d380] format supported: yes, max_mb_count: 65280
    [mpeg2video @ 000002199436d380] min_width: 48, max_width: 4080
    [mpeg2video @ 000002199436d380] min_height: 16, max_height: 4080
    [graph 0 input from stream 0:0 @ 000002199486ffc0] w:720 h:576 pixfmt:cuda tb:1/90000 fr:25/1 sar:64/45 sws_param:flags=2
    [h264_nvenc @ 00000219943cf680] Loaded Nvenc version 9.0
    [h264_nvenc @ 00000219943cf680] Nvenc initialized successfully
    [graph_1_in_0_1 @ 00000219943b3f00] tb:1/48000 samplefmt:s16p samplerate:48000 chlayout:0x3
    [format_out_0_1 @ 00000219943b4800] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_1'
    [auto_resampler_0 @ 00000219943b3800] ch:2 chl:stereo fmt:s16p r:48000Hz -> ch:2 chl:stereo fmt:fltp r:48000Hz
    Output #0, mp4, to '.\1.7TWO.aac.yadif_cuda.works.mp4':
      Metadata:
        encoder         : Lavf58.26.101
        Stream #0:0: Video: h264 (h264_nvenc), 1 reference frame (avc1 / 0x31637661), cuda(progressive, left), 720x576 [SAR 64:45 DAR 16:9], q=-1--1, 2000 kb/s, 25 fps, 12800 tbn, 25 tbc
        Metadata:
          encoder         : Lavc58.47.102 h264_nvenc
        Side data:
          cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000 vbv_delay: -1
        Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, delay 1024, 128 kb/s
        Metadata:
          encoder         : Lavc58.47.102 aac
    No more output streams to write to, finishing.
    frame=  125 fps=0.0 q=-1.0 Lsize=   13331kB time=00:00:05.01 bitrate=21783.7kbits/s speed=16.7x    
    video:13249kB audio:78kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.030828%
    Input file #0 (.\1.7TWO.mpg):
      Input stream #0:0 (video): 128 packets read (1379392 bytes); 128 frames decoded; 
      Input stream #0:1 (audio): 210 packets read (120960 bytes); 210 frames decoded (241920 samples); 
      Total: 338 packets (1500352 bytes) demuxed
    Output file #0 (.\1.7TWO.aac.yadif_cuda.works.mp4):
      Output stream #0:0 (video): 125 frames encoded; 125 packets muxed (13566744 bytes); 
      Output stream #0:1 (audio): 235 frames encoded (240000 samples); 236 packets muxed (80158 bytes); 
      Total: 361 packets (13646902 bytes) muxed
    [AVIOContext @ 000002199435d200] Statistics: 2 seeks, 56 writeouts
    [h264_nvenc @ 00000219943cf680] Nvenc unloaded
    [aac @ 000002199435c840] Qavg: 515.211
    [AVIOContext @ 0000021994363d40] Statistics: 3330192 bytes read, 2 seeks
    
    2. ---------------------------------------------------- 
    "C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel verbose -stats -i ".\1.7TWO.mpg" -t 05 -map_metadata -1 -vf "format=cuda,yadif_cuda=0:-1:0" -r 25 -c:v h264_nvenc -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -an  -y ".\1.7TWO.aac.yadif_cuda.mp4" 
    ffmpeg version N-93276-g3b23eb283a-test Copyright (c) 2000-2019 the FFmpeg developers
      built with gcc 8.3.0 (GCC)
      configuration: --arch=x86_64 --target-os=mingw32 --cross-prefix=x86_64-w64-mingw32- --pkg-config=pkg-config --disable-w32threads --enable-pthreads --enable-cross-compile --enable-pic --enable-libsoxr --enable-libass --enable-iconv --enable-libtwolame --enable-libzvbi --enable-libcaca --enable-libmodplug --enable-cuvid --enable-libmp3lame --enable-version3 --enable-zlib --enable-librtmp --enable-libvorbis --enable-libtheora --enable-libspeex --enable-libgsm --enable-libopus --enable-bzlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-libvpx --enable-libilbc --enable-libwavpack --enable-libwebp --enable-dxva2 --disable-avisynth --enable-vapoursynth --enable-gray --enable-libmysofa --enable-libflite --enable-lzma --enable-libsnappy --enable-libzimg --enable-libx264 --enable-libx265 --enable-libaom --enable-libdav1d --enable-frei0r --enable-filter=frei0r --enable-librubberband --enable-libvidstab --enable-libxvid --enable-libgme --enable-runtime-cpudetect --enable-libfribidi --enable-gnutls --enable-gmp --enable-fontconfig --enable-libfontconfig --enable-libfreetype --enable-libbluray --enable-libcdio --enable-libmfx --disable-schannel --enable-ladspa --enable-libxml2 --enable-libdavs2 --enable-libopenmpt --enable-libxavs --enable-libxavs2 --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-opengl --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-gpl --extra-version='test _of_DeadSix27/python_cross_compile_script' --enable-avresample --pkg-config-flags=--static --extra-libs='-lintl -liconv' --extra-cflags=-DLIBTWOLAME_STATIC --extra-cflags=-DMODPLUG_STATIC --enable-libbluray --prefix=/home/u/Desktop/workdir/x86_64_products/ffmpeg_static_non_free_opencl.installed --disable-shared --enable-static --enable-cuda-nvcc --enable-nonfree --enable-opencl --enable-nonfree --enable-libfdk-aac --enable-decklink --extra-cflags=-DLIBXML_STATIC --extra-cflags=-DGLIB_STATIC_COMPILATION
      libavutil      56. 26.100 / 56. 26.100
      libavcodec     58. 47.102 / 58. 47.102
      libavformat    58. 26.101 / 58. 26.101
      libavdevice    58.  6.101 / 58.  6.101
      libavfilter     7. 48.100 /  7. 48.100
      libavresample   4.  0.  0 /  4.  0.  0
      libswscale      5.  4.100 /  5.  4.100
      libswresample   3.  4.100 /  3.  4.100
      libpostproc    55.  4.100 / 55.  4.100
    Routing option strict to both codec and muxer layer
    [mpeg @ 000001caa72dbd80] max_analyze_duration 5000000 reached at 5000000 microseconds st:0
    Input #0, mpeg, from '.\1.7TWO.mpg':
      Duration: 01:24:11.57, start: 0.229856, bitrate: 3129 kb/s
        Stream #0:0[0x1e0]: Video: mpeg2video (Main), 1 reference frame, yuv420p(tv, top first, left), 720x576 [SAR 64:45 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
        Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 192 kb/s
    Stream mapping:
      Stream #0:0 -> #0:0 (mpeg2video (native) -> h264 (h264_nvenc))
    Press [q] to stop, [?] for help
    [mpeg @ 000001caa72dbd80] Correcting start time by 10144
    [graph 0 input from stream 0:0 @ 000001caa7346b40] w:720 h:576 pixfmt:yuv420p tb:1/90000 fr:25/1 sar:64/45 sws_param:flags=2
    [auto_scaler_0 @ 000001caa7806a40] w:iw h:ih flags:'bicubic' interl:0
    [Parsed_format_0 @ 000001caa72c8780] auto-inserting filter 'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_format_0'
    Impossible to convert between the formats supported by the filter 'graph 0 input from stream 0:0' and the filter 'auto_scaler_0'
    Error reinitializing filters!
    Failed to inject frame into filter network: Function not implemented
    Error while processing the decoded data for stream #0:0
    [AVIOContext @ 000001caa73a9780] Statistics: 0 seeks, 0 writeouts
    [AVIOContext @ 000001caa72e4d40] Statistics: 1855632 bytes read, 2 seeks
    Conversion failed!
    ----------------------------------------------------
    Last edited by hydra3333; 2nd Mar 2019 at 20:06.
    Quote Quote  
  16. @hydra3333: How I understood:
    Hardware filters can be used in a filter graph like any other filter. Note, however, that they may not support any formats in common with software filters in such cases it may be necessary to make use of hwupload and hwdownload filter instances to move frame data between hardware surfaces and normal memory.
    see: https://trac.ffmpeg.org/wiki/HWAccelIntro
    (haven't tested) hwupload_cuda would be needed before cuda based filters when the input isn't loaded through cuda,... and if you use hardware acceleration for decoding, but not for filtering hwdownload_cuda is needed.

    Cu Selur
    users currently on my ignore list: deadrats, Stears555
    Quote Quote  
  17. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Thanks Selur ! This worked with
    Code:
    -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -vf "hwupload_cuda,yadif_cuda=0:-1:0"
    The log:
    Code:
    "C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel verbose -stats -i ".\1.7TWO.mpg" -t 05 -map_metadata -1 -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -vf "hwupload_cuda,yadif_cuda=0:-1:0" -r 25 -c:v h264_nvenc -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -an  -y ".\1.7TWO.aac.yadif_cuda.mp4" 
    ffmpeg version N-93276-g3b23eb283a Copyright (c) 2000-2019 the FFmpeg developers
      built with gcc 8.3.0 (GCC)
      configuration: --arch=x86_64 --target-os=mingw32 --cross-prefix=x86_64-w64-mingw32- --pkg-config=pkg-config --disable-w32threads --enable-pthreads --enable-cross-compile --enable-pic --enable-libsoxr --enable-libass --enable-iconv --enable-libtwolame --enable-libzvbi --enable-libcaca --enable-libmodplug --enable-cuvid --enable-libmp3lame --enable-version3 --enable-zlib --enable-librtmp --enable-libvorbis --enable-libtheora --enable-libspeex --enable-libgsm --enable-libopus --enable-bzlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-libvpx --enable-libilbc --enable-libwavpack --enable-libwebp --enable-dxva2 --disable-avisynth --enable-vapoursynth --enable-gray --enable-libmysofa --enable-libflite --enable-lzma --enable-libsnappy --enable-libzimg --enable-libx264 --enable-libx265 --enable-libaom --enable-libdav1d --enable-frei0r --enable-filter=frei0r --enable-librubberband --enable-libvidstab --enable-libxvid --enable-libgme --enable-runtime-cpudetect --enable-libfribidi --enable-gnutls --enable-gmp --enable-fontconfig --enable-libfontconfig --enable-libfreetype --enable-libbluray --enable-libcdio --enable-libmfx --disable-schannel --enable-ladspa --enable-libxml2 --enable-libdavs2 --enable-libopenmpt --enable-libxavs --enable-libxavs2 --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-opengl --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-gpl --extra-version='DeadSix27/python_cross_compile_script' --enable-avresample --pkg-config-flags=--static --extra-libs='-lintl -liconv' --extra-cflags=-DLIBTWOLAME_STATIC --extra-cflags=-DMODPLUG_STATIC --enable-libbluray --prefix=/home/u/Desktop/workdir/x86_64_products/ffmpeg_static_non_free_opencl.installed --disable-shared --enable-static --enable-cuda-nvcc --enable-nonfree --enable-opencl --enable-nonfree --enable-libfdk-aac --enable-decklink --extra-cflags=-DLIBXML_STATIC --extra-cflags=-DGLIB_STATIC_COMPILATION
      libavutil      56. 26.100 / 56. 26.100
      libavcodec     58. 47.102 / 58. 47.102
      libavformat    58. 26.101 / 58. 26.101
      libavdevice    58.  6.101 / 58.  6.101
      libavfilter     7. 48.100 /  7. 48.100
      libavresample   4.  0.  0 /  4.  0.  0
      libswscale      5.  4.100 /  5.  4.100
      libswresample   3.  4.100 /  3.  4.100
      libpostproc    55.  4.100 / 55.  4.100
    Routing option strict to both codec and muxer layer
    [AVHWDeviceContext @ 000001306346b6c0] 0.0: NVIDIA CUDA / GeForce GTX 1050 Ti
    [AVHWDeviceContext @ 000001306346b6c0] DXVA2 to OpenCL mapping function found (clCreateFromDX9MediaSurfaceKHR).
    [AVHWDeviceContext @ 000001306346b6c0] DXVA2 in OpenCL acquire function found (clEnqueueAcquireDX9MediaSurfacesKHR).
    [AVHWDeviceContext @ 000001306346b6c0] DXVA2 in OpenCL release function found (clEnqueueReleaseDX9MediaSurfacesKHR).
    [AVHWDeviceContext @ 000001306346b6c0] The cl_khr_d3d11_sharing extension is required for D3D11 to OpenCL mapping.
    [AVHWDeviceContext @ 000001306346b6c0] D3D11 to OpenCL mapping not usable.
    [mpeg @ 000001306346d900] max_analyze_duration 5000000 reached at 5000000 microseconds st:0
    Input #0, mpeg, from '.\1.7TWO.mpg':
      Duration: 01:24:11.57, start: 0.229856, bitrate: 3129 kb/s
        Stream #0:0[0x1e0]: Video: mpeg2video (Main), 1 reference frame, yuv420p(tv, top first, left), 720x576 [SAR 64:45 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
        Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 192 kb/s
    Stream mapping:
      Stream #0:0 -> #0:0 (mpeg2video (native) -> h264 (h264_nvenc))
    Press [q] to stop, [?] for help
    [mpeg @ 000001306346d900] Correcting start time by 10144
    [graph 0 input from stream 0:0 @ 00000130634aa6c0] w:720 h:576 pixfmt:yuv420p tb:1/90000 fr:25/1 sar:64/45 sws_param:flags=2
    [h264_nvenc @ 000001306346eac0] Loaded Nvenc version 9.0
    [h264_nvenc @ 000001306346eac0] Nvenc initialized successfully
    [h264_nvenc @ 000001306346eac0] Lookahead enabled: depth 32, scenecut enabled, B-adapt enabled.
    Output #0, mp4, to '.\1.7TWO.aac.yadif_cuda.mp4':
      Metadata:
        encoder         : Lavf58.26.101
        Stream #0:0: Video: h264 (h264_nvenc) (High), 1 reference frame (avc1 / 0x31637661), cuda(left), 720x576 [SAR 64:45 DAR 16:9], q=16-25, 2000 kb/s, 25 fps, 12800 tbn, 25 tbc
        Metadata:
          encoder         : Lavc58.47.102 h264_nvenc
        Side data:
          cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000 vbv_delay: -1
    frame=  109 fps=0.0 q=18.0 size=     256kB time=00:00:02.76 bitrate= 760.0kbits/s speed= 5.5x    
    No more output streams to write to, finishing.
    [mp4 @ 0000013073866980] Starting second pass: moving the moov atom to the beginning of the file
    color primaries unspecified, assuming bt470bg
    [AVIOContext @ 00000130634f8980] Statistics: 937177 bytes read, 0 seeks
    frame=  125 fps=0.0 q=18.0 Lsize=     917kB time=00:00:04.92 bitrate=1527.3kbits/s speed=8.73x    
    video:915kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.231772%
    Input file #0 (.\1.7TWO.mpg):
      Input stream #0:0 (video): 128 packets read (1379392 bytes); 128 frames decoded; 
      Input stream #0:1 (audio): 0 packets read (0 bytes); 
      Total: 128 packets (1379392 bytes) demuxed
    Output file #0 (.\1.7TWO.aac.yadif_cuda.mp4):
      Output stream #0:0 (video): 125 frames encoded; 125 packets muxed (937129 bytes); 
      Total: 125 packets (937129 bytes) muxed
    [AVIOContext @ 0000013063471400] Statistics: 4 seeks, 11 writeouts
    [h264_nvenc @ 000001306346eac0] Nvenc unloaded
    [AVIOContext @ 0000013063476b00] Statistics: 3330192 bytes read, 2 seeks
    ----------------------------------------------------
    Quote Quote  
  18. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    One more similar query, if I may

    I tried various combinations between the YADIF_CUDA filter and the UNSHARP_OPENCL opencl filter (each works OK by itself) but I can't seem to jag something that works with both.



    Code:
    "C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel debug -stats -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -i ".\1.7TWO.mpg" -t 05 -map_metadata -1 -sws_flags lanczos+accurate_rnd+full_chroma_int+full_chroma_inp -filter_complex "[0:v]hwupload_cuda,yadif_cuda=0:-1:0,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv420p,setdar=dar=16/9" -r 25 -c:v h264_nvenc -pix_fmt nv12 -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -af loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00:offset=0.17:linear=true:print_format=summary -c:a libfdk_aac -cutoff 18000 -ab 384k -ar 48000  -y ".\1.7TWO.aac.xxx.mp4" 
    
    ffmpeg version N-93276-g3b23eb283a Copyright (c) 2000-2019 the FFmpeg developers
      built with gcc 8.3.0 (GCC)
      configuration: --arch=x86_64 --target-os=mingw32 --cross-prefix=x86_64-w64-mingw32- --pkg-config=pkg-config --disable-w32threads --enable-pthreads --enable-cross-compile --enable-pic --enable-libsoxr --enable-libass --enable-iconv --enable-libtwolame --enable-libzvbi --enable-libcaca --enable-libmodplug --enable-cuvid --enable-libmp3lame --enable-version3 --enable-zlib --enable-librtmp --enable-libvorbis --enable-libtheora --enable-libspeex --enable-libgsm --enable-libopus --enable-bzlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-libvpx --enable-libilbc --enable-libwavpack --enable-libwebp --enable-dxva2 --disable-avisynth --enable-vapoursynth --enable-gray --enable-libmysofa --enable-libflite --enable-lzma --enable-libsnappy --enable-libzimg --enable-libx264 --enable-libx265 --enable-libaom --enable-libdav1d --enable-frei0r --enable-filter=frei0r --enable-librubberband --enable-libvidstab --enable-libxvid --enable-libgme --enable-runtime-cpudetect --enable-libfribidi --enable-gnutls --enable-gmp --enable-fontconfig --enable-libfontconfig --enable-libfreetype --enable-libbluray --enable-libcdio --enable-libmfx --disable-schannel --enable-ladspa --enable-libxml2 --enable-libdavs2 --enable-libopenmpt --enable-libxavs --enable-libxavs2 --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-opengl --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-gpl --extra-version='DeadSix27/python_cross_compile_script' --enable-avresample --pkg-config-flags=--static --extra-libs='-lintl -liconv' --extra-cflags=-DLIBTWOLAME_STATIC --extra-cflags=-DMODPLUG_STATIC --enable-libbluray --prefix=/home/u/Desktop/workdir/x86_64_products/ffmpeg_static_non_free_opencl.installed --disable-shared --enable-static --enable-cuda-nvcc --enable-nonfree --enable-opencl --enable-nonfree --enable-libfdk-aac --enable-decklink --extra-cflags=-DLIBXML_STATIC --extra-cflags=-DGLIB_STATIC_COMPILATION
      libavutil      56. 26.100 / 56. 26.100
      libavcodec     58. 47.102 / 58. 47.102
      libavformat    58. 26.101 / 58. 26.101
      libavdevice    58.  6.101 / 58.  6.101
      libavfilter     7. 48.100 /  7. 48.100
      libavresample   4.  0.  0 /  4.  0.  0
      libswscale      5.  4.100 /  5.  4.100
      libswresample   3.  4.100 /  3.  4.100
      libpostproc    55.  4.100 / 55.  4.100
    Splitting the commandline.
    Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument 'debug'.
    Reading option '-stats' ... matched as option 'stats' (print progress report during encoding) with argument '1'.
    Reading option '-init_hw_device' ... matched as option 'init_hw_device' (initialise hardware device) with argument 'opencl=ocl:0.0'.
    Reading option '-filter_hw_device' ... matched as option 'filter_hw_device' (set hardware device used when filtering) with argument 'ocl'.
    Reading option '-i' ... matched as input url with argument '.\1.7TWO.mpg'.
    Reading option '-t' ... matched as option 't' (record or transcode "duration" seconds of audio/video) with argument '05'.
    Reading option '-map_metadata' ... matched as option 'map_metadata' (set metadata information of outfile from infile) with argument '-1'.
    Reading option '-sws_flags' ... matched as AVOption 'sws_flags' with argument 'lanczos+accurate_rnd+full_chroma_int+full_chroma_inp'.
    Reading option '-filter_complex' ... matched as option 'filter_complex' (create a complex filtergraph) with argument '[0:v]hwupload_cuda,yadif_cuda=0:-1:0,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv420p,setdar=dar=16/9'.
    Reading option '-r' ... matched as option 'r' (set frame rate (Hz value, fraction or abbreviation)) with argument '25'.
    Reading option '-c:v' ... matched as option 'c' (codec name) with argument 'h264_nvenc'.
    Reading option '-pix_fmt' ... matched as option 'pix_fmt' (set pixel format) with argument 'nv12'.
    Reading option '-preset' ... matched as AVOption 'preset' with argument 'slow'.
    Reading option '-bf' ... matched as AVOption 'bf' with argument '2'.
    Reading option '-g' ... matched as AVOption 'g' with argument '50'.
    Reading option '-refs' ... matched as AVOption 'refs' with argument '3'.
    Reading option '-rc:v' ... matched as AVOption 'rc:v' with argument 'vbr_hq'.
    Reading option '-rc-lookahead:v' ... matched as AVOption 'rc-lookahead:v' with argument '32'.
    Reading option '-cq' ... matched as AVOption 'cq' with argument '22'.
    Reading option '-qmin' ... matched as AVOption 'qmin' with argument '16'.
    Reading option '-qmax' ... matched as AVOption 'qmax' with argument '25'.
    Reading option '-coder' ... matched as AVOption 'coder' with argument 'cabac'.
    Reading option '-strict' ...Routing option strict to both codec and muxer layer
     matched as AVOption 'strict' with argument 'experimental'.
    Reading option '-movflags' ... matched as AVOption 'movflags' with argument '+faststart+write_colr'.
    Reading option '-profile:v' ... matched as option 'profile' (set profile) with argument 'high'.
    Reading option '-level' ... matched as AVOption 'level' with argument '4.1'.
    Reading option '-af' ... matched as option 'af' (set audio filters) with argument 'loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00:offset=0.17:linear=true:print_format=summary'.
    Reading option '-c:a' ... matched as option 'c' (codec name) with argument 'libfdk_aac'.
    Reading option '-cutoff' ... matched as AVOption 'cutoff' with argument '18000'.
    Reading option '-ab' ... matched as option 'ab' (audio bitrate (please use -b:a)) with argument '384k'.
    Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz)) with argument '48000'.
    Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'.
    Reading option '.\1.7TWO.aac.xxx.mp4' ... matched as output url.
    Finished splitting the commandline.
    Parsing a group of options: global .
    Applying option loglevel (set logging level) with argument debug.
    Applying option stats (print progress report during encoding) with argument 1.
    Applying option init_hw_device (initialise hardware device) with argument opencl=ocl:0.0.
    [AVHWDeviceContext @ 0000021d0db3b400] 2 OpenCL platforms found.
    [AVHWDeviceContext @ 0000021d0db3b400] 1 OpenCL devices found on platform "NVIDIA CUDA".
    [AVHWDeviceContext @ 0000021d0db3b400] 0.0: NVIDIA CUDA / GeForce GTX 1050 Ti
    [AVHWDeviceContext @ 0000021d0db3b400] DXVA2 to OpenCL mapping function found (clCreateFromDX9MediaSurfaceKHR).
    [AVHWDeviceContext @ 0000021d0db3b400] DXVA2 in OpenCL acquire function found (clEnqueueAcquireDX9MediaSurfacesKHR).
    [AVHWDeviceContext @ 0000021d0db3b400] DXVA2 in OpenCL release function found (clEnqueueReleaseDX9MediaSurfacesKHR).
    [AVHWDeviceContext @ 0000021d0db3b400] The cl_khr_d3d11_sharing extension is required for D3D11 to OpenCL mapping.
    [AVHWDeviceContext @ 0000021d0db3b400] D3D11 to OpenCL mapping not usable.
    Applying option filter_hw_device (set hardware device used when filtering) with argument ocl.
    Applying option filter_complex (create a complex filtergraph) with argument [0:v]hwupload_cuda,yadif_cuda=0:-1:0,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv420p,setdar=dar=16/9.
    Applying option y (overwrite output files) with argument 1.
    Successfully parsed a group of options.
    Parsing a group of options: input url .\1.7TWO.mpg.
    Successfully parsed a group of options.
    Opening an input file: .\1.7TWO.mpg.
    [NULL @ 0000021d0db3f140] Opening '.\1.7TWO.mpg' for reading
    [file @ 0000021d0db3f840] Setting default whitelist 'file,crypto'
    [mpeg @ 0000021d0db3f140] Format mpeg probed with size=2048 and score=26
    [mpeg @ 0000021d0db3f140] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0 nb_streams:0
    [mpeg @ 0000021d0db3f140] probing stream 0 pp:2500
    [mpeg @ 0000021d0db3f140] Probe with size=2012, packets=1 detected mpegvideo with score=25
    [mpeg @ 0000021d0db3f140] probed stream 0
    [mpeg2video @ 0000021d0db51c80] Format yuv420p chosen by get_format().
    [mpeg @ 0000021d0db3f140] max_analyze_duration 5000000 reached at 5000000 microseconds st:0
    [mpeg @ 0000021d0db3f140] After avformat_find_stream_info() pos: 0 bytes read:1790096 seeks:2 frames:337
    Input #0, mpeg, from '.\1.7TWO.mpg':
      Duration: 01:24:11.57, start: 0.229856, bitrate: 3129 kb/s
        Stream #0:0[0x1e0], 127, 1/90000: Video: mpeg2video (Main), 1 reference frame, yuv420p(tv, top first, left), 720x576 [SAR 64:45 DAR 16:9], 0/1, 25 fps, 25 tbr, 90k tbn, 50 tbc
        Stream #0:1[0x1c0], 210, 1/90000: Audio: mp2, 48000 Hz, stereo, s16p, 192 kb/s
    Successfully opened the file.
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded lib: nvcuda.dll
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuInit
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceGetCount
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceGet
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceGetAttribute
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceGetName
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceComputeCapability
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuCtxCreate_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuCtxSetLimit
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuCtxPushCurrent_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuCtxPopCurrent_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuCtxDestroy_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemAlloc_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemAllocPitch_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemsetD8Async
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemFree_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemcpy2D_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemcpy2DAsync_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGetErrorName
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGetErrorString
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuStreamCreate
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuStreamQuery
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuStreamSynchronize
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuStreamDestroy_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuStreamAddCallback
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuEventCreate
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuEventDestroy_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuEventSynchronize
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuEventQuery
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuEventRecord
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuLaunchKernel
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuModuleLoadData
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuModuleUnload
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuModuleGetFunction
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuTexObjectCreate
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuTexObjectDestroy
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGLGetDevices_v2
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGraphicsGLRegisterImage
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGraphicsUnregisterResource
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGraphicsMapResources
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGraphicsUnmapResources
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGraphicsSubResourceGetMappedArray
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceGetUuid
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuImportExternalMemory
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDestroyExternalMemory
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuExternalMemoryGetMappedBuffer
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuExternalMemoryGetMappedMipmappedArray
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMipmappedArrayGetLevel
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMipmappedArrayDestroy
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuImportExternalSemaphore
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDestroyExternalSemaphore
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuSignalExternalSemaphoresAsync
    [AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuWaitExternalSemaphoresAsync
    [Parsed_yadif_cuda_1 @ 0000021d0db88800] Setting 'mode' to value '0'
    [Parsed_yadif_cuda_1 @ 0000021d0db88800] Setting 'parity' to value '-1'
    [Parsed_yadif_cuda_1 @ 0000021d0db88800] Setting 'deint' to value '0'
    [Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'lx' to value '3'
    [Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'ly' to value '3'
    [Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'la' to value '0.5'
    [Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'cx' to value '3'
    [Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'cy' to value '3'
    [Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'ca' to value '0.5'
    [Parsed_format_4 @ 0000021d0db89580] Setting 'pix_fmts' to value 'yuv420p'
    [Parsed_setdar_5 @ 0000021d0db89840] Setting 'dar' to value '16/9'
    Parsing a group of options: output url .\1.7TWO.aac.xxx.mp4.
    Applying option t (record or transcode "duration" seconds of audio/video) with argument 05.
    Applying option map_metadata (set metadata information of outfile from infile) with argument -1.
    Applying option r (set frame rate (Hz value, fraction or abbreviation)) with argument 25.
    Applying option c:v (codec name) with argument h264_nvenc.
    Applying option pix_fmt (set pixel format) with argument nv12.
    Applying option profile:v (set profile) with argument high.
    Applying option af (set audio filters) with argument loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00:offset=0.17:linear=true:print_format=summary.
    Applying option c:a (codec name) with argument libfdk_aac.
    Applying option ab (audio bitrate (please use -b:a)) with argument 384k.
    Applying option ar (set audio sampling rate (in Hz)) with argument 48000.
    Successfully parsed a group of options.
    Opening an output file: .\1.7TWO.aac.xxx.mp4.
    [file @ 0000021d0db46d00] Setting default whitelist 'file,crypto'
    Successfully opened the file.
    detected 8 logical cores
    Stream mapping:
      Stream #0:0 (mpeg2video) -> hwupload_cuda (graph 0)
      setdar (graph 0) -> Stream #0:0 (h264_nvenc)
      Stream #0:1 -> #0:1 (mp2 (native) -> aac (libfdk_aac))
    Press [q] to stop, [?] for help
    cur_dts is invalid (this is harmless if it occurs once at the start per stream)
    [mpeg2video @ 0000021d0db50800] Format yuv420p chosen by get_format().
    cur_dts is invalid (this is harmless if it occurs once at the start per stream)
    [AVHWDeviceContext @ 0000021d2443a800] Loaded lib: nvcuda.dll
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuInit
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceGetCount
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceGet
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceGetAttribute
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceGetName
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceComputeCapability
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuCtxCreate_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuCtxSetLimit
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuCtxPushCurrent_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuCtxPopCurrent_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuCtxDestroy_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemAlloc_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemAllocPitch_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemsetD8Async
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemFree_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemcpy2D_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemcpy2DAsync_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGetErrorName
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGetErrorString
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuStreamCreate
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuStreamQuery
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuStreamSynchronize
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuStreamDestroy_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuStreamAddCallback
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuEventCreate
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuEventDestroy_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuEventSynchronize
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuEventQuery
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuEventRecord
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuLaunchKernel
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuModuleLoadData
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuModuleUnload
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuModuleGetFunction
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuTexObjectCreate
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuTexObjectDestroy
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGLGetDevices_v2
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGraphicsGLRegisterImage
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGraphicsUnregisterResource
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGraphicsMapResources
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGraphicsUnmapResources
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGraphicsSubResourceGetMappedArray
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceGetUuid
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuImportExternalMemory
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDestroyExternalMemory
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuExternalMemoryGetMappedBuffer
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuExternalMemoryGetMappedMipmappedArray
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMipmappedArrayGetLevel
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMipmappedArrayDestroy
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuImportExternalSemaphore
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDestroyExternalSemaphore
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuSignalExternalSemaphoresAsync
    [AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuWaitExternalSemaphoresAsync
    [Parsed_yadif_cuda_1 @ 0000021d0c36ee80] Setting 'mode' to value '0'
    [Parsed_yadif_cuda_1 @ 0000021d0c36ee80] Setting 'parity' to value '-1'
    [Parsed_yadif_cuda_1 @ 0000021d0c36ee80] Setting 'deint' to value '0'
    [Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'lx' to value '3'
    [Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'ly' to value '3'
    [Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'la' to value '0.5'
    [Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'cx' to value '3'
    [Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'cy' to value '3'
    [Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'ca' to value '0.5'
    [Parsed_format_4 @ 0000021d1e0acc80] Setting 'pix_fmts' to value 'yuv420p'
    [Parsed_setdar_5 @ 0000021d2443c040] Setting 'dar' to value '16/9'
    [graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'video_size' to value '720x576'
    [graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'pix_fmt' to value '0'
    [graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'time_base' to value '1/90000'
    [graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'pixel_aspect' to value '64/45'
    [graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'sws_param' to value 'flags=2'
    [graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'frame_rate' to value '25/1'
    [graph 0 input from stream 0:0 @ 0000021d2443d300] w:720 h:576 pixfmt:yuv420p tb:1/90000 fr:25/1 sar:64/45 sws_param:flags=2
    [format @ 0000021d2443d580] Setting 'pix_fmts' to value 'nv12'
    [auto_scaler_0 @ 0000021d0db5a200] w:iw h:ih flags:'bilinear' interl:0
    [Parsed_unsharp_opencl_2 @ 0000021d0db67000] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_yadif_cuda_1' and the filter 'Parsed_unsharp_opencl_2'
    Impossible to convert between the formats supported by the filter 'Parsed_yadif_cuda_1' and the filter 'auto_scaler_0'
    Error reinitializing filters!
    Failed to inject frame into filter network: Function not implemented
    Error while processing the decoded data for stream #0:0
    [AVIOContext @ 0000021d1dfd8140] Statistics: 0 seeks, 0 writeouts
    [AVIOContext @ 0000021d0db47a80] Statistics: 1855632 bytes read, 2 seeks
    Conversion failed!
    Quote Quote  
  19. @ hydra3333: for readability, please split the actual call and the output into two code-blocks
    looking at the filter_complex:
    Code:
    -filter_complex "[0:v]hwupload_cuda,yadif_cuda=0:-1:0,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv420p,setdar=dar=16/9"
    and
    Code:
    [AVHWDeviceContext @ 0000021d0db3b400] The cl_khr_d3d11_sharing extension is required for D3D11 to OpenCL mapping.
    [AVHWDeviceContext @ 0000021d0db3b400] D3D11 to OpenCL mapping not usable
    ...
    [Parsed_unsharp_opencl_2 @ 0000021d0db67000] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_yadif_cuda_1' and the filter 'Parsed_unsharp_opencl_2'
    Impossible to convert between the formats supported by the filter 'Parsed_yadif_cuda_1' and the filter 'auto_scaler_0'
    .
    I wonder does adding 'hwdownload,format=pix_fmts=yuv420p,hwupload', before the unsharp_opencl and thus explicitly setting the output format of the cuda_yadif filter work?
    I know loading data into the hw filtering, then downloading and uploading it again to filter and that downloading it, probably isn't the best solution, but it might help with the problem.
    (again untested and basically the first thing that jumped to my mind,..)

    Cu Selur
    users currently on my ignore list: deadrats, Stears555
    Quote Quote  
  20. Dinosaur Supervisor KarMa's Avatar
    Join Date
    Jul 2015
    Location
    US
    Search Comp PM
    Originally Posted by veresov View Post
    Ah, I get it, you're too poor to afford a decent GPU. You have my sympathy.
    I've personally known people using DGSource over 5 years ago but the software has probably been out even longer. Back then you could just buy the cheapest NVidia GPU on the market which was more than enough to decode faster than you could encode. With this added Yadif CUDA on top of DGSource, I would still think that the cheapest Nvidia GPU on the market would be more than enough, or even a used GeForce 740 would be more than enough. So your money argument isn't really valid.

    Originally Posted by veresov View Post
    Your 64 fps single-rate case is pitifully slow. Here is DGSource + DGBob (DGBob is a CUDA YADIF clone). It's running over 6 times faster.

    I'm baffled why people use CPU solutions for problems that CUDA/NVDec excels at.
    Do you encode 6 times faster thanks to CUDA decoding? As for me with software decoding and filtering, it's usually only a 10% overhead for 1080p MPEG2 video decoding + yadif. With 1080p H.264 software decoding + yadif it's probably 20%. So unless I'm trying to do super fast - low quality encoding in either software or hw, it's not really that useful to me. Even though I have HW decoding options for avisynth.

    It's been well noted that GPU based video decoders are more prone to decoding errors than software decoders, so that's one reason why someone might not want to.
    Quote Quote  
  21. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Originally Posted by Selur View Post
    @ hydra3333: for readability, please split the actual call and the output into two code-blocks
    OK, good point, thanks. Your suggestion worked.

    Well I guess a test answered the speed question.

    1. vanilla yadif followed by unsharp_opencl
    "C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel warning -stats -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -i ".\1.7TWO.mpg" -t 60 -map_metadata -1 -sws_flags lanczos+accurate_rnd+full_chroma_int+full_chroma_i np -filter_complex "[0:v]yadif=0:0:0,hwupload,unsharp_opencl=lx=3:ly=3:la=0 .5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv 420p,setdar=dar=16/9" -r 25 -c:v h264_nvenc -pix_fmt nv12 -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -af loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00ffset=0.17:linear=truerint_format=summary -c:a libfdk_aac -cutoff 18000 -ab 384k -ar 48000 -y ".\1.7TWO.aac.standard.mp4"
    Code:
    frame= 1500 fps=142 q=18.0 Lsize=   15010kB time=00:01:00.01 bitrate=2049.0kbits/s speed=5.66x
    2. yadif_cuda followed by unsharp_opencl
    "C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel warning -stats -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -i ".\1.7TWO.mpg" -t 60 -map_metadata -1 -sws_flags lanczos+accurate_rnd+full_chroma_int+full_chroma_i np -filter_complex "[0:v]hwupload_cuda,yadif_cuda=0:-1:0,hwdownload,format=pix_fmts=yuv420p,hwupload,un sharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwd ownload,format=pix_fmts=yuv420p" -r 25 -c:v h264_nvenc -pix_fmt nv12 -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -af loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00ffset=0.17:linear=truerint_format=summary -c:a libfdk_aac -cutoff 18000 -ab 384k -ar 48000 -y ".\1.7TWO.aac.yadif_cuda.opencl.mp4"
    Code:
    frame= 1500 fps=125 q=18.0 Lsize=   15000kB time=00:01:00.01 bitrate=2047.7kbits/s speed=5.01x
    I suppose it's the data copies to/from the GPU that do it in.

    If only I was able to cross-compile an ffmpeg with vapoursynth inbuilt (no simple to follow step-by-step instructions) then using a single ffmpeg.exe would be "painless" and insanely fast with DG's latest gear.

    Oh well.
    Quote Quote  
  22. I suppose it's the data copies to/from the GPU that do it in.
    Probably. The up/download is time consuming, I'd recommend to ask in the ffmpeg bug tracker, irc channel or mailing list whether there is a better/faster way to this.

    As a side note, if you want all that in one binary, why not use NVEncC?
    users currently on my ignore list: deadrats, Stears555
    Quote Quote  
  23. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    NVEncC ? Good point.
    I do have it, but haven't checked it out properly eg in regard to deinterlacing. I must do so.
    Also whether it can take vapoursynth input directly nowadays, I guess.
    I seem to recall some audio passing/processing challenge (edit: ah, normalization per the ffmpeg commandlines above) and something to do with needing to pipe NUT format video, maybe that's all no longer relevant.
    At the time, I think I had formed a naive view I somehow trusted a homebuilt ffmpeg a tad more.
    Quote Quote  
  24. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Ah. NVEncC ... I don't how to also read/pass audio from the .mpg source file through vapoursynth (DG's h/w reader and deinterlacer and sharpener) into nvencc for re-encoding.
    I suppose volume leveling (eg with loudnorm) would have to be done separately ?
    An issue is TV capture .mpg files with large-ish internal audio/video offset setting, which some s/w handles but not others.

    Does anyone do this stuff ? I suppose everyone must to get a usable final video, but what is is that people do ?
    Quote Quote  
  25. Missed that you needed Vapoursynth, thought you wanted to open the source and use some cuda filters,..
    Vapoursynth can't handle audio. (since NVEncC is build against libav most of the stuff ffmpeg support should be possible with it, when reading a file source,..)
    users currently on my ignore list: deadrats, Stears555
    Quote Quote  
  26. Member hydra3333's Avatar
    Join Date
    Oct 2009
    Location
    Australia
    Search Comp PM
    Originally Posted by Selur View Post
    Missed that you needed Vapoursynth, thought you wanted to open the source and use some cuda filters,..
    Vapoursynth can't handle audio. (since NVEncC is build against libav most of the stuff ffmpeg support should be possible with it, when reading a file source,..)
    I did With your and other good info and a test or two, it seemed prudent to consider changing tack.
    It seems I'm not looking well enough at the nvencc doco, I'll do a range of testing then.
    Thanks !
    Quote Quote  



Similar Threads