FFMPEG and new GPU-based YADIF_CUDA deinterlacer

2nd Mar 2019 01:07 #1

Member

It seems ffmpeg is cross-compilable (non-free, using the new nvidia CUDA 10.1 toolkit) with the now-inbuilt GPU based YADIF_CUDA deinterlacer mentioned in the nvidia forum here: https://devtalk.nvidia.com/default/topic/1042459/video-codec-and-optical-flow-sdk/nvde...ing-deint-2-/2
ffmpeg documentation here https://ffmpeg.org/ffmpeg-filters.html#yadif_005fcuda

Has anyone done any comparative speed tests of the (hopefully extreme-speed!) GPU-based YADIF_CUDA deinterlacer eg vs vanilla YADIF ?
eg with mode=0,parity=-1,deint=0

Suggestions and comments and the commandlines would be very welcome.
(although, I do prefer not to use the nvdec h.264 hardware decoder in ffmpeg).

Quote

2nd Mar 2019 09:28 #2

Atak_Snajpera

Member

CPU Yadif is already super fast so I see no real benefit of using GPU implementation.

AVC-HD 1920x1080@50i
no deinterlacing

Code:

AVSMeter 2.2.6 (x64)
AviSynth+ 0.1 (r2772, MT, x86_64) (0.1.0.0)
Loading script...

Number of frames:                  829
Length (hh:mm:ss.ms):     00:00:33.160
Frame width:                      1920
Frame height:                     1080
Framerate:                      25.000 (25/1)
Colorspace:                       YV12

Frames processed:               829 (0 - 828)
FPS (min | max | average):      5.563 | 3184 | 65.20
Memory usage (phys | virt):     199 | 122 MiB
Thread count:                   33
CPU usage (average):            10%

Time (elapsed):                 00:00:12.714

YADIF 50p

Code:

AVSMeter 2.2.6 (x64)
AviSynth+ 0.1 (r2772, MT, x86_64) (0.1.0.0)
Loading script...

Number of frames:                 1658
Length (hh:mm:ss.ms):     00:00:33.160
Frame width:                      1920
Frame height:                     1080
Framerate:                      50.000 (50/1)
Colorspace:                       YV12

Frames processed:               1658 (0 - 1657)
FPS (min | max | average):      5.810 | 199.0 | 123.2
Memory usage (phys | virt):     221 | 143 MiB
Thread count:                   33
CPU usage (average):            14%

Time (elapsed):                 00:00:13.454

YADIF 25p

Code:

AVSMeter 2.2.6 (x64)
AviSynth+ 0.1 (r2772, MT, x86_64) (0.1.0.0)
Loading script...

Number of frames:                  829
Length (hh:mm:ss.ms):     00:00:33.160
Frame width:                      1920
Frame height:                     1080
Framerate:                      25.000 (25/1)
Colorspace:                       YV12

Frames processed:               829 (0 - 828)
FPS (min | max | average):      5.787 | 188.6 | 64.41
Memory usage (phys | virt):     211 | 132 MiB
Thread count:                   33
CPU usage (average):            13%

Time (elapsed):                 00:00:12.870

Quote

2nd Mar 2019 10:23 #3

veresov

Banned

Your 64 fps single-rate case is pitifully slow. Here is DGSource + DGBob (DGBob is a CUDA YADIF clone). It's running over 6 times faster.

Code:

AVSMeter 2.7.5 (x64) - Copyright (c) 2012-2017, Groucho2004
AviSynth+ 0.1 (r2728, MT, x86_64) (0.1.0.0)

Number of frames:                 3645
Length (hh:mm:ss.ms):     00:02:01.622
Frame width:                      1920
Frame height:                     1080
Framerate:                      29.970 (30000/1001)
Colorspace:                       YV12

Frames processed:               3645 (0 - 3644)
FPS (min | max | average):      99.05 | 408.2 | 393.4
Memory usage (phys | virt):     268 | 755 MiB
Thread count:                   21
CPU usage (average):            14%

Time (elapsed):                 00:00:09.266

I'm baffled why people use CPU solutions for problems that CUDA/NVDec excels at.

Quote

2nd Mar 2019 10:26 #4

Atak_Snajpera

Member

Originally Posted by veresov

Your 64 fps single-rate case is pitifully slow. Here is DGSource + DGBob (DGBob is my own CUDA YADIF clone). It's running over 6 times faster.

Code:

AVSMeter 2.7.5 (x64) - Copyright (c) 2012-2017, Groucho2004
AviSynth+ 0.1 (r2728, MT, x86_64) (0.1.0.0)

Number of frames:                 3645
Length (hh:mm:ss.ms):     00:02:01.622
Frame width:                      1920
Frame height:                     1080
Framerate:                      29.970 (30000/1001)
Colorspace:                       YV12

Frames processed:               3645 (0 - 3644)
FPS (min | max | average):      99.05 | 408.2 | 393.4
Memory usage (phys | virt):     268 | 755 MiB
Thread count:                   21
CPU usage (average):            14%

Time (elapsed):                 00:00:09.266

Try again with software decoder (ffms2)...

Quote

2nd Mar 2019 10:29 #5

veresov

Banned

Not interested in sacrificing performance. You can try with a CUDA decoder.

Also, one of the points you apparently miss when you say you see no benefit is that offloading to the GPU frees up CPU.

If we did this for 4K video your solution would go from poor to miserable, whereas mine would retain very high performance.

Finally, further gains in the CUDA solution can be had by using the CUDASynth framework.

Last edited by veresov; 2nd Mar 2019 at 10:38.

Quote

2nd Mar 2019 10:41 #6

Atak_Snajpera

Member

If we did this for 4K video your solution would go from poor to miserable, whereas mine would retain very high performance.

Yeah... because 3840x2160@60i is soooooooo common...

Originally Posted by veresov

Not interested in sacrificing performance. You can try with a CUDA decoder.

Also, one of the points you apparently miss when you say you see no benefit is that offloading to the GPU frees up CPU.

Add encoder to the mix and you will notice this

I'm using E5-2690 + x264 default medium profile CRF20 + FFMS2 + YADIF 50fps and decoding process uses only ~8% of my cpu. I doubt that you can reduce that to 0% on your CUDA decoder + deinterlacer. Hardware decoder in this task is basically a placebo.

Quote

2nd Mar 2019 10:45 #7

veresov

Banned

Stay with your low performance if you like. I don't care.

Quote

2nd Mar 2019 10:50 #8

Atak_Snajpera

Member

Originally Posted by veresov

Stay with your low performance if you like. I don't care.

Enjoy your placebo effect during encoding with x264/x265/AV1 and so on. Mr. Neuron2

Quote

2nd Mar 2019 10:52 #9

veresov

Banned

I use NVEnc. Why would I want to choose a low performance solution?

Quote

2nd Mar 2019 10:56 #10

Atak_Snajpera

Member

Originally Posted by veresov

I use NVEnc. Why would I want to choose a low performance solution?

Because NVenc sucks in terms of fine detail retention at low bitrates?

Just for lulz. The same but with default x265 profile

Yeah. In this case I could save whole 1% by using hardware decoding+deinterlacing. AMAZING! SHUT UP AND TAKE ALL MY PESOS!

Quote

2nd Mar 2019 11:01 #11

veresov

Banned

Ah, I get it, you're too poor to afford a decent GPU. You have my sympathy.

Quote

2nd Mar 2019 11:05 #12

Atak_Snajpera

Member

Originally Posted by veresov

Ah, I get it, you're too poor to afford a decent GPU. You have my sympathy.

Lack of arguments, I sense here...

Quote

2nd Mar 2019 11:10 #13

veresov

Banned

Your ability to sense and understand things will surely improve as you mature.

Last edited by veresov; 2nd Mar 2019 at 12:34.

Quote

2nd Mar 2019 17:20 #14

hydra3333

Member

Whilst open to information and suggestions/discussion, I am not sure I comprehend the line of reasoning ... it seems to be along the lines of "don't bother to use something faster" ?
It may be a valid conclusion at this point in some circumstances, although it does seem counter intuitive.

One would have reasonably thought that as videos inevitably change in characteristic over time (eg recall old phone .3gp etc vs 1080i which seem ubiquitous nowadays) into more size/complexity, faster tools and whatnot would seem to be desirable.

Some like NVDec. In vapoursynth and using DG's handy and fast tools, yes I do; with vanilla commandline ffmpeg some (a minority of?) streams seem more problematic possibly ending up with out of sync audio and whatnot which isn't great in automated workflows.

Still, perhaps I wasn't very good in the initial post, I was more interested in single ffmpeg commandline ("quick'n'dirty") de-interlacing quality and performance information about the recently updated yadif_cuda filter If I was wanting to use beaut stuff and fine control on video sources which would be worth it, I'd use vapoursynth and DG's amazing gear and the large range of great filters available for vapoursynth

Thank you for your informative comments, though !

Last edited by hydra3333; 2nd Mar 2019 at 17:38.

Quote

2nd Mar 2019 19:38 #15

hydra3333

Member

Hello, seeking help using the newly updated FFMPEG filter YADIF_CUDA.

I'm obviously doing something wrong in (not?) converting between formats in the second commandline, but I don't know what.

Example 1 works, using NVDEC as source input filter.
Example 2 fails, using vanilla ffmpeg mpeg2 source input filter.

Can anyone please clarify ?

Code:

1. ---------------------------------------------------- 
"C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel verbose -stats -hwaccel nvdec -hwaccel_output_format cuda -i ".\1.7TWO.mpg" -t 05 -vf yadif_cuda=0:-1:0 -c:v h264_nvenc -preset lossless -f mp4 -y ".\1.7TWO.aac.yadif_cuda.works.mp4" 
ffmpeg version N-93276-g3b23eb283a-test Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8.3.0 (GCC)
  configuration: --arch=x86_64 --target-os=mingw32 --cross-prefix=x86_64-w64-mingw32- --pkg-config=pkg-config --disable-w32threads --enable-pthreads --enable-cross-compile --enable-pic --enable-libsoxr --enable-libass --enable-iconv --enable-libtwolame --enable-libzvbi --enable-libcaca --enable-libmodplug --enable-cuvid --enable-libmp3lame --enable-version3 --enable-zlib --enable-librtmp --enable-libvorbis --enable-libtheora --enable-libspeex --enable-libgsm --enable-libopus --enable-bzlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-libvpx --enable-libilbc --enable-libwavpack --enable-libwebp --enable-dxva2 --disable-avisynth --enable-vapoursynth --enable-gray --enable-libmysofa --enable-libflite --enable-lzma --enable-libsnappy --enable-libzimg --enable-libx264 --enable-libx265 --enable-libaom --enable-libdav1d --enable-frei0r --enable-filter=frei0r --enable-librubberband --enable-libvidstab --enable-libxvid --enable-libgme --enable-runtime-cpudetect --enable-libfribidi --enable-gnutls --enable-gmp --enable-fontconfig --enable-libfontconfig --enable-libfreetype --enable-libbluray --enable-libcdio --enable-libmfx --disable-schannel --enable-ladspa --enable-libxml2 --enable-libdavs2 --enable-libopenmpt --enable-libxavs --enable-libxavs2 --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-opengl --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-gpl --extra-version='test _of_DeadSix27/python_cross_compile_script' --enable-avresample --pkg-config-flags=--static --extra-libs='-lintl -liconv' --extra-cflags=-DLIBTWOLAME_STATIC --extra-cflags=-DMODPLUG_STATIC --enable-libbluray --prefix=/home/u/Desktop/workdir/x86_64_products/ffmpeg_static_non_free_opencl.installed --disable-shared --enable-static --enable-cuda-nvcc --enable-nonfree --enable-opencl --enable-nonfree --enable-libfdk-aac --enable-decklink --extra-cflags=-DLIBXML_STATIC --extra-cflags=-DGLIB_STATIC_COMPILATION
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 47.102 / 58. 47.102
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  6.101 / 58.  6.101
  libavfilter     7. 48.100 /  7. 48.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
[mpeg @ 000002199435b3c0] max_analyze_duration 5000000 reached at 5000000 microseconds st:0
Input #0, mpeg, from '.\1.7TWO.mpg':
  Duration: 01:24:11.57, start: 0.229856, bitrate: 3129 kb/s
    Stream #0:0[0x1e0]: Video: mpeg2video (Main), 1 reference frame, yuv420p(tv, top first, left), 720x576 [SAR 64:45 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 192 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (mpeg2video (native) -> h264 (h264_nvenc))
  Stream #0:1 -> #0:1 (mp2 (native) -> aac (native))
Press [q] to stop, [?] for help
[mpeg2video @ 000002199436d380] NVDEC capabilities:
[mpeg2video @ 000002199436d380] format supported: yes, max_mb_count: 65280
[mpeg2video @ 000002199436d380] min_width: 48, max_width: 4080
[mpeg2video @ 000002199436d380] min_height: 16, max_height: 4080
[graph 0 input from stream 0:0 @ 000002199486ffc0] w:720 h:576 pixfmt:cuda tb:1/90000 fr:25/1 sar:64/45 sws_param:flags=2
[h264_nvenc @ 00000219943cf680] Loaded Nvenc version 9.0
[h264_nvenc @ 00000219943cf680] Nvenc initialized successfully
[graph_1_in_0_1 @ 00000219943b3f00] tb:1/48000 samplefmt:s16p samplerate:48000 chlayout:0x3
[format_out_0_1 @ 00000219943b4800] auto-inserting filter 'auto_resampler_0' between the filter 'Parsed_anull_0' and the filter 'format_out_0_1'
[auto_resampler_0 @ 00000219943b3800] ch:2 chl:stereo fmt:s16p r:48000Hz -> ch:2 chl:stereo fmt:fltp r:48000Hz
Output #0, mp4, to '.\1.7TWO.aac.yadif_cuda.works.mp4':
  Metadata:
    encoder         : Lavf58.26.101
    Stream #0:0: Video: h264 (h264_nvenc), 1 reference frame (avc1 / 0x31637661), cuda(progressive, left), 720x576 [SAR 64:45 DAR 16:9], q=-1--1, 2000 kb/s, 25 fps, 12800 tbn, 25 tbc
    Metadata:
      encoder         : Lavc58.47.102 h264_nvenc
    Side data:
      cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000 vbv_delay: -1
    Stream #0:1: Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, delay 1024, 128 kb/s
    Metadata:
      encoder         : Lavc58.47.102 aac
No more output streams to write to, finishing.
frame=  125 fps=0.0 q=-1.0 Lsize=   13331kB time=00:00:05.01 bitrate=21783.7kbits/s speed=16.7x    
video:13249kB audio:78kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.030828%
Input file #0 (.\1.7TWO.mpg):
  Input stream #0:0 (video): 128 packets read (1379392 bytes); 128 frames decoded; 
  Input stream #0:1 (audio): 210 packets read (120960 bytes); 210 frames decoded (241920 samples); 
  Total: 338 packets (1500352 bytes) demuxed
Output file #0 (.\1.7TWO.aac.yadif_cuda.works.mp4):
  Output stream #0:0 (video): 125 frames encoded; 125 packets muxed (13566744 bytes); 
  Output stream #0:1 (audio): 235 frames encoded (240000 samples); 236 packets muxed (80158 bytes); 
  Total: 361 packets (13646902 bytes) muxed
[AVIOContext @ 000002199435d200] Statistics: 2 seeks, 56 writeouts
[h264_nvenc @ 00000219943cf680] Nvenc unloaded
[aac @ 000002199435c840] Qavg: 515.211
[AVIOContext @ 0000021994363d40] Statistics: 3330192 bytes read, 2 seeks

2. ---------------------------------------------------- 
"C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel verbose -stats -i ".\1.7TWO.mpg" -t 05 -map_metadata -1 -vf "format=cuda,yadif_cuda=0:-1:0" -r 25 -c:v h264_nvenc -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -an  -y ".\1.7TWO.aac.yadif_cuda.mp4" 
ffmpeg version N-93276-g3b23eb283a-test Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8.3.0 (GCC)
  configuration: --arch=x86_64 --target-os=mingw32 --cross-prefix=x86_64-w64-mingw32- --pkg-config=pkg-config --disable-w32threads --enable-pthreads --enable-cross-compile --enable-pic --enable-libsoxr --enable-libass --enable-iconv --enable-libtwolame --enable-libzvbi --enable-libcaca --enable-libmodplug --enable-cuvid --enable-libmp3lame --enable-version3 --enable-zlib --enable-librtmp --enable-libvorbis --enable-libtheora --enable-libspeex --enable-libgsm --enable-libopus --enable-bzlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-libvpx --enable-libilbc --enable-libwavpack --enable-libwebp --enable-dxva2 --disable-avisynth --enable-vapoursynth --enable-gray --enable-libmysofa --enable-libflite --enable-lzma --enable-libsnappy --enable-libzimg --enable-libx264 --enable-libx265 --enable-libaom --enable-libdav1d --enable-frei0r --enable-filter=frei0r --enable-librubberband --enable-libvidstab --enable-libxvid --enable-libgme --enable-runtime-cpudetect --enable-libfribidi --enable-gnutls --enable-gmp --enable-fontconfig --enable-libfontconfig --enable-libfreetype --enable-libbluray --enable-libcdio --enable-libmfx --disable-schannel --enable-ladspa --enable-libxml2 --enable-libdavs2 --enable-libopenmpt --enable-libxavs --enable-libxavs2 --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-opengl --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-gpl --extra-version='test _of_DeadSix27/python_cross_compile_script' --enable-avresample --pkg-config-flags=--static --extra-libs='-lintl -liconv' --extra-cflags=-DLIBTWOLAME_STATIC --extra-cflags=-DMODPLUG_STATIC --enable-libbluray --prefix=/home/u/Desktop/workdir/x86_64_products/ffmpeg_static_non_free_opencl.installed --disable-shared --enable-static --enable-cuda-nvcc --enable-nonfree --enable-opencl --enable-nonfree --enable-libfdk-aac --enable-decklink --extra-cflags=-DLIBXML_STATIC --extra-cflags=-DGLIB_STATIC_COMPILATION
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 47.102 / 58. 47.102
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  6.101 / 58.  6.101
  libavfilter     7. 48.100 /  7. 48.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
Routing option strict to both codec and muxer layer
[mpeg @ 000001caa72dbd80] max_analyze_duration 5000000 reached at 5000000 microseconds st:0
Input #0, mpeg, from '.\1.7TWO.mpg':
  Duration: 01:24:11.57, start: 0.229856, bitrate: 3129 kb/s
    Stream #0:0[0x1e0]: Video: mpeg2video (Main), 1 reference frame, yuv420p(tv, top first, left), 720x576 [SAR 64:45 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 192 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (mpeg2video (native) -> h264 (h264_nvenc))
Press [q] to stop, [?] for help
[mpeg @ 000001caa72dbd80] Correcting start time by 10144
[graph 0 input from stream 0:0 @ 000001caa7346b40] w:720 h:576 pixfmt:yuv420p tb:1/90000 fr:25/1 sar:64/45 sws_param:flags=2
[auto_scaler_0 @ 000001caa7806a40] w:iw h:ih flags:'bicubic' interl:0
[Parsed_format_0 @ 000001caa72c8780] auto-inserting filter 'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_format_0'
Impossible to convert between the formats supported by the filter 'graph 0 input from stream 0:0' and the filter 'auto_scaler_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0
[AVIOContext @ 000001caa73a9780] Statistics: 0 seeks, 0 writeouts
[AVIOContext @ 000001caa72e4d40] Statistics: 1855632 bytes read, 2 seeks
Conversion failed!
----------------------------------------------------

Last edited by hydra3333; 2nd Mar 2019 at 20:06.

Quote

3rd Mar 2019 02:43 #16

Selur

Member

@hydra3333: How I understood:

Hardware filters can be used in a filter graph like any other filter. Note, however, that they may not support any formats in common with software filters – in such cases it may be necessary to make use of hwupload and hwdownload filter instances to move frame data between hardware surfaces and normal memory.

see: https://trac.ffmpeg.org/wiki/HWAccelIntro
(haven't tested) hwupload_cuda would be needed before cuda based filters when the input isn't loaded through cuda,... and if you use hardware acceleration for decoding, but not for filtering hwdownload_cuda is needed.

Cu Selur

users currently on my ignore list: deadrats, Stears555

Quote

3rd Mar 2019 04:04 #17

hydra3333

Member

Thanks Selur ! This worked with

Code:

-init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -vf "hwupload_cuda,yadif_cuda=0:-1:0"

The log:

Code:

"C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel verbose -stats -i ".\1.7TWO.mpg" -t 05 -map_metadata -1 -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -vf "hwupload_cuda,yadif_cuda=0:-1:0" -r 25 -c:v h264_nvenc -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -an  -y ".\1.7TWO.aac.yadif_cuda.mp4" 
ffmpeg version N-93276-g3b23eb283a Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8.3.0 (GCC)
  configuration: --arch=x86_64 --target-os=mingw32 --cross-prefix=x86_64-w64-mingw32- --pkg-config=pkg-config --disable-w32threads --enable-pthreads --enable-cross-compile --enable-pic --enable-libsoxr --enable-libass --enable-iconv --enable-libtwolame --enable-libzvbi --enable-libcaca --enable-libmodplug --enable-cuvid --enable-libmp3lame --enable-version3 --enable-zlib --enable-librtmp --enable-libvorbis --enable-libtheora --enable-libspeex --enable-libgsm --enable-libopus --enable-bzlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-libvpx --enable-libilbc --enable-libwavpack --enable-libwebp --enable-dxva2 --disable-avisynth --enable-vapoursynth --enable-gray --enable-libmysofa --enable-libflite --enable-lzma --enable-libsnappy --enable-libzimg --enable-libx264 --enable-libx265 --enable-libaom --enable-libdav1d --enable-frei0r --enable-filter=frei0r --enable-librubberband --enable-libvidstab --enable-libxvid --enable-libgme --enable-runtime-cpudetect --enable-libfribidi --enable-gnutls --enable-gmp --enable-fontconfig --enable-libfontconfig --enable-libfreetype --enable-libbluray --enable-libcdio --enable-libmfx --disable-schannel --enable-ladspa --enable-libxml2 --enable-libdavs2 --enable-libopenmpt --enable-libxavs --enable-libxavs2 --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-opengl --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-gpl --extra-version='DeadSix27/python_cross_compile_script' --enable-avresample --pkg-config-flags=--static --extra-libs='-lintl -liconv' --extra-cflags=-DLIBTWOLAME_STATIC --extra-cflags=-DMODPLUG_STATIC --enable-libbluray --prefix=/home/u/Desktop/workdir/x86_64_products/ffmpeg_static_non_free_opencl.installed --disable-shared --enable-static --enable-cuda-nvcc --enable-nonfree --enable-opencl --enable-nonfree --enable-libfdk-aac --enable-decklink --extra-cflags=-DLIBXML_STATIC --extra-cflags=-DGLIB_STATIC_COMPILATION
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 47.102 / 58. 47.102
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  6.101 / 58.  6.101
  libavfilter     7. 48.100 /  7. 48.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
Routing option strict to both codec and muxer layer
[AVHWDeviceContext @ 000001306346b6c0] 0.0: NVIDIA CUDA / GeForce GTX 1050 Ti
[AVHWDeviceContext @ 000001306346b6c0] DXVA2 to OpenCL mapping function found (clCreateFromDX9MediaSurfaceKHR).
[AVHWDeviceContext @ 000001306346b6c0] DXVA2 in OpenCL acquire function found (clEnqueueAcquireDX9MediaSurfacesKHR).
[AVHWDeviceContext @ 000001306346b6c0] DXVA2 in OpenCL release function found (clEnqueueReleaseDX9MediaSurfacesKHR).
[AVHWDeviceContext @ 000001306346b6c0] The cl_khr_d3d11_sharing extension is required for D3D11 to OpenCL mapping.
[AVHWDeviceContext @ 000001306346b6c0] D3D11 to OpenCL mapping not usable.
[mpeg @ 000001306346d900] max_analyze_duration 5000000 reached at 5000000 microseconds st:0
Input #0, mpeg, from '.\1.7TWO.mpg':
  Duration: 01:24:11.57, start: 0.229856, bitrate: 3129 kb/s
    Stream #0:0[0x1e0]: Video: mpeg2video (Main), 1 reference frame, yuv420p(tv, top first, left), 720x576 [SAR 64:45 DAR 16:9], 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x1c0]: Audio: mp2, 48000 Hz, stereo, s16p, 192 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (mpeg2video (native) -> h264 (h264_nvenc))
Press [q] to stop, [?] for help
[mpeg @ 000001306346d900] Correcting start time by 10144
[graph 0 input from stream 0:0 @ 00000130634aa6c0] w:720 h:576 pixfmt:yuv420p tb:1/90000 fr:25/1 sar:64/45 sws_param:flags=2
[h264_nvenc @ 000001306346eac0] Loaded Nvenc version 9.0
[h264_nvenc @ 000001306346eac0] Nvenc initialized successfully
[h264_nvenc @ 000001306346eac0] Lookahead enabled: depth 32, scenecut enabled, B-adapt enabled.
Output #0, mp4, to '.\1.7TWO.aac.yadif_cuda.mp4':
  Metadata:
    encoder         : Lavf58.26.101
    Stream #0:0: Video: h264 (h264_nvenc) (High), 1 reference frame (avc1 / 0x31637661), cuda(left), 720x576 [SAR 64:45 DAR 16:9], q=16-25, 2000 kb/s, 25 fps, 12800 tbn, 25 tbc
    Metadata:
      encoder         : Lavc58.47.102 h264_nvenc
    Side data:
      cpb: bitrate max/min/avg: 0/0/2000000 buffer size: 4000000 vbv_delay: -1
frame=  109 fps=0.0 q=18.0 size=     256kB time=00:00:02.76 bitrate= 760.0kbits/s speed= 5.5x    
No more output streams to write to, finishing.
[mp4 @ 0000013073866980] Starting second pass: moving the moov atom to the beginning of the file
color primaries unspecified, assuming bt470bg
[AVIOContext @ 00000130634f8980] Statistics: 937177 bytes read, 0 seeks
frame=  125 fps=0.0 q=18.0 Lsize=     917kB time=00:00:04.92 bitrate=1527.3kbits/s speed=8.73x    
video:915kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.231772%
Input file #0 (.\1.7TWO.mpg):
  Input stream #0:0 (video): 128 packets read (1379392 bytes); 128 frames decoded; 
  Input stream #0:1 (audio): 0 packets read (0 bytes); 
  Total: 128 packets (1379392 bytes) demuxed
Output file #0 (.\1.7TWO.aac.yadif_cuda.mp4):
  Output stream #0:0 (video): 125 frames encoded; 125 packets muxed (937129 bytes); 
  Total: 125 packets (937129 bytes) muxed
[AVIOContext @ 0000013063471400] Statistics: 4 seeks, 11 writeouts
[h264_nvenc @ 000001306346eac0] Nvenc unloaded
[AVIOContext @ 0000013063476b00] Statistics: 3330192 bytes read, 2 seeks
----------------------------------------------------

Quote

3rd Mar 2019 04:21 #18

hydra3333

Member

One more similar query, if I may

I tried various combinations between the YADIF_CUDA filter and the UNSHARP_OPENCL opencl filter (each works OK by itself) but I can't seem to jag something that works with both.

Code:

"C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel debug -stats -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -i ".\1.7TWO.mpg" -t 05 -map_metadata -1 -sws_flags lanczos+accurate_rnd+full_chroma_int+full_chroma_inp -filter_complex "[0:v]hwupload_cuda,yadif_cuda=0:-1:0,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv420p,setdar=dar=16/9" -r 25 -c:v h264_nvenc -pix_fmt nv12 -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -af loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00:offset=0.17:linear=true:print_format=summary -c:a libfdk_aac -cutoff 18000 -ab 384k -ar 48000  -y ".\1.7TWO.aac.xxx.mp4" 

ffmpeg version N-93276-g3b23eb283a Copyright (c) 2000-2019 the FFmpeg developers
  built with gcc 8.3.0 (GCC)
  configuration: --arch=x86_64 --target-os=mingw32 --cross-prefix=x86_64-w64-mingw32- --pkg-config=pkg-config --disable-w32threads --enable-pthreads --enable-cross-compile --enable-pic --enable-libsoxr --enable-libass --enable-iconv --enable-libtwolame --enable-libzvbi --enable-libcaca --enable-libmodplug --enable-cuvid --enable-libmp3lame --enable-version3 --enable-zlib --enable-librtmp --enable-libvorbis --enable-libtheora --enable-libspeex --enable-libgsm --enable-libopus --enable-bzlib --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libvo-amrwbenc --enable-libvpx --enable-libilbc --enable-libwavpack --enable-libwebp --enable-dxva2 --disable-avisynth --enable-vapoursynth --enable-gray --enable-libmysofa --enable-libflite --enable-lzma --enable-libsnappy --enable-libzimg --enable-libx264 --enable-libx265 --enable-libaom --enable-libdav1d --enable-frei0r --enable-filter=frei0r --enable-librubberband --enable-libvidstab --enable-libxvid --enable-libgme --enable-runtime-cpudetect --enable-libfribidi --enable-gnutls --enable-gmp --enable-fontconfig --enable-libfontconfig --enable-libfreetype --enable-libbluray --enable-libcdio --enable-libmfx --disable-schannel --enable-ladspa --enable-libxml2 --enable-libdavs2 --enable-libopenmpt --enable-libxavs --enable-libxavs2 --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-opengl --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-gpl --extra-version='DeadSix27/python_cross_compile_script' --enable-avresample --pkg-config-flags=--static --extra-libs='-lintl -liconv' --extra-cflags=-DLIBTWOLAME_STATIC --extra-cflags=-DMODPLUG_STATIC --enable-libbluray --prefix=/home/u/Desktop/workdir/x86_64_products/ffmpeg_static_non_free_opencl.installed --disable-shared --enable-static --enable-cuda-nvcc --enable-nonfree --enable-opencl --enable-nonfree --enable-libfdk-aac --enable-decklink --extra-cflags=-DLIBXML_STATIC --extra-cflags=-DGLIB_STATIC_COMPILATION
  libavutil      56. 26.100 / 56. 26.100
  libavcodec     58. 47.102 / 58. 47.102
  libavformat    58. 26.101 / 58. 26.101
  libavdevice    58.  6.101 / 58.  6.101
  libavfilter     7. 48.100 /  7. 48.100
  libavresample   4.  0.  0 /  4.  0.  0
  libswscale      5.  4.100 /  5.  4.100
  libswresample   3.  4.100 /  3.  4.100
  libpostproc    55.  4.100 / 55.  4.100
Splitting the commandline.
Reading option '-loglevel' ... matched as option 'loglevel' (set logging level) with argument 'debug'.
Reading option '-stats' ... matched as option 'stats' (print progress report during encoding) with argument '1'.
Reading option '-init_hw_device' ... matched as option 'init_hw_device' (initialise hardware device) with argument 'opencl=ocl:0.0'.
Reading option '-filter_hw_device' ... matched as option 'filter_hw_device' (set hardware device used when filtering) with argument 'ocl'.
Reading option '-i' ... matched as input url with argument '.\1.7TWO.mpg'.
Reading option '-t' ... matched as option 't' (record or transcode "duration" seconds of audio/video) with argument '05'.
Reading option '-map_metadata' ... matched as option 'map_metadata' (set metadata information of outfile from infile) with argument '-1'.
Reading option '-sws_flags' ... matched as AVOption 'sws_flags' with argument 'lanczos+accurate_rnd+full_chroma_int+full_chroma_inp'.
Reading option '-filter_complex' ... matched as option 'filter_complex' (create a complex filtergraph) with argument '[0:v]hwupload_cuda,yadif_cuda=0:-1:0,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv420p,setdar=dar=16/9'.
Reading option '-r' ... matched as option 'r' (set frame rate (Hz value, fraction or abbreviation)) with argument '25'.
Reading option '-c:v' ... matched as option 'c' (codec name) with argument 'h264_nvenc'.
Reading option '-pix_fmt' ... matched as option 'pix_fmt' (set pixel format) with argument 'nv12'.
Reading option '-preset' ... matched as AVOption 'preset' with argument 'slow'.
Reading option '-bf' ... matched as AVOption 'bf' with argument '2'.
Reading option '-g' ... matched as AVOption 'g' with argument '50'.
Reading option '-refs' ... matched as AVOption 'refs' with argument '3'.
Reading option '-rc:v' ... matched as AVOption 'rc:v' with argument 'vbr_hq'.
Reading option '-rc-lookahead:v' ... matched as AVOption 'rc-lookahead:v' with argument '32'.
Reading option '-cq' ... matched as AVOption 'cq' with argument '22'.
Reading option '-qmin' ... matched as AVOption 'qmin' with argument '16'.
Reading option '-qmax' ... matched as AVOption 'qmax' with argument '25'.
Reading option '-coder' ... matched as AVOption 'coder' with argument 'cabac'.
Reading option '-strict' ...Routing option strict to both codec and muxer layer
 matched as AVOption 'strict' with argument 'experimental'.
Reading option '-movflags' ... matched as AVOption 'movflags' with argument '+faststart+write_colr'.
Reading option '-profile:v' ... matched as option 'profile' (set profile) with argument 'high'.
Reading option '-level' ... matched as AVOption 'level' with argument '4.1'.
Reading option '-af' ... matched as option 'af' (set audio filters) with argument 'loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00:offset=0.17:linear=true:print_format=summary'.
Reading option '-c:a' ... matched as option 'c' (codec name) with argument 'libfdk_aac'.
Reading option '-cutoff' ... matched as AVOption 'cutoff' with argument '18000'.
Reading option '-ab' ... matched as option 'ab' (audio bitrate (please use -b:a)) with argument '384k'.
Reading option '-ar' ... matched as option 'ar' (set audio sampling rate (in Hz)) with argument '48000'.
Reading option '-y' ... matched as option 'y' (overwrite output files) with argument '1'.
Reading option '.\1.7TWO.aac.xxx.mp4' ... matched as output url.
Finished splitting the commandline.
Parsing a group of options: global .
Applying option loglevel (set logging level) with argument debug.
Applying option stats (print progress report during encoding) with argument 1.
Applying option init_hw_device (initialise hardware device) with argument opencl=ocl:0.0.
[AVHWDeviceContext @ 0000021d0db3b400] 2 OpenCL platforms found.
[AVHWDeviceContext @ 0000021d0db3b400] 1 OpenCL devices found on platform "NVIDIA CUDA".
[AVHWDeviceContext @ 0000021d0db3b400] 0.0: NVIDIA CUDA / GeForce GTX 1050 Ti
[AVHWDeviceContext @ 0000021d0db3b400] DXVA2 to OpenCL mapping function found (clCreateFromDX9MediaSurfaceKHR).
[AVHWDeviceContext @ 0000021d0db3b400] DXVA2 in OpenCL acquire function found (clEnqueueAcquireDX9MediaSurfacesKHR).
[AVHWDeviceContext @ 0000021d0db3b400] DXVA2 in OpenCL release function found (clEnqueueReleaseDX9MediaSurfacesKHR).
[AVHWDeviceContext @ 0000021d0db3b400] The cl_khr_d3d11_sharing extension is required for D3D11 to OpenCL mapping.
[AVHWDeviceContext @ 0000021d0db3b400] D3D11 to OpenCL mapping not usable.
Applying option filter_hw_device (set hardware device used when filtering) with argument ocl.
Applying option filter_complex (create a complex filtergraph) with argument [0:v]hwupload_cuda,yadif_cuda=0:-1:0,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv420p,setdar=dar=16/9.
Applying option y (overwrite output files) with argument 1.
Successfully parsed a group of options.
Parsing a group of options: input url .\1.7TWO.mpg.
Successfully parsed a group of options.
Opening an input file: .\1.7TWO.mpg.
[NULL @ 0000021d0db3f140] Opening '.\1.7TWO.mpg' for reading
[file @ 0000021d0db3f840] Setting default whitelist 'file,crypto'
[mpeg @ 0000021d0db3f140] Format mpeg probed with size=2048 and score=26
[mpeg @ 0000021d0db3f140] Before avformat_find_stream_info() pos: 0 bytes read:32768 seeks:0 nb_streams:0
[mpeg @ 0000021d0db3f140] probing stream 0 pp:2500
[mpeg @ 0000021d0db3f140] Probe with size=2012, packets=1 detected mpegvideo with score=25
[mpeg @ 0000021d0db3f140] probed stream 0
[mpeg2video @ 0000021d0db51c80] Format yuv420p chosen by get_format().
[mpeg @ 0000021d0db3f140] max_analyze_duration 5000000 reached at 5000000 microseconds st:0
[mpeg @ 0000021d0db3f140] After avformat_find_stream_info() pos: 0 bytes read:1790096 seeks:2 frames:337
Input #0, mpeg, from '.\1.7TWO.mpg':
  Duration: 01:24:11.57, start: 0.229856, bitrate: 3129 kb/s
    Stream #0:0[0x1e0], 127, 1/90000: Video: mpeg2video (Main), 1 reference frame, yuv420p(tv, top first, left), 720x576 [SAR 64:45 DAR 16:9], 0/1, 25 fps, 25 tbr, 90k tbn, 50 tbc
    Stream #0:1[0x1c0], 210, 1/90000: Audio: mp2, 48000 Hz, stereo, s16p, 192 kb/s
Successfully opened the file.
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded lib: nvcuda.dll
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuInit
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceGetCount
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceGet
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceGetAttribute
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceGetName
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceComputeCapability
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuCtxCreate_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuCtxSetLimit
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuCtxPushCurrent_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuCtxPopCurrent_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuCtxDestroy_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemAlloc_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemAllocPitch_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemsetD8Async
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemFree_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemcpy2D_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMemcpy2DAsync_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGetErrorName
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGetErrorString
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuStreamCreate
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuStreamQuery
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuStreamSynchronize
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuStreamDestroy_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuStreamAddCallback
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuEventCreate
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuEventDestroy_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuEventSynchronize
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuEventQuery
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuEventRecord
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuLaunchKernel
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuModuleLoadData
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuModuleUnload
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuModuleGetFunction
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuTexObjectCreate
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuTexObjectDestroy
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGLGetDevices_v2
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGraphicsGLRegisterImage
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGraphicsUnregisterResource
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGraphicsMapResources
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGraphicsUnmapResources
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuGraphicsSubResourceGetMappedArray
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDeviceGetUuid
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuImportExternalMemory
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDestroyExternalMemory
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuExternalMemoryGetMappedBuffer
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuExternalMemoryGetMappedMipmappedArray
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMipmappedArrayGetLevel
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuMipmappedArrayDestroy
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuImportExternalSemaphore
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuDestroyExternalSemaphore
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuSignalExternalSemaphoresAsync
[AVHWDeviceContext @ 0000021d0db7cec0] Loaded sym: cuWaitExternalSemaphoresAsync
[Parsed_yadif_cuda_1 @ 0000021d0db88800] Setting 'mode' to value '0'
[Parsed_yadif_cuda_1 @ 0000021d0db88800] Setting 'parity' to value '-1'
[Parsed_yadif_cuda_1 @ 0000021d0db88800] Setting 'deint' to value '0'
[Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'lx' to value '3'
[Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'ly' to value '3'
[Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'la' to value '0.5'
[Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'cx' to value '3'
[Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'cy' to value '3'
[Parsed_unsharp_opencl_2 @ 0000021d0db88bc0] Setting 'ca' to value '0.5'
[Parsed_format_4 @ 0000021d0db89580] Setting 'pix_fmts' to value 'yuv420p'
[Parsed_setdar_5 @ 0000021d0db89840] Setting 'dar' to value '16/9'
Parsing a group of options: output url .\1.7TWO.aac.xxx.mp4.
Applying option t (record or transcode "duration" seconds of audio/video) with argument 05.
Applying option map_metadata (set metadata information of outfile from infile) with argument -1.
Applying option r (set frame rate (Hz value, fraction or abbreviation)) with argument 25.
Applying option c:v (codec name) with argument h264_nvenc.
Applying option pix_fmt (set pixel format) with argument nv12.
Applying option profile:v (set profile) with argument high.
Applying option af (set audio filters) with argument loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00:offset=0.17:linear=true:print_format=summary.
Applying option c:a (codec name) with argument libfdk_aac.
Applying option ab (audio bitrate (please use -b:a)) with argument 384k.
Applying option ar (set audio sampling rate (in Hz)) with argument 48000.
Successfully parsed a group of options.
Opening an output file: .\1.7TWO.aac.xxx.mp4.
[file @ 0000021d0db46d00] Setting default whitelist 'file,crypto'
Successfully opened the file.
detected 8 logical cores
Stream mapping:
  Stream #0:0 (mpeg2video) -> hwupload_cuda (graph 0)
  setdar (graph 0) -> Stream #0:0 (h264_nvenc)
  Stream #0:1 -> #0:1 (mp2 (native) -> aac (libfdk_aac))
Press [q] to stop, [?] for help
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
[mpeg2video @ 0000021d0db50800] Format yuv420p chosen by get_format().
cur_dts is invalid (this is harmless if it occurs once at the start per stream)
[AVHWDeviceContext @ 0000021d2443a800] Loaded lib: nvcuda.dll
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuInit
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceGetCount
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceGet
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceGetAttribute
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceGetName
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceComputeCapability
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuCtxCreate_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuCtxSetLimit
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuCtxPushCurrent_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuCtxPopCurrent_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuCtxDestroy_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemAlloc_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemAllocPitch_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemsetD8Async
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemFree_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemcpy2D_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMemcpy2DAsync_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGetErrorName
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGetErrorString
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuStreamCreate
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuStreamQuery
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuStreamSynchronize
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuStreamDestroy_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuStreamAddCallback
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuEventCreate
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuEventDestroy_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuEventSynchronize
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuEventQuery
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuEventRecord
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuLaunchKernel
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuModuleLoadData
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuModuleUnload
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuModuleGetFunction
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuTexObjectCreate
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuTexObjectDestroy
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGLGetDevices_v2
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGraphicsGLRegisterImage
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGraphicsUnregisterResource
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGraphicsMapResources
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGraphicsUnmapResources
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuGraphicsSubResourceGetMappedArray
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDeviceGetUuid
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuImportExternalMemory
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDestroyExternalMemory
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuExternalMemoryGetMappedBuffer
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuExternalMemoryGetMappedMipmappedArray
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMipmappedArrayGetLevel
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuMipmappedArrayDestroy
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuImportExternalSemaphore
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuDestroyExternalSemaphore
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuSignalExternalSemaphoresAsync
[AVHWDeviceContext @ 0000021d2443a800] Loaded sym: cuWaitExternalSemaphoresAsync
[Parsed_yadif_cuda_1 @ 0000021d0c36ee80] Setting 'mode' to value '0'
[Parsed_yadif_cuda_1 @ 0000021d0c36ee80] Setting 'parity' to value '-1'
[Parsed_yadif_cuda_1 @ 0000021d0c36ee80] Setting 'deint' to value '0'
[Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'lx' to value '3'
[Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'ly' to value '3'
[Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'la' to value '0.5'
[Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'cx' to value '3'
[Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'cy' to value '3'
[Parsed_unsharp_opencl_2 @ 0000021d0db67000] Setting 'ca' to value '0.5'
[Parsed_format_4 @ 0000021d1e0acc80] Setting 'pix_fmts' to value 'yuv420p'
[Parsed_setdar_5 @ 0000021d2443c040] Setting 'dar' to value '16/9'
[graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'video_size' to value '720x576'
[graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'pix_fmt' to value '0'
[graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'time_base' to value '1/90000'
[graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'pixel_aspect' to value '64/45'
[graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'sws_param' to value 'flags=2'
[graph 0 input from stream 0:0 @ 0000021d2443d300] Setting 'frame_rate' to value '25/1'
[graph 0 input from stream 0:0 @ 0000021d2443d300] w:720 h:576 pixfmt:yuv420p tb:1/90000 fr:25/1 sar:64/45 sws_param:flags=2
[format @ 0000021d2443d580] Setting 'pix_fmts' to value 'nv12'
[auto_scaler_0 @ 0000021d0db5a200] w:iw h:ih flags:'bilinear' interl:0
[Parsed_unsharp_opencl_2 @ 0000021d0db67000] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_yadif_cuda_1' and the filter 'Parsed_unsharp_opencl_2'
Impossible to convert between the formats supported by the filter 'Parsed_yadif_cuda_1' and the filter 'auto_scaler_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0
[AVIOContext @ 0000021d1dfd8140] Statistics: 0 seeks, 0 writeouts
[AVIOContext @ 0000021d0db47a80] Statistics: 1855632 bytes read, 2 seeks
Conversion failed!

Quote

3rd Mar 2019 08:33 #19

Selur

Member

@ hydra3333: for readability, please split the actual call and the output into two code-blocks
looking at the filter_complex:
Code:
-filter_complex "[0:v]hwupload_cuda,yadif_cuda=0:-1:0,unsharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv420p,setdar=dar=16/9"
and
Code:
[AVHWDeviceContext @ 0000021d0db3b400] The cl_khr_d3d11_sharing extension is required for D3D11 to OpenCL mapping.
[AVHWDeviceContext @ 0000021d0db3b400] D3D11 to OpenCL mapping not usable
...
[Parsed_unsharp_opencl_2 @ 0000021d0db67000] auto-inserting filter 'auto_scaler_0' between the filter 'Parsed_yadif_cuda_1' and the filter 'Parsed_unsharp_opencl_2'
Impossible to convert between the formats supported by the filter 'Parsed_yadif_cuda_1' and the filter 'auto_scaler_0'
.
I wonder does adding 'hwdownload,format=pix_fmts=yuv420p,hwupload', before the unsharp_opencl and thus explicitly setting the output format of the cuda_yadif filter work?
I know loading data into the hw filtering, then downloading and uploading it again to filter and that downloading it, probably isn't the best solution, but it might help with the problem.
(again untested and basically the first thing that jumped to my mind,..)

Cu Selur

users currently on my ignore list: deadrats, Stears555

Quote

3rd Mar 2019 20:52 #20

KarMa

Dinosaur Supervisor

Originally Posted by veresov

Ah, I get it, you're too poor to afford a decent GPU. You have my sympathy.

I've personally known people using DGSource over 5 years ago but the software has probably been out even longer. Back then you could just buy the cheapest NVidia GPU on the market which was more than enough to decode faster than you could encode. With this added Yadif CUDA on top of DGSource, I would still think that the cheapest Nvidia GPU on the market would be more than enough, or even a used GeForce 740 would be more than enough. So your money argument isn't really valid.

Originally Posted by veresov

Your 64 fps single-rate case is pitifully slow. Here is DGSource + DGBob (DGBob is a CUDA YADIF clone). It's running over 6 times faster.

I'm baffled why people use CPU solutions for problems that CUDA/NVDec excels at.

Do you encode 6 times faster thanks to CUDA decoding? As for me with software decoding and filtering, it's usually only a 10% overhead for 1080p MPEG2 video decoding + yadif. With 1080p H.264 software decoding + yadif it's probably 20%. So unless I'm trying to do super fast - low quality encoding in either software or hw, it's not really that useful to me. Even though I have HW decoding options for avisynth.

It's been well noted that GPU based video decoders are more prone to decoding errors than software decoders, so that's one reason why someone might not want to.

Quote

4th Mar 2019 03:44 #21

hydra3333

Member

Originally Posted by Selur

@ hydra3333: for readability, please split the actual call and the output into two code-blocks

OK, good point, thanks. Your suggestion worked.

Well I guess a test answered the speed question.

1. vanilla yadif followed by unsharp_opencl

"C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel warning -stats -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -i ".\1.7TWO.mpg" -t 60 -map_metadata -1 -sws_flags lanczos+accurate_rnd+full_chroma_int+full_chroma_i np -filter_complex "[0:v]yadif=0:0:0,hwupload,unsharp_opencl=lx=3:ly=3:la=0 .5:cx=3:cy=3:ca=0.5,hwdownload,format=pix_fmts=yuv 420p,setdar=dar=16/9" -r 25 -c:v h264_nvenc -pix_fmt nv12 -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -af loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00ffset=0.17:linear=truerint_format=summary -c:a libfdk_aac -cutoff 18000 -ab 384k -ar 48000 -y ".\1.7TWO.aac.standard.mp4"
Code:
frame= 1500 fps=142 q=18.0 Lsize=   15010kB time=00:01:00.01 bitrate=2049.0kbits/s speed=5.66x
2. yadif_cuda followed by unsharp_opencl

"C:\SOFTWARE\ffmpeg\0-homebuilt-x64\ffmpeg.exe" -loglevel warning -stats -init_hw_device opencl=ocl:0.0 -filter_hw_device ocl -i ".\1.7TWO.mpg" -t 60 -map_metadata -1 -sws_flags lanczos+accurate_rnd+full_chroma_int+full_chroma_i np -filter_complex "[0:v]hwupload_cuda,yadif_cuda=0:-1:0,hwdownload,format=pix_fmts=yuv420p,hwupload,un sharp_opencl=lx=3:ly=3:la=0.5:cx=3:cy=3:ca=0.5,hwd ownload,format=pix_fmts=yuv420p" -r 25 -c:v h264_nvenc -pix_fmt nv12 -preset slow -bf 2 -g 50 -refs 3 -rc:v vbr_hq -rc-lookahead:v 32 -cq 22 -qmin 16 -qmax 25 -coder cabac -strict experimental -movflags +faststart+write_colr -profile:v high -level 4.1 -af loudnorm=I=-16:TP=0.0:LRA=11:measured_I=-25.78:measured_LRA=4.50:measured_TP=-6.82:measured_thresh=-36.00ffset=0.17:linear=truerint_format=summary -c:a libfdk_aac -cutoff 18000 -ab 384k -ar 48000 -y ".\1.7TWO.aac.yadif_cuda.opencl.mp4"
Code:
frame= 1500 fps=125 q=18.0 Lsize=   15000kB time=00:01:00.01 bitrate=2047.7kbits/s speed=5.01x
I suppose it's the data copies to/from the GPU that do it in.

If only I was able to cross-compile an ffmpeg with vapoursynth inbuilt (no simple to follow step-by-step instructions) then using a single ffmpeg.exe would be "painless" and insanely fast with DG's latest gear.

Oh well.

Quote

4th Mar 2019 13:09 #22

Selur

Member

I suppose it's the data copies to/from the GPU that do it in.

Probably. The up/download is time consuming, I'd recommend to ask in the ffmpeg bug tracker, irc channel or mailing list whether there is a better/faster way to this.

As a side note, if you want all that in one binary, why not use NVEncC?

users currently on my ignore list: deadrats, Stears555

Quote

4th Mar 2019 14:46 #23

hydra3333

Member

NVEncC ? Good point.
I do have it, but haven't checked it out properly eg in regard to deinterlacing. I must do so.
Also whether it can take vapoursynth input directly nowadays, I guess.
I seem to recall some audio passing/processing challenge (edit: ah, normalization per the ffmpeg commandlines above) and something to do with needing to pipe NUT format video, maybe that's all no longer relevant.
At the time, I think I had formed a naive view I somehow trusted a homebuilt ffmpeg a tad more.

Quote

6th Mar 2019 01:19 #24

hydra3333

Member

Ah. NVEncC ... I don't how to also read/pass audio from the .mpg source file through vapoursynth (DG's h/w reader and deinterlacer and sharpener) into nvencc for re-encoding.
I suppose volume leveling (eg with loudnorm) would have to be done separately ?
An issue is TV capture .mpg files with large-ish internal audio/video offset setting, which some s/w handles but not others.

Does anyone do this stuff ? I suppose everyone must to get a usable final video, but what is is that people do ?

Quote

6th Mar 2019 09:11 #25

Selur

Member

Missed that you needed Vapoursynth, thought you wanted to open the source and use some cuda filters,..
Vapoursynth can't handle audio. (since NVEncC is build against libav most of the stuff ffmpeg support should be possible with it, when reading a file source,..)

users currently on my ignore list: deadrats, Stears555

Quote

6th Mar 2019 12:33 #26

hydra3333

Member

Originally Posted by Selur

Missed that you needed Vapoursynth, thought you wanted to open the source and use some cuda filters,..
Vapoursynth can't handle audio. (since NVEncC is build against libav most of the stuff ffmpeg support should be possible with it, when reading a file source,..)

I did With your and other good info and a test or two, it seemed prudent to consider changing tack.
It seems I'm not looking well enough at the nvencc doco, I'll do a range of testing then.
Thanks !

Quote

FFMPEG and new GPU-based YADIF_CUDA deinterlacer

Thread Tools

Similar Threads

Search: Linux based transcoding (GPU) and streaming Solution for live strea

Just an idea: ffmpeg based key frame accurate cutting,...

gpu deinterlacer

ffmpeg nvidia-gpu-accelerated encoding using NVENC - commandline settings

For software developers: 2.3 FFMPEG will support the DXVA2 based decoding !