Hi there, I've recently tested encoding performances on an a friend's MSI Stealth GS77 - 12U laptop: when the process was done with one of the GPUs, the other remained unused - although it still consumed some power (neither goes totally idle) and of course produce heat - and vice versa.
As for the title: how - if possible - write a command line to assign the Intel iGPU to decode source video stream and Nvidia dGPU to the encoding workload only (and, why not, vice-versa) ?
Thanks in advance to anyone that can/will help.
Try StreamFab Downloader and download from Netflix, Amazon, Youtube! Or Try DVDFab and copy Blu-rays!
+ Reply to Thread
Results 1 to 24 of 24
Thread
-
-
Decoding and encoding using different HW so i doubt if you gain anything trying to decode on one SoC and encode on another.
If you check NVidia and Intel guidelines for ffmpeg encoding then both vendors pointing that fastest transcoding is performed when there is no transfers beyond vendor SoC (i.e. decoding/scaling/processing/encoding is performed only by SoC). -
The only time I think it makes sense to use 2 gpu's and not the single most powerful one is if decoding say 422 10 bit. AFAIK only intel can do that in HW. So in that case an nle such as Vegas Pro can be set to Intel for decoding and say Nvidia or Amd for encoding. Because typically the Intel device would be a lot less powerful than say a high end Nvidia device it makes sense to split the render across both, as using the Intel device alone would be slower.
It would be nice to have an ffmpeg syntax to do that for sure: i.e. use both.
Actually some of the above re: VP is incorrect. The intel device can be set to assist playback within the nle for the 10 bit 422 clip but the render only selects a single gpu for encoding output. The render times might be faster though, (due to using Intel to decode frames) I must check that out.Last edited by JN-; 26th Jul 2024 at 19:25.
-
Checked it out and in VP rendering out using Nvidia GPU but with Intel GPU as decoder it really really does go faster.
So getting that to work using ffmpeg would be something. Beyond my capabilities though.
Note that the Intel GPU is only in a 4x PCIE slot.
Render times:
Without Intel used as decoder: 91 seconds
With Intel used as decoder: 23 seconds (Nvidia was used as decoder)
A 4x difference.
Test clip was a 27 second UHD hevc 422 10 bit.
Playback in VP of course was vastly better using Intel decoder. The last 25% of clip played at less than1 fps without using Intel decoder. Using Intel Decoder clip played at full 25 fps for all of clip length.
Output rendered files were to UHD 420 8 bit.
Output quality was the same ...
Start Date and Time .. 27/07/2024 .. 12:30:47.26 ------------------------------------------------------------------------------------
----------------------------------------------------- Video Help Test-with Using Intel as decoder.mp4 metrics below
SSIM All........ 0.697930 (5.198917)
PSNR Average ... 20.644672
VMAF ........... 47.087881
----------------------------------------------------- Video Help Test-without using Intel as decoder.mp4 metrics below
SSIM All........ 0.697954 (5.199270)
PSNR Average ... 20.644662
VMAF ........... 47.090000
End Date and Time .... 27/07/2024 .. 12:32:04.59 ... Duration [ 00:01:18.62 ]. Number of input files [ 2 ].
--------------------------------------------------------------------------------------------------------------------------------------
Above file(s) processed by ....... [ RQM.bat ] ..... Reference file ... [ Source.mov ]
__________________________________________________ __________________________________________________ __________________________________
ffmpeg version date .. 07-09-2023
ffmpeg version ....... ffmpeg version 2023-09-07-git-9c9f48e7f2-full_build-www.gyan.dev Copyright (c) 2000-2023 the FFmpeg developers
__________________________________________________ __________________________________________________ __________________________________
The overall low output RQM results are because I didn't aim for high values. The output h264 file size and data rates were about 3.4 times smaller than the source hevc file
Nvenc encoding in VP (latest version 21 b315) still suffers with poor nvenc encoding, Stop/Start. Heart beat, Hacksaw or whatever you want to call it. Since it happens with both renders it can be ignored. Other non Nvidia encoding in VP wouldn't suffer this issue.Last edited by JN-; 27th Jul 2024 at 07:27.
-
You can use qsv for decoding and nvenc for encoding with ffmpeg. Just specify the decoder before the input, and the encoder after.
Code:ffmpeg -y -hide_banner -benchmark ^ -c:v h264_qsv -i "%~dpnx1" ^ -c:v h264_nvenc -preset fast -cq:v 23 ^ "%~dpn1.h264.nvenc.mkv"
Last edited by jagabo; 27th Jul 2024 at 12:08.
-
Thanks for the syntax. I've struggled with it for some time but cannot get it to work if input is not 8 bit 420.
Using input of 8 bit 420 = using Intel decoding is a fraction slower.
I'd expect that will reverse with input of 10 bit 422, if I could get it to work.
Input file of 10 bit 422 causes errors.
[Attachment 81011 - Click to enlarge]
This is what I am using ...
@echo off
mode CON:cols=228 lines=80
color 0A
SETLOCAL EnableDelayedExpansion
TITLE A test
SET _INPUT_FILE="%~n1%~x1"
SET _INPUT_FILENAME="%~n1"
SET _INPUT_EXTN=%~x1
SET "_PIXELFORMAT=-pix_fmt yuv420p"
REM Without Intel decoding.
REM ffmpeg -y -hide_banner -benchmark -i "%~n1%~x1" -c:v libx264 -preset fast -crf 23 "%~n1-[A-TEST].mkv"
REM With Intel decoding.
REM ffmpeg -y -hide_banner -benchmark -c:v h264_qsv -i "%~n1%~x1" -c:v libx264 -preset fast -crf 23 "%~n1-[A-TEST-uses-Intel-decoder].mp4"
ffmpeg -y -hide_banner -benchmark -vcodec h264_qsv -i "%~n1%~x1" -c:v libx264 %_PIXELFORMAT% -preset fast -crf 23 "%~n1-[A-TEST-uses-Intel-decoder].mp4"
echo.
echo.
echo.
pauseLast edited by JN-; 27th Jul 2024 at 16:21.
-
Hi Jagabo, only saw reply a little while ago.
This is a dropbox link with 2 10 bit 422 samples.
https://www.dropbox.com/scl/fi/topo8mks0kfl91ou5d5ok/Short-10-bit-422-pieces.zip?rlkey...10depuesc&dl=0 -
It turns out that my CPU (i9 9900K, Coffee Lake) doesn't support 10 bit 4:2:2 HEVC decoding.
https://en.wikipedia.org/wiki/Intel_Quick_Sync_Video
I don't think I can help you since I won't be able to tell when a command line should work with a CPU that supports it. -
Aok. Thanks for the suggestions. I think 10 bit 422 was from 11 gen onwards.
I had a list of command lines for hw encoding, with many variations that I tested in the past, although not for this particular scenario, i.e. using 2 gpu's. But it might have given me some idea of what i'm missing. Unfortunately after much searching I cannot find. I’ll have another look tomorrow. -
Please post the command line here if you find something that works. I keep a log of such things for reference, even if I can't use it right now.
-
It would always be my intention to do so.
I already burned the midnight oil on this last night and so for now I am out of ideas. There may be other users here that may have suggestions. Even if they don’t have qsv capability for 10 bit 422 hw I am happy to try any alternate offered ffmpeg syntax
Last thing to do shortly is to try an older ffmpeg version say end of last year -
[Attachment 81019 - Click to enlarge]
Shortly just arrived, no luck just a more concise error message. I used ffmpeg.exe from November 2023.
Used ... ffmpeg -y -hide_banner -c:v h264_qsv -i "%~n1%~x1" -c:v libx264 -preset fast -crf 23 "%~n1-[A TEST, uses Intel decoder].mp4"
Following were h264 files.
When input is 8 bit 420 above works.
When input is 8 bit 422 or is
10 bit 420 or is
10 bit 422 then above doesn't work.
Tried one 10 bit 422 hevc file, still errors.Last edited by JN-; 28th Jul 2024 at 10:04.
-
Query both (decoder and encoder) for supported pixel formats ( use ffmpeg -h encoder=xxx_nvenc similarly for intel encoder - decoder may need fmpeg -h decoder=xxx ) - if i recall correctly GPU's natively using different pixel formats than common video formats - your goal is to avoid CPU interactions as much as possible - preferably copy only video data between cards (from Intel to NVidia) - as you are using PCIe then anyway this is most impacting performance operation - check vendors papers - they explain some transcoding limitations:
https://duckduckgo.com/?q=intel+ffmpeg+quicksync+filetype%3Apdf
https://duckduckgo.com/?q=nvenc+ffmpeg+filetype%3Apdf -
Thanks for suggestions pandy. My use case requirements may be different to the OP's
“your goal is to avoid CPU interactions as much as possible - preferably copy only video data between cards (from Intel to NVidia)”
My interest in this is to use ffmpeg to improve rendering/encoding of 10 bit 422 to typically 8 bit 420 output.
So far I have only used cpu to render and qsv to decode the 10 bit 422 input.
I'll try using nvidia instead of cpu to see if I have better luck. Thanks for links.
“if i recall correctly GPU's natively using different pixel formats than common video formats”
Corrrct.
I will insert the appropriate pixel formats for qsv and nvenc. -
Not necessarily your goal is so different - by trying to push as much as possible video processing on GPU HW and reducing overall CPU load you may try to use complex filters impractical from CPU perspective - OpenCL or CUDA based processing may be beneficial to your goal.
You have some GPU accelerated processing options:
https://ffmpeg.org/ffmpeg-filters.html#libplacebo
https://ffmpeg.org/ffmpeg-filters.html#OpenCL-Video-Filters
https://ffmpeg.org/ffmpeg-filters.html#Vulkan-Video-Filters -
I tried using encoder nvidia instead of CPU but get similar errors.
ffmpeg -y -init_hw_device qsv=intel,child_device=2 -c:v h264_qsv -i "%~n1%~x1" -map 0:v -c:v h264_nvenc "%~n1-[ Uses Intel decoder-YES ].mp4"
I am using AMD CPU with iGPU and Nvidia and Intel cards, so the gpu number above "2" is correct. i.e. 0=Nvidia, 1=AMD and 2=Intel.
As mentioned previously if input is 8 bit 420 it works ok. -
I didn't insert the pixel formats for QSV or Nvidia because neither supports 10 bit 422. I figure the complexities of this is above me.
However, maybe this is all a bit of a brain fart on my behalf, looking for something that's already mostly there.
When I found that I was getting a 4x render time increase in Vegas Pro by using Intel for decoding and Nvidia to encode the render I thought why not aim for that with ffmpeg.
Thing is you can feed 10 bit 422 to nvidia only ( no QSV used ) using ffmpeg and it will output say any of it's available pixel formats, 8bit 420 and 8bit 444. Typically I would only need 8 bit 420 as output.
I just assumed that if the Intel decoder is also used in the command line that it would be faster still. ? Maybe not by much.Last edited by JN-; 29th Jul 2024 at 14:15.
-
My assumption up to now has been that when using ffmpeg, and utilising Intel's QSV to decode 10 bit 422 input, then that would improve encode time using say ffmpeg's CPU, x264/hevc or Nvenc. But maybe ffmpeg converts the frames from the input 10 bit 422 very fast anyway. Without getting the syntax working i'll never know.
-
OK. I used the following, with and without Intel decoding using Nvenc and x264 CPU for rendering. It works on UHD 10 bit 422 hevc input file of 1m:56 seconds clip.
SET "_PIXELFORMAT=-pix_fmt yuv420p"
Input is UHD 10 bit 422 hevc, output is to UHD 8 bit 420 h264.
REM ffmpeg -y -init_hw_device qsv=intel,child_device=2 -i "%~n1%~x1" -map 0:v -c:v h264_nvenc %_PIXELFORMAT% -map 0:a -c:a copy "%~n1-[ Uses Intel decoder-YES ]-Nvenc encoder.mp4"
REM ffmpeg -y -i "%~n1%~x1" -map 0:v -c:v h264_nvenc %_PIXELFORMAT% -map 0:a -c:a copy "%~n1-[ Uses Intel decoder-NO ]-Nvenc encoder.mp4"
REM ffmpeg -y -init_hw_device qsv=intel,child_device=2 -i "%~n1%~x1" -map 0:v -c:v libx264 %_PIXELFORMAT% -map 0:a -c:a copy "%~n1-[ Uses Intel decoder-YES ]-CPU encoder.mp4"
REM ffmpeg -y -i "%~n1%~x1" -map 0:v -c:v libx264 %_PIXELFORMAT% -map 0:a -c:a copy "%~n1-[ Uses Intel decoder-NO ]-CPU encoder.mp4"
The render times are near identical for both sets of tests. They are the opposite of what I would expect, using the Intel device is slower in both cases. So i'm not sure if it is really using the Intel device to decode. In the ffmpeg on screen display it does clearly show it though when running with the Intel syntax.
Duration = [00:00:52.43] ... "Grafton Street Music, 10 bit 422.mp4" Nvenc encoding, WITH Intel decoder
Duration = [00:00:52.18] ... "Grafton Street Music, 10 bit 422.mp4" Nvenc encoding, WITHOUT Intel decoder
Duration = [00:01:41.37] ... "Grafton Street Music, 10 bit 422.mp4" CPU x264 encoding, WITH Intel decoder
Duration = [00:01:41.09] ... "Grafton Street Music, 10 bit 422.mp4" CPU x264 encoding, WITHOUT Intel decoderLast edited by JN-; 29th Jul 2024 at 17:32.
-
SET "_PIXELFORMAT=-pix_fmt yuv420p"
ffmpeg -y -init_hw_device qsv=intel,child_device=2 -i "%~n1%~x1" -c:v h264_nvenc %_PIXELFORMAT% -c:a copy "%~n1-[ Uses Intel decoder-YES ]-Nvenc encoder-.mp4"
I see that as per Task Manager the Intel is doing nothing, not decoding.
Although it gets initialised, that's it.
Oddly the GPU numbers in TM are 0=Nvidia, 1=Intel and 2=AMD.
When I use 2 for Intel I get no errors, but using 1 does.
I don't have an ffmpeg syntax to get these numbers.Last edited by JN-; 29th Jul 2024 at 19:25.
-
Well, another option is to pipe QSV/NV-Enc into FFMPEG (or vice-versa): https://github.com/rigaya/QSVEnc/issues/207
-
Another option for me is to just use a single Intel GPU to do both decoding and encoding. I realise that this isn't what the OP queried and it would still be nice to get the multiple GPU working.
In the case of an input clip of 10 bit 422 I would expect a big improvement using Intel QSV decoding capabilities for this type of input, currently not handled by AMD/Intel.
Why it took me so long to get here (no excuse) was because in VP I have to set I/O to Intel for decoding and say CPU/Nvidia for encoding. I just didn't see the obvious when using ffmpeg.
I tried comparing an ffmpeg encode of 10 bit 422 clip using Nvenc vs QSV expecting a difference in favour of Intel, despite the Intel GPU running in a 4x slot.
However there was none. Reason is that no decoding (Task Manager) was taking place on either GPU, CPU only.
So I have to find how to enable ffmpeg HW decoding later for nvenc and QSV.Last edited by JN-; 30th Jul 2024 at 06:23.
-
I used this ... -hwaccel_output_format qsv -c:v h264_qsv -i "%_INPUT_FILE%" -map 0:v -c:v h264_qsv etc etc
It was a slight bit slower without above. Maybe that's because the GPU is in a 4x slot and I use a 16 core cpu. So I won't be using that, also it doesn't handle 10 bit 422 input. Leaving the above syntax off allows 10 bit 422 input, for ffmpeg. Surprising since latest intel should handle 10 bit 422. I guess there is something else that I should add in to above?
I did another test in VP and found that with an h264 input file enabling Intel decoding gave no benefit. However when the input file was hevc there was a nearly 2x render speed improvement. I got a 4x speed improvement previously with a different file.
So back to square one. I know that enabling the Intel decoder is worth doing if the input files are typically hevc and 10 bit 422.
As to th OP's query, I don't know how to enable typically one GPU for decoding and another one for encoding using ffmpeg.Last edited by JN-; 30th Jul 2024 at 12:51.
Similar Threads
-
SproutOnline.com video exploit help
By Wobbuffet in forum Video Streaming DownloadingReplies: 0Last Post: 15th Jul 2024, 17:03 -
PlayReady ECC Keypair Extraction / Exploit
By gtaman92 in forum Video Streaming DownloadingReplies: 4Last Post: 30th Apr 2022, 08:01 -
Anyway i can exploit my graphic card for conversion?
By wonmanfactory in forum Video ConversionReplies: 18Last Post: 25th Oct 2021, 13:27 -
Google discloses Windows 0-day that’s been under active exploit
By El Heggunte in forum ComputerReplies: 2Last Post: 3rd Nov 2020, 11:05 -
All new NVIDIA RTX 3000 GPUs announced to have AV1 hardware decoder
By Alkl in forum Latest Video NewsReplies: 4Last Post: 3rd Sep 2020, 16:40