VideoHelp Forum
+ Reply to Thread
Results 1 to 20 of 20
Thread
  1. Hello folks I was thinking about using two-pass constant quality enconding following the WebM Project guide.

    I found that I could use a 10 bit encode on VP9 using the profile 2 but I didn't find anything about how to use it in the WebM Project guide nor FFMPEG Wiki.

    I'm using the FFMPEG static build from today (ffmpeg-git-20171004-64bit-static) and there is a ffmpeg-10bit binary so should I use that and not pass any additional parameter?

    About the sound, on the FFMPEG wiki recommends 128k and VBR on but I'm not really sure if I should raise the bitrate a little.

    So I would be using this (ignoring the 10bit):

    Code:
    ffmpeg -i anime.m2ts -vf scale=1280x720 -c:v libvpx-vp9 -pass 1 -b:v 0 -crf 17 -threads 8 -speed 4 \
      -tile-columns 6 -frame-parallel 1 \
      -an -f webm /dev/null -y
    ffmpeg -i anime.m2ts -vf scale=1280x720 -c:v libvpx-vp9 -pass 2 -b:v 0 -crf 17 -threads 8 -speed 1 \
      -tile-columns 6 -frame-parallel 1 -auto-alt-ref 1 -lag-in-frames 25 \
      -c:a libopus -b:a 128k -vbr on -map_metadata -1 -f webm anime.webm -y
    Here's a sample.

    So, basically how could I use that VP9 profile 2 for 10bit encode or would you encode that on another way? Should I raise the sound bitrate a little?

    I am also interested about VP8 for legacy support but looks like there is nothing like that profile2 for 10bit enconding nor VBR for vorbis.

    I would appreciate some help from someone more experienced in the topic
    Quote Quote  
  2. Use coffee. Lots, and lots of coffee... and get a girlfriend.
    Quote Quote  
  3. Use
    Code:
    ffmpeg -h encoder=libvpx-vp9 >libvpx-vp9.txt
    to detect capability of implemented encoder and wait until support for 10 bit arrive.

    In build 3.4 there is no sign of 10 bit support:
    Code:
    Encoder libvpx-vp9 [libvpx VP9]:
        General capabilities: delay threads 
        Threading capabilities: auto
        Supported pixel formats: yuv420p yuva420p yuv422p yuv440p yuv444p gbrp
    You must wait for
    Code:
    IO... yuv420p10be            3            15
    IO... yuv420p10le            3            15
    IO... yuv422p10be            3            20
    IO... yuv422p10le            3            20
    IO... yuv444p10be            3            30
    IO... yuv444p10le            3            30
    I would take advise from transporterfan seriously... especially girlfriend (you will need probably few of them)...
    Quote Quote  
  4. Member
    Join Date
    Oct 2016
    Location
    Spain
    Search PM
    You need to build ffmpeg yourself, as I assume that you are using windows use this tool https://github.com/jb-alvarado/media-autobuild_suite, if it fail to build it at first try it various times. You need to configure it during first run, but if you dont want to do it I atach my config that conpiles it only for 64bits, before to run it the first time copy the 3 files ("ffmpeg_options.txt","mpv_options.txt" and media-autobuild_suite.ini) to the build directory.

    Then you only need to add "-pix_fmt yuv420p10le" for it to autoselect profile 2 in vp9 (in x265 is the same to use 10bit).

    Example:

    CRF mode (there is not 2 pass in crf mode, and the value dont mach those of x264 or x265)
    Code:
    @@:LOOP
    "ffmpeg.exe" -i %1 -map 0 -c copy -c:v libvpx-vp9 -b:v 0 -crf 17 -threads 1 -deadline good -cpu-used 1 -tile-columns 0 -frame-parallel 0 -auto-alt-ref 1 -lag-in-frames 24 -g 9600 -aq-mode 0  -sws_dither none -pix_fmt yuv420p10le -filter:v scale=w=1280:h=720:force_original_aspect_ratio=decrease,crop=trunc(iw/2)*2:trunc(ih/2)*2:0:0 -c:a libopus -b:a 128k -ac 2 -f webm "%~dpn1.webm"
    @shift
    @if not (%1)==() goto LOOP
    2 Pass
    Code:
    @@:LOOP
    "ffmpeg.exe" -i %1 -map 0 -c copy -c:v libvpx-vp9 -pass 1 -passlogfile "%~dpn1" -b:v 1500K -threads 1 -deadline good -cpu-used 4 -tile-columns 0 -frame-parallel 0 -auto-alt-ref 1 -aq-mode 0  -sws_dither none -pix_fmt yuv420p10le -filter:v scale=w=1280:h=720:force_original_aspect_ratio=decrease,crop=trunc(iw/2)*2:trunc(ih/2)*2:0:0 -an -f null NUL
    "ffmpeg.exe" -i %1 -map 0 -c copy -c:v libvpx-vp9 -pass 2 -passlogfile "%~dpn1" -b:v 1500K -threads 1 -deadline good -cpu-used 1 -tile-columns 0 -frame-parallel 0 -auto-alt-ref 1 -lag-in-frames 24 -g 9600 -aq-mode 0  -sws_dither none -pix_fmt yuv420p10le -filter:v scale=w=1280:h=720:force_original_aspect_ratio=decrease,crop=trunc(iw/2)*2:trunc(ih/2)*2:0:0 -c:a libopus -b:a 128k -ac 2 -f webm "%~dpn1.webm"
    @shift
    @if not (%1)==() goto LOOP
    The examples are bat files, you need to edit the ffmpeg path and put the full path to the executable. There is 2 ways to call the batch file: "convert.bat video1 ... videoN" or drag & drop the files into the batch file, the number of files are limited by the comand line and is not fixed as it depends of the name plus the patch.The crop funtion after the scale is to force the video to be mod2, it only crop the video by one pixel if the with or heigth is not mod2 also the scale funtion dosn't exactly resize the video to 1280x720, it resize maintaining the aspect ratio to where with=1280 or heigth=720.
    Image Attached Files
    Quote Quote  
  5. Originally Posted by pandy View Post
    Use
    Code:
    ffmpeg -h encoder=libvpx-vp9 >libvpx-vp9.txt
    to detect capability of implemented encoder and wait until support for 10 bit arrive.

    In build 3.4 there is no sign of 10 bit support:
    Code:
    Encoder libvpx-vp9 [libvpx VP9]:
        General capabilities: delay threads 
        Threading capabilities: auto
        Supported pixel formats: yuv420p yuva420p yuv422p yuv440p yuv444p gbrp
    You must wait for
    Code:
    IO... yuv420p10be            3            15
    IO... yuv420p10le            3            15
    IO... yuv422p10be            3            20
    IO... yuv422p10le            3            20
    IO... yuv444p10be            3            30
    IO... yuv444p10le            3            30
    I would take advise from transporterfan seriously... especially girlfriend (you will need probably few of them)...
    I have two binaries, the ffmpeg-10bit has enabled those formats:

    Code:
    Supported pixel formats: yuv420p yuva420p yuv422p yuv440p yuv444p yuv420p10le yuv422p10le yuv440p10le yuv444p10le yuv420p12le yuv422p12le yuv440p12le yuv444p12le gbrp gbrp10le gbrp12le
    But I haven't found anything related to how to select them for vp8-9 anywhere. Only something that I have to choose profile 2. So I'm very confused right here.


    Originally Posted by gdx View Post
    You need to build ffmpeg yourself, as I assume that you are using windows use this tool https://github.com/jb-alvarado/media-autobuild_suite, if it fail to build it at first try it various times. You need to configure it during first run, but if you dont want to do it I atach my config that conpiles it only for 64bits, before to run it the first time copy the 3 files ("ffmpeg_options.txt","mpv_options.txt" and media-autobuild_suite.ini) to the build directory.

    Then you only need to add "-pix_fmt yuv420p10le" for it to autoselect profile 2 in vp9 (in x265 is the same to use 10bit).

    Example:

    CRF mode (there is not 2 pass in crf mode, and the value dont mach those of x264 or x265)
    Code:
    @@:LOOP
    "ffmpeg.exe" -i %1 -map 0 -c copy -c:v libvpx-vp9 -b:v 0 -crf 17 -threads 1 -deadline good -cpu-used 1 -tile-columns 0 -frame-parallel 0 -auto-alt-ref 1 -lag-in-frames 24 -g 9600 -aq-mode 0  -sws_dither none -pix_fmt yuv420p10le -filter:v scale=w=1280:h=720:force_original_aspect_ratio=decrease,crop=trunc(iw/2)*2:trunc(ih/2)*2:0:0 -c:a libopus -b:a 128k -ac 2 -f webm "%~dpn1.webm"
    @shift
    @if not (%1)==() goto LOOP
    2 Pass
    Code:
    @@:LOOP
    "ffmpeg.exe" -i %1 -map 0 -c copy -c:v libvpx-vp9 -pass 1 -passlogfile "%~dpn1" -b:v 1500K -threads 1 -deadline good -cpu-used 4 -tile-columns 0 -frame-parallel 0 -auto-alt-ref 1 -aq-mode 0  -sws_dither none -pix_fmt yuv420p10le -filter:v scale=w=1280:h=720:force_original_aspect_ratio=decrease,crop=trunc(iw/2)*2:trunc(ih/2)*2:0:0 -an -f null NUL
    "ffmpeg.exe" -i %1 -map 0 -c copy -c:v libvpx-vp9 -pass 2 -passlogfile "%~dpn1" -b:v 1500K -threads 1 -deadline good -cpu-used 1 -tile-columns 0 -frame-parallel 0 -auto-alt-ref 1 -lag-in-frames 24 -g 9600 -aq-mode 0  -sws_dither none -pix_fmt yuv420p10le -filter:v scale=w=1280:h=720:force_original_aspect_ratio=decrease,crop=trunc(iw/2)*2:trunc(ih/2)*2:0:0 -c:a libopus -b:a 128k -ac 2 -f webm "%~dpn1.webm"
    @shift
    @if not (%1)==() goto LOOP
    The examples are bat files, you need to edit the ffmpeg path and put the full path to the executable. There is 2 ways to call the batch file: "convert.bat video1 ... videoN" or drag & drop the files into the batch file, the number of files are limited by the comand line and is not fixed as it depends of the name plus the patch.The crop funtion after the scale is to force the video to be mod2, it only crop the video by one pixel if the with or heigth is not mod2 also the scale funtion dosn't exactly resize the video to 1280x720, it resize maintaining the aspect ratio to where with=1280 or heigth=720.
    As you can see in the code I was running, I'm using /dev/null so I'm not using Windows. Also, VP8-VP9 supports 2 pass CRF mode (Don't ask me why lol). Btw, it's even recommended by WebM Project Wiki.

    I've been using these binaries and there is a 10bit version, so I only need to add -pix_fmt yuv420p10le to auto select the profile 2?

    I'll try it right now. Thanks!
    Quote Quote  
  6. @gdx I just tried it and I could enable the profile 2, I still have some doubts about the sound but I can test that by myself.
    Muchas gracias <3
    Quote Quote  
  7. Member
    Join Date
    Oct 2016
    Location
    Spain
    Search PM
    Originally Posted by Planeptune View Post
    As you can see in the code I was running, I'm using /dev/null so I'm not using Windows. Also, VP8-VP9 supports 2 pass CRF mode (Don't ask me why lol). Btw, it's even recommended by WebM Project Wiki.
    Sorry I missed the /dev/null, and asumed Windwos as is the most comom platform,and even in linux I compile ffmpeg because I include some libraries that are normally not included like libfdk-aac. I looked at the thing about the CRF and found the reason in a mix of ffmpeg nomenglature vs vp9enc nomenglature, ffmpeg rename funtion for consistency but sometimes this causes confusion and the maping is not always exact, vp9 lacks a true crf mode and is maped to "quality" and "contrained quality" mode of vp9enc (actually even "quality" is a special case of "constrained quality").

    Also Opus at 128kbps for stereo is near transparent, and at 192kbps is considered transparent (http://wiki.hydrogenaud.io/index.php?title=Opus).

    De nada.
    Quote Quote  
  8. Originally Posted by Planeptune View Post
    Hello folks I was thinking about using two-pass constant quality enconding following the WebM Project guide.
    Before everything, the docs on webmproject.org seem to be unmaintained (and probably deprecated in some parts), the 2017 docs on Google Developers contradict with the old docs (and sometimes even themselves). Making 2-pass VBR-CQ work is my dream for the past weeks. But it always undershoots bitrate and tweaking it more seems to only worsen the output orz
    Originally Posted by Planeptune View Post
    About the sound, on the FFMPEG wiki recommends 128k and VBR on but I'm not really sure if I should raise the bitrate a little.
    For stereo (not more than 2 channels) 128 kbit/s opus should be enough, as Opus wiki says.
    Originally Posted by Planeptune View Post
    So I would be using this (ignoring the 10bit):

    Code:
    ffmpeg -i anime.m2ts -vf scale=1280x720 -c:v libvpx-vp9 -pass 1 -b:v 0 -crf 17 -threads 8 -speed 4 \
      -tile-columns 6 -frame-parallel 1 \
      -an -f webm /dev/null -y
    ffmpeg -i anime.m2ts -vf scale=1280x720 -c:v libvpx-vp9 -pass 2 -b:v 0 -crf 17 -threads 8 -speed 1 \
      -tile-columns 6 -frame-parallel 1 -auto-alt-ref 1 -lag-in-frames 25 \
      -c:a libopus -b:a 128k -vbr on -map_metadata -1 -f webm anime.webm -y
    • As the 2017 docs explain, the number of threads is bound the the number of tile-columns, and tile-column in its turn depends on the width of the video stream. In short with the default width for a tile column of 256 px, a 1920 px wide video would need 8, -tile-columns (the*option) accepts log2 of the actual number of tile columns, in out case that’s 3, because 2*2*2=8. It’s all there with examples.
    • -speed aka -cpu-used is a tuning option for -deadline and I’m not sure, if it even works without it (does libvpx-vp9 assume some default value for -deadline, if it’s not provided? We just don’t know.)

    Originally Posted by Planeptune View Post
    I am also interested about VP8 for legacy support but looks like there is nothing like that profile2 for 10bit enconding nor VBR for vorbis.
    VP8 was still a testing field for VP9, and since Google has managed to launch VP9 on Youtube and cut bandwidth for up to a half, VP8 probably won’t have any attention. I wouldn’t recommend VP8 anyway.

    I myself look for optimal VBR-CQ options that would allow to cut video, especially from anime, with a specified bitrate. I use VOD recommended bitrate settings for 1080p @25 FPS – 1800 kbit/s, but I still observe quality dropping below H264 from time to time.
    Quote Quote  
  9. Originally Posted by deterenkelt View Post
    Making 2-pass VBR-CQ work is my dream for the past weeks. But it always undershoots bitrate and tweaking it more seems to only worsen the output orz
    After two days of testing got it working, soon will post on the wiki.
    Originally Posted by deterenkelt View Post
    [*]As the 2017 docs explain, the number of threads is bound the the number of tile-columns, and tile-column in its turn depends on the width of the video stream. In short with the default width for a tile column of 256 px, a 1920 px wide video would need 8, -tile-columns (the option) accepts log2 of the actual number of tile columns, in out case that’s 3, because 2*2*2=8. It’s all there with examples.
    I’ve found, why the 2016 docs (those on webmproject.org) use -tile-columns 6 – this and -threads set the cap on the maximum tile-columns and threads, that the codec can use. So counting optimal values may be needed, if you want to restrict them. I observe 23 ffmpeg threads, of which only 4 are loading CPU cores on 100%, 3 more use 20–30%, the rest are “sleeping”. So I think limiting on a desktop isn’t necessary, it was probably invented for 60-core servers, so that re-encoding a 15 MB file wouldn’t try to use all cores.
    Originally Posted by deterenkelt View Post
    I myself look for optimal VBR-CQ options that would allow to cut video, especially from anime, with a specified bitrate. I use VOD recommended bitrate settings for 1080p @25 FPS – 1800 kbit/s, but I still observe quality dropping below H264 from time to time.
    This was solved by finding proper quantiser values.
    Quote Quote  
  10. Originally Posted by deterenkelt View Post
    After two days of testing got it working, soon will post on the wiki.
    ̶W̶h̶e̶r̶e̶?̶
    Oh.. sorry, I didn't realize you were the owner of that repository :P
    Originally Posted by deterenkelt View Post
    Originally Posted by deterenkelt View Post
    [*]As the 2017 docs explain, the number of threads is bound the the number of tile-columns, and tile-column in its turn depends on the width of the video stream. In short with the default width for a tile column of 256 px, a 1920 px wide video would need 8, -tile-columns (the option) accepts log2 of the actual number of tile columns, in out case that’s 3, because 2*2*2=8. It’s all there with examples.
    I’ve found, why the 2016 docs (those on webmproject.org) use -tile-columns 6 – this and -threads set the cap on the maximum tile-columns and threads, that the codec can use. So counting optimal values may be needed, if you want to restrict them. I observe 23 ffmpeg threads, of which only 4 are loading CPU cores on 100%, 3 more use 20–30%, the rest are “sleeping”. So I think limiting on a desktop isn’t necessary, it was probably invented for 60-core servers, so that re-encoding a 15 MB file wouldn’t try to use all cores.
    Originally Posted by deterenkelt View Post
    I myself look for optimal VBR-CQ options that would allow to cut video, especially from anime, with a specified bitrate. I use VOD recommended bitrate settings for 1080p @25 FPS – 1800 kbit/s, but I still observe quality dropping below H264 from time to time.
    This was solved by finding proper quantiser values.
    Looks like I still need a bit more theory... If I've understood correctly, if I want use all the threads I must use -tile-columns 3 but yet Google's VOD guide recommend me to use 2 and 8 threads for 720-1080p... I find that weird. Shouldn't I only be able to use four threads with that setting?
    So maybe probably is not recommended more than 4 tile-columns for such resolution?
    Also the last speeds in the Google's VOD examples are all wrong.
    I'm going to look at those links more closely, thanks.
    Last edited by Planeptune; 9th Apr 2018 at 14:34.
    Quote Quote  
  11. Originally Posted by Planeptune View Post
    If I've understood correctly, if I want use all the threads I must use -tile-columns 3
    No, just -tile-columns >0. It seems, that there’s no limit on how many threads may work on a single tile-column.
    Originally Posted by Planeptune View Post
    but yet Google's VOD guide recommend me to use 2 and 8 threads for 720-1080p... I find that weird. Shouldn't I only be able to use four threads with that setting?
    I guess it’s clear now, heh.
    Originally Posted by Planeptune View Post
    So maybe probably is not recommended more than 4 tile-columns for such resolution?
    If you mean “recommended for the speed of encoding”, then you can set -tile-columns 6 and forget about them. This options and -threads is to cap, i.e. if you have a 68-core machine and you encode 9999 videos simultaneously, you’ll probably want to give say 8 threads to 1080p videos and 2 to 360p. Same with tile-columns – if you don’t want to slice 1080p in four columns, but say only in two instead (to avoid the quality loss caused by slicing), you cap tile-columns with 1. libvpx seem to be aware about how much threads are available and on how much columns can it slice – so these options are there to restrict libvpx from taking/using too much.
    Originally Posted by Planeptune View Post
    Also the last speeds in the Google's VOD examples are all wrong.
    Oh, yes. I’ve noticed these inconsistencies too. There’s still a bunch I haven’t published to the wiki w
    Originally Posted by Planeptune View Post
    I'm going to look at those links more closely, thanks.
    You’re welcome!
    Soon I’ll release Nadeshiko v1.2 and update the wiki with actual data. The new example.nadeshiko.rc.sh will have more libvpx parameters with detailed description!
    I write Nadeshiko – a Linux tool to cut short videos with ffmpeg.
    In parallel I compile a wiki about encoding with H264 and VP9.
    Quote Quote  
  12. Member
    Join Date
    Oct 2016
    Location
    Spain
    Search PM
    The vp9 documentation is really bad, and enabling the rom-mt in my experience don't hurt quality in a notable way, is more in my test I found that the files using row-mt sometime have a slightly highter SSIM and VMAF values that the files that don't use it. Also is better to limit the maximun quantifier to the range of 38-48 or at minimun the value of desired quality+8 , it hurts the bitrate but if used correctly it hurts the bitrate next to nothing while preventing vp9 fails.
    Quote Quote  
  13. Originally Posted by gdx View Post
    is more in my test I found that the files using row-mt sometime have a slightly highter SSIM and VMAF values that the files that don't use it.
    I’d like to look at the test, is it published somewhere?
    Originally Posted by gdx View Post
    Also is better to limit the maximun quantifier to the range of 38-48 or at minimun the value of desired quality+8 , it hurts the bitrate but if used correctly it hurts the bitrate next to nothing while preventing vp9 fails.
    I thought of writing a song about VP9 “Aaaand I like to 63, 63, 63…”. I actually use -b:v (vpxenc’s --target-bitrate) with -qmax (-max-q) set to desired q +5 instead of -crf (--end-usage=cq, --cq-level). Using -crf with or without -qmin and -qmax leads to heavy bitrate undershooting.
    I write Nadeshiko – a Linux tool to cut short videos with ffmpeg.
    In parallel I compile a wiki about encoding with H264 and VP9.
    Quote Quote  
  14. Member
    Join Date
    Oct 2016
    Location
    Spain
    Search PM
    That test haven't been published as it was private test for personal use and is part of the evaluation for the config of another test that is in the works. I decided to quick replicate it using a 5 minutes 720p clip of the first episode of Soushin Shoujo Matoi.

    Code:
    ENCODER: FFMPEG [libvpx-vp9 v1.7.0-227-g5476ab095]
    COMMON OPTIONS: -b:v 0 -crf 27 -threads 8 -deadline good -cpu-used 1 -frame-parallel 0 -qmax 36 -auto-alt-ref 1 -lag-in-frames 24 -g 240 -aq-mode 0 
    
    TILES | ROW_MT | 1pass FPS | 2pass FPS |  SIZE   | FFMPEG PSNR | FFMPEG SSIM | FFMPEG VMAF
        0 |      0 |        42 |       3.8 | 53695kB |   46.579878 |    0.989096 |   96.103280
        0 |      1 |        42 |       9.5 | 53456kB |   46.603315 |    0.989150 |   96.127597
        1 |      0 |        42 |       6.5 | 53840kB |   46.578242 |    0.989094 |   96.104231
        1 |      1 |        42 |      11.0 | 53611kB |   46.602365 |    0.989149 |   96.120578
        2 |      0 |        42 |       8.9 | 54110kB |   46.575647 |    0.989090 |   96.095768
        2 |      1 |        42 |      11.0 | 53880kB |   46.599416 |    0.989143 |   96.115766
    And as you can see the metrics are so close that you can tell that they are the same, and yes in this case row-mt give slightly higher values. The encoding threads are CPU limited in my case plus I noticed that row-mt it uses 4x as much CPU but at best only gives at best a 3x speedup and in this can lie the reason behind its slightly higher values (cpu-used can have different internal tweaks to compensate for a possible worse compression if you use row-mt ore something like that).

    PD. I append the VMAF XML logs.
    Image Attached Files
    Quote Quote  
  15. Originally Posted by gdx View Post
    That test haven't been published as it was private test for personal use and is part of the evaluation for the config of another test that is in the works. I decided to quick replicate it using a 5 minutes 720p clip of the first episode of Soushin Shoujo Matoi.

    Code:
    ENCODER: FFMPEG [libvpx-vp9 v1.7.0-227-g5476ab095]
    COMMON OPTIONS: -b:v 0 -crf 27 -threads 8 -deadline good -cpu-used 1 -frame-parallel 0 -qmax 36 -auto-alt-ref 1 -lag-in-frames 24 -g 240 -aq-mode 0 
    
    TILES | ROW_MT | 1pass FPS | 2pass FPS |  SIZE   | FFMPEG PSNR | FFMPEG SSIM | FFMPEG VMAF
        0 |      0 |        42 |       3.8 | 53695kB |   46.579878 |    0.989096 |   96.103280
        0 |      1 |        42 |       9.5 | 53456kB |   46.603315 |    0.989150 |   96.127597
        1 |      0 |        42 |       6.5 | 53840kB |   46.578242 |    0.989094 |   96.104231
        1 |      1 |        42 |      11.0 | 53611kB |   46.602365 |    0.989149 |   96.120578
        2 |      0 |        42 |       8.9 | 54110kB |   46.575647 |    0.989090 |   96.095768
        2 |      1 |        42 |      11.0 | 53880kB |   46.599416 |    0.989143 |   96.115766
    And as you can see the metrics are so close that you can tell that they are the same, and yes in this case row-mt give slightly higher values. The encoding threads are CPU limited in my case plus I noticed that row-mt it uses 4x as much CPU but at best only gives at best a 3x speedup
    Thank you for the test data. It’s very interesting. I was curious and made a small comparison myself. It’s truth – the SSIM scores are even, but the quality is noticeably higher with -row-mt=1. I’d expect the difference in SSIM score to be around 0.010 or at least 0.005, but no, they’re identical up to thousandths. Couldn’t see “3x speedup”, only 1/6 though.
    Originally Posted by gdx View Post
    and in this can lie the reason behind its slightly higher values (cpu-used can have different internal tweaks to compensate for a possible worse compression if you use row-mt ore something like that).
    Not sure if I understood you here. You mean, that something in the methods employed by a particular cpu-used is also affected by different row-mt? I know only that all motion estimation lies within cpu-used values, and cpu-used behaves differently based on the deadline in effect.
    I write Nadeshiko – a Linux tool to cut short videos with ffmpeg.
    In parallel I compile a wiki about encoding with H264 and VP9.
    Quote Quote  
  16. Member
    Join Date
    Oct 2016
    Location
    Spain
    Search PM
    Originally Posted by deterenkelt View Post
    Thank you for the test data. It’s very interesting. I was curious and made a small comparison myself. It’s truth – the SSIM scores are even, but the quality is noticeably higher with -row-mt=1. I’d expect the difference in SSIM score to be around 0.010 or at least 0.005, but no, they’re identical up to thousandths. Couldn’t see “3x speedup”, only 1/6 though.
    The speed up or row-mt is very dependent of the number of encoding threads, basically if you are maxing the threads of you CPU with tiles the speed up is going to be very small (10-20%) plus the gains are not linear, if you see my data both "tile 1" and "tile 2" have the same speed with row-mt enabled because is maxing in both cases the 8 thread. And I don't know why but in my machine VP9 is caped to 80% CPU use. The non-deterministic behavior of row-mt is what is off by default, that means that two encodes of the same file don't warrants that both video stream are going to be identical even if they have the same quality, while if it was deterministic both video streams are going to be identical.

    Originally Posted by deterenkelt View Post
    Not sure if I understood you here. You mean, that something in the methods employed by a particular cpu-used is also affected by different row-mt? I know only that all motion estimation lies within cpu-used values, and cpu-used behaves differently based on the deadline in effect.
    Basically yes, and is not that rare as unlike x264 or x265 most encoder internal opinion are hidden and are selected via a table mainly using the setting deadline and cpu-used (real bad names for the options).

    PD. Hmm... I have seen a fail by me as the max speedup that i have seen is around only 200%, the "3x speedup" meant to be the "3x the encoding speed"... the only excuse is that I'm not a native English speaker.
    Last edited by gdx; 12th Apr 2018 at 05:23.
    Quote Quote  
  17. Originally Posted by gdx View Post
    The speed up or row-mt is very dependent of the number of encoding threads, basically if you are maxing the threads of you CPU with tiles the speed up is going to be very small (10-20%) plus the gains are not linear, if you see my data both "tile 1" and "tile 2" have the same speed with row-mt enabled because is maxing in both cases the 8 thread.
    Eeh, I think the difference between tests №4 and №6 is in thread usefulness, and not in their count. Since threads only work when both it and -tile-columns are set to above zero, all the four tests should be capped at 8 threads set in the common options. But with --tile-columns=1 there are 4 threads per tile, and 2017 docs recommend to use <=2.

    While searching about how row-mt works I’ve found an article from the times multithreading was introduced in VP9. it tells, that row-mt refers to the rows of macroblocks (and not -tile-rows, as one might think). I had a gut feeling it would be like that, after I stumbled on “rows of macroblocks” in the description of threads in the docs from 2016. And I found a confirmation, yay.

    On the page about live encoding row-mt explained poorly, but there is also said that it “Allows use of up to 2x thread as tile columns.”. Maybe that’s why row-mt is a key to improved encoding time? Then my little speed boost can be explained – as I use 8 threads for 1080p, but my CPU is 4-core without HT, the boost is reasonably small.

    Originally Posted by gdx View Post
    And I don't know why but in my machine VP9 is caped to 80% CPU use.
    OS policy for CPU hogging processes, probably.

    Originally Posted by gdx View Post
    Basically yes, and is not that rare as unlike x264 or x265 most encoder internal opinion are hidden and are selected via a table mainly using the setting deadline and cpu-used (real bad names for the options).

    PD. Hmm... I have seen a fail by me as the max speedup that i have seen is around only 200%, the "3x speedup" meant to be the "3x the encoding speed"... the only excuse is that I'm not a native English speaker.
    Ah, kinisinai, kinisinai.
    I write Nadeshiko – a Linux tool to cut short videos with ffmpeg.
    In parallel I compile a wiki about encoding with H264 and VP9.
    Quote Quote  
  18. Member
    Join Date
    Oct 2016
    Location
    Spain
    Search PM
    Originally Posted by deterenkelt View Post
    Eeh, I think the difference between tests №4 and №6 is in thread usefulness, and not in their count. Since threads only work when both it and -tile-columns are set to above zero, all the four tests should be capped at 8 threads set in the common options. But with --tile-columns=1 there are 4 threads per tile, and 2017 docs recommend to use <=2.

    While searching about how row-mt works I’ve found an article from the times multithreading was introduced in VP9. it tells, that row-mt refers to the rows of macroblocks (and not -tile-rows, as one might think). I had a gut feeling it would be like that, after I stumbled on “rows of macroblocks” in the description of threads in the docs from 2016. And I found a confirmation, yay.

    On the page about live encoding row-mt explained poorly, but there is also said that it “Allows use of up to 2x thread as tile columns.”. Maybe that’s why row-mt is a key to improved encoding time? Then my little speed boost can be explained – as I use 8 threads for 1080p, but my CPU is 4-core without HT, the boost is reasonably small.
    Not exactly that threads only works with a minimum of --tile-columns=1, this is only valid if row-mt is not used as with --tile-columns=0 and row-mt activated it spawn 4 threads. The difference about test №6 is that is allready using the 8 threads with row-mt=0, then the gains that we see with row-mt activated is inerent to the minimun boost of row-mt due to when is activated the encoder do a better use of the threads resources. If we go to the original anouncement is not clear that row-mt only llows use of up to 2x thread as tile columns, it actually can use more but not as efficient as using 2x the threads.

    Second I admit that the encoding speed is a contaminated variable as I use the computer at the same time for other things during the encoding because that are only a reference, but ironically this number are more realistic that it appears as most people don't used machines dedicated only to encoding, my machine is a 4 core with HT and this also alters the results as the HT only boost the performance at max 30-50% that if I disable HT and this is even more complex depending of how smart the scheduler is. Also using more software threads that hardware threads can boost peroformance but that performance is caped by the hardware threads and the gain is due to better schelduling of the stalled/inactive/waiting threads as it can maximize resource use.

    Originally Posted by deterenkelt View Post
    OS policy for CPU hogging processes, probably.
    That can be the reason as I run FFMPEG in low priority.
    Quote Quote  
  19. Originally Posted by gdx View Post
    Not exactly that threads only works with a minimum of --tile-columns=1, this is only valid if row-mt is not used as with --tile-columns=0 and row-mt activated it spawn 4 threads.
    Oh, you’re right indeed. I wrote the post over the span of several hours inbetween of > important things, so I forgot to correct the first 1/3 of the post, after I got to know the truth about row-mt on ittiam.com.

    Originally Posted by gdx View Post
    If we go to the original anouncement is not clear that row-mt only llows use of up to 2x thread as tile columns, it actually can use more but not as efficient as using 2x the threads.
    > In tests[1] of encoding HD videos…
    > [1]. Tests were run on the 16-core desktop with Intel(R) Xeon(R) CPU E5-2690 0 @ 2.90GHz.
    Damn, I want a desktop with a 16-core Xeon for $2000, too

    Originally Posted by gdx View Post
    Second I admit that the encoding speed is a contaminated variable as I use the computer at the same time for other things during the encoding because that are only a reference, but ironically this number are more realistic that it appears as most people don't used machines dedicated only to encoding,
    One can count real time instead (and for that kind of test it should be preferred), but measuring wall clock time is still important. At least for us, plebs without $2000 processors. One just gotta make sure that there wouldn’t be some sudden firefox compilation in the background.
    I write Nadeshiko – a Linux tool to cut short videos with ffmpeg.
    In parallel I compile a wiki about encoding with H264 and VP9.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!