VideoHelp Forum
Closed Thread
Page 1 of 2
1 2 LastLast
Results 1 to 30 of 50
Thread
  1. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    I've been reading up on info about the effects of multi-threading on x264. Some of the info was written a very long time ago (many new iterations of the x264 codec have been produced since then)

    I'm wondering if anyone would be interested to see some short clips encoded using an increasing number of threads (I don't know how many the max number of threads would be).

    I, for one, would like to know at what point there is a significant drop-off in quality. Unless, of course, the quality drop-off occurs with a linear scale and not a logarithmic or exponential scale.

    Let me know if there's any interest and I'll upload some short clips of the following:

    S02-E01 - The North Remembers (Source).mkv

    As well, I'll post the avs scripts and the encoding setting used.
    Image Attached Files
    Last edited by ziggy1971; 11th Jan 2019 at 23:01.

  2. I'm wondering if anyone would be interested to see some short clips encoded using an increasing number of threads
    Not really.
    Clips should be at least 10min long, to make sure 2pass rate control really has to do something. (Not sure atm. whether x264 uses multiple threads per frame or not, if it does, shorter clips are probably okay as long as they are high resolution enough,..)
    (file size would have to be kept constant, crf encodings don't make sense for such tests unless you are really understood crf and account for it's changes depending on the settings and source characteristics, since crf does not deliver constant quality over different sources.)
    Clips should use different resolution samples. (SD, HD, UHD, 8k)
    What thread counts are you planing to test? (first number that pop to mind would be: 1, 2, 4, 6, 8, 12, 16, 24, 48, 64, 128)

    I, for one, would like to know at what point there is a significant drop-off in quality.
    What methods would you use for measuring quality? What would count as a 'significant drop-off' for what method?

    Cu Selur
    Last edited by Selur; 28th Dec 2018 at 23:36.
    users currently on my ignore list: deadrats, Stears555

  3. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    Originally Posted by Selur View Post

    Clips should be at least 10min long, to make sure 2pass rate control really has to do something. (Not sure atm. whether x264 uses multiple threads per frame or not, if it does, shorter clips are probably okay as long as they are high resolution enough,..)
    Actually, I was thinking of 8-10 min clips. I don't know if shorter clips would show the same/similar results. No point in going shorter and later find out that the clips were too short to provide any usable data and then redo it all over again with longer clips.

    Originally Posted by Selur View Post

    (file size would have to be kept constant, crf encodings don't make sense for such tests unless you are really understood crf and account for it's changes depending on the settings and source characteristics, since crf does not deliver constant quality over different sources.)
    I normally don't use 2pass encoding because it targets a specific file size/bitrate. Since I have my videos on HDD I don't see a point in targeting a file size. If I were to store them on optical disks I guess I would use 2pass.

    I can do 2pass and CRF with any settings suggested and upload the results.


    Thus far I've been using a CRF=18 and the file sizes range from 494,751,893 to 497,467,531 bytes, which in my opinion, is pretty close in file size given the large range of encoding speeds (77 mins for 1 thread and about 4 mins for 32 threads).

    I used the following x264 settings, only changing the --threads 18 in the custom command line arguments.
    Code:
    cabac=1 / ref=4 / deblock=1:-1:-1 / analyse=0x3:0x133 / me=umh / subme=7 / psy=1 / psy_rd=1.00:0.15 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-3 / threads=18 / lookahead_threads=4 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=4 / b_pyramid=2 / b_adapt=2 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=240 / keyint_min=24 / scenecut=40 / intra_refresh=0 / rc_lookahead=60 / rc=crf / mbtree=1 / crf=18.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / ip_ratio=1.40 / aq=1:1.00

    For measuring quality, I'd use some avs scripts to compare the encoded clips to the originals; and I was hoping for some other input from more experienced users. I would like to upload the clips for further review, but the lack of interest doesn't seem worthwhile at the moment.

    These are some scripts provided by other members here at VideoHelp Forum which I've recently started using. Other suggestions are welcome.

    Code:
    source = WhateverSource("source.ext")
    enc1 = WhateverSource("enc1.ext")
    enc2 = WhateverSource("enc2.ext")
    Interleave(source, enc1, enc2, source)
    Code:
    source = WhateverSource("source.ext")
    enc1 = WhateverSource("enc1.ext")
    enc2 = WhateverSource("enc2.ext")
    StackHorizontal(source, enc1, enc2, source)
    I haven't got this one to work yet with 64-bit AvsPmod/VirtualDub2 (x64); I think I need a 64-bit version of hdragc if there is one.
    Code:
    q=WhateverSource("source.ext").hdragc().subtitle("source")
    x=WhateverSource("enc1.ext").hdragc().subtitle("x264")
    
    interleave(q, x)
    Originally Posted by Selur View Post

    Clips should use different resolution samples. (SD, HD, UHD, 8k)
    I agree with different resolution samples, but I don't know where I could get high quality samples at various resolutions. The only thing I can think of at the moment is using a high resolution 1080p or UHD and resizing it. Any suggestions?

    I chose the source clips I listed above because I have the original blu-ray copy currently on my system, it's 1080p, they have darker areas that are more difficult to encode and expose their flaws more easily (I think).

    Originally Posted by Selur View Post

    What thread counts are you planing to test? (first number that pop to mind would be: 1, 2, 4, 6, 8, 12, 16, 24, 48, 64, 128)
    Thread counts? I have a 16-core/32-thread CPU so I could test whatever would be appropriate; 1, 2, 4, etc. or every single thread up to whatever makes sense. I've also set the CPU frequency to a constant frequency, hoping that doing so would give more accurate results without boosting lower core counts.

    For my own testing so far I've encoded the same 1080p clip (11508 frames @ 23.976fps or almost 8 mins long) using MeGUI x264 8-bit with a basic avs script without any filtering. I've recorded the time taken to encode at each of 1-32 threads and also recorded the final file size.

    The data creates an interesting curve for the number of threads used vs. framerate (not linear) and, given the there's little difference in file sizes, I don't expect much quality degradation. I have to check the clips against the original yet.

  4. I normally don't use 2pass encoding because it targets a specific file size/bitrate.
    exactly so you have a fixed size and can compare the quality
    Since I have my videos on HDD I don't see a point in targeting a file size. If I were to store them on optical disks I guess I would use 2pass.
    you could also compare with a fixed quantizer and compare the size (problem with fixed quantizer is that adaptive quantization can't be used)

    crf doesn't make sense since both quality and size will differ,....

    Thus far I've been using a CRF=18 and the file sizes range from 494,751,893 to 497,467,531 bytes, which in my opinion, is pretty close in file size given the large range of encoding speeds (77 mins for 1 thread and about 4 mins for 32 threads).
    encoding speed will always be the worst for single threaded encoding, it only should be taken into account after being sure size and quality doesn't differ.
    Problem is crf does lots of things internally and is different for each source and setting combination.

    I agree with different resolution samples, ... Any suggestions?
    Movie trailers, Blender Films (https://www.blender.org/about/projects/) usually there are the rendered images available, Xiph Test Media (https://media.xiph.org/video/derf/),...
    I didn't mean you should test the same clip in different resolution, but if you want, resizing a high resolution source should be fine.

    they have darker areas that are more difficult to encode and expose their flaws more easily (I think).
    for dark scenes additional tweaking is usually advised,... I would recommend to take clips which are either dark or bright not a mix for testing since it allows to easier get a general idea.

    The data creates an interesting curve for the number of threads used vs. framerate (not linear)
    higher resolution samples will probably longer maintain a linear curve.

    given the there's little difference in file sizes, I don't expect much quality degradation.
    would be surprising if there would be much simply visible degradation

    I still think using crf is a flawed approach in this matter, but I wish you the best of luck.

    The problem isn't doing the encoding, it's setting up a meaningful test scenario which is reproducible and allows to draw conclusions.
    => the point is if you come up with a testing setup which is easy to reproduce and can produce meaningful results, others will be more inclined to help.

    The smaller the expected quality derivation is the more important is it to have a meaningful test scenario.
    If you have a file size range between 494,751,893 (clip A) and 497,467,531 bytes (clip B) and the quality of clip A is 100%* and the quality of clip B is 99,45%* the difference might only be due to the size difference ( 100/(497,467,531/494,751,893) = 99,4541074882) and due to the efficiency of the encoder.
    (-> so trying to keeping the file size out of it might be a good idea)

    * based on whatever measure you chose



    Cu Selur
    users currently on my ignore list: deadrats, Stears555

  5. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    Originally Posted by Selur View Post

    I still think using crf is a flawed approach in this matter, but I wish you the best of luck.
    I think I was misunderstood somewhere, but yes, CRF approach would be wrong to compare quality if the CRF CHANGES. However, I'm not changing the CRF, it remains the same, in this case CRF=18. If I changed the CRF for each encode, that would be like encoding the same clip at 2 different bitrates in 2 pass and expect the same quality. Not happening.

    I'm only changing the number of threads being used for encoding, nothing else.

    Same sample clip, same CPU speed, same encoder settings (except for the threads) and everything else the same, or at least as close as possible. I'm just testing to see how using more/less threads affects the encoding speed and quality.

    For instance, if using 12 threads yields virtually the same result as using 16 threads, but 16 threads encodes 10% faster, then why not go with more threads. (I know, exaggerated) but you know what I mean.

    Main reason for testing threads is this:
    https://forum.videohelp.com/threads/378203-Captured-a-60gb-2hr-avi-with-Vdub-Huff-enco...e2#post2442659

    I find it hard to believe that the clips would be so drastically different using the same bitrate and only changing the number of threads used.

    I will be encoding using various bitrates as soon as I have time.

  6. I'm only changing the number of threads being used for encoding, nothing else.
    So all your test results will only be valid as long as that crf is used. (not sure whether the calculation of the crf is influenced by the thread count, I hope not )

    affects the encoding speed and quality.
    Assuming that the size is fixed. Like I wrote even a seemingly small variation might influence the absolute quality.

    For instance, if using 12 threads yields virtually the same result as using 16 threads, but 16 threads encodes 10% faster, then why not go with more threads. (I know, exaggerated) but you know what I mean.
    Assuming the file size stays the same (with a neglectable error margin) with a fixed crf value, you are mainly looking how long increasing the thread count will decrease the encoding time, right?

    I find it hard to believe that the clips would be so drastically different using the same bitrate and only changing the number of threads used.
    I would be surprised too. Assuming the output file size is the same and you only changed the thread count the quality hopefully doesn't change much either. This is what was established years ago. Quality decrease was neglectable an not reliably measurable while thread count stayed <= 16. With higher thread counts some theoretically losses should apply, but even these normally shouldn't be visible in more than a handful of frames.

    I will be encoding using various bitrates as soon as I have time.
    using different resolution is probably more interesting since most of the old tests were done with SD content as far as I remember,...

    where a single frame comparison was used without looking of the rest of the frames around that frame,..

    -> do the testing try to come to a conclusion, it's a good way to learn

    Cu Selur

    Cu Selur
    users currently on my ignore list: deadrats, Stears555

  7. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    Here are some renders of the clips I posted.

    T1 = 1 thread
    T32 = 32 threads

    Compare these to each other or against the originals.

    I know there will be many nay-sayers, but I don't believe there's nearly as much loss as the photos in this link would leave you to believe; and that's just one example.

    Even the wiki page seems to have another view here.
    Image Attached Files
    Last edited by ziggy1971; 3rd Jan 2019 at 13:58.

  8. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    Same x264 settings just resize to 720p
    Image Attached Files

  9. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    Resized to 360p

  10. Originally Posted by ziggy1971 View Post

    Main reason for testing threads is this:
    https://forum.videohelp.com/threads/378203-Captured-a-60gb-2hr-avi-with-Vdub-Huff-enco...e2#post2442659

    I find it hard to believe that the clips would be so drastically different using the same bitrate and only changing the number of threads used.


    The reason you see significant differences quite easily for that tree clip , is because it's low bitrate for that specific clip

    Differences are easier to see (with human eye) , when you are in the lower bitrate range. In that that range, small differences in compression efficiency settings make a large difference. eg. If you used more references, more b-frames, those significantly affect compression in that bitrate range. Similarly, if you use very high bitrates, nothing matters ; everything "looks" the same - and you could use MPEG2 and it would look fine. If you were to plot the quality at a given bitrates, or quality vs. threads using a measurement, you will see a clear relationship. In adequate bitrate scenarios, the differences with high vs. low threads are mostly negligible . Just like using "very slow" vs. "medium" or reference or b-frames is mostly negligible - you'd only get very minor differences. I posted those comparisons years ago too, nothing has changed

    If you encoded crf 18 , you'd get like 8-10x the bitrate . I'll see if I can dig it up if you want to check for yourself if you're interested. But you can do it on only clip, it's reproducible 100% of the time. Just do a low bitrate encode scenario with threads=1 vs. threads= some high number

  11. btw. using:
    Code:
    t1 = LWLibavVideoSource("C:\Users\Selur\Desktop\S02-E01 - The North Remembers T1.mkv",cache=false,stacked=true,format="YUV420P8",repeat=true)
    t32 = LWLibavVideoSource("C:\Users\Selur\Desktop\S02-E01 - The North Remembers T32.mkv",cache=false,stacked=true,format="YUV420P8",repeat=true)
    #i = StackHorizontal(t1,t32)
    Subtract(t1,t32).Levels(127, 1, 129, 0, 255)
    I see tons of differences,... this much should be noticeable in PSNR.

    Looking at:
    File: S02-E01 - The North Remembers T1.mkv
    File: S02-E01 - The North Remembers T32.mkv
    btw. the source has 2875 and T1 has 2874 and T32 has 2874 frames,....
    In case you sure you just changed the thread value and nothing more and this is worrisome.
    users currently on my ignore list: deadrats, Stears555

  12. @selur - that would be an alignment issue, a problem with producing the encode; not a threads issue - threads shouldn't cause a framecount discrepancy. So something went wrong on his end. You would expect only negligible differences here for threads. I didn't download it here , but did you check with another source filter ? e.g. ffms2 threads =1 for the source filter ?

    One of the main reasons you reduce compression efficiency with increasing threads is truncated motion vectors , especially vertical ones. So a source that has little motion like his GoT clips - you would expect less loss.

  13. @poisendeathray:
    Okay, small correction it's:
    2874 for the source and
    2875 for the reencodes

    opening the source clip with:
    DGSource (DGDecNV): 2875 frames
    FFVideoSource: 2875 frames
    FFVideoSource (fms 2k): 2875 frames
    LWLibavVideoSource: 2875 frames

    So either all the source filters or mediainfo is wrong.
    users currently on my ignore list: deadrats, Stears555

  14. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    I opened all 3 clips in VirtualDub2 (x64) using the following scripts

    S02-E01 - The North Remembers (Source).mkv
    Code:
    LoadPlugin("D:\Applications\_Music-Video\Video Converters\MeGUI-2896-64 - 2018-12-08\tools\ffms\ffms2.dll")
    FFVideoSource("J:\My Videos - Originals - TV\S02-E01 - The North Remembers (Source).mkv")
    S02-E01 - The North Remembers T1.mkv
    Code:
    LoadPlugin("D:\Applications\_Music-Video\Video Converters\MeGUI-2896-64 - 2018-12-08\tools\ffms\ffms2.dll")
    FFVideoSource("J:\My Videos - Originals - TV\S02-E01 - The North Remembers T1.mkv")
    S02-E01 - The North Remembers T32.mkv
    Code:
    LoadPlugin("D:\Applications\_Music-Video\Video Converters\MeGUI-2896-64 - 2018-12-08\tools\ffms\ffms2.dll")
    FFVideoSource("J:\My Videos - Originals - TV\S02-E01 - The North Remembers T32.mkv")
    I downloaded my clips off here to check them against what I already had just to make sure I didn't mess up by uploading the wrong ones, same number of frames each time VirtualDub2 (x64) says 2875 on the timeline.

    However, when opening each clip via drag-n-drop, dropping each clip onto a separate instance of VirtualDub2 (x64) I do get something odd. The source clip shows 2873 frames while the S02-E01 - The North Remembers T1.mkv and S02-E01 - The North Remembers T32.mkv both show 2875 frames. Don't ask where the extra 2 frames came from, I don't know.

    I stepped through each clip to find scene cuts/splits and the frame numbers line up for all 3 clips.


    Meanwhile, I found my other script for comparison, temporarily misplaced, but here it is:

    Code:
    LoadPlugin("D:\Applications\_Music-Video\Video Converters\MeGUI-2896-64 - 2018-12-08\tools\ffms\ffms2.dll")
    # ========================================
    # If Videos start at different frames use frameadjust to align them
    frameadjust=0
    name1="Source Name"
    name2="Compare Name"
    
    # Videos to compare: (v1 is original, v2 is encoded or whatever)
    v1 = FFVideoSource("J:\My Videos - Originals - TV\S02-E01 - The North Remembers T1.mkv", cache=false).trim(frameadjust, 0)
    v2 = FFVideoSource("J:\My Videos - Originals - TV\S02-E01 - The North Remembers T32.mkv", cache=false)
    
    sub = v1.subtract(v2) 
    substrong = sub.levels(112, 1.000, 144, 0, 255) 
    
    StackVertical(StackHorizontal(v1.subtitle(name1), v2.subtitle(name2)), StackHorizontal(sub.subtitle("Difference"), substrong.subtitle("Difference amplified 8x")))
    I agree that there are some differences between the clips, but then again, you're looking at 1 thread vs. 32 threads. 1 thread took 1:22:51 (h:mm:ss) to transcode while the 32 thread transcode took only 0:04:56 (h:mm:ss).

    I'll be update the earlier post with clips of various other thread counts; I think 1, 2, 4, 6, 8, 12, 16, 24, 48, 64, 128 (as suggested earlier) should suffice. I don't know if I can do the 64/128 threads but I'll give it a try to see.

    I have noticed something else but I'm not sure what to do about it.

    The following are the MediaInfo x264 codec parameters:

    S02-E01 - The North Remembers T32.264
    Code:
    cabac=1 / ref=4 / deblock=1:-1:-1 / analyse=0x3:0x133 / me=umh / subme=7 / psy=1 / psy_rd=1.00:0.15 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-3 / threads=32 / lookahead_threads=8 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=4 / b_pyramid=2 / b_adapt=2 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=240 / keyint_min=24 / scenecut=40 / intra_refresh=0 / rc_lookahead=60 / rc=crf / mbtree=1 / crf=18.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / ip_ratio=1.40 / aq=1:1.00
    S02-E01 - The North Remembers T16.264
    Code:
    cabac=1 / ref=4 / deblock=1:-1:-1 / analyse=0x3:0x133 / me=umh / subme=7 / psy=1 / psy_rd=1.00:0.15 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-3 / threads=16 / lookahead_threads=4 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=4 / b_pyramid=2 / b_adapt=2 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=240 / keyint_min=24 / scenecut=40 / intra_refresh=0 / rc_lookahead=60 / rc=crf / mbtree=1 / crf=18.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / ip_ratio=1.40 / aq=1:1.00
    For each encoding setting I've only changed the --threads=xx value. Somehow the --lookahead_threads=xx value is changing internally.

    Could the --lookahead_threads=xx value change the outcome of the encode? If so, how much?
    Should I set it to a static value across all encodes like the --threads=xx value so it stays the same each time?


    For those who can't understand the threads vs. cores vs. logical cores concept, here's a good explanation that may help:
    http://forum.doom9.org/showthread.php?p=1576145#post1576145
    Last edited by ziggy1971; 30th Dec 2018 at 14:02.

  15. Originally Posted by ziggy1971 View Post
    For each encoding setting I've only changed the --threads=xx value. Somehow the --lookahead_threads=xx value is changing internally.

    Could the --lookahead_threads=xx value change the outcome of the encode? If so, how much?
    Should I set it to a static value across all encodes like the --threads=xx value so it stays the same each time?

    The formula for lookahead threads is threads/6 , unless you explicitly override it. As usually, the effect is mostly negligible at typical bitrate ranges. I would probably leave it default

    Nobody uses --threads 1 for actual use , unless it's for very low bitrate encodes , or testing scenarios for "best" quality (e.g. MSU) . But all the little things add up; 0.5% here and there becomes significant , especially at lower bitrate ranges

    You want slow? Insane people use placebo preset, 1 thread.

  16. The formula for lookahead threads is threads/6 ,
    @poisondeathray: How do you get that from this:
    Code:
    if( h->param.i_lookahead_threads == X264_THREADS_AUTO )
        {
            if( h->param.b_sliced_threads )
                h->param.i_lookahead_threads = h->param.i_threads;
            else
            {
                /* If we're using much slower lookahead settings than encoding settings, it helps a lot to use
                 * more lookahead threads.  This typically happens in the first pass of a two-pass encode, so
                 * try to guess at this sort of case.
                 *
                 * Tuned by a little bit of real encoding with the various presets. */
                int badapt = h->param.i_bframe_adaptive == X264_B_ADAPT_TRELLIS;
                int subme = X264_MIN( h->param.analyse.i_subpel_refine / 3, 3 ) + (h->param.analyse.i_subpel_refine > 1);
                int bframes = X264_MIN( (h->param.i_bframe - 1) / 3, 3 );
    
                /* [b-adapt 0/1 vs 2][quantized subme][quantized bframes] */
                static const uint8_t lookahead_thread_div[2][5][4] =
                {{{6,6,6,6}, {3,3,3,3}, {4,4,4,4}, {6,6,6,6}, {12,12,12,12}},
                 {{3,2,1,1}, {2,1,1,1}, {4,3,2,1}, {6,4,3,2}, {12, 9, 6, 4}}};
    
                h->param.i_lookahead_threads = h->param.i_threads / lookahead_thread_div[badapt][subme][bframes];
                /* Since too many lookahead threads significantly degrades lookahead accuracy, limit auto
                 * lookahead threads to about 8 macroblock rows high each at worst.  This number is chosen
                 * pretty much arbitrarily. */
                h->param.i_lookahead_threads = X264_MIN( h->param.i_lookahead_threads, h->param.i_height / 128 );
            }
        }
        h->param.i_lookahead_threads = x264_clip3( h->param.i_lookahead_threads, 1, X264_MIN( max_sliced_threads, X264_LOOKAHEAD_THREAD_MAX )
    source: x264/encoder/encoder.c line 1220
    Don't see how this resolved to 'threads/6',...

    You want slow? Insane people use placebo preset, 1 thread.
    Insane people additionally disable asm.
    users currently on my ignore list: deadrats, Stears555

  17. Originally Posted by Selur View Post
    @poisondeathray: How do you get that from this:
    It was mentioned by one of developers , ds / fiona
    https://forum.doom9.org/showthread.php?t=165040

    That was a few years ago - it might have changed , but some tests in another recent "thread" ( that is , "forum post" ) suggested it's still threads/6 , rounded down (in the absence of sync-lookahead) . You can verify it yourself quickly

    18/6 => 3 , 12/6 => 2 , but 9/6 => 1 (1.5 rounded down)


    Insane people additionally disable asm
    That is really insane, because it's bit identical isn't it ?

    Whereas more threads produce measurable lower quality (even if you might not "see" it at normal bitrate ranges)
    Last edited by poisondeathray; 31st Dec 2018 at 09:37.

  18. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    As reported by MediaInfo:

    threads=1 / lookahead_threads=1
    threads=2 / lookahead_threads=1
    threads=3 / lookahead_threads=1
    threads=4 / lookahead_threads=1
    threads=5 / lookahead_threads=1
    threads=6 / lookahead_threads=1
    threads=7 / lookahead_threads=1
    threads=8 / lookahead_threads=2
    threads=9 / lookahead_threads=2
    threads=10 / lookahead_threads=2
    threads=11 / lookahead_threads=2
    threads=12 / lookahead_threads=3
    threads=13 / lookahead_threads=3
    threads=14 / lookahead_threads=3
    threads=15 / lookahead_threads=3
    threads=16 / lookahead_threads=4
    threads=17 / lookahead_threads=4
    threads=18 / lookahead_threads=4
    threads=19 / lookahead_threads=4
    threads=20 / lookahead_threads=5
    threads=21 / lookahead_threads=5
    threads=22 / lookahead_threads=5
    threads=23 / lookahead_threads=5
    threads=24 / lookahead_threads=6
    threads=25 / lookahead_threads=6
    threads=26 / lookahead_threads=6
    threads=27 / lookahead_threads=6
    threads=28 / lookahead_threads=7
    threads=29 / lookahead_threads=7
    threads=30 / lookahead_threads=7
    threads=31 / lookahead_threads=7
    threads=32 / lookahead_threads=8

    Also:
    An Intel Core i7 8700K @ 3.7GHz (No Turbo) is about less than 1% faster at encoding than an overclocked Intel Core i7 4930K @ 4.6GHz using the same encoding settings. I couldn't set the multiplier in the BIOS any lower than 37 for the Core i7 8700K. So, even at the lowest speed the newer i7 8700K is at or over par with a 6 core high end chip 4 generations ago.

    Another thing of note:
    Both the i7 4930K and the i7 8700K encoded with 18 threads each (reported by MediaInfo). Given that Threadripper has many more cores you'd think you could simply reduce the threads being used by x264 to 18, or thereabouts, and get approximately the same result, right? 18 threads is 18 threads in x264 regardless of CPU... WRONG!!!

  19. @poisondeathray: https://forum.doom9.org/showthread.php?t=163901is the last time I really read about it over at doom9,...
    users currently on my ignore list: deadrats, Stears555

  20. For lookahead threads, other people get different values . I get the same as these on a Haswell 4C/8T

    Code:
    threads=9 / lookahead_threads=1 
    threads=16 / lookahead_threads=2
    threads=18 / lookahead_threads=3

  21. Dinosaur Supervisor KarMa's Avatar
    Join Date
    Jul 2015
    Location
    US
    Search Comp PM
    Originally Posted by poisondeathray View Post
    Originally Posted by Selur View Post
    Insane people additionally disable asm
    That is really insane, because it's bit identical isn't it ?
    Would be an interesting test.

  22. Dinosaur Supervisor KarMa's Avatar
    Join Date
    Jul 2015
    Location
    US
    Search Comp PM
    Originally Posted by poisondeathray View Post
    Originally Posted by Selur View Post
    Insane people additionally disable asm
    That is really insane, because it's bit identical isn't it ?
    Tested out turning ASM on and off, and it turns out they are not bit identical. I used the 3 minute, mp4, 30MB, 1280x720 source and encoded it twice with x264. Once with ASM enabled (using MMX2 SSE2Fast SSSE3 SSE4.2 AVX XOP FMA3 BMI1) and another with ASM completely disabled. All other settings were the same (x264 very slow). Then I took these two encodings and converted them to Uncompressed YUV 420, and compared the CRC32 hashes of them. They had different hashes. There was also a 100 byte difference between the two x264 encodings, which I originally thought could just be metadata differences but now I think it's actual video data differences.

    When actually doing Subtract in avisynth, the differences are more than minor. With massive differences in certain situations, mostly motion. I can only imagine the ASM disable has better quality but I have not gone that far. So it's obviously more than a few random differences but noticeably different, visually. Example of the subtraction.
    Click image for larger version

Name:	SampleVideo-ASM Subtract.png
Views:	190
Size:	737.2 KB
ID:	47699

    Code:
    LoadPlugin("C:\...\avss.dll")
    A=dss2("C:\....\SampleVideo-ASM.mkv", fps=25.000).AssumeFPS(25,1)
    B=dss2("C:\....\SampleVideo-NO-ASM.mkv", fps=25.000).AssumeFPS(25,1)
    Subtract(A,B).tweak(cont=1.4,bright=-30)
    Image Attached Files

  23. Originally Posted by KarMa View Post
    Tested out turning ASM on and off, and it turns out they are not bit identical.
    Interesting . I verified this on another video . There is a difference in favor of --no-asm , if you can believe psnr (they "look" the same, because it was a quick crf 18 test) . And they are very close to the same size .

    no asm
    [Parsed_psnr_2 @ 000000f65dfe8300] PSNR y:38.316090 u:43.099870 v:47.534419 average:39.611963 min:36.891034 max:45.488224

    default
    [Parsed_psnr_2 @ 0000005857703140] PSNR y:38.315517 u:43.104735 v:47.534387 average:39.611810 min:36.895396 max:45.495271


    For threads, the effect is well documented , reproducible, measureable . The mechanism is understood and it makes sense why it happens , and, in what situations you'd expect the effect to have more influence, or less influence etc... But for --no-asm not sure of the mechanism , or why /how there would be a difference ?

    The bitrate deltas seem to be larger with threads (say 1 vs. 36) in crf mode . --no-asm seems to produce much smaller differences if you only look at bitrates/filesizes in CRF mode. That suggest --no-asm produces even smaller differences than threads , but you'd have to investigate it more thoroughly . It was only about 0.03% difference in size on this test . But threads 1 vs 36 was about 5.5% on this clip (you couldn't use crf to compare here, or you'd have to use fractional crf test runs to get it closer in filesize)

  24. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    Originally Posted by poisondeathray View Post
    For lookahead threads, other people get different values . I get the same as these on a Haswell 4C/8T

    Code:
    threads=9 / lookahead_threads=1 
    threads=16 / lookahead_threads=2
    threads=18 / lookahead_threads=3
    OK, that's interesting. Did you use the same source and settings I used?
    Perhaps the actual number of CPU cores in the system affects the threads used.

    I added some more clips above with different threads used; any thoughts, opinions?
    Last edited by ziggy1971; 3rd Jan 2019 at 19:11.

  25. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    Originally Posted by Selur View Post

    Clips should be at least 10min long, to make sure 2pass rate control really has to do something. (Not sure atm. whether x264 uses multiple threads per frame or not, if it does, shorter clips are probably okay as long as they are high resolution enough,..)
    Any specific settings you'd like me to test?

    Right now I'm just running the Automated 2pass with the following encode settings (as displayed in MediaInfo):

    Code:
    cabac=1 / ref=4 / deblock=1:-1:-1 / analyse=0x3:0x133 / me=umh / subme=7 / psy=1 / psy_rd=1.00:0.15 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-3 / threads=32 / lookahead_threads=8 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=4 / b_pyramid=2 / b_adapt=2 / b_bias=0 / direct=1 / weightb=1 / open_gop=0 / weightp=2 / keyint=240 / keyint_min=24 / scenecut=40 / intra_refresh=0 / rc_lookahead=60 / rc=2pass / mbtree=1 / bitrate=10000 / ratetol=1.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / cplxblur=20.0 / qblur=0.5 / ip_ratio=1.40 / aq=1:1.00

    (file size would have to be kept constant, crf encodings don't make sense for such tests unless you are really understood crf and account for it's changes depending on the settings and source characteristics, since crf does not deliver constant quality over different sources.)
    CRF doesn't deliver constant quality over different sources, but then again, using the same bitrate setting doesn't deliver constant quality over different sources either. 10Mbit/s may look good for one source with low motion, but may not be nearly enough for high action films.

  26. Originally Posted by ziggy1971 View Post

    OK, that's interesting. Did you use the same source and settings I used
    Several different sources , just default --crf 18 --threads "x"

    Not sure why the difference


    CRF doesn't deliver constant quality over different sources, but then again, using the same bitrate setting doesn't deliver constant quality over different sources either. 10Mbit/s may look good for one source with low motion, but may not be nearly enough for high action films.
    Yes, and that's the whole point of testing . You want to determine what is the "quality level" at a given bitrate . You can define "quality" however you like: objective metrics, subjective, etc.... That' s how crf is tested. CRF 18.2, 18.4, 18.6 , etc... and you measure the bitrates and quality. You perform multiple encodes with a given set of settings, and if looking at metrics, you can plot the curves (quality on the y-axis , bitrate on the x-axis.) You can repeat the same thing with 2pass rate control . It's a lot of work. CRF actually closely approximates 2pass encode, if they end up the same bitrate with other settings the same.

  27. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    Originally Posted by Selur View Post
    btw. using:
    Code:
    t1 = LWLibavVideoSource("C:\Users\Selur\Desktop\S02-E01 - The North Remembers T1.mkv",cache=false,stacked=true,format="YUV420P8",repeat=true)
    t32 = LWLibavVideoSource("C:\Users\Selur\Desktop\S02-E01 - The North Remembers T32.mkv",cache=false,stacked=true,format="YUV420P8",repeat=true)
    #i = StackHorizontal(t1,t32)
    Subtract(t1,t32).Levels(127, 1, 129, 0, 255)
    I see tons of differences,... this much should be noticeable in PSNR.

    Looking at:
    File: S02-E01 - The North Remembers T1.mkv
    File: S02-E01 - The North Remembers T32.mkv
    btw. the source has 2875 and T1 has 2874 and T32 has 2874 frames,....
    In case you sure you just changed the thread value and nothing more and this is worrisome.
    @poisondeathray
    Did you have a chance to look at the clips I uploaded that were encoded using a different number of threads? Selur said he saw "I see tons of differences...", but didn't elaborate on his findings or follow up with some reviews on clips using less threads.

    I do see some differences when moving up in the number of threads being used, but I wouldn't exactly call it "tons" of difference. I'd say there's less than 1% difference between the clips, but that's my opinion with my experience.

    However, if you want to compare the loss of quality is worth the speed gains, well, that's another story. I did keep track of encoding times, frame rates, file sizes, etc. in a spreadsheet and created graphs to show the differences.

  28. Originally Posted by ziggy1971 View Post
    @poisondeathray
    Did you have a chance to look at the clips I uploaded that were encoded using a different number of threads? Selur said he saw "I see tons of differences...", but didn't elaborate on his findings or follow up with some reviews on clips using less threads.
    The reason is because the frames were not aligned (different framecount, not comparing the same frames) , and the method he was using - there will be large differences. Same with metrics like PSNR, SSIM, etc... if you are n+/-1 , they will say massive differences. Even if you shift the pixels , say 1 to the right, it will say massive differences, when the picture is identical otherwise.

    If you aligned them up , you would expect only very minor differences at that bitrate with those samples when changing --threads (this is even with metrics, or difference testing) . Whereas if you picked section with more action , you'd expect slightly more differences at that bitrate range.

  29. Member
    Join Date
    Jan 2007
    Location
    Canada
    Search Comp PM
    So, if I understand you correctly, it is the source that I'm working with. The fact that it is a section cut from the original, maybe offset from an I frame or something, that is the issue? Would cutting/splitting on an I frame help? Or is this just a common result of splitting any clip at any location and nothing can be done to rectify it?

    If it is an issue that can be corrected, can you let me know how to do it? For instance, cutting on an I frame?

  30. Originally Posted by ziggy1971 View Post
    So, if I understand you correctly, it is the source that I'm working with. The fact that it is a section cut from the original, maybe offset from an I frame or something, that is the issue? Would cutting/splitting on an I frame help? Or is this just a common result of splitting any clip at any location and nothing can be done to rectify it?

    If it is an issue that can be corrected, can you let me know how to do it? For instance, cutting on an I frame?

    For "S02-E01 - The North Remembers (Source).mkv" - it's because it wasn't cut on an "IDR" frame , or a true keyframe. BD's can have "i" frames or NOT true keyframes with open GOP's .

    Different source filters handle leading b-frames differently; some drop them, some keep duplicate "placeholder" frames

    e.g. If you open it up with ffms2, there is a duplicate frame at the beginning . But whatever you use to do encoding should keep that duplicate frame, provided you use the same decoding method for everything (e.g. if you used ffms2 for the script to encode, that's what you should be using to check later, or any tests) .

    You need to be consistent for everything. If a source isn't cut cleanly , then you could still use it - provided you use the same frames. eg. You might use the same script, same decoding method for everything. Or you might specify a range with Trim(start,end) .




Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!