VideoHelp Forum

Try DVDFab and download streaming video, copy, convert or make Blu-rays,DVDs! Download free trial !
+ Reply to Thread
Results 1 to 12 of 12
Thread
  1. Banned
    Join Date
    Nov 2005
    Location
    United States
    Search Comp PM
    well guys, i finally joined the sandy bridge club, i waited and waited for bulldozer to come out and after seeing what it had to offer i decided to buy the cheapest SB that supports quick sync, namely the i3 2100. i ran a bunch of benchmarks, tested out a few theories and got a slew of results that i think you'll find interesting.

    first things first, if some of you caught my post entitled "in defense of bulldozer" you'll remember that i hypothesized that bulldozer probably needed more threads than is launched by default by multithreaded apps in order for it's true potential to be shown. while i don't have a bulldozer based cpu, i decided to see what effect cranking up the thread count would have on the i3 2100. before i go any further allow me to share some system specs with you:

    i3 2100, HT enabled
    gigabyte h61 based motherboard, b3 revision, bug free
    8 gigs ddr3 1333, microcenter has this at $40, couldn't resist
    used integrated gpu

    i tested with these two pieces of software: cinebench 11.529 32bit and media coder 2011 64bit build 5198, for encoding the source i used was an XDCAM mpeg-2 422 52mbps (50mbps video) 1080p TFF 2 channel pcm 1152 kbps per channel 48khz.

    the target was 720p 4mbps average bit rate, 2 channel ac3 48khz, mkv, I420 color space, motion compensating de-interlace and "natural bicubic spline" was the resize filter; source and target were on 2 different hard drives. for x264, the "ultra fast" preset was used. for x264 4 b frames and 4 reference frames were chosen, for the intel encoder 4 reference frames were used but there is no setting in media coder for setting the number of b frames (i know that reference frames are technically b frames, but there's no separate setting).

    the decoder used was mencoder and the baseline test was mencoder set to threads=auto and x264 threads=auto:

    decoder cpu <30%
    encoder cpu <40%
    710-735mb ram
    176 seconds
    1.53x real time

    mencoder auto x264 4 threads
    decoder cpu <30%
    encoder cpu <40%
    712mb ram
    193.7 seconds
    frames 8900
    speed 1.39x real time

    mencoder auto x264 8 threads
    decoder cpu <30%
    encoder cpu <40%
    725-750mb ram
    191.1 seconds
    frames 8900
    speed 1.41x real time

    mencoder auto x264 12 threads
    decoder cpu <30%
    encoder cpu <40%
    760-785mb ram
    170.4 seconds
    8900 frames
    speed 1.58x real time

    mencoder auto x264 16 threads
    decoder cpu <30%
    encoder cpu <40%
    790-820mb ram
    170 seconds
    8900 frames
    speed 1.59x real time

    i also ran a test with

    mencoder 8 threads x264 16 threads:
    decoder cpu <35%
    encoder cpu <40%
    790-825mb ram
    189.8 seconds
    142x

    but as you guys can see the sweet spot was mencoder set to auto and x264 set to 16 threads. now mind you this is a dual core with HT enabled but no turbo, extrapolating from these results one would expect a quad core HT enabled SB based processor to get max performance with mencoder set to auto and x264 set to 32 threads.

    next up i tested intel's quick sync:

    mencoder auto intel encoder auto
    decoder cpu <35%
    encoder cpu <15%
    710-730mb
    147.7seconds
    1.83x real time

    mencoder 8 threads intel encoder auto
    decoder cpu <40%
    encoder cpu <15%
    720-735mb
    142.7 seconds
    1.89x real time

    unfortunately while the intel encoder does support manually setting the thread count this implementation would cause an error if i manually tried to set the number of threads; likewise if i tried to set mencoder to more than 8 threads when the intel encoder was chosen i would get the same error. as you guys can see using the intel encoder in conjunction with mencoder set to 8 threads resulted in the fastest encode times and in terms of quality i could see no different between the x264 encodes and the intel encoder encodes.

    for cinebench i ran multiple tests:

    12 threads 2.53
    10 threads 2.61
    8 threads 2.60
    6 threads 2.52
    4 threads 2.33

    as you guys can see the highest scores were achieved by setting the app to launch 10 threads; the obvious implication is that a quad core HT enabled cpu should need 20 threads launched to achieve maximum performance.

    with regards to quick sync, i tested every app currently available that has any type of support for the technology and the best implementation currently available is media coder. as a separate test i took a 3875kbps mpeg-2 vob 720x480 4:3 DAR with 2 channel ac3 audio 2 channel and transcoded it to h264 3mbps 2 channel ac3 audio 128 kbps, 720x540 square pixels mkv, motion compensation de-interlacing, I420 color space and natural bicubic spline resize filter.

    for the x264 encode i set the decoder (mencoder) to threads=auto and x264 to threads=16, for the intel encoder i set the decoder to threads=8 and the intel encoder to threads=auto:

    x264
    6.82x real time (204.4 fps)

    intel encoder
    8.78x real time (263.1 fps)

    as you guys can see it wasn't even close, quick sync smoked x264 on "ultra fast".

    i also tested the mpeg-2 encoding capabilities of the intel encoder, simply changing the desired output to mpeg-2 instead of h264, all other settings were as above, the speed was 6.94x real time (208 fps), i honestly don't think there's any software based encoder that can match those encode times.

    the scary part is that intel claims ivy bridge will offer better image quality for quick sync and be twice as fast; march 2012 should be very interesting indeed.

    hope you guys enjoyed.
    Quote Quote  
  2. Originally Posted by deadrats View Post
    the decoder used was mencoder and the baseline test was mencoder set to threads=auto and x264 threads=auto:

    decoder cpu <30%
    encoder cpu <40%
    710-735mb ram
    176 seconds
    1.53x real time

    mencoder auto x264 16 threads
    decoder cpu <30%
    encoder cpu <40%
    790-820mb ram
    170 seconds
    8900 frames
    speed 1.59x real time

    but as you guys can see the sweet spot was mencoder set to auto and x264 set to 16 threads.
    The difference between 1.53 and 1.59 is negligible. That's not gonna turn bulldozer into a monster.


    Originally Posted by deadrats View Post
    for the x264 encode i set the decoder (mencoder) to threads=auto and x264 to threads=16, for the intel encoder i set the decoder to threads=8 and the intel encoder to threads=auto:

    x264
    6.82x real time (204.4 fps)

    intel encoder
    8.78x real time (263.1 fps)

    as you guys can see it wasn't even close, quick sync smoked x264 on "ultra fast".
    Too bad the QS encode looked like shit.

    Originally Posted by deadrats View Post
    i also tested the mpeg-2 encoding capabilities of the intel encoder, simply changing the desired output to mpeg-2 instead of h264, all other settings were as above, the speed was 6.94x real time (208 fps), i honestly don't think there's any software based encoder that can match those encode times.
    I get about 14x with CCE on my 2500K. DVD MPEG 2 to DVD MPEG 2 via DgIndex + AviSynth + Mpeg2Source(), CBR. Also with DV AVI to MPEG 2.

    Originally Posted by deadrats View Post
    the scary part is that intel claims ivy bridge will offer better image quality for quick sync and be twice as fast; march 2012 should be very interesting indeed.
    I'm sure Intel will find some artificial benchmark that's twice as fast but the real world will be different. Probably more like 50 percent. But until the quality improves a lot -- fast shit is still shit.
    Last edited by jagabo; 25th Oct 2011 at 19:44.
    Quote Quote  
  3. Dual core right? Wouln't you expect Quicksync to be faster than a dual core ?

    Try no resize, no deinterlace or filters or audio encoding. Those are CPU bound transformations, which means the CPU encoder is at a disadvantage . If you set out to test a scenario, fine, but don't draw the wrong conclusions about a specific encoder by testing something else . Scientific method 101.

    i know that reference frames are technically b frames, but there's no separate setting).
    Actually, it's an I-frame (or IDR frame to distinguish from a non GOP delineating "i" frame)


    Thanks for sharing the results
    Last edited by poisondeathray; 25th Oct 2011 at 19:31.
    Quote Quote  
  4. Banned
    Join Date
    Nov 2005
    Location
    United States
    Search Comp PM
    Originally Posted by jagabo View Post
    The difference between 1.53 and 1.59 is negligible. That's not gonna turn bulldozer into a monster.
    1.53 x 29.97 = 45.9

    1.59 x 29.97 = 47.2

    2 fps with a dual core with HT no turbo; it may not turn bulldozer into a competition crushing monster but it does show that testing it at the default thread count may not be the fairest way to see what any processor is capable of.

    Originally Posted by jagabo View Post
    I wouldn't call 30 percent faster "smoked". And the QS encode looked like shit.
    a 60 fps speed differential is getting smoked and since i didn't post any sample encodes (the encoded files are too big for me to find a host) you really can't comment on the quality of the QS encodes. with the high quality sources i used i couldn't tell the difference between x264+uf and the QS encodes; this held true if i used a target bit rate or a target size.

    I get about 14x with CCE on my 2500K. DVD MPEG 2 to DVD MPEG 2 via DgIndex + AviSynth + Mpeg2Source().
    that's 420 fps, that is pretty bad ass.

    I'm sure Intel will find some benchmark that's twice as fast but the real world will be different. Probably more like 50 percent. But until the quality improves a lot -- fast shit is still shit.
    i'm going to go out on a limb and say you are not enamored with quick sync's quality; honestly i have run tons of tests and to me it seems that x264+uf is equal in quality with quick sync; now perhaps with higher quality presets and filtering via avisynth x264 offers superior quality but as i have mentioned before i'm not the type to bit rate starve my encodes, so for me QS more than does the job and allows me to get away with using a cheaper processor since i can't really afford to pick up a sweet i7 2700k.
    Quote Quote  
  5. aBigMeanie aedipuss's Avatar
    Join Date
    Oct 2005
    Location
    666th portal
    Search Comp PM
    a sweet i7 2700k
    $50 more for a 3.5ghz than the 3.4ghz 2600k??? seems silly.
    --
    "a lot of people are better dead" - prisoner KSC2-303
    Quote Quote  
  6. Banned
    Join Date
    Nov 2005
    Location
    United States
    Search Comp PM
    Originally Posted by poisondeathray View Post
    Dual core right? Wouln't you expect Quicksync to be faster than a dual core?
    seeing as how the media coder uses software decoding and filtering i would expect a dual core to hold QS back because it can't feed it data fast enough; it's the same scenario with cuda based encoders, the data needs to be fed to the video card so just as software based encoding gets faster with a faster cpu so does gpu based encoding, more often than not the gpu is waiting around for the cpu to pass decoded frames to it for processing.

    Try no resize, no deinterlace or filters or audio encoding. Those are CPU bound transformations, which means the CPU encoder is at a disadvantage . If you set out to test a scenario, fine, but don't draw the wrong conclusions about a specific encoder by testing something else . Scientific method 101.
    you got it, the source can be obtained here:

    http://www.sonycreativesoftware.com/vegaspro/gpuacceleration

    i used the 50mbps XDCAM version of the file, no resizing or de-interlacing, no audio, the max bit rate media coder allowed me to set was 16mbps, for x264 the threads were set to 16 with threads=auto used for mencoder (handles decode duties); for intel encoder threads was set to auto and mencoder was set to 8 threads, source and target drives are different. i had to covert to I420 color space because intel encoder doesn't seem to support 4:2:2, i recall reading something in the documentation about intel's encoder only supporting nv12 (i think that's what it was), regardless i did both encodes converting the color space to I420:

    x264 - 1.05x real time or 31.5 fps
    intel encoder - 1.66x real time or 49.8 fps

    obviously, not even close speed wise. also, since the source is interlaced TFF and i didn't de-interlace, both encoders produced files with tearing, oddly enough the name of the file says it's 1080p30 but every app sees it as 1080i 29.97 fps, also both encoders undershot the target bit rate using only 14mbps instead of the requested 16mbps.

    very odd, if i de-interlace the source using motion compensation, no issues appear and both encodes are nice and crisp and clear; if i de-interlace using yadiff, i get similar tearing as with no de-interlacing.

    regardless, no matter what the set up, quick sync proves the faster solution, with the advantage increasing as bit rate increases.
    Quote Quote  
  7. Banned
    Join Date
    Nov 2005
    Location
    United States
    Search Comp PM
    Originally Posted by aedipuss View Post
    $50 more for a 3.5ghz than the 3.4ghz 2600k??? seems silly.
    i was under the impression that intel introduced the i7 2700k at the same price point as the i7 2600k, i.e. in the $270 range.
    Quote Quote  
  8. Originally Posted by deadrats View Post
    Originally Posted by poisondeathray View Post
    Dual core right? Wouln't you expect Quicksync to be faster than a dual core?
    seeing as how the media coder uses software decoding and filtering i would expect a dual core to hold QS back because it can't feed it data fast enough; it's the same scenario with cuda based encoders, the data needs to be fed to the video card so just as software based encoding gets faster with a faster cpu so does gpu based encoding, more often than not the gpu is waiting around for the cpu to pass decoded frames to it for processing.
    You 're saying a dual core decoding isn't fast enough to feed QS ? That it's the bottleneck? Have there been QS i3 vs i7 benchmarks?

    We all know x264 scales better with more cores. i7 2500 is about 1.75x faster, 2600k is about 2.2x faster for the x264 encoding pass than the i3-2100 . (Not just more cores, but higher stock clocks for those models)

    So for a dual core, and for someone that favors quality over speed QS might be a good option. All I was implying earlier was that your results are in line with normal expectations. A dual core isn't going to be very fast for software encoding or useful for video editing applications
    Quote Quote  
  9. The i3-2100 is a dual core CPU with hyperthreading so it runs 4 threads. It's somewhere in between the speed of a dual core and a true quad core (with otherwise similar features). Closer to dual than quad from what I've seen.

    By the way, x264 at ultrafast preset by default uses no b-frames on only 1 reference frame.
    Last edited by jagabo; 26th Oct 2011 at 07:00.
    Quote Quote  
  10. Banned
    Join Date
    Nov 2005
    Location
    United States
    Search Comp PM
    Originally Posted by jagabo View Post
    By the way, x264 at ultrafast preset by default uses no b-frames on only 1 reference frame.
    i manually set it to use 4 b frames and 4 reference frames. i also did practically the exact same test encode with tmpg, which allows for manually entering the number of b frames and reference frames for their qs implimentation; the speed results were identical.
    Quote Quote  
  11. Banned
    Join Date
    Nov 2005
    Location
    United States
    Search Comp PM
    Originally Posted by poisondeathray View Post
    You 're saying a dual core decoding isn't fast enough to feed QS ? That it's the bottleneck? Have there been QS i3 vs i7 benchmarks?
    media coder provides fine grained feedback for processor usage; no matter what the decoder hovered around the 25% cpu usage, manually increasing the number of threads allowed that number to approach the 35% mark, for software encoding the cpu usage by the encoder stayed under 40%, with qs the encoder usage of the cpu was under 15%, this tells me that qs is being held back by the decoder in my tests.

    i have seen qs i3 vs i7 tests, but the tests were done with cyberlink's espresso, which isn't exactly the finest qs implementation.
    Quote Quote  
  12. Originally Posted by deadrats View Post
    Originally Posted by poisondeathray View Post
    You 're saying a dual core decoding isn't fast enough to feed QS ? That it's the bottleneck? Have there been QS i3 vs i7 benchmarks?
    media coder provides fine grained feedback for processor usage; no matter what the decoder hovered around the 25% cpu usage, manually increasing the number of threads allowed that number to approach the 35% mark, for software encoding the cpu usage by the encoder stayed under 40%, with qs the encoder usage of the cpu was under 15%, this tells me that qs is being held back by the decoder in my tests.

    i have seen qs i3 vs i7 tests, but the tests were done with cyberlink's espresso, which isn't exactly the finest qs implementation.
    Not sure I'm following your logic, can you explain where you're going with the decoder CPU usage ? If it was a bottleneck, wouldn't it be pegged at 100% ? (Maybe there is a problem with the implementation of the decoder? or scaling?)

    An old generation dual core laptop can decode 1080p24 AVC ~50-60 FPS using ffmpeg-mt, which is a lot hellava lot harder to decode than MPEG2. For SD AVC 480p at ~300-350 FPS . Certainly a brand new i3 would be way way faster. I really really doubt decoding is a bottleneck.
    Quote Quote  



Similar Threads