x264 soon to be gpu accelerated?

11th Sep 2010 12:51 #1

Banned

i know, i know, most of you reading this title will probably say that this topic has been beaten to death (or at least it's in a coma with no chance of regaining conciseness) but that may not be the case. before i share what i consider to be some pretty exciting news, allow me to say that i firmly believe that the x264 developers and some of the people that support them may just be the biggest bleapholes the world has ever seen.

for the past week or so i have been reading through every post i can find on "the diary of an x264 developer" site, as well as through posts on various forums of software that depends on x264 and it amazes me the amount of idiocy spewed by some very well known and respected members of the video community. now i won't call out anyone by name/handle, but let's just say these are people that develop some well known, well respected, widely used software and should quite frankly know better.

the general talking points to be taken away from everything i have read are as follows:

1) x264 is the greatest encoder ever conceived, it's way better than everything else and there will never be anything better than it, ever.

2) it's absolutely not possible to gpu accelerate x264, not at all, not in part, not in whole, it's not possible.

of the two stances, i think the second one may be the dumber of the two, particularly when the reasons given are considered. perhaps the silliest one i have seen is that x264 is "too complicated" and thus can't be ported to run on a gpu. this despite the fact that seti@home (search for extra-terrestrial life), folding@home (protein synthesis), dna sequencing, business and market analysis, cryptographic encryption and decryption and 3d modelling are all done on gpu's, as well as other h264 encoders, but no, x264 is somehow special and can't be ported.

another truly stupid claim is that the api's are not documented (if i linked to who made this claim you would think that i had somehow hacked their site and posted said nonsense myself), despite the fact that nvidia's cuda sdk is thoroughly documented with tons of code samples, despite the fact that microsoft has thoroughly documented direct compute and included the tools within visual c for developer's to write code and despite the fact that open cl is not only thoroughly documented, not only does open cl accelerated encoders already exist on both the mac and the pc, amd has also released complete documentation on how to port cuda applications to open cl via 1 to 1 equivalent function calls, but no, that's not enough documentation or tools for them to port x264.

well some university decided to do the heavy work for the x264 developers and has ported x264 to run on a gpu:

http://li5.ziti.uni-heidelberg.de/x264gpu/

this is a complete working demo of gpu accelerated x264 and they have included full source code (i downloaded the demo and looked through the code) as well as scripts to test it out for yourselves and they have submitted it to the x264 developers for inclusion within the main development tree.

want to know what the sad thing is? someone, i won't say who, connected with the handbrake project, when informed of this development, responded with the following:

"not very useful" (in reference to including gpu acceleration in a future version of handbrake either via an open cl accelerated x264 or open cl accelerated ffmpeg).

tell me that it just doesn't boggle the mind.

Quote

11th Sep 2010 12:59 #2

poisondeathray

Member

That looks promising!

#2 (for the ME portion) is already being worked on in the google summer of code project , but this project looks farther ahead and uses opencl as opposed to cuda

Quote

11th Sep 2010 20:29 #3

Puncakes

Member

Originally Posted by deadrats

[...]allow me to say that i firmly believe that the x264 developers and some of the people that support them may just be the biggest bleapholes the world has ever seen.

Permit me to express that I strongly feel as though you are quite the enormous idiot.

Originally Posted by deadrats

for the past week or so i have been reading through every post i can find on "the diary of an x264 developer" site, as well as through posts on various forums of software that depends on x264 and it amazes me the amount of idiocy spewed by some very well known and respected members of the video community. now i won't call out anyone by name/handle, but let's just say these are people that develop some well known, well respected, widely used software and should quite frankly know better.

the general talking points to be taken away from everything i have read are as follows:

1) [See the only intelligent part of this post below in the next quote.] and there will never be anything better than it, ever.

2) it's absolutely not possible to gpu accelerate x264, not at all, not in part, not in whole, it's not possible.

Surely you've spewed more idiocy in writing this one post, than can be found on all of the forum and blog posts you read. If that's what you've determined, you've either looked in all the wrong places and merely exaggerated the importance of those you are basing this off of, or, more likely, the person who let you pass 3rd grade reading comprehension should be fired.

Originally Posted by deadrats

x264 is the greatest encoder ever conceived, it's way better than everything else

This is basically true.

Originally Posted by deadrats

of the two stances, i think the second one may be the dumber of the two, particularly when the reasons given are considered. perhaps the silliest one i have seen is that x264 is "too complicated" and thus can't be ported to run on a gpu. this despite the fact that seti@home (search for extra-terrestrial life), folding@home (protein synthesis), dna sequencing, business and market analysis, cryptographic encryption and decryption and 3d modelling are all done on gpu's, as well as other h264 encoders, but no, x264 is somehow special and can't be ported.

Please learn the difference between "too complicated" and "not meaningfully parallelized," before whining like this. Either you are once again quoting an idiot, or someone was trying to dumb it down for people like yourself.

Originally Posted by deadrats

another truly stupid claim is that the api's are not documented (if i linked to who made this claim you would think that i had somehow hacked their site and posted said nonsense myself), despite the fact that nvidia's cuda sdk is thoroughly documented with tons of code samples, despite the fact that microsoft has thoroughly documented direct compute and included the tools within visual c for developer's to write code and despite the fact that open cl is not only thoroughly documented, not only does open cl accelerated encoders already exist on both the mac and the pc, amd has also released complete documentation on how to port cuda applications to open cl via 1 to 1 equivalent function calls, but no, that's not enough documentation or tools for them to port x264.

Yeah, they're documented about as well as the VP8 spec is written. Hurr hurr.

Originally Posted by deadrats

well some university decided to do the heavy work for the x264 developers and has ported x264 to run on a gpu:

http://li5.ziti.uni-heidelberg.de/x264gpu/

Oh look, you found somebody working on one of x264's developer supported Summer of Code 2010 projects. Just like the guy who's attempting to get a Cuda port of some aspects written and committed.

Originally Posted by deadrats

this is a complete working demo of gpu accelerated x264 and they have included full source code (i downloaded the demo and looked through the code) as well as scripts to test it out for yourselves and they have submitted it to the x264 developers for inclusion within the main development tree.

I hope you don't think it's actually a fully OpenCL accelerated encoder. It specifically states on the site that only the parts that parallelize well are implemented in OpenCL code, because trying to do anything else would be stupid, just as the developers have stated.

Originally Posted by deadrats

want to know what the sad thing is? someone, i won't say who, connected with the handbrake project, when informed of this development, responded with the following:

"not very useful" (in reference to including gpu acceleration in a future version of handbrake either via an open cl accelerated x264 or open cl accelerated ffmpeg).

tell me that it just doesn't boggle the mind.

First, let me remind you that full search is not very useful, and 55% faster full search is still really, really slow. Second, "While other GPU solutions claim up to 20x speedup, independent tests against unmodified x264 shows similar gains as our implementation for FullHD." isn't very impressive, seeing as x264 was already nearly as fast as GPU encoders like Badaboom at similar quality. Third, AMD has stated that they will be allowing API access to the hardware decoder on their cards, just like Cuda does. Why waste time rewriting code for GPU acceleration when there is a dedicated chip for it?

Finally, seeing as the developers you're whining about wrote, you know, NEARLY THE REST OF THE ENCODER, you have to be a quite the "bleaphole" to be posting this indignant drivel. Please think about working on your attitude and mannerisms.

P.S. Kudos to the guys who wrote this and the Cuda implementations. It should be somewhat useful for people with recent GPUs but old CPUs-- and those using placebo settings like full search, I suppose.

Quote

11th Sep 2010 21:55 #4

deadrats

Banned

Originally Posted by Puncakes

Permit me to express that I strongly feel as though you are quite the enormous idiot.

is it safe to assume that you are someone that is madly in love with one, or more, of the x264 developers? allow me to wish you guys lots of gay sex and a very happy anal life together.

Originally Posted by Puncakes

Surely you've spewed more idiocy in writing this one post, than can be found on all of the forum and blog posts you read. If that's what you've determined, you've either looked in all the wrong places and merely exaggerated the importance of those you are basing this off of, or, more likely, the person who let you pass 3rd grade reading comprehension should be fired.

again i am sensing just a tad bit of hostility, could it be that your daily dose of self sodomy didn't go as well as you had hoped? my condolences.

for the record, all one has to do is read through the "diary of an x264 developer" to find ample proof backing up what i said, the completely stupid claims and excuses they make border on laughable and the code they write looks like a retarded chimp threw it together, honestly i am amazed that it works at all.

Originally Posted by Puncakes

Originally Posted by deadrats

x264 is the greatest encoder ever conceived, it's way better than everything else

This is basically true.

it takes a serious night of drinking, followed by ample amounts of hallucinogens to arrive at this conclusion. all you have to do is ask yourself why is it if x264 is so great why is it that professional authoring and rendering houses spend up to 70 grand on pro level h264 encoders and not save the dough and use the legally free product?

Originally Posted by Puncakes

Please learn the difference between "too complicated" and "not meaningfully parallelized," before whining like this. Either you are once again quoting an idiot, or someone was trying to dumb it down for people like yourself.

if i may be so bold as to suggest a third option, perhaps my quote is accurate and the asswipe developer actually did say "too complicated". furthermore, it's silly beyond belief to claim, let alone believe that video encoding, with any codec, is "not meaningfully parallelized" when the same developers have succeeded in getting their precious little baby to scale across multiple x86 cores nearly linearly. furthermore, when you consider that a 1080p video has 2073600 pixels per frame and there are numerous calculations (both integer and floating point) that need to take place for each frame to be decoded and encoded, i would say it takes a jackass of unimaginable stature to believe that such a process can't be "meaningfully parallelized".

seriously, what are you, and the developers, smoking?

Originally Posted by Puncakes

Yeah, they're documented about as well as the VP8 spec is written.

you can't be serious, you must be a cartoon character. cuda, open cl and direct compute aren't fully documented? there's white papers, api hooks, compilers, source code, college level courses on coding for gpu's, and working examples, what more do you want. download the cuda sdk or the direct x 10/11 sdk or the open cl sk and look for yourself.

only the biggest of bombastic imbeciles would claim that there isn't enough documentation.

Originally Posted by Puncakes

I hope you don't think it's actually a fully OpenCL accelerated encoder. It specifically states on the site that only the parts that parallelize well are implemented in OpenCL code, because trying to do anything else would be stupid, just as the developers have stated.

i take it you are incapable of downloading the demo, reading through the code and understanding it.

first things first, the x264 developers have made numerous excuses as to why x264 can't be ported to run on a gpu.

second as i pointed out above the thought that video encoding, with the millions of pixels per frame and the thousands of frames per video stream, is somehow a singularly linear task that doesn't lend itself to easy parallelism is absurd to the extreme.

third, even if we ignore reality and accepted the false notion that only small parts of video encoding can benefit from parallel programming techniques, the fact remains that gpu's are significantly faster that performing both integer and floating point math calculations and if you look through the x264 code you will see that it uses equal amounts of integer and floating point math.

i suggest you read through the modified code to see to what extend they were able to port the code to open cl.

Originally Posted by Puncakes

First, let me remind you that full search is not very useful, and 55% faster full search is still really, really slow. Second, "While other GPU solutions claim up to 20x speedup, independent tests against unmodified x264 shows similar gains as our implementation for FullHD." isn't very impressive, seeing as x264 was already nearly as fast as GPU encoders like Badaboom at similar quality. Third, AMD has stated that they will be allowing API access to the hardware decoder on their cards, just like Cuda does. Why waste time rewriting code for GPU acceleration when there is a dedicated chip for it?

you have no idea what the f**k you are talking about, do you? allow me to address each "point" separately:

a) "really, really slow". let's assume for the sake of argument that this in fact an accurate assessment, you don't feel that going from "really, really, really slow" to "really, really slow" is a worthwhile endeavor? a 55% improvement is meaningless to you? if you had the chance to increase your IQ by 55% you would pass it up simply because even with such an increase you would still only be at the level of a trained chimp?

b) "x264 is nearly as fast as gpu encoders". maybe if you drank a bottle of vodka before performing said comparison it might be. using espresso couple to an x4 620 and a 9600 gso, i can encode 1080p mpeg-2 and h264 at 10 mb/s at better than 50% faster than real time (an 18 minutes video encodes in 11 minutes). i defy you or anyone else, using any currently available cpu and any x264 settings to encode an 18 minute video in 11 minutes at 1080p @ 10mb/s.

c) your third point makes absolutely zero sense, yes amd has a hardware UVD built into their cards, just like bobcat will and yes they have stated that they will be making available api hooks with their drivers so that developers may access it but what does that have to do with porting x264 (an encoder) to run on the gpu?

x264 isn't a codec (COmpressor/DECompressor) it's just an encoder that is usually mated to the ffmpeg library which handles the decoding duties. yes, someone could code a hardware decoder that runs on either the uvd or the gpu and that would free up the cpu to handle the encoding duties but why not use both and have the uvd handle exactly what it was designed to handle, namely the decoding and let the gpu, which is designed for high performance math calculations, handle the video encoding?

or does that make too much sense for you to understand?

Originally Posted by Puncakes

Finally, seeing as the developers you're whining about wrote, you know, NEARLY THE REST OF THE ENCODER, you have to be a quite the "bleaphole" to be posting this indignant drivel. Please think about working on your attitude and mannerisms.

talk about the pot calling the kettle black (or african-american so as to not be offensive).

i suppose everyone has the right to act like a bleephole (you misspelled it by the way) once in a while, seeing as how you used up a years worth of allotment i would suggest you err on the side of caution lest you be accused of abusing the privilege.

i hope your back door makes a full recovery.

thank you, scum again.

Quote

12th Sep 2010 10:36 #5

Puncakes

Member

I'm not going to touch your obsession with gay sex. If that's how you roll, that's fine, but please don't try to push your sexuality onto others. The same goes for drugs. Just because you need to smoke and use other hallucinogenics to be happy, does not mean everyone else has to stoop to that level. Please do not insult others by implying they do. This is common courtesy.

Originally Posted by deadrats

for the record, all one has to do is read through the "diary of an x264 developer" to find ample proof backing up what i said, the completely stupid claims and excuses they make border on laughable and the code they write looks like a retarded chimp threw it together, honestly i am amazed that it works at all.

I see your encoder is much better. Seriously though dude, you have issues. I honestly suggest you go see a psychologist if you feel you need to put down other people's work this much, just to make yourself feel better.

Originally Posted by deadrats

it takes a serious night of drinking, followed by ample amounts of hallucinogens to arrive at this conclusion. all you have to do is ask yourself why is it if x264 is so great why is it that professional authoring and rendering houses spend up to 70 grand on pro level h264 encoders and not save the dough and use the legally free product?

Why do DVD authoring studios use 10 year old Mpeg2 encoders? Oh right, because they are stupid. The couple people on Doom9, for example, that author Blu-Rays with "pro level" encoders, are simply too inept to lower deadzones before complaining that x264 can't achieve the same level of transparency. I can ask myself that question all I want, but it doesn't give me any reason to believe those people are any less incompetent.

Originally Posted by deadrats

if i may be so bold as to suggest a third option, perhaps my quote is accurate and the asswipe developer actually did say "too complicated". furthermore, it's silly beyond belief to claim, let alone believe that video encoding, with any codec, is "not meaningfully parallelized" when the same developers have succeeded in getting their precious little baby to scale across multiple x86 cores nearly linearly. furthermore, when you consider that a 1080p video has 2073600 pixels per frame and there are numerous calculations (both integer and floating point) that need to take place for each frame to be decoded and encoded, i would say it takes a jackass of unimaginable stature to believe that such a process can't be "meaningfully parallelized".

... Okay, if you think 1 stream core is equivalent to 1 i7 core, all hope is lost. Not to mention, you don't seem to understand in the slightest how and why x264 scales to multiple cores. Frame-based threading lowers quality more as threads increase. Slice-based threading is even worse. This is because, well, video encoding doesn't parallelize well. Gee.

Seriously, your ignorance is getting obnoxious. I'm getting rather tired of this.

Originally Posted by deadrats

you can't be serious, you must be a cartoon character. cuda, open cl and direct compute aren't fully documented? there's white papers, api hooks, compilers, source code, college level courses on coding for gpu's, and working examples, what more do you want. download the cuda sdk or the direct x 10/11 sdk or the open cl sk and look for yourself.

only the biggest of bombastic imbeciles would claim that there isn't enough documentation.

"Words words words words words words words words words"-- Was any of that useful to you? Just because something exists doesn't make it useful.

Originally Posted by Puncakes

I hope you don't think it's actually a fully OpenCL accelerated encoder. It specifically states on the site that only the parts that parallelize well are implemented in OpenCL code, because trying to do anything else would be stupid, just as the developers have stated.

Originally Posted by deadrats

i take it you are incapable of downloading the demo, reading through the code and understanding it.

Originally Posted by deadrats

first things first, the x264 developers have made numerous excuses as to why x264 can't be ported to run on a gpu.

And? The most computationally intensive parts still aren't, and cannot be reasonably or usefully done.

Originally Posted by deadrats

second as i pointed out above the thought that video encoding, with the millions of pixels per frame and the thousands of frames per video stream, is somehow a singularly linear task that doesn't lend itself to easy parallelism is absurd to the extreme.
third, even if we ignore reality and accepted the false notion that only small parts of video encoding can benefit from parallel programming techniques, the fact remains that gpu's are significantly faster that performing both integer and floating point math calculations and if you look through the x264 code you will see that it uses equal amounts of integer and floating point math.

Write the code for it, then come back and tell me how easy it was. Seriously, you're just making me sick at this point.

Originally Posted by deadrats

you have no idea what the f**k you are talking about, do you? allow me to address each "point" separately:

a) "really, really slow". let's assume for the sake of argument that this in fact an accurate assessment, you don't feel that going from "really, really, really slow" to "really, really slow" is a worthwhile endeavor? a 55% improvement is meaningless to you? if you had the chance to increase your IQ by 55% you would pass it up simply because even with such an increase you would still only be at the level of a trained chimp?

*Sigh* Cool analogy bro. About as useful as full search is.

Originally Posted by deadrats

b) "x264 is nearly as fast as gpu encoders". maybe if you drank a bottle of vodka before performing said comparison it might be. using espresso couple to an x4 620 and a 9600 gso, i can encode 1080p mpeg-2 and h264 at 10 mb/s at better than 50% faster than real time (an 18 minutes video encodes in 11 minutes). i defy you or anyone else, using any currently available cpu and any x264 settings to encode an 18 minute video in 11 minutes at 1080p @ 10mb/s.

Get i7 with hyperthreading -> Use preset ultrafast -> Bam.

Originally Posted by deadrats

c) your third point makes absolutely zero sense, yes amd has a hardware UVD built into their cards, just like bobcat will and yes they have stated that they will be making available api hooks with their drivers so that developers may access it but what does that have to do with porting x264 (an encoder) to run on the gpu?

I was talking about ffmpeg. Implementing a decoder on the GPU when the hardware decoder is there, would be really, really stupid.

Originally Posted by deadrats

x264 isn't a codec (COmpressor/DECompressor)

No "s**t," Sherlock.

Originally Posted by deadrats

it's just an encoder that is usually mated to the ffmpeg library which handles the decoding duties. yes, someone could code a hardware decoder that runs on either the uvd or the gpu and that would free up the cpu to handle the encoding duties but why not use both and have the uvd handle exactly what it was designed to handle, namely the decoding and let the gpu, which is designed for high performance math calculations, handle the video encoding?

Once again, you assume you know more about the software than the people who wrote it. It's ridiculous. I'm done.

P.S.

Originally Posted by deadrats

i suppose everyone has the right to act like a bleephole (you misspelled it by the way)

Look at your first post herp derp.

Quote

12th Sep 2010 17:52 #6

deadrats

Banned

Originally Posted by Puncakes

I see your encoder is much better. Seriously though dude, you have issues. I honestly suggest you go see a psychologist if you feel you need to put down other people's work this much, just to make yourself feel better.

i have issues?!? did you bother to read your response to my first post on this topic, one would almost think that you were somehow involved in developing x264, in which case i need to try and simplify what i say so that you may follow the conversation.

Originally Posted by Puncakes

... Okay, if you think 1 stream core is equivalent to 1 i7 core, all hope is lost. Not to mention, you don't seem to understand in the slightest how and why x264 scales to multiple cores. Frame-based threading lowers quality more as threads increase. Slice-based threading is even worse. This is because, well, video encoding doesn't parallelize well. Gee.

the claim that video encoding doesn't parallelize well is so wrong it shocks me that anyone could actually believe that drivel. as i pointed out a 1080p frame has over 2 million pixels that need to be encoded and a video stream is composed of thousands of individual frames and as i already pointed out x264 scales up to about 16 threads nearly linearly, so there is no reason why you couldn't run those 16 threads on a gpu. now it's true that the relationship between stream/cuda cores and x86 cores is not 1 to 1 but that is neither here nor there. a gpu is designed to execute each individual thread much faster than a cisc cpu can.

but let's assume for a second that you are absolutely correct, that video encoding is a singularly linear task that can't be parallelized at all. the fact remains that a gpu is many orders of magnitude faster than a cpu in linear tasks and this is easily proven by looking at some of the applications that benefit greatly from gpu acceleration:

web browsers - i'm currently running the gpu accelerated firefox beta and it smokes the sse2 optimized version i was running before. and in case you didn't know, web browsing, is a very serialized task by virtue of the fact that web sites are coded using interpreted languages, such as html and java script. what happens is that the browser reads each line of code and executes it before moving on to the next line and so on (that's how all interpreted languages work), by definition they can not be threaded (that's an overly simplified statement but considering you have obviously never written a line of code i'll let it stand at that), yet gpu acceleration provides substantial speed benefits.

similarly dna sequencing, fast fourier transforms as well as encryption and decryption of data (including brute force dictionary attacks) are carried out many orders of magnitude faster on a gpu that a cpu, it's absurd to claim that somehow video encoding wouldn't show similar benefits.

but here's the real kicker, gpu accelerated encoding has already been done, it's not like anyone's asking them to break new ground, gpu accelerated h264, mpeg-2 and wmv encoders have been around for a while, so clearly their claims about the impossibility of doing it are lies.

now if they said something along the lines of "look, we don't have any experience coding for gpu's and don't know how to go about porting our code" or " we programmed ourselves into a corner by using function pointers and we are waiting until dx11 class gpu's that support function pointers become more wide spread", i could respect that.

but the excuses they have made and that people like you propagate are truly stupid.

Quote

x264 soon to be gpu accelerated?

Thread Tools

Search Thread

Similar Threads

gpu accelerated firefox beta is out...

an honest look at gpu accelerated encoding

ATI GPU-accelerated codec

gpu accelerated firefox demo available!!!

firefox to soon be gpu accelerated