It's been a while since I had a chance to do any encoding tests, so I figured I would do one with a test source I have never used, namely the nearly 100gb Meridian test file that can be found one Xiph Derf.
There's a number of things I need to explain about this test:
I'm sure that many people will complain and criticize this test and in some ways I will actually agree with them, The source is 1058 Mb/s 3840x2160, 59.940 fps, JPEG2000 (BCM@L6), 4:2:2 10-bit and I used Fedora 33 running on an HP laptop that has a i5-1035G1 CPU with an integrated UHD Graphics (ICL GT1).
I have upgraded this laptop from 8gb (1x8) @ 2400mhz to 16gb ddr4 (2x8) @ 2666mhz and I have upgraded the cheap 128gb ssd and 1tb 5400 rpm hdd to 1tb NVMe and 1tb SSD.
This codec is brutal to decode and I have never seen a system that could decode it in real time, in fact according to media info they created this file using Colorfront Transkoder running on Win 7 and from everything I can find it seems Colorfront Transkoder is GPU accelerated.
For test I used Handbrake and used x264 very fast 10-bit and Intel's qsv hevc 10-bit quality setting, and both were set to use 5Mb/s in one pass mode, 1280x720@59.94fps.
Encode speed for the x264 encode was 2.98fps and for the Intel hevc encode it was 3.02fps. There is a severe decode and resize bottleneck, with both encodes using about 50% of the cpu for the entire 5+ hours per encode.
This is why I didn't do a 2 pass encode or use a slower preset for x264, it took so long with these settings that I consider it unusable for normal encodes.
Furthermore, I tried a CRF encode but it only used about 2200kb/s and the quality was atrocious. The interesting thing is that even though I chose a 1 pass 5mb/s, both encoders only used 4200kb/s.
Unless you use software that has hardware decode of this codec, you will not be able to transcode this in real time. The reason is that you would need a cpu with 20 times the cores and threads and even then you will bottlenecked by the resize filter, so you would need hardware resizing as well.
Thoughts are welcome.
+ Reply to Thread
Results 1 to 6 of 6
-
Last edited by sophisticles; 6th Jan 2021 at 18:34.
-
Why not perform all the preprocessing separately, and encode from that separate intermediate ? This way you could separate preprocessing steps from actual encoder(s) - you could determine the actual encoding speeds instead of contaminating the results pre-processing bottlenecks. Also, you could do many tests quickly, instead of incurring the bottleneck penalty each encoding run
Or is that what you're testing on purpose ? ie. are you testing a specific workflow such as when faced with bottlenecks ?
Why did you use handbrake ? Do they have true 10bit pipe yet ? It' s not a common tool for someone using a 10bit422 source planning on a 10bit encode for that intermediate 8bit reason
Why did you choose to look at x264 ? Wouldn't x265 make more sense ? HEVC has 10bit decoding support in hardware, much more common. AVC has almost none . Usually the only time 10bit AVC is used is for intermediates and acquisition (AVC-Intra/Ultra variants) -
You raised many of the objections I expected, and in fact grappled with myself.
In order:
I did create a number of intermediates, including x264 lossless, ut video, and huffy. The thing is each took over 20 hours to create and resulted in such huge but rates that there was still a bottleneck.
I had hoped to feed the x264 lossless into avidemux, because it supports both hardware decode via vaapi and vdpau and hardware resize via vaapi, but I found it wouldn't work.
I ended deleted all the intermediates I made in frustration, only to realize that I was using the i965 driver instead of the iHD driver. Now that I just checked, not only does the hardware decode works but so does the opengl and vaapi resize, so I may redo it with a lossless x264.
I can't stand handbrake and no it still does not support a true 10-bit pipe, but it does make it easy to create a batch job and it does allow for 10-bit intel hevc, 2 things that avidemux does not and shotcut does not support 10-bit hevc encoding, at least not easily, I may be able to do it by manually adding the options, I haven't tried it.
I didn't use x265 because it's so cpu intensive that the encode tests would have taken twice as much time.
As for 10-bit hevc, maybe 4:2:0 hardware encode is becoming common but the only thing that features hardware 10-bit 4:2:2 decode is Apple's new M1.
I'm going to redo this test using x264 lossless intermediate and avidemux hardware decode and resize, but with 8-bit target hevc and x265 encode.
Check back in a day or two. -
What is your goal ? I thought you were encoding 1280x720p59.94 ? How is that lossless codec bitrate going to be a bottleneck? It's going to decode a lot faster than that UHD MJPEG source
Are you encoding 10bit420 or 10bit422 ? Huffyuv classic does not support 10bit eitherway
UT Video supports 10bit 420 and 10bit 422, but not encoding from libavcodec/ffmpeg . Only windows VFW/ACM supports encoding 10bit for UT
FFVHuff (ffmpeg) and FFV1 support all those pixel format configurations
I would simplify all the preprocessing steps into the intermediate if you were set out to test encoders (not some workflow). That way you're testing the actual encoder and encoding speed, not some other bottleneck or other factors like resizing or pixel format conversions (which are not part of the encoder) -
My goal was 2 fold, I wanted to watch the Meridian movie and simultaneously I wanted to test the quality of Ice Lake's QSV encoder against x264. Initially I was just going to watch the 720p version and run encoding tests using the 4k version, so I created 4k intermediates and those were the ones that had data rates that resulted in bottlenecks just as big as the source.
Now that I have watched the movie, I am going to make an x264 lossless 4k version, feed that into avidemux, which hopefully will allow me to use hardware decode and eliminate that bottleneck and then do a bunch of test encodes in a reasonable amount of time. -
Would a sower preset be slower though? It's generally only slower if a faster preset is already utilizing 100% of the CPU. If it's not, because there's a bottleneck earlier on, doesn't that mean there's free CPU cycles for a slower preset to use without it actually being slower? If you're running low on memory and a slower preset is looking further ahead, maybe that'd make a difference.
Does that mean the CRF value should have been lower?
Average bitrate encoding doesn't come close to CRF or 2 pass for quality anyway. Whenever I read "bitrate" without a 2 pass qualification for x264, I lose interest.Avisynth functions Resize8 Mod - Audio Speed/Meter/Wave - FixBlend.zip - Position.zip
Avisynth/VapourSynth functions CropResize - FrostyBorders - CPreview (Cropping Preview)
Similar Threads
-
Ice Lake QSV Review/Test!!!
By sophisticles in forum Video ConversionReplies: 19Last Post: 1st Oct 2020, 09:14 -
10 bit HEVC x264/265 BD player?
By TheWired in forum Newbie / General discussionsReplies: 11Last Post: 16th Jul 2020, 02:34 -
Best CPUs For Encoding In HEVC 10-bit
By vipies in forum ComputerReplies: 24Last Post: 31st Dec 2019, 17:26 -
Why AVC looks better than 10-bit HEVC with same CRF
By vash1 in forum Newbie / General discussionsReplies: 5Last Post: 22nd Nov 2018, 06:53 -
Spectre and Melton Preformance Issues - Ice Lake vs Coffe lake
By Snakebyte1 in forum ComputerReplies: 0Last Post: 21st Mar 2018, 09:20