Hi all,
I am newbie at ffmpeg and have just managed to get my hands on a mac and install the package.
I have some reference video sequences which are 1080p60 and 1080p50 in v210 encoded format, saved in an avi container.
I am attempting to extract individual frames while still preserving the bit depth. I am also using MATLAB, for processing these individual frames. I mainly care about the luma component.
In essence I would like to do the following:
1. Convert v210 encoded video to image sequence / file sequence
2. Convert the image sequence/file sequence back to v210 encoded video.
I am attempting to do some manipulation on the individual frames and put them back together.
I would like to not induce any color space conversions in the process.
Any help is greatly appreciated. Thanks a lot!
+ Reply to Thread
Results 1 to 30 of 48
-
-
v210 is Y'CbCr 10bit 422 . Since you do not want any colorspace conversions, what image sequence format do you want that is Y'CbCr 10bit 422 that is compatible with matlab ?
-
Thanks for a response.
I understand it is 10-bit Y'CbCr 422. You are right, I have not really had much success reading files into MATLAB that are not in RGB space.
The alternative would be 'perhaps' to read in a binary file and parse the data. I am not sure if I can use any such file format/extension/container available in ffmpeg to generate the binary file(s). I recall not having much success with *.yuv files which were sent to me by someone I was collaborating with a few months back. They ended up packing the data into TXT files and marking Y,U,V channels separately, solely because we were handicapped by MATLAB and were short on time for some analysis.
Alternatively, if I know what color space conversion is used (say based on BT.601 or BT.709), I can handle the RGB data as long as it would be 10-bits. (I think the option would be to use a 16-bit container for the 10-bits as 10-bit support is not offered in ffmpeg for good reasons). I could also use some help here (which commands to use) to get a handle on the color space conversion and also generate RGB image sequence. Consequently, (hopefully) use these images to combine back to v210 encoded video sequence.
I really appreciate any help! Thanks again! -
ffmpeg doesn't have a native 10bit RGB implementation . It's 10bit RGB implementation (gbrp10le or gbrp10be) is actually stored as YUV, and uses a lossless RGB<=>YUV transform function. The problem is matlab probably isn't going to understand that, unless it has ffmpeg to translate or uses the same lossless transform function
ffmpeg does have a native 16bit RGB implementation, and more common image formats like TIFF, PNG, SGI support it. But then you introduce other variables in the 10bit 422 => 16bit RGB conversion. Not only the matrix used, but the chroma up (and downsampling if going back to YUV 422) algorithm used. In theory, only the "Nearest Neighbor" or point resizing is reversible & lossless. I guess if you're "mainly" interested in the Y' component, then the chroma issues are less irrelevant
(dpx/cineon image sequence can hold 10bit YUV 422 as one of the variants, but ffmpeg doesn't offer it, only the RGB implementation ) -
Yes, there are losses from round tripping it. Binary compare, amplified differences don't match . There are always some rounding loss from going Y'CbCr<=>RGB as they are non overlapping color models
If you still want to pursue this, the syntax is -pix_fmt bgr48le for 16bit RGB. You can specify a scaling algorithm by usings -sws_flags . e.g. nearest neighbor would use "neighbor". I think default is bicubic
The "%04d" is the number of placeholder digits in the image sequence. So output%04d.tiff would be output0000.tiff , output0001.tiff etc.. ; where as output%05d.tiff would give output00000.tiff, output00001.tiff , etc....
eg.
Code:ffmpeg -i "input.avi" -pix_fmt bgr48le -sws_flags neighbor -start_number 0 -an output%04d.tiff
-
Thanks a lot, that is very helpful. I really appreciate your help.
Chroma interpolation completely slipped my mind, thats a very important point.
I did try the command you mentioned, and it returned me 16-bit per channel deep image sequence (auto selected rgb48le instead of bgr48le). Loaded into MATLAB to plot histograms and check ranges to just verify all the 16 bits are being used. I was just curious to see how this is true 16-bit while using all of them.
I think I understand from your description that video to image sequence used Rec601 by default and image sequence to video (output v210) uses Rec709. Correct me if I am wrong.
Let me briefly describe what may be the small problem/missing piece in my workflow and the potential solution.
1. I will convert my video sequence to RGB (48bit), and manipulate in MATLAB (add a frame index as text for alignment later).
2. Convert these sequences (images) back to v210 encoded AVI sequence.
3. This AVI sequence is used to test performance of an encoder/decoder. I can also capture decoder output in v210 encoded AVI on a disk.
4. To compare my reference (in step 3) to decoder output, I only need to extract Y' channel from both these sequences. These two sets of Y' data as long as readable in MATLAB would suffice for my workflow.
How I arrive at end of Step 2 is perhaps not super important. Chroma interpolation, color space conversion (back and forth) would be 'ok' theoretically.
As a side bar, how can I convert the image sequences to a v210 encoded sequence by specifying frame rate etc.
I would really appreciate any pointers. Thanks again for walking me through some of the commands and getting me started.Last edited by lotus5; 19th Aug 2014 at 17:08.
-
So what did matlab show ?
I think I understand from your description that video to image sequence used Rec601 by default and image sequence to video (output v210) uses Rec709. Correct me if I am wrong.
The thing is ffmpeg is always changing, there are commits almost daily. So what I say now, might not be true tomorrow.
ffmpeg used to use Rec601 by default for all YUV<=>RGB conversion (I don't know if that's still the case), unless the output was v210, then it would automatically switch to 709. That would potentially screw up your workflow, because if it used 601 convering to RGB, then 709 back to v210.....
And it never used to read source flags, but it does now . You have to verify / test the results to be sure. I might run some quick tests later if I have time
Some of the new filters might give you more control. For example -vf scale now has an 'in_color_matrix' , ‘out_color_matrix’ switch. But you have to be careful , because some of the filters work in 8bit
https://www.ffmpeg.org/ffmpeg-filters.html#scale-1
1. I will convert my video sequence to RGB (48bit), and manipulate in MATLAB (add a frame index as text for alignment later).
2. Convert these sequences (images) back to v210 encoded AVI sequence.
As a side bar, how can I convert the image sequences to a v210 encoded sequence by specifying frame rate etc.
Otherwise uses the same sprintif syntax for the image sequence as input:
It's not necessary to input the -pix_fmt, because ffmpeg will automatically use " -pix_fmt yuv422p10le" when "v210" is specified
eg.
Code:ffmpeg -i input%04d.tiff -r 30 -c:v v210 -an output.avi
4. To compare my reference (in step 3) to decoder output, I only need to extract Y' channel from both these sequences. These two sets of Y' data as long as readable in MATLAB would suffice for my workflow. -
Yes , confirmed, 601 is still used by default for Y'CbCr => RGB conversion, AND back RGB => Y'CbCr. Even from RGB to v210 (I was wrong above about v210 automatically using 709, or something has changed)
If you want 709 for the conversion to RGB, the options are
Code:1) -vf scale=in_color_matrix=bt709:out_color_matrix=bt709 or 2) -vf colormatrix=bt709:bt601
And RGB to YUV should be the same for -vf scale, but -vf colormatrix would probably be bt601:bt709, but I didn't these these because if you use the default 601 for both trips, it should be Ok, and it looks cleaner (less chroma aliasing) -
Thanks a lot for running it and checking!
MATLAB histogram analysis revealed most codes in 16-bit [0, 65535] were used. Min and max were as expected and all codes were used for the one frame I read in.
I attempted to 'regenerate' the video file, by using the defaults
Code:ffmpeg -i input%04d.tiff -r 50 -c:v v210 -an output.avi
EDIT: Does work with QT Pro I have on a windows machine.
I get a warning?Code:[v210 @ 0x7f.......] bits per raw sample: 0 != 10 bit
-
Thank you! I am only using ffmpeg on the mac. AVI is still the desired format.
Observing something odd: The original sequence is 10 sec and 2.59 GB . The reconstructed sequence is 19 or 20 sec and 2.59GB also. Somehow the reconstructed sequence also not 'look' right (seems like judder), when played back in VLC or QT Pro. Windows explorer shows original sequence at 2211857 kbps while reconstructed one at 200 kbps (a little unbelievable as the quality would look really bad at this rate).
To say the least I am confused :-S
EDIT: The original sequence is also at 50fps. I did not change the frame rate which could possibly introduce judder. -
what does mediainfo(view=>text) say about the output file?
what does the ffmpeg log or console text show about the frame rate ?
You might have to use -r 50 as an input option (ffmpeg is picky where you put the switches) . There is a difference between -r as an input vs. output option
Code:ffmpeg -r 50 -i input%04d.tiff -c:v v210 -an output.avi
-
The argument reordering has fixed the 'length' issue. The sequence is now 10 seconds. Thank you!
Windows Explorer still shows 200 kbps, but MediaInfo seems happy.
I also took a small snapshot from terminal while the video was being generated.
Thanks a lot! Just also trying to understand why ffmpeg shows 200 kbps as well in the parameters and what it means. Bit rate at the very bottom seems more believable.
I really appreciate your help! -
I wouldn't pay any attention to that 200kb/s. v210 is uncompressed, so always the same bitrate (might be a few bytes difference between different headers , metadata or container differences), but the coded frame size will always be the same for each frame. It's analgous to how a BMP will always be the same size for a given frame dimension and bit depth
Filesize = Bitrate * Running Time -
Thank you! That helps.
I need to figure out now how to extract Y' - something I understand from the files without doing color space conversion and chroma interpolation. I believe we are still mixing some chroma information in generating the Green Channel in case of RGB48. Green Channel could otherwise be treated as Y' but this conversion is a little more intricate. I wish there was a simpler way.. -
Yes, as soon as you go into RGB, there is some loss, both due to rounding and illegal (negative) values from non overlapping colorspace. You would have to avoid RGB completely
ffmpeg doesn't have a 10bit greyscale pix_fmt, only 8bit and 16bit. If your source was 8 or 16bit I think you could use -pix_fmt gray for 8bit or gray16be / gray16le for 16bit .
There is a filter -vf extractplanes that can manipulate channels, you might be able to do something with that . eg. maybe extract Y' as express it as and R plane
https://www.ffmpeg.org/ffmpeg-filters.html#extractplanes
Or you can read up on the lossless GBR matrix , the lossless YUV<=>RGB transform equation, maybe program it into matlab . RCT (reversible color transform) is used in FFV1 for example -
Thanks a lot for the help! Sorry I went on a break and now back to tackling and moving forward.
I am using the 'filter_complex' and 'extractplanes' as follows though I get y.avi to be encoded again in some compressed format. The quality is quite bad and for a 2.7 GB sequence, the y.avi is only 6.8 MB.
I wanted to try out this filter and see what I get in the luma and chroma streams.
I am using the following for the v210 avi sequence :
Code:ffmpeg -i video.avi -filter_complex 'extractplanes=y+u+v[y][u][v]' -map '[y]' y.avi -map '[u]' u.avi -map '[v]' v.avi
-
You can force a codec with -c:v (used to be "-vcodec", you can still use it)
Try -c:v copy , or -c:v v210
Not sure what will happen with v210, to the "blank" chroma channels
Or try -c:v rawvideo for raw yuv -
Thank you, that worked! Sorry if this was quite basic, I am still learning how to use ffmpeg library.
EDIT: I used -c:v v210
Code:ffmpeg -i input.avi -c:v v210 -filter_complex 'extractplanes=y+u+v[y][u][v]' -map '[y]' y.avi -map '[u]' u.avi -map '[v]' v.avi
I am considering exporting luma channel video (y.avi) to image sequences again. Which -pix_fmt can I use for that? Is there something specific for 10/16 bit grayscale only?
for the -c:v rawvideo do I need to change my arguments or I can still use the same extractplanes filter? If I use the same I get y.avi with 8-bit depth per channel in RGB colorspace
Thanks again!Last edited by lotus5; 25th Aug 2014 at 17:30.
-
There is no 10bit format for greyscale in ffmpeg . The chroma values are greyed out with this
Code:ffmpeg -i input.mov -vf extractplanes=y -c:v v210 -an output.avi
If you export to image sequence, you should be able to use -vf extractplanes with -pix_fmt gray16be or gray16le (difference between be and le is big vs. little endian) . Not sure what image format supports that directly, but it looks like TIFF can
Not sure what happens if you use extractplanes with -pix_fmt rgb48le. In theory, it should only be giving you the Y channel exporessed as 16bit RGB
Code:ffmpeg -i input.avi -vf extractplanes=y -pix_fmt rgb48le -an output%04d.tiff
If you use -pix_fmt gray16be, filesize is much smaller, no error messages. ffplay can parse it, but other typical image viewers might not understand it . Matlab might not be able to understand it
Code:ffmpeg -i input.avi -vf extractplanes=y pix_fmt gray16be -an output_gray16be_%04d.tiff
-
Update :
Played a bit with two test sequences I have. The verified a couple of things in MATLAB as far as signal ranges are concerned. Some findings:
Case 1
Code:ffmpeg -i input.avi -vf extractplanes=y -pix_fmt rgb48le -an output%04d.tiff
Results in a 3 channel 48bit TIFF which has exact same R, G, and B channels. If I do a diff pairwise for the matrices in the 3 channels, I get zeros.
Question: Is there any color space conversion going on here?
Case 2
Code:ffmpeg -i input.avi -vf extractplanes=y pix_fmt gray16be -an output_gray16be_%04d.tiff
Resulting tiff image is single channel, almost 1/3 the size as in previous case.
I imagine here, the luma channel only is extracted and no color space conversion is happening?
Also realized in both the above case that the signal range was [4096, 60032] for one of my sequences. If I divide this by 2^6, I get 64 for the lowest gray level. Seems very reasonable for the standard black level for a 10-bit video. I therefore concluded that the data bits were packed in the 10 MSBs of the 16-bit word. Correct me if this is incorrect.
I think Case 2 fits my requirement quite well.
Questions:
- I am trying to understand if I am doing any color space conversions, when I am going gray16le from v210 avi file?
- Is this native luma channel extraction I was hoping to do?
- Am I missing or assuming something I should be mindful of?
-
So matlab can read the gray16le tiff ?
I don't know what is ACTUALLY going on "behind the scenes" in ffmpeg, there might be colorspace conversions , rounding losses going on with any of these. You would have to look at the code or ask someone that knows the actual code
In 10bit YUV, 64-940 is "legal range" out of 0-1023 (analgous to how 16-235 for 0-255 in 8bit)
I don't know how valid this is, because there is no native 10bit Y' greyscale format in ffmpeg. If you did some tests on known test graphics/images (known values, in Y' for v210) , you might be able to figure out how valid these test are -
Yeah, it appears to be happy to read the gray16le tiff. I am using the OS X version of MATLAB 2014a. I can also display the image within MATLAB without any data parsing or bit rearrangement.
The images are rendered also in Finder and in Preview app.
Out of curiosity I did the following:
Compare the channels from the RGB48 (they are same w.r.t. each other) to the single channel tiff file in gray16le. The difference comes to zero, which means they are both producing equivalent results and the color space conversions (whatever they are or not) are the same. Of course we are limited by 1 count of precision in integer numbers. Perhaps the luma channel is replicated since we instructed ffmpeg to extract only the y channel and it populates it to all 3 R,G,B channels
I am not sure if I understand myself the differences in the two approaches (case 1 and case 2 as you had suggested) to 'guess' if there is some color space conversion or not.
Yes, thank you for confirming that. I was also alluding to the 64-940 valid range for the 10-bits similar to the 16-235 luma (16-240 for chroma) for 8-bit.
I really want to thank you for all the input and walking me through! -
I did a quick test round tripping it. MD5's don't compare (or any method of comparison such as amplified differences), so there are losses incurred somewhere. It might be from dithering during bit depth conversions
Neither original, method 1 , or method 2, match with each other
EDIT: it looks like a levels shift
In theory, you should just be able to copy over the values if you had a 10bit RGB image format. e.g 0-1023 in Y' is 0-1023 in 10bit R, G, B. That is "grey" when R=G=B. You should be able to copy it back over as well without any loss. eg. a Y' value =100, would be R=B=G=100 . -
Yes ffmpeg is scaling the video . I used a "full" range v210 test video 0-1023 gradient , and result in the 1st trip to TIFF was scaled to "standard" range. I guess if your source v210 is all standard range it might be ok. Or you might be able to adjust a switch in -vf scale or swscale .
-
Level shift as an offset or scaling?
I am sorry, I don't understand the round trip that you are referring to. Is that (v210 <--> RGB48le)? Or you mean using the test sequence with all gray levels (0-1023) (which do end up in 64-940 range post conversion), through method 1 and method 2 give different results?
Thank you! -
Levels clamping . Y' 0-1023 black to white source video is now expressed as Y' 64-940 black to white
I tested v210 => TIFF => v210 with both methods. Ideally the 2nd v210 should be bit identical to 1st v210 if it were a lossless round trip
Method 1 didn't equal Method 2 either , ie each gave different results. (both were clamped, but still not equal to each other)
EDIT:
Yes, it's definitely clamping the levels on the roundtrip , even with "standard" range input v210 , comparing greyscale only on both, ignoring chroma completely. It's visible to naked eye, just do the test with a "normal" video and view the Y' channel
I couldn't find any switches that work properly to set in range/ out range (vf scale is supposed to have in_range, out_range switches, doesn't work properly here maybe because it's not 8bit YUV)Last edited by poisondeathray; 25th Aug 2014 at 22:04.
-
Thank you for checking all of that for me, it is very valuable information.
I am attempting to better understand the roundtrip in method 1 and method 2. From the extracted luma channel, are you only reconstructing the grayscale v210 video? Video 1 and Video 2 (how do we go to v210 with only grayscale - luma channel tiffs?) do not match although they are clamped at the same video levels. I am correct? I am thinking if there are some colorspace conversions which are different in both cases. Thank you again! -
It's clamped during the first step, with either method to TIFF. The "roundtrip" is going from TIFF back to v210 . The CbCr channels are a single value as expected, not absent. v210 by defintion will still have CbCr because it's not grey10
So I thought doing the round trip back to v210 (from the greyscale TIFF) might unclamp or reverse it, and hopefully give you the original Y' channel but that' s not the case.
Video 1 and Video 2 (how do we go to v210 with only grayscale - luma channel tiffs?) do not match although they are clamped at the same video levels. I am correct? I am thinking if there are some colorspace conversions which are different in both cases. Thank you again!
The clamp makes those two methods useless for analysis or any manipulations. I'm not sure what's causing it, maybe a bug in the extractplanes, because a "normal" conversion to 16bit RGB TIFF doesn't exhibit those issues