VideoHelp Forum




+ Reply to Thread
Results 1 to 12 of 12
  1. Member vhelp's Avatar
    Join Date
    Mar 2001
    Location
    New York
    Search Comp PM
    RGB to YUV - Revisited.

    Please help me to review my understanding of the following
    steps of conversion:

    * step 1 - RGB 4 pixel layout - MAIN
    * step 2 - RGB to YUV
    * step 3 - Sub-Sampling to 420
    * step 4 - stream format of file structure (theory)
    * step 5 - block layout (theory, mpeg)

    Any advice would greatly be appreciated

    -vhelp 3425



    STEP 1
    RGB - using 4 pixels to represent image:
    Code:
      1    2    3    4
    [RGB][RGB][RGB][RGB]

    STEP 2
    YUV - after converted from RGB, now uses 12 pixels
    to represent image from above 4 RGB pixels:
    Code:
        1         2         3         4
    --------  --------  --------  --------
    [Y][U][V] [Y][U][V] [Y][U][V] [Y][U][V]

    STEP 3
    After converted to YUV, a sub-sampling of 420 format is
    applied, and the new layout reconstruction becomes a total
    of 6 pixels, with four 'Y' pixles, and two chroma pixels 'Cb' and 'Cr'
    would represent the original 4 RGB pixel layout from STEP 1:
    Code:
      1   2   3   4
    [ Y][ Y][ Y][ Y]
      5   6
    [Cb][Cr]

    STEP 4
    Storage to a stream file could theorectically look:
    Code:
      1   2   3   4   5   6   7   8   9  10  11  12
    [ Y][ Y][ Y][ Y][Cb][Cr][ Y][ Y][ Y][ Y][Cb][Cr] ...
    
    or,
    
      1   2   3   4   5   6   7   8   9  10  11  12
    [ Y][ Y][ Y][ Y][ u][ v][ Y][ Y][ Y][ Y][ u][ v] ...

    STEP 5
    Using the original 4 pixel RGB above in STEP 1,
    and then converted to YUV in STEP 2, and then sub-sampled
    to 420 in STEP 3, could be theoretically viewed in block
    form as:
    Code:
    [Y00]      [Y01]
       [Cb4][Cr5]   
    [Y02]      [Y03]
    Quote Quote  
  2. Member SaSi's Avatar
    Join Date
    Jan 2003
    Location
    Hellas
    Search Comp PM
    I think your step 2 is going to the wrong direction.

    In your example, the original 4 pixels that are defined by three bytes each, when converted to YUV will still be 4 pixels, and represented by 3 bytes. I think the best possible explanation comes from a relevant book, so I quote:

    The human visual system (HVS) is less sensitive to colour than to luminance (brightness). In the RGB colour space the three colours are equally important and so are usually all stored at the same resolution but it is possible to represent a colour image more efficiently by separating the luminance from the colour information and representing luma with a higher resolution than colour.
    The YCbCr colour space and its variations (sometimes referred to as YUV) is a popular way of efficiently representing colour images. Y is the luminance (luma) component and can be calculated as a weighted average of R, G and B:
    Y = kr R + kgG + kbB (2.1)
    where k are weighting factors.
    The colour information can be represented as colour difference (chrominance or chroma) components, where each chrominance component is the difference between R, G or B and the luminance Y :
    Cb = B − Y
    Cr = R − Y (2.2)
    Cg = G − Y
    The complete description of a colour image is given by Y (the luminance component) and three colour differences Cb, Cr and Cg that represent the difference between the colour intensity and the mean luminance of each image sample. Figure 2.10 shows the chroma components (red, green and blue) corresponding to the RGB components of Figure 2.9. Here,
    mid-grey is zero difference, light grey is a positive difference and dark grey is a negative difference. The chroma components only have significant values where there is a large difference between the colour component and the luma image (Figure 2.1). Note the strong
    blue and red difference components.

    So far, this representation has little obvious merit since we now have four components instead of the three in RGB. However,Cb + Cr + Cg is a constant and so only two of the three chroma components need to be stored or transmitted since the third component can always be
    calculated from the other two. In the YCbCr colour space, only the luma (Y ) and blue and red chroma (Cb, Cr) are transmitted. YCbCr has an important advantage over RGB, that is the Cr and Cb components may be represented with a lower resolution that Y because the HVS is less sensitive to colour than luminance. This reduces the amount of data required to represent the chrominance components without having an obvious effect on visual quality.
    To the casual observer, there is no obvious difference between an RGB image and a YCbCr image with reduced chrominance resolution. Representing chroma with a lower resolution than luma in this way is a simple but effective form of image compression.
    An RGB image may be converted to YCbCr after capture in order to reduce storage and/or transmission requirements. Before displaying the image, it is usually necessary to convert back to RGB. The equations for converting an RGB image to and from YCbCr colour space and vice versa are given in Equation 2.3 and Equation 2.41. Note that there is no need
    to specify a separate factor kg (because kb + kr + kg = 1) and that G can be extracted from the YCbCr representation by subtracting Cr and Cb from Y , demonstrating that it is not necessary to store or transmit a Cg component.
    The more I learn, the more I come to realize how little it is I know.
    Quote Quote  
  3. Member SaSi's Avatar
    Join Date
    Jan 2003
    Location
    Hellas
    Search Comp PM
    I forgot to mention the source of the text quoted. It is a book by Iain Richardson titled:
    H.264 and MPEG4 video compression.

    An excellent book, highly recommended.
    The more I learn, the more I come to realize how little it is I know.
    Quote Quote  
  4. Member edDV's Avatar
    Join Date
    Mar 2004
    Location
    Northern California, USA
    Search Comp PM
    @vhelp
    Your "4 pixel layout" needs further definition. For CCIR-601 (D1) sampling, full bandwidth 4:4:4 RGB is represented by a 720x480 raster in 3 layers R, G an B. Conversion to YUV gets you 3 720x480 layers in YUV or in the digital domain Y, Cb, Cr.

    SaSi has the math right.

    To take it further, you need to think of YUV as a potentially spatially compressed version of RGB where the "spatial compression" takes advantage of the psychological perception of the human eye and brain. As such, compressed U and V (in terms of XY spacial sampling) can be done to 4x or more in both X and Y (for a progressive frame) without the human eye noticing the difference. Detail is perceived in Y(monochrome) so Y must be kept full bandwidth or the human will detect a difference. (Ref. rods and cones perception in the eye).

    If this technique is confined to display, nobody is the wiser and humans never notice that the spatial compression was done.

    However ....

    Machines (and math algorithms) don't have the same limited perception. Therefore image processing of bandwidth reduced UV has its limitations and must be factored into the algorithm (effects, filter or data compression) being comtemplated with subsampled U and V.

    Most all video formats are in the form of YUV for good reason. U and V have been bandwidth limited to varying degrees.

    I'll pause now while you describe the context of the question. What are you trying to do?
    Quote Quote  
  5. Member edDV's Avatar
    Join Date
    Mar 2004
    Location
    Northern California, USA
    Search Comp PM
    Originally Posted by vhelp
    STEP 1
    RGB - using 4 pixels to represent image:
    Code:
      1    2    3    4
    [RGB][RGB][RGB][RGB]

    STEP 2
    YUV - after converted from RGB, now uses 12 pixels
    to represent image from above 4 RGB pixels:
    Code:
        1         2         3         4
    --------  --------  --------  --------
    [Y][U][V] [Y][U][V] [Y][U][V] [Y][U][V]
    You are describing 4 horizontal pixels in 3 planes (components)

    1 2 3 4
    --------
    R R R R
    G G G G
    B B B B

    converted to YUV (same number of samples)

    1 2 3 4
    --------
    Y Y Y Y
    U U U U
    V V V V

    4:2:2 sampling is like this (2/3 the number of samples for the same 4 pixels)

    1 2 3 4
    --------
    Y Y Y Y
    U_ U_
    V_ V_

    4:1:1 sampling is like this (half the number of samples for the same 4 pixels)

    1 2 3 4
    --------
    Y Y Y Y
    U
    V

    In the cases above, U and V samples are spatially coincident with the first (and third for 4:2:2) Y sample.

    4:2:0 adds another dimension
    U and V are interpolated to fall in between a quadrant of 16 Y pixels (4x4)

    See this representation:
    http://www.answers.com/topic/ycbcr-sampling

    To further complicate matters, 4:2:0 comes in two flavors. One lines up U and V with existing Y pixels (called co-sited), but PAL DV and PALand NTSC DVD interpolate U and V positions to fall between Y samples in X and Y. This spatial interpolation results in very poor multigeneration performance for the PAL DV format compared to 4:1:1 NTSC DV format.
    Quote Quote  
  6. Member vhelp's Avatar
    Join Date
    Mar 2001
    Location
    New York
    Search Comp PM
    I was busy (after my time limit exptired and was disconnected
    from the web) with my new toy. A USB cam. (USB 2.0 compatible)
    and found a dramatic difference between my digital camera, and
    my latest usb cam. I can capture 640x480 at 29.970 fps. But,
    w/ my digital, no can do. jerks; spits; and dropped frames all
    over the place.
    .
    So, why am I OT 'ing you may ask ? Well, because I've been
    busy having fun, and I wanted to share my reason for delayed
    response
    .
    Also, because it has some parts to this RGB / YUV business, and
    I was just getting an early taste of it w/ the two usb cams.

    and now back to our TOPIC.. Anyways..

    Thanks SaSi.. edDV..

    But, wait. I was in the middle of composing my response back
    to SaSi, when my time limint online ran out (10 hrs ) since
    this morning. I'm all tuckered out - pfew. Top that off with
    trying to get my last words in response above, and what-not.., pfew.
    I spend most of my time researching. B believe me. But, nothing
    is easy in the things of video.. is it. hehe.

    Anyways..

    @ edDV

    Did you sneak a peak at my response, while I was trying to
    respond back ?? Because what you said, I already knew. I just
    did not update this page, (lost in time, due to lack of response)
    and I did not get to it as of yet)

    Ok. I have this Excel Spreadsheet that I made a few months ago, I use
    it to keep track of my understanding of these things in RGB and YUV.
    And at every chance I get, I update it with new information.
    .
    What you laid out was basically what was on my sheet already.
    I still do not get it all. There is something missing in my
    illustration of how things are "put together". My problem is putting
    things into words that you all can understand.
    .
    Perhaps I should revise my Steps above, and then continue. Thanks again
    for both your shared input. I appreciate it - I do

    I will be back shortly w/ more info, and responses.

    -vhelp 3460
    Quote Quote  
  7. @vhelp,

    The above is right (except that in most editing programs RGB is interleaved instead of planar; just like YUV 4:2:2). You might find the following page also worthwhile to read:

    http://www.avisynth.org/Sampling
    Quote Quote  
  8. Member vhelp's Avatar
    Join Date
    Mar 2001
    Location
    New York
    Search Comp PM
    @ Wilbert

    Thank you for your comment/correction

    I had about 3 different (plus) comback responses, but they all, in the
    end, would not add up because each one would lead up to several more
    questions, and things were just getting out of hand (I'm too impatient
    I guess) For now, let me start up fresh. Forgive me if I come
    off too redundant in my quest for knowledge..


    GOAL:
    * to gain knowledge and understanding of the machanics/layout of
    the pixel placements when writing out to a RAW bitmap file.
    Example, once RGB is converted to YUV (plus a sampling or sub-sampling,
    if need-be) to write this out in RAW form, to a file. Later,
    restore through a re-conversion back to RGB, and display inside
    a bitmap on screen. (for the purpose of this topic, (LAB) a tool
    could come in handy)


    History:
    Part of this discussion has to do with some work that I am envolved
    with, on a DCT application, which deals with YUV. Also, on another
    project, (a screen capture) where everything seems to be working out
    great, speed 'wise, but that I'm stuck with the saving out to a stream
    file because I want to save as RAW YUV data, but SO NOT want to use the
    API calls for VFW, which I am well aware of. I've had some ups and downs
    during this ongoing DCT and Screen Capture project. Then there is yet,
    one more project that i'm slowly working on, ..an MPEG encoder.., if it
    ever takes off.


    ToDo:
    * rewrite this topic into a LAB, so that others can make use of it. +/-
    * provide a "learning tool" to aid in the understanding of the process
    ..(RGB > YUV conversions; and the "sampling" that takes place)
    * demo at least a sectional part of my DCT project (may bring things
    ..into perspective here) +/-


    I will try and conjure up a tool to aid in the research/educational
    of this topic/project. I was meaning to add it into this discission
    when things got more clearer to me. And, I as learn more, the ongoing
    building of such a tool will reflect this knowledge/growth.


    A personal note..

    This topic/project is driving me crazy. I have spent months on this,
    and still, no progress. I can not take a RGB bitmap; convert it to
    YUV color space; incorporate a sampling; and then write this YUV format
    out to a RAW file. Then later, to open it and bring it back into
    an RGB bitmap for viewing on screen.
    .
    What illudes me (after the RGB > YUV conversion, which I understand)
    is the "sampling" and "pixel arrangement" to format into a RAW file.
    No header is required for the file, in this learning excercise.
    But, I suspect that there is a standard layout for such YUV files.
    I just don't know what they are at this time. So, I'm here, trying
    to figure them out, from the bottom up.


    So, what I would like to do for this learning excericse is the
    following:

    * open an rgb bitmap (say a small grid of 8x8 pixels - my DCT project)
    * convert this 8x8 rgb grid bitmap to yuv
    * apply a sampling to this yuv (ie, 422 / 411 / 420 sampling)
    * write it out to a RAW file

    then..

    * read in this RAW YUV file (who should still be in 8x8 yuv grid)
    * convert it to RGB 8x8 grid
    * display this rgb 8x8 grid


    Gosh. I hope I got everything down here. Anyways..

    In the mean time, I'll post whatever notes and things I can (I do have
    some on hold) or find interesting to this topic/project. But for now,
    I am off to start work on the learning tool

    -vhelp 3461
    Quote Quote  
  9. I'm also very interested in your project.

    But I'm not a programmer, so i don't know how to open (avs ImageSource contains the code to open RGB bmp files) and write files.

    The sampling is easy. Start with RGB 8x8 pixels, and convert to full YUV 8x8 pixels.

    Consider 4:4:4 YUV -> 4:2:2 YUV (YUY2) for example (section 2.2. of the link i gave you):

    line 1: Y1C1 Y2C2 Y3C3 Y4C4
    line 2: Y1C1 Y2C2 Y3C3 Y4C4
    etc ...

    Sampling:

    line 1: Y1C1x Y2 Y3C3x Y4
    line 2: Y1C1x Y2 Y3C3x Y4

    with

    C1x = (C1+C1+C2)/3 // edge
    C3x = (C2+C3+C3+C4)/4
    C5x = (C4+C5+C5+C6)/4

    4:4:4 YUV -> 4:2:0 YUV (YV12) is more difficult. Convert to YUY2 first, and then to YV12. The YUY2 -> 'YV12 progressive conversion' is given in the same section of the link.

    The layout of raw YUV depends on which YUV flavour you want to write.

    For YUY2 it's simply (Y1* first line, Y2* second line, etc ...):

    Y11U11 Y12V13 Y13U13 Y14V14 ... Y21U21 Y22V22 Y23U23 Y24V24 ...

    The second frame comes directly after it.

    You need to write the YUV samples in the order as given above. So when writing 8 pixels per line you get 16 bytes per line (8 for the luma values and 8 for the chroma).

    YV12 is even more simple (Y1* first line, Y2* second line, etc ...): It's planar, so first Y plane, followed by V plane, followed by U plane.

    Y11 Y12 Y13 Y14 ... Y21 Y22 Y23 Y24 ... Y31 Y32 Y33 Y34 ... Y41 Y42 Y43 Y44 ...

    V11 V13 ... V31 V33 ...

    U11 U13 ... U31 U33 ...

    The second frame comes directly after it.

    So when writing the first frame you get: 8*8 = 64 luma samples, followed by 4*4=16 V samples, followed by 4*4=16 U samples.

    ps, i didn't say it explicitly but you have multiple foms of 4:2:2 and 4:2:0.

    ps2, the position of the chroma samples in the YUY2 and YV12 (and all others) layouts gives you the weight you should apply when sampling.
    Quote Quote  
  10. Member vhelp's Avatar
    Join Date
    Mar 2001
    Location
    New York
    Search Comp PM
    Hi Wilbert

    Than
    ks again for your follow-up. Appreciated
    And it pleases me that you have interest/curiosities in my project(s)

    However, your great explanation has given me a headacke, and I
    I must step back and take a break and regroup.

    .
    .

    EDIT:
    After rereading my last post, I had decided that I did not like the
    way it all came out, and felt that it was too much, and I resorted
    to editing it out of this latest response here.

    In addition to what I have already from my internet searches, and those
    notes / responses I received here, I will take the time to study them all.
    Perhaps in 2 or 3 weeks I'll have found the answers to all these puzzle
    pieces.

    Thank you all, for your patiance with me and helping me out as much
    as you all could afford in your spare time with my lack of understanding
    this YUV Sampling and Layout/Pixel Placement for RAW storage.

    -vhelp 3464
    Quote Quote  
  11. vhelp, you might want to start with the "easiest" part. Just open RGB24 bitmaps, do R<->G (or something else which is trivial) and write it back to

    1) raw RGB24.

    2) a bitmap.

    If you can do the above, the YUV conversion/sampling should be pretty trivial.
    Quote Quote  
  12. Member vhelp's Avatar
    Join Date
    Mar 2001
    Location
    New York
    Search Comp PM
    @ Wilbert,

    I'm finally back for more, hehe

    Question.. regarding the fragment below, after I perfeorm the
    RGB -> YUV conversion on an image.


    ( from the avisythn sampling page.. sec 2.2 )

    Code:
    Recall the layout of a 4:4:4 encoded image
    
    frame 	line 	
    Y1C1 Y2C2 Y3C3 Y4C4 	line 1 	
    Y1C1 Y2C2 Y3C3 Y4C4 	line 2 	
    Y1C1 Y2C2 Y3C3 Y4C4 	line 3 	
    Y1C1 Y2C2 Y3C3 Y4C4 	line 4
    and then the following

    Code:
    In AviSynth, the default mode is using a 1-2-1 kernel to interpolate
    chroma, that is
    
    C1x = (C1+C1+C2)/4 ???
    C3x = (C2+C3+C3+C4)/4
    C5x = (C4+C5+C5+C6)/4
    You can image my next question. What happend to my U and V ??

    From sec 2.1 The color formats: RGB, YUY2 and YV12

    .
    .

    The term 4:4:4 denotes that for every four samples of the
    luminance (Y), there are four samples each of U and V. Thus
    each pixel has a luminance value (Y), a U value (blue difference
    sample or Cb) and a V value (red difference sample or Cr).
    Note, "C" is just a chroma sample (UV-sample).


    .
    .

    They seem to have been replaced with Cxx, as in C1 C2 C3 and C4
    (I'm assuming per line of 4 'Y' samples, where every luma sample
    is 'YYYY' are considered a line, as you seem to be refereing to.

    Am I Correct ??

    So, after looking this over, and over and over ..
    I have concluded that the Cxx is suppose to be refered to as
    the U and V sampled together as one Chroma value, hence..

    if the following (simplified) luma sample per line were:

    Code:
    Y            Y            Y            Y
    U=100 V=255  U=110 V=250  U=120 V=240  U=130 V=230
    C1 = (U+V)/2 = n -- (or 177.5)
    C2 = (U+V)/2 = n -- (or 180 )
    C3 = (U+V)/2 = n -- (or 180 )
    C4 = (U+V)/2 = n -- (or 180 )

    per luma line sample ??

    So, the UV-sample seems to indicate to me, that these two
    (now refered [after rgb -> yuv conversion] to as chroma values
    that are merged or averaged by using the above formula that I just
    noted above.

    If this is true, then this is a sampling all on it's own, before
    the other sampling ( ie, 422 or sub-sampling, 420 ) would take
    place. Interesting

    Am I correct ??

    So now, (if correct) the C1 C2 C3 and C4 now become,
    C C C C, or YC YC YC YC, or from:

    YC1 YC2 YC3 YC4 - line 1
    YC1 YC2 YC3 YC4 - line 2
    YC1 YC2 YC3 YC4 - line 3
    YC1 YC2 YC3 YC4 - line 4

    to this:

    YC YC YC YC - line 1
    YC YC YC YC - line 2
    YC YC YC YC - line 3
    YC YC YC YC - line 4

    [ Y(u+v)/2 ] for every pixel per 4 pixel in an RGB -> YUV conversion
    for luma pre-sampling, or the first stage for formating/sampling.

    Now, we continue with the sampling you demonstrated in the above post
    you made earlier with the following:

    The sampling is easy. Start with RGB 8x8 pixels, and convert to full YUV 8x8 pixels.

    Consider 4:4:4 YUV -> 4:2:2 YUV (YUY2) for example (section 2.2. of the link i gave you):

    line 1: Y1C1 Y2C2 Y3C3 Y4C4
    line 2: Y1C1 Y2C2 Y3C3 Y4C4
    etc ...

    Sampling:

    line 1: Y1C1x Y2 Y3C3x Y4
    line 2: Y1C1x Y2 Y3C3x Y4

    with

    C1x = (C1+C1+C2)/3 // edge
    C3x = (C2+C3+C3+C4)/4
    C5x = (C4+C5+C5+C6)/4

    .
    .

    Am I correct ??

    Thanks,
    -vhelp 3479
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!