OK, after reading very carefully the specifications of the Mpeg1-Audio Standard (ISO11172 - Part 3) and the dolby surround standards, I wish to give some comments on the questions which came up in this long discussion.
If someone is interested in the specs of Mpeg1-audio (ISO11172-3: Coding of moving pictures and associated audio for digital storage media at up to about 1.5 Mbit/s - Part 3 Audio), you can download the pdf here: http://www.hnpat.com/way-board/way-board.php?db=a5&j=dn&number=85 and for the dolby surround specs, just hava a look at http://www.dolby.com/tech/.
1) What do the specs tell us about the difference between dual channel and stereo ?
To make it short: Nothing! They introduce in the header of every mpeg-frame a field called "mode" which can be:
where for Layer I/II only "intensity_stereo" s possible (you can find it in ISO11172-3, 2.4.2 Semantics for the Audio Bitstream Syntax, 126.96.36.199 Header). In the specs for Mpeg2 (ISO13818-3: Information Technology - Generic Coding of Moving Pictures and Associated Audio: Audio), where they introduce multi channel audio, they make at least a statement in the chapeter "2.1 Definition":'00' stereo
'01' joint_stereo (intensity_stereo and/or ms_stereo)
But this is only a statement to clearify the names. So lets have a look into the Spec of Mpeg1 again, there we find in the "3-ANNEX G (informative) - JOINT STEREO CODING" the following statement:2.1.56 dual channel mode [audio]: A mode, where two audio channels with independent programmeme contents (e.g. bilingual) are encoded within one bit stream. The coding process is the same as for the stereo mode.
2.1.151 stereo mode [audio]: Mode, where two audio channels which form a stereo pair (left and right) are encoded within one bit stream. The coding process is the same as for the dual channel mode.
So they give a general flow chart of a possible encoder. I want to make clear, the specs from Mpeg always defined the syntax and semantics of the bitstream and how an decoder can/should decode the signal. They never make a statement about how an encoder must look like, this is free to the programmer. They only give sample implementations. But if we look at the flow chart, it gets clear, that in the beginning the left and right channels are separatly handled (except when joint stereo is used). At the point where the encoder has to satisfying the bitrate condition and the quantization is done, we can have an interaction, but we don't have to. Here the earlier statement of the Lame encoder gets clear: Lame uses half bitrate (CBR) for the channels in "dual channel" mode but VBR if he is in "stereo" mode. This is not defined in the standard, but can be done. The progammer of the encoder is free to do so...3-G.1. Intensity Stereo Coding Layer I, II
The basic idea for intensity stereo coding is that for some subbands, instead of transmitting separate left and right subband samples only the sum-signal is transmitted, but with scalefactors for both the left and right channels, thus preserving the stereophonic image.
Flow diagrams of a stereo encoder and decoder, including intensity stereo mode, are shown in Figure 3-G.1 "GENERAL STEREO ENCODER FLOW-CHART" and Figure 3-G.2 "GENERAL STEREO DECODER FLOW-CHART". First, an estimation is made of the required bitrate for both left and right channel. If the required bitrate exceeds the available bitrate, the required bitrate can be decreased by setting a number of subbands to intensity stereo mode. Depending on the bitrate needed, subbands
16 to 31,
12 to 31,
8 to 31, or
4 to 31
can be set to intensity stereo mode. For the quantization of such combined subbands, the higher of the bit allocations for left and right channel is used.
FIGURE 3-G.1 General Stereo Encoder Flow Chart ...
I hope this clearifies the first problem, why we have different results on different encoders. Again I want to make clear, that I don't want to say, which method is better (CBR/VBR), this depends on the source and the encoder. The Physoacustic Model, as I understand, is working before the "adjustment to fixed bitrate". (Please everyone interested should have a look at this figure in the pdf document (ISO11172-3).
2) Does the library "mp2enc" distinguishes between "dual channel" and "stereo mode" ?
We know, lame does this, but what about mp2enc? So what have I done... looking through the source code of mp2enc (since it is under the GNU Lesser General Public Licence this was possible), after 1 1/2 hours I couldn't find a point, where the encoder really makes a difference. To be sure, I mailed DSPGuru, since he is working on this encoder (made the floatingpoint processing extensions) and this was his reply:
So this is an interesting comment and its strange, that kwag, you can hear definitively a difference... maybe it is true, that the decoder realy looks at the two bits... sorry, that I couldn't give an answer on that.as for mp2enc's dual-channel vs. stereo, i believe the only difference can be found in the mp2 headers, indicating the channels mode. who knows, maybe there are mpega decoders who gives different quality after reading those two bits...
3) What is the difference between Dolby Pro Logic / Pro Logic II etc. ?
OK, Again, this can be answered by some specification, which can be found on the net at: http://www.dolby.com/tech/. I invite everyone to read the document: "Dolby Surround Pro Logic II Decoder - Principles of Operation". Already the title says something about Pro Logic: It's a definition of the decoder and not the encoder! Ok this is not 100% correct, to be more precise, we have a small difference between Pro Logic and Pro Logic II (as far as I understood):
Pro Logic: Four channels are mixed Left, Right, Center and Surround. This is done, as I showed in my first posting, but before making the phase shift of 90 degrees, one bandpass filters the signal with 100Hz-7000Hz and adds a "half" Dolby-B noise reduction to it. The decoder works not passive (as I did it in my calculations) but active using active VCA's (amplifiers) To better identify the surround channel. This system is only a feed-forward design (see documents).
Pro Logic II: Five channels can be used: Left, Right, Center, LeftSurround and Right-Surround. For the surround channels, we have no bandpass filters. The decoder is again active, but now designed in a "new sense of spatiality, directionality, and soundfield stability". That means, we have now a feedback logic design, which tryes to find the correct directions of the sound. For more detail please read the specs from Dolby. Additional, the decoder generates a kind of LFE (Low Frequency Effect) signal.
I hope this clearifies the Pro Logic stuff...
Still there is question, which I adressed to DSPGuru and the auther of HeadAC3he, what they mean with "surround" and "surround 2" downmix. The author of HeadAC3he told me, that he uses for "surround 2" a different Downmix Matrix than for "surround". After some experiments he reached a point, where the Pro Logic II decoder could identify the left and right channel. Sorry, dolby gives no comments on that... but the A/52A docs (see below).
4) Does the downmix methods 0/180 degree are correct (as used in BeSweet and HeadAC3he) or should one use a true 90 degree phase shift ?
Well, this is a bit tricky, the specs of dolby say, one has to make a phase shift of +-90 degree when generating a dolby surround signal from 4 mono channels (L,R,C,S)! If you ignore the phase shift of 90 degree, then you will get the artifact, that the signal moving from rear to front (center) will fade actually from rear to left to front...
The author of HeadAC3he told me on this comment, that most of the AC3 surround signals are already phase shift and thats why a 0/180 degree downmix is ok. When looking into the Specs of the ATSC (Advanced Television Systems Committee - atsc.org), more precise A/52A they also use the 0/180 method, but they never mention, that the surround channel in the AC3 stream is already 90 degree phase shifted...
Actually this would be very interesting if the people generating a true 5.1 sound would already think about downmix to dolby surround using 0/180 degree and therefore already phase shift the surround signals by 90 degree. If this is true, we would have one problem less... but since I couldn't find any references and no comments from dolby (see again the website) that they recommend to phase shift the surround signals when generating true 5.1 surroun sound for Dolby Digital, I'm still wondering.
Probably I should say, that I'm a theoretical physicist and I only believe if I have a proof or a reference to a standard or paper. This is what I have learn in my study and the PhD time...
BTW I have a Pro Logic Receiver, but I'm still missing a pair of loudspeakers to enjoy the sound, so I'm really happy, that people are realy testing the "theoretical" discussions and give their comments. But please always give the details to the tests .
P.S. You can find all links to the documents here:
+ Reply to Thread
Results 31 to 36 of 36
I guess it all boils down to non-defined VBR encoding on stereo, which is not defined on MPEG audio encoding standards, and that could be messing up the surround signals by introducing variations on the surround signals which should be constant. It's the only explanation I have. If everything else is identical, except for those two bits to define the channel mode, then I can't think of anything else. What would be REALLY good would be to use an oscilloscope or some sort of frequency analyzer with logging, and monitor surround signal amplitude on one channel on a heavily surround sound encoded audio track! That way we could actually see if there are amplitude changes, etc., on the surround signals. Anyway, thanks for your research, it's very informative and clears many doubts
It just hit me... I don't know why I forgot about it, but it makes sence that the Stereo/Dual-Channel flag is just instructions to the decoder...
I mean, I just recalled now that for some MPEG-2 plugin I had, I think it was the ligos one, when ever I played something that is Dual-channel, it would automatically only play the Left channel, where the primary language would be, to prevent two different things play at the same time (which would happen since dual-channel should be used for bilingual audio).
So actually that's the only difference!
OK. Some things.
First of all, how badly does MP3 destroy prologic info and/or generate false positives?
(i've had it that downloaded files that were supposed to be 2.0 stereo have generated proglogic surround - wiiierd, esp as they were made with joint stereo... and mono ones that go to the centre speaker and no others!)
Can prologic information be completely stripped, even? (on purpose?)
Right now I rarely make VCDs of above 160k audio, and all as joint stereo, but if there's a chance it would make some kind of difference I may change to splurge an extra 32k on the sound rather than the picture!
Second... how did you get TMPGEnc audio encoder to work so well? (20khz-ish frequency response at 192k? how?!)
(btw - 112k is for very special cases, too; only use it after doing a test with your source material first to see if it looks ok... it's a very borderline bitrate with mp3, even more so with mp2. shouldn't use it as a matter of course!)
Third... what encoders are there that work with any Psychoacoustic model other than the 'default' one without going really bad? So far any attempt to change from AT&T, eg to MUSICAM, just yields a similar 6khz-max-response telephone quality signal similar to dropping bitrate to 96k or less.
Fourth.. would 'pre emphasis' do anything?
Lastly - any 'real world' quantitative tests done? White noise is all well and good but what of actual soundfiles being run through to guage the differences?-= She sez there's ants in the carpet, dirty little monsters! =-
Back after a long time away, mainly because I now need to start making up vidcapped DVDRs for work and I haven't a clue where to start any more!
I was discussing these topics with Dark Avenger (author of HeadAC3he) and he wanted to give some comments on it:
I guess dual channel seems to work better for DPL2 than stereo may be
The algortihm in the encoder doesn't know anything about DS(2). SO
basically it doesn't care much about the rear channel(s). Thus in stereo
mode allocation of bits is probably not judged on the rear channel(s)
but only on Info regardiong L or R, as elements of SL and SR are more or
less evenly distributed in Lt and Rt. Then if the encoder decides to
take away too much bits form one channel, it may be harmful for a full
frequency reconstruction of the rears (aka DPL2), whereas it may not be
much of a problem for DPL. This is due to the same argument in a52
regarding mid-siode stereo being better for DS encoding:
When decoding (in terms of mp2 to pcm) Lt and Rt you have an error on
each channel, thus the reconstructed Surround will have the sum of both.
But this is not even the problem we are talking about. The problem is
that that errors may be so different, that the commaon but phase shifted
information in Lt/Rt will become so different that the DPL2 decoder
won't properly recognize it as a common signal "the" rear channel. So by
using mid/side stereo we will have better center-rear spearation but
worse L-R separation, which again is a problem in DPL2 case (but in the
case of DPL, it should make things better.) DPL2 not only needs the 180°
phase differenc of the common signal, but also the difference of L and
R, as SL and SR are steered that way. So in the end both methods offer
advantages and disadvantages for DPL2.
So what would be probable solutions? three came to my mind:
1) dumb and easy: fix (make it constant) the bit allocation or in other
words use dual channel. By this bits are evenly distributed to all
channels (components). Disadvantage is you need rather high overall
bitrate to achive decent sound. (128kbps mp2 is even for normal stereo a
2) Fix the mp2 encoder, where we have two methods (perhaps best combined):
2a) Introduce a minimum bitrate for a channel in stereo mode. To find
out this value do a dual channel encoding and try to minimize bitrate at
which rear channel in DPL2 still sounds OK. This should then be used as
a lower limit.
2b) "Fix" the joint stereo of mp2, ie. throw away intensity stereo.
Though for DPL it may not be sooo harmful (according to above tests) I
guess it will be desastrous in the case of DPL2, so joint stereo should
only switch between mid/side and normal stereo mode but with above
I hope my thoughts make some sense. I am no pro in this things...
Still I would like to give a comment, that mp2 has only intensity stereo as encoding and doesn't know anything about mid/side-stereo. This is a feature of mp3... (see postings before)
Originally Posted by ajungDisadvantage is you need rather high overall
bitrate to achive decent sound. (128kbps mp2 is even for normal stereo a
I hope my thoughts make some sense.