I'm using BeSweet to convert the sound from .ac3 to .m4a, and the sound is being too loud in comparison to the original sound. The default setting for normalize is 0.97. When I'm setting normalize to 0.20, then the sound is lowered so it fits the original volume, approximately. The question is; is this the correct way to lower audio volume, or is it other values which decide the output volume?
+ Reply to Thread
Results 1 to 12 of 12
Thread: Too loud audio
I've never used BeSweet but you made me curious so I tried to find out what the numbers mean.... without any positive confirmation.... but I have formed a theory.
"Normalise" would probably mean adjusting the volume until the peaks are at 0db. The default of 0.97 is probably a percentage, which means it's aiming for a fraction below 0db (100%) probably to provide a little safety margin. 97% of maximum would probably give you peaks around -0.3db rather than 0db. Nothing you'd hear in terms of a volume difference. Using the normalise function at 97 would most likely increase the volume compared to the original though.
So based on the assumption the 0.20 value you're entering is not 20% of the original volume, but 20% of the maximum volume (ie you'd effectively be increasing the volume to maximum then reducing it to 20%) that'd give you peaks of around -14db (I'm using the calculator here to do the math so if it's wrong I'll blame it... converting dB to percentage does my head in). That leaves the "how long is a piece of string" question. ie how loud was the audio to begin with? It's impossible to know without looking at the peaks to then work out if you've corrected it by the right amount, because the peaks will be different every time.
Anyway, I had a look at the BeSweet GUI, and if that's what you're using, the upshot of my ramblings would be to disable the normalise function for any audio conversion which doesn't involve down-mixing and that should leave the volume as it is. For multi-channel to stereo conversions try enabling the gain adjustment instead and setting it to around -6dB to -7.5dB. Once you've found the value which you're happy gives you the same volume relative to the original, you should be able to use that same reduction every time. Because down-mixing multi-channel audio increases the volume of the stereo channels (relative to the original left and right channels) -6dB would probably be the minimum to apply to prevent clipping. -7.5dB should be completely safe. I'm pretty sure that's what Dolby specify for down-mixing AC3. I just use -6dB myself.
That's my 2 cents worth anyway......
Last edited by hello_hello; 9th Mar 2013 at 07:27.
I'm using the CLI with the following switches:
-core( -input ".ac3" -output ".m4a" ) -azid( -c normal -L -3db ) -ota( -norm 0.20 ) -bsn( -vbr 0.6 )
And I'm downmixing from multichannel to stereo (AAC). BeSweet uses neroAacEnc for this.
If norm 0.20 is 20% of max volum, then the program sure boost the sound up a lot with norm 0.97!
I think if I disable the normalize function that the sound will be a bit lower than the original.
I think that "ota" has a -g value which controls gain. So I will try to set this to max instead of using -norm, to see what happens. Ok "-g max" equals "-norm 0.97". Log file said max gain found: 112.6 dB and overall track gain: 20dB for both. (-g max slighty over with 20.1 and -norm 97 is 19.9)
So what's left is to find which dB level for -g equals to max, and what equals to the source audio file. (ac3).
BeSweet Reference 2012-01-11.pdf
Last edited by brusno; 9th Mar 2013 at 10:37.
BeSweet user, it's largely Chinese to me.
I'm not sure why you need to know what gain is required for max, because it'll be different every time. It depends on the peak levels of each audio stream which will rarely be the same.
I'll admit I'm not sure I understand the gains offered by the log file.
It might also depend on the player you're using to compare the AC3 audio to the encoded version. If the player obeys the dialogue normalisation in the AC3 stream, that'll tend to reduce the volume of the AC3 audio, sometimes by quite a bit. By default, BeSweet ignores/removes the dialogue normalisation.
Last edited by hello_hello; 9th Mar 2013 at 10:46.
In my case it seems to be 8dB lower than the source. But other values, as -L, may have influence.
i.e. I got 20dB for norm:0.97, 12dB for norm:0.40, 6dB for norm:0.20, and 0dB for norm:0.0. That way I can see the decrease in gain for lowering the normalise value.
Last edited by brusno; 9th Mar 2013 at 13:54.
http://img.afreecodec.com/screenshot...or-589298.jpeg So 0dB would be 100%.
The dB scale needs to be relative to something. Without a reference point it has no meaning. There is a reference which is used to measure sound pressure, which is 0dB = 20 micropascals (if that means anything to you). It starts off 0dB being the threshold of human hearing to something like the volume of a rock concert being over 100dB etc.
To be honest if the program is using dB values in terms of volume, I'm not sure how that works, as the sound has no volume until it's amplified. It's probably using a different dB standard to the one I'm referring. http://en.wikipedia.org/wiki/Decibel#Audio_electronics
I'm tired at the moment and fairly rusty on all this, but I'll research it a little more later to refresh my memory.
I really should know all this crap, given I mix live bands for a living, but it's the sort of thing you can kind of know without really knowing it all that well. Or forget as you go.....
Anyway if I have time when I'm awake later, I'll do some research to refresh my memory regarding the dB values offered by the log file.
When I referred to reducing the volume by 6dB I was referring to reducing it relative to what it currently is. Nothing to do with whatever adjustment may be made to achieve the maximum. So you don't need to know what the volume is. If you apply a -6dB gain adjustment you've lowered it's volume by 6dB (half).
So when it comes to down-mixing multi-channel audio, that's where the gain option comes in. When you take the original front left and right channels, and add the centre and rear left and right channels to result in a stereo track, you're increasing the volume of those original left and right channels. After down-mixing, if you reduce the gain by around 6dB, that should take their volume back to roughly what they were originally. By doing it that way, you don't need to think about the maximum volume as such, As long as you know when you reduce the gain, you're reducing it enough to stop the peaks from exceeding 0dB. -6dB should be enough, but -7.5dB should be 100% safe.
It just so happens that the gain option has a setting which does the same thing as the normalise function set to 97% or 100% ie it'll increase the peaks to 0dB, but if you apply a gain of 0dB you won't be turning the volume up or down, it should remain exactly as it is. The gain value is the amount of change relative to the original volume, nothing to do with silence or peak levels. +6dB will increase the existing volume by 6dB. -6dB will reduce it by the same amount.
I mainly use foobar2000 for converting so I've not thought about any of this in quite a while...... and obviously I didn't understand it all that well to begin with.
I mainly use foobar2000 as it's my audio player but it also does batch converting and you can save conversion presets to use via the right click menu. For stereo to stereo or multi-channel to multi-channel conversions I just leave the audio untouched and the output volume is the same as the input volume. Foobar2000 also has a DSP for automatically down-mixing multi-channel audio to stereo. When I've set up presets for downmixing and encoding, I've simply included a gain reduction of 6dB while encoding, which takes the volume pretty close to back where it started. It mightn't be exact, but as it's the same each time, the output audio should have the same volume relative to the original each time. The principle would be the same regardless of the encoder GUI or CLI used.
Because the normalisation function works relative to the maximum volume, it can't also work relative to the original volume too. For example if audio A and audio B have the same average volume, but audio B has peaks which are 6dB louder than audio A, then when they're both raised to maximum level (peaks at 100%/0dB) the average volume for audio A will be 6dB louder than audio B. You can use the normalisation function to set whatever percentage you like but it won't make them the same again. Audio A will always remain louder. So instead, you use the gain function which works relative to the original volume, not the maximum.
And don't forget, AC3 audio has a dialogue normalisation parameter, which tells a hardware player how much to change the volume so dialogue is always at the same level. The idea being, you can go from one movie to another without touching the volume and the dialogue will always be the same. The problem is though, this invariably gets the player to turn the volume down, so if you're using a hardware player to compare volume, the dialogue normalisation might throw a spanner into the works. As I said, I'm sure BeSweet ignores the dialogue normalisation and encodes the audio "as-is" (manual normalisation and down-mixing aside). You can tell it not to ignore it, but generally it's not used when re-encoding audio.
PS and also don't forget AC3 audio can contain compression information to level out the volume so the peaks aren't so load compared to the average volume. Whether that would effectively make the audio sound louder or softer I'm not sure, but if you're comparing the audio using a hardware player, make sure it has any compression disabled in it's settings.
Probably the best way to compare the two is to open the original audio and the encoded version using a wave editor such as audacity and compare the peak levels. That might give you a better idea how similar they are rather than listening to the audio, unless you're sure there's no dialogue normalisation or compression being used.
Hopefully that all makes sense.....
Last edited by hello_hello; 10th Mar 2013 at 19:44.
Thanks for your informative answer.
I've learned a lot.
I opened the audio file in foobar2000 and opened the peak-value window. And I got ca. -5 dB for norm0.97, -15 dB for norm0.40, -20 dB for norm0.20 and -25 dB for norm:disabled. When playing a specific sound in the file. Since the difference between norm:disabled and norm:0.97 is 20 dB, it make sence that I got 20dB gain for norm:0.97 as I wrote earlier. And as far as I know now, 20dB should be 10 times louder sound.
By the way, -L :
Specifies the level used for mixing the LFE channel (bass channel) into left and
right front channels. The value is given in decibels.
So this is lowered by -3dB in the code. Is this -3dB in addition to the -6dB you mention in your last answer?
Last edited by brusno; 10th Mar 2013 at 13:51.
Lo = 1.0 * L + clev * C + slev * Ls
Ro = 1.0 * R + clev * R + slev * Rs
The upshot of the above (if I'm getting it correct) is "clev" is the gain reduction for the centre channel, while "slev" is the gain reduction for the surround channels. The amount of reduction can be written in the AC3 stream for the player to use. For the centre channel, there's three possible values. -3dB, -4.5dB and -6dB. The spec says if there's no value written to the stream, -4.5dB should be used. For surround there's only two possible values, -3dB and -6dB, with the default being -6dB.
I couldn't tell you how likely this information is to be present in an AC3 stream or how often it's used by encoders. I vaguely remember it being mentioned in the BeSweet manual so I'd guess it does use it if it's there. If a decoder just down-mixes without using the information or just uses the default values, I'm fairly sure they generally use the default -6dB value for surround mixing but I suspect it's -3dB for the centre channel. The difference between -3dB and -4.5dB would be fairly minor, and the biggest complaint regarding dialogue is generally that it's too low anyway. Plus..... if you take a sound being produced by a single speaker and split it between 2 speakers, all else being equal I'm pretty sure doubling the speakers increases the level by 3dB, so reducing the centre channel by 3dB when down-mixing to stereo adjusts for that 3dB increase. Don't quote me on that though..... my memory regarding amplifier power and adding additional speakers, along with any speaker coupling etc and the various resulting perceived volume increases which result seems to be very patchy. I really should give myself a refresher course.
When it comes to the LFE channel it appears to be a different story, in that it's not included in the official down-mixing equation at all when down-mixing to normal stereo. Thinking about it..... the LFE channel isn't like the centre channel, it contains no additional information which isn't already in the front (or rear) stereo channels. It's there simply as a "boost" for the low frequencies. So officially, it's not included when down-mixing.
When down-mixing to Dobly Prologic (ie surround sound from stereo audio) the LFE channel can be included. If it is, information should be written to the AC3 stream simply to tell the decoder whether it's there or not, but there's no value specified as far as level goes.
So when down-mixing to normal stereo (not Prologic) my theory would be, whether or not you include the LFE channel for a bass boost would be personal choice. Thinking about the various encoders and encoder GUI's I've used over the years, I suspect it's generally not included by default and you'd have to tell the decoder to specifically include it in the down-mix if you want it, but I'm just going on more of a hunch there. So if you do include it, a gain reduction of 3dB makes perfect sense, in relation to the speaker doubling I mentioned earlier. There's probably no reason why you couldn't use a little more gain reduction if you wanted to, so as not to boost the bass so much, which is no doubt why the BeSweet GUI gives you a choice whether or not to include the LFE channel and why it defaults to -3dB, but gives you the option of changing it.
Logically, it'd make sense when down-mixing to stereo for the encoder automatically apply the further gain reduction required so the user doesn't need to think about it, but I guess as traditionally "normalising" to 0dB has been part of the encoding process, it's pointless doing both, so chances are no encoder automatically applies the further gain reduction. That'd be my theory anyway....
So after going through all that, the above is how I'm fairly sure it all works. What I don't actually know for certain, is whether foobar2000's down-mix DSP includes the LFE channel. A year or two ago I asked about the matrix it uses for down-mixing in the foobar2000 forum, but didn't get a definitive answer. The consensus seemed to be it applies -3dB to the centre channels, -6dB to the rear channels and the LFE channel is included at -3dB, but I don't know if that's exactly correct. I just use it.
When I get a chance later today or tomorrow, I'll try some down-mixing to stereo using MeGUI and AVISynth and looking at the matrix MeGUI adds to the AVISynth script to see if I can work out how it does it. It'll down-mix to stereo or Prologic so I'm kind of interested to see if it does it the same way each time.
The original foobar2000 down-mix DSP only worked with 5.1ch audio, but apparently the current version also works with other types of multi-channel audio, although I've not yet updated foobar2000 to the latest version myself. Once I do, I'll probably ask the question again in the forum to see if I can get a definitive answer this time.
As far as BeSweet goes, I haven't looked at the manual too closely, but I had a quick play with the GUI, and it appears when down-mixing to stereo it'll automatically apply the appropriate down-mix matrix. The GUI adds "-L -3db" by default to the command line, but lets you disable the LFE channel or change the amount of gain reduction, which would lead me to guess you'd have to add the same to the command line when using the CLI version of you want the LFE channel to be included. It also appears there's two LFE command line options. "-L -3db" seems to be the one to use for normal stereo down-mixing, while "-l 0db" is the one to use when down-mixing to Prologic (if you want the LFE channel to be included). It also appears you can specify a gain.
The way I read the tooltips for the GUI, is you can manually adjust the mix level for the centre and rear channels, but if you do it over-rides the levels which would normally be used (ie "-C 0db" and "-S 0db").
As I said, I'll edit my previous post to correct what I wrote regarding the down-mix levels. If you're interested in looking at the spec for AC3, there's a link to the pdf on this page: http://www.digitalpreservation.gov/f...09.shtml#specs
Or this is the direct link: http://www.atsc.org/cms/standards/a_52-2010.pdf
To save you the time I spent looking through it trying to find the relevant bits, the down-mixing formula is section 7.8.2 on page 101. The tables used for the "clev" and "slev" values are numbers 5.9 and 5.10 on page 40. Oh..... I almost forgot......
I was pretty close when I said the further gain reduction required when down-mixing was 7.5dB, according to the AC3 standard. It's actually a little different depending on whether you down-mix to stereo or down-mix to Prologic, as each uses a slightly different down-mix matrix. At best as I can tell, -3dB is always used for the rear, centre and LFE channels when down-mixing to Prologic, and I'd assume the AC3 stream contains information telling the decoder whether it needs to adjust their volumes further on playback, but to be honest I didn't delve into the Pro-Logic side of things as I'd never use it. The Prologic down-mix formula is this:
Lt = 1.0 * L + 0.707 * C – 0.707 * Ls – 0.707 * Rs
Rt = 1.0 * R + 0.707 * C + 0.707 * Ls + 0.707 * Rs
Where "0.707" would be 70.7% or -3dB.
This is what the spec says regarding the further gain reduction required.
"The actual coefficients used must be scaled downwards so that arithmetic overflow does not occur if all channels contributing to a downmix signal happen to be at full scale. For each audio coding mode, a different number of channels contribute to the downmix, and a different scaling could be used to prevent overflow. For simplicity, the scaling for the worst case may be used in all cases. This minimizes the number of coefficients required. The worst case scaling occurs when clev and slev are both 0.707. In the case of the LoRo downmix, the sum of the unscaled coefficients is 1 + 0.707 + 0.707 = 2.414, so all coefficients must be multiplied by 1/2.414 = 0.4143 (downwards scaling by 7.65 dB). In the case of the LtRt downmix, the sum of the unscaled coefficients is 1 + 0.707 + 0.707 + 0.707 = 3.121, so all coefficients must be multiplied by 1/3.121, or 0.3204 (downwards scaling by 9.89 dB)."
My translation of the above is, if all channels are already maximised to 0dB (which they probably almost never are) then when down-mixing to stereo, after applying the appropriate down-mix matrix, a further 7.65dB gain reduction is required to prevent clipping, and when down-mixing to Prologic it's 9.89dB. In practice though (I've checked lots of files) when converting untouched AC3 directly from the disc, -6dB seems to be plenty for normal stereo down-mixing.
Something else to keep in mind:
Lossy codecs can store values above 0dB, so when converting directly to a lossy format, values above 0dB aren't "clipped" as such, they're stored correctly. They may of course be clipped on playback, but in practice most equipment would probably have at least a few dB of headroom, and even if not, if there's only the odd peak above 0dB and it's only 1 or 2 dB, it's not something you'd be likely to hear anyway. Wave files, on the other hand, can't store values above 0dB, so if you convert to wave file, or if the converter uses an intermediate wave file when converting (which I think BeSweet might) then values above 0dB will be "clipped".
Well you've refreshed my memory on much of this and I think I've learned a little along the way, and I will play with MeGUI later to see what it does. Probably BeSweet too. If I discover anything interesting I'll report back.
Last edited by hello_hello; 10th Mar 2013 at 20:20.
BeSweet is down-mixing to Dolby Pro Logic 1 (2 channel, stereo) by default.
When you say that "-l 0db" is the one to use when down-mixing to ProLogic, you probably don't mean stereo, 2 channel. Because in stereo the LFE channel is not included. Or are you meaning "up-mixing" from stereo to surround? (Or do you mean that the LFE channel is preserved in front left and right channels, and can be turned back to the LFE channel when up-mixing again?).
As far as I understand there is 3 choises for down-mixing to stereo here:
- Stereo (ordinary):
Center * c-lev and Rear channels * s-lev are being added to the front channels. Rear-left to Front left, and Rear-right to Front right, no mixing of the Right and Left channels. (c-lev=0.707, s-lev=0.707). 0.707*LFE are also being distributed to the Front channels.
-Dolby Pro Logic 1:
Center * c-lev are being added to the front channels, (Rear-left and Rear-right) * s-lev are subtracted from Front left,
(Rear-left and Rear-right) * s-lev are added to Front right channel. Here it is mixing of right and left channel. And some phase shifting. (c-lev=0.707, s-lev=0.707). 0.707*LFE are also being distributed to the Front channels.
-Dolby Pro Logic 2:
Center * c-lev are being added to the front channels, (0.87 * Rear left) and (0.49 * Rear right) are subtracted from Front left, (0.49 * Rear left) and (0.87 * Rear right) are added to Front right channel. Here it is mixing of right and left channel. And some phase shifting. (c-lev=0.707, s-lev are listed). 0.707 * LFE are also being distributed to the Front channels.
0.707 = -3dB, 0.49 = -6dB, 0.87 = -1.2dB
Could it be that when using DPL1 and DPL2, the stereo can be mixed back to surround again? (upmixing) While the ordinary stereo can never be mixed back to surround?
If this is the case, then it has to be wiser to use DPL1 or DPL2. Because then I can convert the audio to surround again, when I invest in surround gear
Then the question is, which are wiser to use of DPL1 or DPL2?
DPL2 is a improved version in comparison to DPL1 isn't it?
MeGUI's audio encoder does (at -3dB), and if memory serves me correctly the BeSweet GUI does the same thing, but I'd guess the command line version doesn't as the GUI adds "-L -3db" to the command line in order to include it (it might default to "-L -0db", I can't remember).
When downmixing to Prologic, "-l -0db" is used. I'm working from my unreliable memory again, but I think "-l -0db" tells BeSweet to include whatever info is necessary to let the decoder know there's "separate" LFE channel info in the DPL audio which it needs to decode. Normally you'd just include it without changing it's volume if I remember the AC3 spec correctly, but I guess BeSweet also gives you the option to adjust it's volume if you like.
I'd go out on a limb and guess if you down-mix to prologic you could use both "-L -0db" and "-l -0db", which would include it in the left and right channels for normal stereo playback but also include it as DPL info which could be decoded separately again and sent to the LFE speaker by itself. That'd be how the surround sound channels would work wouldn't it? When playing the audio while decoding it as stereo the surround audio would be heard in the front left and right speakers, but if you use DPL decoding the surround audio is sent to the rear speakers instead??
I've never been interested in DPL myself so I can't say I've ever played around with it to confirm how it all works.
I couldn't even tell you whether DPL is used much these days, given discreet 5.1ch audio seems to be more the norm, but I'd guess you'd need a surround system capable of decoding it.
As far as I know, while not as good as discreet channels DPL works, however I'd be wondering what happens after you downmix it to DLP and then re-encode it. It's not something I've ever tested myself (I used AutoGK for years which downmixes to DPL, but I've never decoded it using DPL), however when you encode using a lossy encoder, I'd wonder how much of the surround sound information gets thrown away, given one of the things lossy encoders do is remove the stuff you can't hear to make what remains easier to compress. If you're wanting to keep as much of it as possible, you'd probably want to use a fairly high bitrate when encoding.
I kind of remember a thread a long time ago where someone said they tested it while encoding to MP3 and DLP is a waste of time for bitrates below 190kbps as lossy encoders tend to mess with the phase cancelling/shifting stuff DPL does to work it's magic, but that's kind of a vague memory. Maybe you could encode it while downmixing to DPL, then decode it to a multi-channel wave file and compare it to the original to see how similar it is when encoded at different bitrates? I'd assume there's a way to do it.... I've just never thought about it before.
Personally, if you're looking to "future proof" your encodes a little, I think it'd be better to forget about re-encoding the audio and just keep the original AC3 (or DTS) multi-channel stream.
Myself.... if I'm converting DTS I convert it to multi-channel AAC as that can reduce the audio stream size by around 1GB for a movie (I use the default NeroAAC quality setting of q.50) but I never bother re-encoding AC3 as unless you do down-mix it to stereo, it doesn't reduce the size enough to warrant it. For extra future-proofness you'd probably want to convert DTS to multi-channel AC3 anyway, as any surround sound system should decode it, but multi-channel AAC.... probably not so much.
I can't say I worry about it though as I'm not sure I'll ever own a surround sound system as personally I much prefer stereo (even though I keep the original multi-channel audio or convert it to multi-channel). All surround sound audio does for me is offer a constant reminder the audio surrounds me while the picture doesn't, and I find that more of a distraction than anything else. Maybe for a 3D movie it'd be less distracting, but even then, I doubt it. At least for me.
I thought maybe a DPL expert might have popped into the thread by now, which would be nice, because I'm working mainly on theory where DPL is concerned. My theory, that is.
Last edited by hello_hello; 17th Mar 2013 at 22:10.
If by making DPL stereo, you can play stereo on stereo speakers, and surround on surround speakers, that is just marvellous. Then I think it's a good choice to downmix it to that kind of stereo. I choose AAC because my iPad doesn't want AC3 in the mp4 files.
It is one detail about DPL which seems very strange. And that is that rear left and right are both subtracted from front-left and added to front-right. It seems like the rear channels are mixed together and put in the front channels. But they are phase shifted 90 degrees (along the y-axix, imaginery numbers). Then it will not interfere with the sound on the x-axis, which is the sound from front left and right, center channel and LFE-channel. I think that the sound in the y-axis is for storing only. And the sound in the x-axis is being played in stereo. And when decoding to surround, then the sound in the y-axis is being played in the rear speakers.
I don't know really, but it's my theory about it.
I'd be tempted to convert the AC3 to AAC while downmixing to stereo and keep the original AC3 as well. That way you should be able to play the encodes on the ipad while still having the original multichannel audio to use in the future.