I have several videos that the movie music is loud but dialogue is too quiet. I normally try normalizing audios to help with this, but I found out that there is a better way of doing this for the soft dialogue parts only using an audio compressor.
What I would like to know is what is the best settings or a good starting place for doing this for audacity. Also is there a better plugin for audacity for doing this?
I'm open to suggestions as well as trying other audio editors if this will do what is needed.
I know this is fairly easy to accomplished with software media players, but I'm using my television media player which doesn't have any setting to accomplish this.
+ Reply to Thread
Results 1 to 30 of 121
The best way is to break down the 5.1 audio into 6 waves and create your own 2.0 audio(I hate 5.1 audio for just this reason).....but that's gonna require several steps obviously.
You can give a chance to dynaudnorm filter - i would use ffmpeg and apply this filter https://ffmpeg.org/ffmpeg-filters.html#dynaudnorm only on Center channel (or create 2 channel downmixed version as suggested by hech54) - you can keep original track and add new track to file (of course if this comply to file size).
I have broken down the audio to 2 channels stereo, but the dialogue is really low in sound while the music and explosions are really loud. I wanted to increase the gain of the soft passages while keeping the loud one the same.
If you could feed the audio into Multitrack view (Adobe Audition) you could process just the centre channel that contains the dialog. Compressing the dialog only would leave the music and effects at a more comfortable level. With a stereo downmix the dialog is spread across both the left and right channels so any compression is going to affect the music and effects equally. The result of that is the overall level of the sound might be louder but the ratio between dialog to M&E will remain much the same.
Some upward expansion on the dialog would lift the low level dialog but leave the normal and loud parts the same. Don't know if Audacity has this ability. You might have to produce 6 wave files from the AC3 ?BeyonWiz T3 PVR ~ Popcorn A-500 ~ Samsung ES8000 65" LED TV ~ Windows 7 64bit ~ Yamaha RX-A1070 ~ QnapTS851-4G
Test bellow example - works for me:
ffmpeg -y -i "%1" -vn -c:a ac3 -b:a 192k -af "pan=stereo|FL < FL+1.414FC+0.5BL+0.5SL+0.25LFE+0.125BR|FR < FR+1.414FC+0.5BR+0.5SR+0.25LFE+0.125BL,firequalizer=gain='if(gte(f,16),0,-INF)+if(lte(f,16000),0,-INF)',dynaudnorm=p=1/sqrt(2):m=100:s=12:g=15,firequalizer=gain='if(gte(f,16),0,-INF)+if(lte(f,16000),0,-INF)',aresample=resampler=soxr:osr=48000:cutoff=0.990:dither_method=none" -f matroska "%~n1_dn.mkv"
Thank you all for responding.
I will give the above suggestions a try. I especially like the idea of feeding the audio into Multitrack view as netmask56 have mentioned above. I think this will give me a good idea of what the audio looks like and I will be able to adjust the audio better that way. I like the idea of boosting the center channel, and perhaps the Left and right channels while leaving the other channels alone, before downmixing to stereo.
ffmpeg executable (to avoid hassle with folders in same place where your script is located).
And yes, ffmpeg is quite powerful so i always highly recommend to spent some time on learning how to use it.
Thank you for that suggestion, I didn't know how to work with ffmpeg at the command line level, but your suggestion will make it much easier.
Just to be certain that I got the procedure correct. So the bat and video file should be dropped into the same folder as where ffmpeg is installed? Is this correct?
Do or do not. There is no "try." - Yoda
If you want to try it out first, here's the method i use for applying compression on playback. You can load the same WinAMP DSPs directly with Potplayer.
If I was going to compress when downmixing, which I do now and then when converting audio specifically for a video that'll be watched using the TV's media player once, then deleted, (I only keep the uncompressed original) I'd do it the same way I do it on playback on the PC. I've created a foobar2000 conversion preset to downmix to stereo, compress and then encode with QAAC, because QAAC has an option to normalise the over-all volume so the peaks are at maximum.
It'll take a bit of setting up initially (although I can upload my foobar2000 configuration files if it helps) but when it's done, you can simply load a mutichannel file into a playlist, right click and select the conversion preset, and out comes a downmixed, compressed stereo file a few minutes later. Easy.... once it's set up.
Here's some old sample files I uploaded previously. A stereo downmixed version and a few compressed versions, all normalised to the same volume. The idea is to listen to the difference in volume between the speech and the action (gunshots) that follow, or ideally, the lack of difference in volume.
In the zip file:
1. Downmixed to stereo, no compression.
2. Compressed with RockSteady (Wimanp plugin loaded into foobar2000).
3. Compressed with LoudMax (Wimanp plugin loaded into foobar2000). It's not as good as I expected but I just threw that one in. It's probably too compressed. I haven't played with LoudMax much and I'm sure it'll do better.
4. Compressed with foobar2000's EBU R128 Normalizer DSP.
I'll try to add another sample using the Dynamic Audio Normaliser pandy mentioned later, once I get it working properly in foobar2000, and maybe a better LoudMax example.
I guess the advantage of pandy's method is you can do everything in one go. For my foobar2000 method you need to extract the audio yourself, convert it and remux. The latter doesn't bother me though as much of the time I'm re-encoding the video via Avisynth and/or remuxing anyway. AnotherGUI is another GUI I'd recommend trying with ffmpeg too.
Last edited by hello_hello; 3rd Feb 2017 at 21:48.
If you compress the dynamic range of the total sound track ie Lt+Ct+Rt+LRt+RRt then you just make matters worse. You end up with a louder sound track but the ratio of desired sound to undesired sound much closer. You really need to compress only the dialog and then lift it a tad.
It's such a pity there isn't a control on the average A/V amp that gives the user some control over the balance between the centre channel and the rest. My Yamaha has a dialog lift and height control, though subtle does help.BeyonWiz T3 PVR ~ Popcorn A-500 ~ Samsung ES8000 65" LED TV ~ Windows 7 64bit ~ Yamaha RX-A1070 ~ QnapTS851-4G
But the ability to control the speaker levels on an amp is entirely irrelevant to the OP's problems with mixing 5.1 channels down to stereo using computer software.Do or do not. There is no "try." - Yoda
What you are describing is the global settings for the balance between speakers in a typical setup and once set up ideally should be left alone - but that is not the way to control individual listening of an individual Blu ray or DVD etc title. Mixing down to stereo of course you really need to have access, either software emulated or hardware of a multi-track mixer to be able to accurately control the mix down.
In laypersons terms even on a stereo amp having a centre channel "volume knob" would be a boon to rebalance 5 or 7 channel material. No all that hard to implement at the manufacturing stage but unlikely to happen.BeyonWiz T3 PVR ~ Popcorn A-500 ~ Samsung ES8000 65" LED TV ~ Windows 7 64bit ~ Yamaha RX-A1070 ~ QnapTS851-4G
I have been tweaking the audio with various plugins, and it's actually pretty good emphasizing speech while trying to keep the music and explosions the same as much as possible.
It's not even a case of undesired vs desired sound, it's a case of undesired dynamic range vs desired dynamic range.
I do Lt+Ct+Rt+LRt+RRt then compress all the time and it definitely doesn't make things worse. Try the samples in the zip file I posted. It's less than 10MB. In the uncompressed (1st) sample there's normal volume speech followed by loud gunshots and sirens etc. In the Rocksteady (2nd) sample, which is the compression I use on playback, the gunshots and sirens aren't any louder but the dialogue at the beginning is. That's the object of the exercise. You still want to hear everything. You just don't want to be straining to hear something one minute and have your ears bleed the next.
A related question is: Why do the video creators do this in the first place?
It's called "art." For some reason, they think you want to pay complete attention to their movie and feel like you are there, in the scene, with the helicopters and the tanks and the gunshots and the John Williams orchestra blaring its little heart out.
A great deal of the problem is due to the different environment that the final mix is done - movies are mixed in purpose designed studios with acoustics that hopefully come close to the cinema experience. This is very different to the home environment where the listener/viewer is contending with many extra external noises, like passing traffic, air traffic, kids screaming, preparing meals etc.
Ideally a different mix down for domestic conditions ought to be done but that all adds to production cost. When I worked as a TV sound mixer, time of broadcast was a factor as to how you mixed. If it was scheduled for play between 1700 and 2000 dialog was king over everything else. Doing film mixes was a totally different ball game - monitoring at much higher levels, artistic considerations and realism took over. Really A/V manufacturers should cater for this by allowing the user easy control over the balance of the audio over and above the normal global settings for speaker balance for the system and different speaker types.
But unfortunately, the reality of it is much different. Dialogue ends up in all 5 channels (maybe even the LFE channel, too, if you're watching a movie starring Michael Clarke Duncan). The end user doesn't get a separate 5.1 soundtrack with just the dialogue that they can make louder or softer to suit their taste.
No matter how good the A/V hardware is, for the end user to do it at home means a compromise. Even your Yamaha system, with its "Dialogue Level" adjustment, is based on a typical set of frequencies and characteristics normally found in dialogue. But if the actual dialogue ventures outside the range of what is typical (such as someone with an especially deep or high-pitched voice), then you begin to lose the desired effect, since some of the dialogue will end up not being boosted. And any of the non-dialogue sounds that fall within that range will be boosted, even though you don't want them to be.
But regardless of whether it's done in the studio or on the hardware end, as you said, it all adds to the cost. And adding to the cost is prohibitive to sales, so it's not going to happen.Do or do not. There is no "try." - Yoda
I was bored so I created some new samples using different compression methods. They all compress reasonably well. First I downmixed to a stereo 32 bit float wave file like this, with Matrix Mixer's "Normalise Matrix" option enabled to also reduce the overall volume enough to prevent clipping when downmixing. I never include the LFE channel when downmixing, but doubly-so when compressing as it can interfere with the compression too much.
From there I used the following steps:
- Scanned the output file and converted it to flac while adjusting the volume to 83dB in ReplayGain speak (EBU R128 Scanning).
- Used the flac file to convert to 32 bit wave files while applying the various compression methods.
- Scanned the wave files and converted to AAC while adjusting the volume to 83dB in ReplayGain speak (EBU R128 Scanning).
That all seemed to work fine except for using the LoudMax compressor. It seems to skew the ReplayGain result for some reason. I left it as it is, and the ReplaqyGain volume is 83dB like all the rest, but it sounds 3 or 4 dB quieter than the others to me. I'll have to investigate that further.
1 - Source FLAC file at 83dB in ReplayGain speak (EBU R128 Scanning).
2 - Compressed with the Dynamic Audio Normalizer with -f 150 in the command line, then adjusted to 83dB (ReplayGain).
3 - Compressed with the foobar2000 EBU R128 Compressor DSP (R128Norm), then adjusted to 83dB (ReplayGain). It has no settings to configure.
4 - Compressed with the VST Version of the LoudMax plugin, threshold set to -18dB, then adjusted to 83dB (ReplayGain).
5 - Compressed with the WinAmp RockSteady plugin, settings in the screenshot here, then adjusted to 83dB (ReplayGain)
Nothing exciting to report in the end, aside from what seems like a LoudMax/ReplayGain/R128 scanning anomaly. They all compress. Which you prefer might be personal preference and the settings used. I still think they improve the dialogue volume and none of them make the problem worse.
Last edited by hello_hello; 7th Feb 2017 at 17:02.
- Threshold: -30.0 dB
- Noise Floor: -50.0 dB
- Ratio: 4:1
- Attack Time: 0.3 sec.
- Release Time: 3.0 sec.
- "Make-up gain for 0dB after compressing": enabled
The result is OK, but the loud parts are still a bit too loud imo.
Any suggestions for improvement?