I have an MP4 with a musical soundtrack and voices muxed together in a single track. Is there any way to identify the frequencies for the voices and filter out everything else? Ideally I would like to play different music in an additional layer.
+ Reply to Thread
Results 1 to 5 of 5
Again, similar to the other recent thread: it DEPENDS. But don't get your hopes up.
What you need to understand is that the frequencies that comprise voice are the very same frequencies that comprise music (though music encompasses some beyond what the voice does). You can brute force EQ to filter out non-voice frequencies, but that doesn't lose the bulk of the music. You can filter out by level, dropping out low level stuff, but both voice & music vary their dynamics both in microcosm and macrocosm, so you wouldn't get much separation unless the mix balance is already heavily in favor of the voice.
You may be able to use neural-network spectral/harmonic filters, as most? music has a higher spectral content than voice, and that might be just the hook you need as leverage to separate, but those are not free to use, nor easy, and it's still a crap shoot. Again, a few special cases, maybe.
BTW, please note: very rarely is voice & music "muxed" together. That would connote that they are still separate parallel channels that are coexisting in time with each other but encapsulated within the overall container file. If they truly were, it would be EASY to demux them to their elementary streams. No, they are "mixed" together into a single set of channel(s). Burned-in, as it were. Or like a cake recipe: it's no longer flour & sugar & water..., but now is dough. Even if it hasn't been baked yet. Ever try to separate the sugar from dough?
Chuck Norris can separate sugar from dough with sheer force of will.
Izotope RX7 has 2 modules that could be useful in doing this. The first is "Dialogue Isolate" which is more geared toward extracting dialogue from noisy backgrounds. (Info: https://www.izotope.com/en/products/repair-and-edit/rx/features-and-comparison/dialogue-isolate.html
If "Dialogue Isolate" doesn't produce great results I've found that the new "Music Rebalance" module (machine learning based) can sometimes work for spoken word stuff if you use the Isolate voice preset. (Info: https://www.izotope.com/en/products/repair-and-edit/rx/features-and-comparison/music-rebalance.html)
I believe Izotope offers a trial for RX, but I'm not sure if the trial is crippled or not.
That's the state of the art way though.
The videos in this post might be relevant :
Also mentioned here :