I know this isn't really a forum for capturing A/V, but rather for post-processing. However, that's where my problem lies:
Can anyone detail (or point me in the direction of) the attack profile of a standard snare drum?
I'm trying to properly sync video of live music with it's original audio, and I am doing so by syncing snare hits. But in almost all the video I've spot checked, snare hits (particularly rim shots) occur in the audio a frame or two after the frame upon which visible motion of the stick has stopped. So I'm guessing one of two things:
- The snare drum is delayed at mix to compensate for the distance of most of the rest of the drums micing. However, –66ms seems quite extreme.
- A snare hit/rim shot has an inherent delay between physical strike and audible attack (as the pressure moves towards and eventually excites the snare itself?).
I've searched far and wide, but all searches come back with tips on how/why to delay a snare, nothing about what actually happens upon striking the drumhead.
The only other thing I can come up with is that these videos—while obtained from an official archive—were originally live-streamed, and what I'm seeing might simply be the inherent latency in the original stream (or codec latency) that was preserved upon archival capture.
+ Reply to Thread
Results 1 to 10 of 10
Of course, there's an amount of time between the hit and the time the sound reaches the microphone. But that's on the order of a foot per millisecond. So for a 60 millisecond delay the mic would have to be about 60 feet away. In a case like that you would advance the audio by 60 ms to make up for that.
Note that many compression codecs add 10 to 20 milliseconds delay while encoding. So several re-encodings will result in significant delay. Say someone records a concert encoded as mp3 -- that mp3audio may have a 15 ms delayed. They edit the video and reencode it to upload to youtube -- that's another 15 ms. Then Youtube reencodes it again, now you're up to 45 ms delay...
Last edited by jagabo; 9th Sep 2020 at 21:15.
On a rim shot, the steep rise in the audio waveform should coincide exactly with the stick hitting the rim.
The delay you mention is consistent with the microphone being mounted somewhere other than up on the stage. I ran into this with a fashion show where I couldn't plug into the sound board and had to use ambient audio. My camera was about 3/4 of the way down a basketball court from the stage (it was staged in a high school gym). That short distance was enough to put the audio two frames behind the video. The video back then was 29.97 fps, not 59.94. If I had been filming with a more modern 60p camera, it would have been four frames behind.
A good example of higher quality audio A/D latency would be like the Focusrite Scarlett, which has a round-trip latency of 2.47msec (useful for monitoring passthrough while overdubbing). Record buffer to LPCM might add a few dozen samples on top of that, but 1 sample at 48kHz is 0.02msec, so not much.
I would agree with the others, this seems very likely to be some combination of distant miking and/or compression latency. And drumkit hits (snare, hat, etc) are commonly the very items used to gauge sync by, for the very reason of their clear, fast attack, easily recognizable transients.
And remember, AFA distance goes, the speed of light is much faster than the speed of sound, hence the need to utilize close miking techniques in an A + V production.
Last edited by Cornucopia; 9th Sep 2020 at 23:57.
Speed of sound at sea level: 1,125 feet/second.
Distance sound travels in 1/30 second (the duration of one frame of video): 1,125/30 = 37.5 feet.
The width (not the length) of a basketball court is 50 feet, so even at that very modest distance, if your mic is that far away from the drum, the sound will lag by more than a full frame. In my fashion show I was nearly 100 feet away from the speakers, which were mounted behind the fashion runway, so I was close to three frames off. That was more than enough to be noticeable when I zoomed in on the emcee while he was speaking.
It seems to me you're talking about live human drummers. That's certianly my preference but their accuracy many not be enough to really go on. The rise time may not matter.
If the drummer's hitting the rim with the stick that should produce a fairly audible "attack", even if the head of the stick hits the skin a little later and it takes a fraction longer to excite the snare strainer.
I had a music video sitting on my hard drive, so I had a look. I encoded it myself and I'm pretty sure I wouldn't have, or at least shouldn't have, introduced any additional (encoding) delay when I re-encoded the Bluray. I'd never have noticed while watching the video, but it does seem the audio is at least one frame behind.
I tried a movie for fun, and the first place I found where I could be fairly sure, the audio was at least one frame ahead.
I'm not sure I'd fuss over a single frame difference. A frame lasts for around 40ms, and often it's hard to tell when the noise was made, looking at individual frames. That is, when the frame is first displayed, when it finishes displaying, or somewhere in between.
Wow, thank you for the extensive responses!! Much appreciated. In trying to parse all this, I'll say the following:
First, just to clue everyone in: these are pro-shot videos of Phish downloaded directly from their official service. Production is top-notch, but as explained, they are initially livestreamed and then packaged for purchase/re-viewing in an mp4 container (AVC+AAC).
@netmask56: Good tip! But though I did find a paper that looks like what I need, I'm not willing to drop $33 for info I'm nearly certain I can find free.
@jagabo: I hadn't considered compression delay, but now seems obvious, esp. for a snare. However, re. codec delay: video is from a band archive. I'm nearly certain the mp4 just packages direct copies of the original elementary streams, so no additional re-encoding is likely to have taken place.
@johnmeyer: Duh, yes. Upon reflection, obviously a rim shot should have immediate attack from wood/metal contact.
@Cornucopia, et al.: These are all soundboard recordings of a pro band. No chance these are audience recordings or any long-distance micing was used. Thinking various latencies are indeed the real culprit.
@Hoser Rob: I'm usually using a simple 2-4 backbeat to sync. And unless I'm misunderstanding your comment: even if a drummer has the worst sense of time in the world, the attack profile of a solid rim shot would still be identical.
@hello_hello: Wow! Thank you for your in-depth analysis. I'm glad someone else was able to confirm the same using another live music performance; I also needed to make sure my computer wasn't the culprit. Your results make me feel better about my observations. I will say though: I know it seems a fuss over very little, but it was enough for me to notice, check, and observe, so… Might I ask what software you used to generate those samples?
I will say, adding a delay to the audio of –33 or –66ms makes the synchrony of everything seem more crisp. I'm just hoping its not placebo effect and that I'm not actually damaging a good sync by being "fussy" and nitpicky.
There's a link for my AudioWave function in my signature, and you should find a link for waveform.dll in the zip file if you can't find one elsewhere.
Although for the video I was playing with yesterday, the AudioWave function mostly resulted in an access violation error, so I gave up and used waveform.dll directly. I'll have to check the function in case I stuffed something up and it wasn't just the computer tormenting me, but if you have problems with the function, just load waveform.dll and add it to a script yourself.
If you're not an Avisynth user, SubtitleEdit has the ability to generate waveforms. It generates them when you tell it to, but you might want to make a coffee while it does, and it's setup for subtitles. I didn't spend much time with it, but I couldn't find the "advance one frame" button. I thought VLC might have a useful visualisation but couldn't see one.
Maybe foobar2000 is a possibility, now there's a plugin giving it the ability to play video.
I haven't tried it, as it doesn't play on XP, but assuming it works, fb2k has an oscilloscope visualisation that'd probably do the job.
fb2k is fairly awesome, but it does take some time to set it up and find your way around it. You'd also need to add and configure the ffmpeg decoder component, to open video files.
Maybe someone else will be able to recommend an easy to use video editing GUI, that's free and can also display audio waveforms.
I remember watching a documentary on the human brain a while back. Apparently it takes our brains less time to process vision than it takes it to process sound. Or maybe it's the other way around, but one of the things the brain learns to do when we're infants is apply a delay to either the audio or visual to sync them, and it's flexible. The documentary showed an experiment where subjects looked at something flashing on a screen with accompanying audio. Then they slowly changed the A/V sync.... and the subject's brains adjusted it's internal delay to keep the sound and flashing synced. Something like that.... I'll have a look tomorrow to see if I kept that documentary, but it surprised me how far the audio could be ahead of the visuals without the subjects noticing a sync problem, as long as their brains had been trained to expect them to be synced initially. If I remember correctly....
But yeah.... have you ever tried to fix audio sync issues by simply watching and listening (no waveform). Sometimes it's okay, then it's not, then it's worse.... then a day later you smack yourself on the head and start again. Maybe that's because there's no way to disable the brain's auto-sync.
And yeah.... placebo.
I know what you mean though. I used to mix bands before the apocalypse, and I still swear the audio sounds better if the lights look good and the cues are exactly right. Shitty lights, and the audio doesn't sound quite as good. I don't know why, but that's what my brain tells me.