Audio was recorded directly from a digital piano using a Steinberg UR22 audio interface.
It was recorded and saved using Audacity in "96 Hz / 32-bit float", then "normalized" and exported to 44100 Hz / 24-bit PCM WAV.
I also made sure that 96 Hz was in the actual Steinberg settings before starting to record.
So according to you, is the "in24" a good or a bad thing? Or totally irrelevant?
+ Reply to Thread
Results 31 to 60 of 73
-
-
Everything looks good. I wouldn't worry about "in24"
For your peace of mind , just do a short test. Cut a small section and upload it to YT , download the re-encoded version and check
There are ways to change the audio, swap endian etc... in ffmpeg, but you shouldn't need to do that. It should be fine as-is -
This is interesting: When I extract the WAV from the FFMPEG-created video using VirtualDub, I get a 16-bit audio file.
Perhaps that's what the "Sample format: s32 (-> s16)" in VirtualDub was saying? -
-
-
Yes, but that's what you want to check. The idea is if there is a decoding problem, it will manifest in the re-encoded file. If you hear "beeps" or noise that shouldn't be there - you know there is a decoding problem. Then you might have to make some changes to the file you uploaded (I think it's very unlikely). YT never streams the original file, only a re-encoded file, so you're doing a QC check for your audience
If you downloaded the original file, you get the original file - there is nothing to check.... it's the original file
youtube-dl (another command line app) lists and can DL every version available . But some versions take a while to "appear" as YT is processing them -
-
One more thing... what would be the best way in FFMPEG to delay the audio track? Let's say I want it to start 1250 ms later or earlier, is this possible to do with FFMPEG?
And is it possible without re-encoding the video?
If not, I'd have to edit the original audio in Audacity and then export again. It's not a big deal, but if there's a way to simply fix it with FFMPEG, that'd be awesome. -
If you have a positive audio delay - my advice is it's better to edit the video stream if you can (ie. cut off that amount from the video). An A/V offset presents many potential problems, especially when that large. Lots of things can go wrong when youtube processes it, or when people watch/listen to it in a browser or device. Not all target devices will handle it properly. A flush A/V start reduces that risk to practically zero.
You can cut without re-encoding video, but the accuracy and where you can cut specifically is dependent on where the keyframe placement was, and the framerate
If not, then add silence or other audio to fill the start of the audio -
No, you only can access the re-encoded , re-compressed versions with youtube-dl - and that's what you want to test
It's useless to test the original file, because you already have it. The reason for the test is to see if YT can decode it and process it correctly . To see if your audience is getting a correct file, not full of errors . YT decodes your original (actually it's already decoded, it's uncompressed) , then re-encodes it to various different formats. Those various different formats are what you get as an audience
Only the owner can download the original file, and it's difficult to do currently - you have to download it through an archive, and you have to download ALL your content in a category. You can't just download specific files. YT makes it very difficult and very non user friendly to do, probably because they don't want you to use them as "storage" -
With VirtualDub it works flawlessly though. I tested it using the original MP4, and it delays the audio perfectly with no sync issues whatsoever. I even tried different players to test it. No problems.
Now, when you say it's better to edit the video stream, wouldn't editing the actual audio be sufficient? Just cutting off how much is needed, or adding silence, for example. That way, the video file wouldn't need to be touched. What do you think?
I'd also gladly try it in FFMPEG if I knew the command. Maybe there's a chance of it working good like in VirtualDub. -
I know, I know. My bad. This was a misunderstanding on my part. You just wanted me to check the uploaded version.
I got confused (and still am, a little bit) when you said I should download the re-encoded version. Wouldn't listening to audio in the uploaded video be enough? I mean, what advantage would I have in downloading the re-encoded version? The audio would probably not be even close to 24-bit anymore. -
Trust me. Huge mistake. If you are a professional content producer you don't want to take unnecessary risks. This is an easy fix. Do it.
Now, when you say it's better to edit the video stream, wouldn't editing the actual audio be sufficient? Just cutting off how much is needed, or adding silence, for example. That way, the video file wouldn't need to be touched. What do you think? -
Yes, you can play it and listen to it in the browser. The re-encoded version will be 16-bit because that' s all youtube currently supports for their re-encoded versions.
The reason I said download is when you have a local version you can put it through many tests without wasting bandwidth. You can open it up in different browsers, different players for example. Sometimes there are bugs in one browser but not another. You can do tests and manipulations that are not possible when streaming and viewing on youtube
Not all browsers and devices get the same version by default but if there is a decoding error, it will be present in all versions -
Thanks for the advice. Yes, editing the audio in Audacity to fit the video is pretty easy indeed, and seems to be the ideal solution.
And VirtualDub seems to be the perfect method to test exactly how much positive/negative delay is needed, because it does that pretty fast.
Yes, that's what I suggested (maybe I made an edit before your post)
BTW, your help in this thread has been invaluable. -
Audio is better to edit if you can, because the accuracy if the fps was, say 24fps, you can only get 1/24 = 0.0416667ms accuracy. ie. You can only get frame accuracy, and that's if you're re-encoding. When using compressed video that uses typical temporal compression - you can only cut on specific frames when not re-encoding, so the accuracy is much worse.
-
-
-
I'm really REALLY glad now that I didn't upload anything yet, because I learned so much from you.
Say, should I aim at making the audio file exactly the same length as the video file (down to 1 ms)?
I mean, if I were to make some sort of mistake in determining the actual video length, and therefore mistakenly make the audio let's say 5 ms longer (just an example) than the video... how would FFMPEG react when fusing the two together?
And if the audio is longer and I use the "-shorten" command in FFMPEG when fusing the two together, wouldn't that fix things automatically?
I also hope I'm not irritating you with my questions. :/ -
Take a look at this...
One official YouTube page says that 44.1 khz / 24-bit is the way to go: https://support.google.com/youtube/answer/6039860?hl=en
Another official YouTube page says that either 96 khz or 48 khz is the way to go: https://support.google.com/youtube/answer/1722171?hl=en
This seems to be conflicting information, no? And everywhere on Google you can how that YouTube resamples everything down to 44.1 khz, including in this forum post from 2015: https://productforums.google.com/forum/#!topic/youtube/GXGn6Cac7_c
What are your thoughts on this?
Also, regarding video bitrate, my video is 720p, and there is so much conflicting advice everywhere. For example, one site says that 5,000 Kbps (5 Mb) is a good setting, while another site says you should use 30,000 Kbps (30 Mb). The difference seems huge. -
No problems for ffmpeg when audio is longer
But the problems I'm referring to are what happen afterwards, when it gets re-encoded or when other software or devices handle it. Sometimes weird buggy things happen when the duration doesn't quite match
And if the audio is longer and I use the "-shorten" command in FFMPEG when fusing the two together, wouldn't that fix things automatically?
It will cut the longer file . But sometimes there will be a glitch when it cuts (it doesn't always cut audio cleanly), or if it's the video, you might not have the accuracy wanted, especially when using -c:v copy (or just -c: copy)
It's better to do it manually, properly in an audio or video editor while you are certain you have control -
According to VirtualDub, my video's length is 0:01:43.870.
According to Audacity, my audio's length is 00:01:41:490.
I need to add exactly 1600 ms silence (tested in VirtualDub) at the beginning, which will make the audio 00:01:43:090.
Then I'd need to add the remaining silence at the end to match the video length, and the difference is less than a second. It's 780 ms.
Would simply adding 780 ms of silence at the end of the audio be fully sufficient?
My worry is about the precise length, how precise it needs to be, because there is obviously more after the "870" ms, and that is not visible in VirtualDub. There is also more after the "490" in Audacity. That's where my worry comes in. How close/accurate does it need to be?
Let's say if the audio is a bit longer, I'll exaggerate and say 5 seconds longer, and I don't use the "-shortest" command in FFMPEG. Would the remaining audio continue to play after the video is finished, or would the audio end together with the video?
I just really wanna get this right. -
Youtube currently converts everything down to 16bit 44.1Khz for it's re-encoded streaming formats . That doesn't mean it won't change in the future like many of the changes it's made over time. Vimeo used to be 44.1Khz too, but added 48Khz a while back.
For video bitrate , it depends on the content. Some types of content require more bitrate to achieve a certain level of quality. For example, lots of motion, noise . But a completely static scene (for example a still photo or title) would require very little to achieve a certain "quality level". As a general rule , you want to upload something in high quality (but don't go overboard - there are diminishing returns) . So instead of sticking to a set bitrate, most people will use quality based encoding. The bitrate (and thus filesize) will fluctuate according to the content, and you will end up with that quality level that you set. Eitherway, the final results will get severely downgraded in YT - you have very little control over that
It used to be that the higher the resolution, the higher the audio bitrate allocated too . So if you uploaded 1080p and played back the 1080p version it would sound better than say, the 720p or SD versions. I don't know the current status right now.
It doesn't need to be perfect, a few ms off is no problem. "off" at the end is much better than "off" at the beginning. The latter causes more problems in later scenarios
But the accuracy is almost always accurate for uncompressed audio in an audio editor.
The interpreted video duration might change depending on the decoder, splitter or "interpretation" set in a video editor or something like vdub. For example, "23.976" "film rate" is actually supposed to be 24000/1001 . or 23.976024... ie. not quite 23.9760000 . There can also be slight "jitter" in the timebase of various container formats.
Doing this in a dedicated professional type NLE (non linear editor, eg. sony vegas, premiere pro, fcpx, edius, avid mc, etc...) is often the "best" way, because you have access to both audio and video, support for higher bitdepth audio & sampling rates, can make edits and changes right on the fly. I would consider that route if this is something you're going to be doing more often
If audio is longer than video, it depends on the software, player, decoder setup , drivers etc.. . There are hundreds of different combos so it's very difficult to say. Even if you limit it just to browsers, some clients have GPU acelleration enabled, some disabled. Some are running flash, some html5, some people are watching a webm version, but others an mp4 version, etc. etc... -
So would uploading 48 khz / 24-bit be wiser than 44.1 khz / 24-bit?
Some people on Google speculate that uploading higher than 44.1 khz could cause problems, but then again, as you saw, one official link even suggests 48 khz. This is very confusing, to say the least.
For video bitrate , it depends on the content. Some types of content require more bitrate to achieve a certain level of quality. For example, lots of motion, noise . But a completely static scene (for example a still photo or title) would require very little to achieve a certain "quality level". As a general rule , you want to upload something in high quality (but don't go overboard - there are diminishing returns) . So instead of sticking to a set bitrate, most people will use quality based encoding. The bitrate (and thus filesize) will fluctuate according to the content, and you will end up with that quality level that you set. Eitherway, the final results will get severely downgraded in YT - you have very little control over that
It used to be that the higher the resolution, the higher the audio bitrate allocated too . So if you uploaded 1080p and played back the 1080p version it would sound better than say, the 720p or SD versions. I don't know the current status right now.
It doesn't need to be perfect, a few ms off is no problem. "off" at the end is much better than "off" at the beginning. The latter causes more problems in later scenarios
But the accuracy is almost always accurate for uncompressed audio in an audio editor.
The interpreted video duration might change depending on the decoder, splitter or "interpretation" set in a video editor...
Doing this in a dedicated professional type NLE (non linear editor, eg. sony vegas, premiere pro, fcpx, edius, avid mc, etc...) is often the "best" way, because you have access to both audio and video, support for higher bitdepth audio & sampling rates, can make edits and changes right on the fly.
If audio is longer than video, it's very difficult to say. It depends on the software, player, decoder setup , drivers etc.. . There are hundreds of different combos so it's very difficult to say. Even if you limit it just to browsers, some clients have GPU acelleration enabled, some disabled. Some are running flash, some html5, etc. etc... -
BTW, I use "Shotcut", which is a free non-linear editor. I find it great, as it lets me crossfade between clips etc., also do fade-in/out, and it also helped me join certain files together in a perfect way, making sure the later imported audio matches exactly. You can see the audio waves below the video clips as you're editing, which greatly helps in putting things together.
Unfortunately, it doesn't seem to allow import of lossless audio regardless which video codec or container is used. It always wants to compress it in some way, and I want to upload completely lossless to YouTube.
Side question: Since I do intend to do more of these videos in the future, do you recommend buying the software "Adobe Premiere Elements"? It is not that expensive, and it seems to have great features. -
I've never had problems with 48Khz - that's what I would use in your shoes. In theory you could have similar problems with 24bit => 16bit depth conversion done by youtube, right ?
It helps to visualize the audio and video as "bars" to see if they match at the end - doesn't shotcut do that (not too familiar with it) . It's hard to say, because I wouldn't entirely trust what vdub is telling you. ffmpeg -i input.ext will also tell you what it "thinks" the A/V durations are , I'd actually trust that more because that is what youtube is going to be using.
EDIT: oops didn't see the last post about shotcut. Elements is junk IMO, it's a very downgraded version of Premiere -
Good point. I'll go with 48 khz / 24-bit then.
It helps to visualize the audio and video as "bars" to see if they match at the end - doesn't shotcut do that (not too familiar with it) .
EDIT: oops didn't see the last post about shotcut. Elements is junk IMO, it's a very downgraded version of Premiere -
I just did an experiment. I added the 1600 ms silence in the audio at the beginning, but intentionally left out the 780 ms silence at the end, and then fused them together with FFMPEG.
The result is flawless when played back on VLC Media Player.
Perhaps I could go more accurate and leave only 100 ms at the end instead of 780 ms, but if the current result would remain the same after uploading to YouTube, then that'd be perfect.
I guess I'll have to play around and update the thread with the final results eventually. -
I don't understand - The original calculations in the earlier post #52 make sense to me, that's what I would use. That should match perfectly for length. I wouldn't make the audio shorter, certainly not 100ms shorter. Why are you saying there should be "more" after the end ?
Can shotcut import generic 16bit 48khz PCM Wav ? or even 44.1Khz ? It's pretty standard, and should be supported. You're only doing this to test AV length. You don't want to use some compressed audio format to test, because there will be some delay padding added to the beginning. All lossy compressed formats do this to an extent -
My reasoning is this: Let's say VirtualDub and FFMPEG are giving me a longer video duration than it actually is (for whatever reason). In that case, by making my audio have the same duration as the supposed video duration, I'd be literally making the audio track longer. And seeing how longer audio than video could lead to problems, I assumed it would be "safer" to leave a little room at the end. If not 100 ms, then maybe 50 ms.
Perhaps my reasoning is faulty.
In VLC it played back perfectly though.
Can shotcut import generic 16bit 48khz PCM Wav ? or even 44.1Khz ? It's pretty standard, and should be supported. You're only doing this to test AV length. You don't want to use some compressed audio format to test, because there will be some delay padding added to the beginning. All lossy compressed formats do this to an extent
I'll play around with it tomorrow and see what I can do.
If you really think that one should have the audio track exactly match the video duration down to 1 ms, then I trust you, and that's what I'll be aiming for.
Thank you, once again, for literally saving my project with all your explanations and suggestions, starting with FFMPEG and so on.
Similar Threads
-
Problems adding PAL audio track to NTSC video, both with the same duration
By Swordmaster512 in forum Newbie / General discussionsReplies: 5Last Post: 27th Jul 2015, 20:51 -
Replacing Audio In An MPG File Lossless
By TheRandomOne in forum AudioReplies: 1Last Post: 16th Mar 2014, 22:55 -
mkv replacing audio track and out of sync
By chaussettdeguerr in forum EditingReplies: 6Last Post: 25th Mar 2013, 22:42 -
Replacing DVD Audio track (how does this sound?)
By ArtOfLosingMFZB in forum Newbie / General discussionsReplies: 2Last Post: 21st Apr 2012, 15:00 -
please recommend tool for editing + adding audio track to video
By lovelove in forum Newbie / General discussionsReplies: 4Last Post: 25th Sep 2011, 23:10