I'm a bit confused about audio extraction with tools like ffmpeg or mkvextract. I did some research but couldn't find much info about this specific topic.
Basically what I am trying to do is extract opus audio from webm container (losslessly of course, without re-encoding). I've tried multiple methods using the tools mentioned above and performed spectrum analysis on all files afterwards.
It seems like the output files have some data lost/changed during the process, which I don't think should be the case. I'm not sure if the results are 100% accurate, though multiple programs confirm it.
How can I be sure that the extraction is successfull and the data matches exactly the original?
Here is some info and images for side-by-side comparison:
Source file: audio.webm
Size: 2.94 MB
[Attachment 49874 - Click to enlarge]
[Attachment 49875 - Click to enlarge]
Extracted file using ffmpeg: audio_extracted_ffmpeg.opus
Size: 2.90 MB
[Attachment 49876 - Click to enlarge]
[Attachment 49877 - Click to enlarge]
Extracted file using mkvextract: audio_extracted_mkvextract.opus
Size: 2.91 MB
[Attachment 49879 - Click to enlarge]
[Attachment 49880 - Click to enlarge]
Comparison between the source webm and the ffmpeg opus in Audacity:
[Attachment 49881 - Click to enlarge]
[Attachment 49882 - Click to enlarge]
This is the output of the ffmpeg extraction:
And the mkvextract:Code:ffmpeg version 4.2 Copyright (c) 2000-2019 the FFmpeg developers built with gcc 9.1.1 (GCC) 20190807 configuration: --enable-gpl --enable-version3 --enable-sdl2 --enable-fontconfig --enable-gnutls --enable-iconv --enable-libass --enable-libdav1d --enable-libbluray --enable-libfreetype --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libtheora --enable-libtwolame --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libzimg --enable-lzma --enable-zlib --enable-gmp --enable-libvidstab --enable-libvorbis --enable-libvo-amrwbenc --enable-libmysofa --enable-libspeex --enable-libxvid --enable-libaom --enable-libmfx --enable-amf --enable-ffnvcodec --enable-cuvid --enable-d3d11va --enable-nvenc --enable-nvdec --enable-dxva2 --enable-avisynth --enable-libopenmpt libavutil 56. 31.100 / 56. 31.100 libavcodec 58. 54.100 / 58. 54.100 libavformat 58. 29.100 / 58. 29.100 libavdevice 58. 8.100 / 58. 8.100 libavfilter 7. 57.100 / 7. 57.100 libswscale 5. 5.100 / 5. 5.100 libswresample 3. 5.100 / 3. 5.100 libpostproc 55. 5.100 / 55. 5.100 Input #0, matroska,webm, from 'files/audio.webm': Metadata: encoder : google/video-file Duration: 00:03:08.30, start: -0.007000, bitrate: 131 kb/s Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default) Output #0, opus, to 'files/ffmpeg/audio_extracted_ffmpeg.opus': Metadata: encoder : Lavf58.29.100 Stream #0:0(eng): Audio: opus, 48000 Hz, stereo, fltp (default) Metadata: encoder : Lavf58.29.100 Stream mapping: Stream #0:0 -> #0:0 (copy) Press [q] to stop, [?] for help size= 2975kB time=00:03:08.28 bitrate= 129.5kbits/s speed=6.57e+03x video:0kB audio:2952kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.803061%
Code:Extracting track 0 with the CodecID 'A_OPUS' to the file 'files/mkvextract/audio_extracted_mkvextract.opus'. Container format: Ogg (Opus in Ogg) Progress: 100%
+ Reply to Thread
Results 1 to 10 of 10
Converted the files into 16-bit wavs, this is the output from spek
[Attachment 49883 - Click to enlarge]
[Attachment 49884 - Click to enlarge]
[Attachment 49885 - Click to enlarge]
There is smaller difference between the source file and the one extracted via mkvextract, but very noticable difference between the ffmpeg and the rest
Not sure what is going on
How were they converted exactly ? You can get different dithering algorithms applied when converting from fltp to 16bit
Different containers can have slightly different offsets as well. Compressed audio can have different delays, and there can be differences between say mkv(webm) and ogg or something like mp4
For example, you have a -0.007 start time in the webm container according to ffmpeg . What does mkvmerge think the start time is ? or mediainfo ?
If you extracted it without the offset (zero start time), the audio would be shifted slightly when compared to it inside the container (webm converting to pcm wav directly) .
Well, in that case, I guess it might be due to misalignment? I've never had trouble with ffmpeg concatenation or transcoding. Since mkvextract gives closer output compared to the original, the start offset of ffmpeg would be the main issue. I don't know if that applies to all containers with opus codec, but is there any convenient way to fix the padding/offset issue if transcoding a batch of files becomes necessary?
This is medainfo of the source webm:
General CompleteName : C:\Users\Alexander\Desktop\Scripts\files\audio.webm Format/String : WebM Format_Version : Version 4 FileSize/String : 2.95 MiB Duration/String : 3 min 8 s OverallBitRate/String : 131 kb/s Encoded_Application/String : google/video-file Encoded_Library/String : google/video-file Audio ID/String : 1 Format/String : Opus CodecID : A_OPUS Duration/String : 3 min 8 s Channel(s)/String : 2 channels ChannelLayout : L R SamplingRate/String : 48.0 kHz BitDepth/String : 16 bits Compression_Mode/String : Lossy Language/String : English Default/String : Yes Forced/String : No
And those are the parameters used:
ffmpeg.exe -i "source.webm" -vn -acodec copy "output.opus" mkvextract.exe "source.webm" tracks 0:"output.opus"
If a start offset is partially contributing to the difference, in audacity, you can zoom way in to the start and you should be able to see the difference when comparing webm version loaded directly in audacity to extracted versions in audacity .
But there seems to be more differences than just a shift .
Maybe try the opus decoder directly from libopus or opus-tools . Maybe some ffmpeg implementation issue (audacity is using ffmpeg to decode isn' t it? )
Not sure what to do or how to handle it. Maybe wait until all the tickets sneaker referred to get resolved
Just checked the verbose info of the source file. It is affected by the ffmpeg 1ms frame delay issue. https://pastebin.com/cCh2HmV2
It's a lossy re-encoded version with ffmpeg in the first place, because it's downloaded from YouTube... I'll try libopus and hope that it works well for batch muxing. I'm open to other suggestions as well.
Last edited by Alexander24; 21st Aug 2019 at 20:06.
So, after extensive testing I came up with a bit more accurate results.
Firstly I converted the opus audio stream from the matroska container to multiple WAVs. Then using ffmpeg's MD5 hashing function I validated the audio streams of each WAV (track #0 is the only one in this case), including the original source file.
Looks like the source, 16-bit signed WAV, 32-bit float WAV and 64-bit float WAV have the same hash (same audio stream data) which is excellent.
[Attachment 49896 - Click to enlarge]
The 24-bit signed WAV and 32-bit signed WAV share the same hash themselves, but don't match the rest.
[Attachment 49897 - Click to enlarge]
So the only issue currently present could be the start offset of -0.007 which ffmpeg detects and uses for the remux process.
Is there any way to set this value to flat 0.000000 on the original (source) file without need of re-encoding, or ignore the -0.007 and straight up use 0.000000 when remuxing?
[Attachment 49898 - Click to enlarge]
I'm not quite sure how to deal with alignment, seeking, DTS/PTS timestamps and similar advanced stuff if necessary.