Merging video frames back into a video with subtitle, messes up audio

22nd Jun 2023 08:24 #1

Member

I broke down a h265 video into frames and try to merge them back together with its original audio, subtitle and fonts. The resulting video has an issue where when I try to seek at some points in the video, the audio stops playing. And even when I don't seek, at a point the video hangs and the audio keeps playing. Seconds later, the video resumes, but now, video and audio are out of sync. This doesn't happen with the original video.

The reason I'm breaking the video into frames and merging them is because I want to upscale each frame. But I'm going to leave that part out because this issue occurs with the original unscaled frames.

Here's the details of the original video. Notice it has video, audio and two font streams.

Code:

.\ffmpeg.exe -i "input.mkv"

Input #0, matroska,webm, from 'input.mkv':
  Metadata:
    encoder         : libebml v1.3.10 + libmatroska v1.5.2
    creation_time   : 2021-01-07T00:20:19.000000Z
  Duration: 00:23:02.05, start: 0.000000, bitrate: 320 kb/s
    Stream #0:0: Video: hevc (Main), yuv420p(tv), 1280x720, SAR 1:1 DAR 16:9, 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc (default)
    Metadata:
      BPS-eng         : 278671
      DURATION-eng    : 00:23:02.006000000
      NUMBER_OF_FRAMES-eng: 33135
      NUMBER_OF_BYTES-eng: 48140731
      _STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
      _STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
    Stream #0:1(jpn): Audio: aac (HE-AAC), 48000 Hz, stereo, fltp
    Metadata:
      BPS-eng         : 36166
      DURATION-eng    : 00:23:02.016000000
      NUMBER_OF_FRAMES-eng: 32391
      NUMBER_OF_BYTES-eng: 6247833
      _STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
      _STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
    Stream #0:2(eng): Subtitle: ass (default)
    Metadata:
      BPS-eng         : 76
      DURATION-eng    : 00:21:20.790000000
      NUMBER_OF_FRAMES-eng: 246
      NUMBER_OF_BYTES-eng: 12264
      _STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
      _STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
    Stream #0:3: Attachment: ttf
    Metadata:
      filename        : Roboto-Medium.ttf
      mimetype        : application/x-truetype-font
    Stream #0:4: Attachment: ttf
    Metadata:
      filename        : Roboto-MediumItalic.ttf
      mimetype        : application/x-truetype-font

Here's how I break it into frames

Code:

.\ffmpeg.exe -i "input.mkv" -qscale:v 1 -qmin 1 -qmax 1 -vsync 0 "InputFolder/frame%08d.png"

Here's how I merge the frames back to video with all the original streams except the video

Code:

.\ffmpeg.exe -r 23.98 -i "InputFolder\frame%08d.png" -i "input.mkv" -map 0:v:0 -map 1 -map -1:v -c:a copy -c:v libx265 -r 23.98 -pix_fmt yuv420p "output.mkv"

Here's the details of the resulting video:

Code:

.\ffmpeg.exe -i "output.mkv"

Input #0, matroska,webm, from 'output.mkv':
  Metadata:
    ENCODER         : Lavf58.45.100
  Duration: 00:23:02.05, start: 0.000000, bitrate: 245 kb/s
    Stream #0:0: Video: hevc (Main), yuv420p(tv), 1280x720 [SAR 1:1 DAR 16:9], 23.98 fps, 23.98 tbr, 1k tbn, 23.98 tbc (default)
    Metadata:
      ENCODER         : Lavc58.91.100 libx265
      DURATION        : 00:23:01.777000000
    Stream #0:1(jpn): Audio: aac (HE-AAC), 48000 Hz, stereo, fltp (default)
    Metadata:
      BPS-eng         : 36166
      DURATION-eng    : 00:23:02.016000000
      NUMBER_OF_FRAMES-eng: 32391
      NUMBER_OF_BYTES-eng: 6247833
      _STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
      _STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      DURATION        : 00:23:02.046000000
    Stream #0:2(eng): Subtitle: ass (default)
    Metadata:
      BPS-eng         : 76
      DURATION-eng    : 00:21:20.790000000
      NUMBER_OF_FRAMES-eng: 246
      NUMBER_OF_BYTES-eng: 12264
      _STATISTICS_WRITING_APP-eng: mkvmerge v46.0.0 ('No Deeper Escape') 64-bit
      _STATISTICS_WRITING_DATE_UTC-eng: 2021-01-07 00:20:19
      _STATISTICS_TAGS-eng: BPS DURATION NUMBER_OF_FRAMES NUMBER_OF_BYTES
      ENCODER         : Lavc58.91.100 ssa
      DURATION        : 00:21:21.580000000
    Stream #0:3: Attachment: ttf
    Metadata:
      filename        : Roboto-Medium.ttf
      mimetype        : application/x-truetype-font
    Stream #0:4: Attachment: ttf
    Metadata:
      filename        : Roboto-MediumItalic.ttf
      mimetype        : application/x-truetype-font

One thing to note is that I've done this successfully numerous times with h264 videos. No audio issues. Another thing to note which might be more relevant is that when I merge the frames with only the original audio stream (as opposed to all original streams except video), the audio issue does not occur. Also, when I merge the frames with the original audio AND subtitle stream, i.e without the fonts, the issue remains.

Code:

.\ffmpeg.exe -r 23.98 -i "InputFolder\frame%08d.png" -i "input.mkv" -map 0:v:0 -map 1:a:0 -c:a copy -c:v libx265 -r 23.98 -pix_fmt yuv420p "output.mkv"

Produces no audio issues. But this isn't good for me because I want the subtitles and fonts from the original video.

If anyone needs me to upload the original video somewhere so they can reproduce it, let me know.

Quote

22nd Jun 2023 11:56 #2
poisondeathray

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2007

Location
Canada
Does the issue occur using mkvmerge instead ?

Quote
22nd Jun 2023 12:03 #3
ProWo

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2019
Instead of -r 23.98 you should use -r 24000/1001

Quote
22nd Jun 2023 13:15 #4
PeteJobi

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2023
Originally Posted by poisondeathray

Does the issue occur using mkvmerge instead ?

I've installed it. How do I feed it the frames?

Quote
22nd Jun 2023 13:26 #5
PeteJobi

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2023
Originally Posted by ProWo

Instead of -r 23.98 you should use -r 24000/1001

Tried that just now. Same issue.

Quote
22nd Jun 2023 18:02 #6
poisondeathray

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2007

Location
Canada
Originally Posted by PeteJobi

Originally Posted by poisondeathray

Does the issue occur using mkvmerge instead ?

I've installed it. How do I feed it the frames?

Probably easier to use mkvtoolnix-gui (the GUI) at first. There is an option to the show command line, you can also look at the documentation and examples included there

Add original file, uncheckmark original video stream, add replacement stream, push start multiplexing. The replacement stream can be the same one you made from ffmpeg earlier, you're just using mkvmerge to do the multiplexing step . It's generally more reliable than ffmpeg

For encoding the replacement stream, I would pay more attention to your encoding settings and parameters. You are using default libx265 settings, and the quality might not be ideal. Also -pix_fmt yuv420p will use swscale and Rec601 to convert RGB to YUV . Assuming your png images were done correctly upstream, this means the output YUV video will have "SD" colors . Maybe you were upscaling to "SD" , but I think it's unlikely. Normally Rec709 would be used for HD

Quote
22nd Jun 2023 19:19 #7
PeteJobi

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2023
Probably easier to use mkvtoolnix-gui (the GUI) at first. There is an option to the show command line, you can also look at the documentation and examples included there

Add original file, uncheckmark original video stream, add replacement stream, push start multiplexing. The replacement stream can be the same one you made from ffmpeg earlier, you're just using mkvmerge to do the multiplexing step . It's generally more reliable than ffmpeg

Truth is I'm not using ffmpeg manually. I wrote myself a little program that runs the commands for me. All I need ffmpeg for is to split video into frames, and merge frames back into video along with audio and subtitles and whatever the original video contained. If there's something else that can do that for me, and work for MP4 as well as MKV, and has a CLI I can use in my code, I'll gladly replace ffmpeg with it (if it's not larger than ffmpeg).

For encoding the replacement stream, I would pay more attention to your encoding settings and parameters. You are using default libx265 settings, and the quality might not be ideal. Also -pix_fmt yuv420p will use swscale and Rec601 to convert RGB to YUV . Assuming your png images were done correctly upstream, this means the output YUV video will have "SD" colors . Maybe you were upscaling to "SD" , but I think it's unlikely. Normally Rec709 would be used for HD

I also think it's something with the parameters. I've tried libx264 as well and gotten the same result. I'm not an ffmpeg expert (I only know enough to serve my purpose), so I'll take any advice and pointers you have. Though it sounds like the parameters you mentioned affect image quality? What do you suggest I use for -pix_fmt?

Quote
23rd Jun 2023 08:40 #8
poisondeathray

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2007

Location
Canada
Your problem is likely from ffmpeg muxing. Notice the original file was made with mkvmerge.

Try 1 video first, then show the CLI command to adapt it to your program. It cannot output MP4, but MP4 cannot hold all types of streams such as your ass sub streams. mkvmerge can accept MP4 input or elementary streams. For proper MP4 muxing output, use mp4box. These are all commandline programs. I'm just recommending that you use the GUI first, on 1 quick test, to see if it works, so you don't waste your time. No use learning the 50 ipages of documentation, if it doesn't work and it's not the issue. If it works, you have your answer

It's not an issue with encoding. Because when you mux original audio with new video only ( but no other streams) it works ok. But you can improve seeking granularity by reducing max keyframe interval, but the default is 250, and for hundreds of millions of normal videos, this works fine. To reduce the keyframe interval, use -g . For example -g 24 for 1sec interval. An original "24p" BD would have this value. But is not your problem. The problem is ffmpeg muxing

There is a separate point about quality. If you're going to upscale IMO, you might as well do it correctly. Use a lower -crf value for higher bitrates (less quality loss) .

For -pix_fmt, swscale is ok, but you need to specify the 709 matrix for the RGB to YUV conversion . If you don't, the colors will be slightly shifted in most players. There will be a Rec601 vs.709 mismatch. By convention, HD material uses 709. Instead of -pix_fmt, use

Code:

-vf scale=out_color_matrix=bt709,format=yuv420p
Quote
23rd Jun 2023 09:04 #9
Selur

View Profile

View Forum Posts

Private Message

Visit Homepage
Member

Join Date
Jun 2011

Location
Germany
Are you sure the frame count of the source and the reencode are the same?
Extract the time codes to see whether the source if vfr. (or the audio is stretched)

users currently on my ignore list: deadrats, Stears555, marcorocchini

Quote
23rd Jun 2023 09:36 #10
poisondeathray

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2007

Location
Canada
Originally Posted by Selur

Are you sure the frame count of the source and the reencode are the same?
Extract the time codes to see whether the source if vfr. (or the audio is stretched)

That shouldn't be an issue, since replacement + original audio (but no subs / other streams) works ok .

Originally Posted by PeteJobi

. Another thing to note which might be more relevant is that when I merge the frames with only the original audio stream (as opposed to all original streams except video), the audio issue does not occur. Also, when I merge the frames with the original audio AND subtitle stream, i.e without the fonts, the issue remains.

So it suggests an issue with muxing subs with ffmpeg, or re-writing timestamps with ffmpeg

The thing about mkvmerge is it preserves the original timestamps (not just video timestamps, other streams as well) - so if you just replace just the video stream, everything should be the same except for the replaced video. I'd be very surprised if it didn't work. Usually the culprit is ffmpeg

Quote
24th Jun 2023 13:54 #11
PeteJobi

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2023
Originally Posted by poisondeathray

Your problem is likely from ffmpeg muxing. Notice the original file was made with mkvmerge.

Try 1 video first, then show the CLI command to adapt it to your program. It cannot output MP4, but MP4 cannot hold all types of streams such as your ass sub streams. mkvmerge can accept MP4 input or elementary streams. For proper MP4 muxing output, use mp4box. These are all commandline programs. I'm just recommending that you use the GUI first, on 1 quick test, to see if it works, so you don't waste your time. No use learning the 50 ipages of documentation, if it doesn't work and it's not the issue. If it works, you have your answer

It's not an issue with encoding. Because when you mux original audio with new video only ( but no other streams) it works ok. But you can improve seeking granularity by reducing max keyframe interval, but the default is 250, and for hundreds of millions of normal videos, this works fine. To reduce the keyframe interval, use -g . For example -g 24 for 1sec interval. An original "24p" BD would have this value. But is not your problem. The problem is ffmpeg muxing

There is a separate point about quality. If you're going to upscale IMO, you might as well do it correctly. Use a lower -crf value for higher bitrates (less quality loss) .

For -pix_fmt, swscale is ok, but you need to specify the 709 matrix for the RGB to YUV conversion . If you don't, the colors will be slightly shifted in most players. There will be a Rec601 vs.709 mismatch. By convention, HD material uses 709. Instead of -pix_fmt, use

Code:

-vf scale=out_color_matrix=bt709,format=yuv420p

Thank you for your suggestions on the ffmpeg parameters. I will tinker with those. I'll also look into muxing.
I want to try out mkvmerge (which is MKVToolNix, right), but I still don't see a way to break a video into frames. I need the actual frames so I can upscale them with a different program. I see some options under "splitting", but doesn't seem like this is what I need.
Quote
24th Jun 2023 14:09 #12
PeteJobi

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2023
I was able to input the original video and the upscaled video (the one with issues) into the program. I selected the video stream of the upscaled one, and everything else from the original, and the result is without issues i.e an upscaled video without the audio issues. Is this what you wanted me to try?

I also noticed something that might be relevant.

[Attachment 71981 - Click to enlarge]

The upscaled video (produced by merging frames with ffmpeg) has more files than the original.

Last edited by PeteJobi; 24th Jun 2023 at 14:17. Reason: More info

Quote
25th Jun 2023 09:33 #13
PeteJobi

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2023
Update: I tried using mkvmerge to remove the extra files I mentioned earlier from the upscaled video. And it worked wonderfully without audio issues.

[Attachment 72052 - Click to enlarge]

So if I can get ffmpeg to not generate those "tags" thing, my problem should be solved.

Quote
25th Jun 2023 10:04 #14
poisondeathray

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2007

Location
Canada
Originally Posted by PeteJobi

So if I can get ffmpeg to not generate those "tags" thing, my problem should be solved.

In theory - but mkvmerge is doing more than removing tags - it's remuxing the streams

You can try mkvpropedit to remove tags to test your theory . If it works, then that validates your theory, you just have to figure out how to do it in ffmpeg next. mkvpropedit does inplace editing - no remuxing (so very fast)

Code:

"mkvpropedit" "input.mkv" --tags all:
Quote
25th Jun 2023 12:34 #15
PeteJobi

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2023
In theory - but mkvmerge is doing more than removing tags - it's remuxing the streams

Yeah, I see that now. I just tried what I did before, but without unchecking the tags or anything else. The resulting video had no issues. So I guess the tags are not the problem.

If I don't find a ffmpeg solution, I guess what I can do is use ffmpeg to break the video to frames, and to merge the frames back to video without any stream at all, then use mkvmerge to merge that video and the streams from the original video.

This adds an extra step to the process, but if there's no other way.....

How do I get mkvmerge cli?

Quote
25th Jun 2023 12:39 #16
poisondeathray

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2007

Location
Canada
Originally Posted by PeteJobi

This adds an extra step to the process, but if there's no other way.....

There are various issues with ffmpeg mkv muxer, and mp4 muxer. Most GUI's use the actual commandline tools mkvmerge and mp4box , for the multiplexing stage

How do I get mkvmerge cli?

Show the commandline that you used in the GUI by using multiplexer => show commandline

Learn from the example, and have a look at the nice documentation and adapt it to your program

Quote
25th Jun 2023 13:41 #17
PeteJobi

View Profile

View Forum Posts

Private Message
Member

Join Date
Jun 2023
Originally Posted by poisondeathray

Originally Posted by PeteJobi

This adds an extra step to the process, but if there's no other way.....

There are various issues with ffmpeg mkv muxer, and mp4 muxer. Most GUI's use the actual commandline tools mkvmerge and mp4box , for the multiplexing stage

How do I get mkvmerge cli?

Show the commandline that you used in the GUI by using multiplexer => show commandline

Learn from the example, and have a look at the nice documentation and adapt it to your program

Thanks. Will do.

I figured I could get rid of ffmpeg completely if I could find another cli program that could split videos into frames and back. I'm trying to minimize file size of dependencies, and ffmpeg is too large for what I use it for. Do you know a smaller software that does this? (or do you think it's a better idea to stick with ffmpeg for that?)

Quote
25th Jun 2023 14:07 #18
poisondeathray

View Profile

View Forum Posts

Private Message
Member

Join Date
Sep 2007

Location
Canada
I don't have any good ideas for a replacement that makes it simpler with smaller binaries, simpler dependencies. I would stick with ffmpeg and mkvmerge

When looking for replacement options, you would still need to convert YUV video to RGB to PNG images , run it through the GAN or whatever processing to upscale, PNG to YUV video . You still need an encoder too and muxer too .

There are implementations that don't need PNG images - , eg. you can run RealESRGAN (or other machine learning algorithms) in vapoursynth , or some though avisynth, but they take up filespace too, either through install or "portable"

If you compile it yourself - you can disable many of the features to make a ffmpeg binary smaller. The precompiled ones that you download usually have many libraries included that are never used, but balloon up the filesize . You can strip out all the things you don't need. A commonly used one for ffmpeg MABS , the ffmpeg autobuild suite

ffmpeg is nice in that it bundles many encoders (libx265, libx264, etc...), demuxers, muxers . It's like a "swiss army knife" but it still has issues

Quote

Merging video frames back into a video with subtitle, messes up audio

Thread Tools

Search Thread

Similar Threads

Emerge video and audio and subtitle together

Synchronize audio and video in Subtitle Edit and save

Merging video & audio files

Merging url video and url audio

Comparing similar video files to find corrupted frames / audio glitches