I've been collecting many great softwares and advices from this website. Though I could find some useful information from this thread (https://forum.videohelp.com/threads/332670-Add-Soft-Subtitles-and-Chapters-to-m4v-mp4-%...-in-Windows%29), I couldn't find an answer to the issue I have now and that makes me register and write this 1st post on here. Hopefully someone has leads to the solving of it.
So here it is:
I want to soft-code several subtitles (.srt files) in a .mp4 container. To do so, I first used Yamb/MP4Box, then My MP4Box GUI. Both of them fail to mux one specific .srt file, while the 13 others get muxed in fine.
As error notification, I get from YAMB:
..and from My MP4Box GUI:[23:34:11] : Importing SRT File...
[23:34:12] : Invalid UTF Data
[23:34:12] : Error importing c:\Users\...\Uploads\Subs\Film_01\Film_01.Czech.sr t:lang=cs: Corrupted Data in file/stream
[23:34:12] : Creation failed.
The latter gives a hint as for where the problem comes from, ie. line 7.Timed Text (SRT) import - text track 720 x 576, font Serif (size 18)
Invalid UTF data (line 7)
Error importing c:\Users\...\Uploads\Subs\Film_01\Film_01.Czech.sr t:lang=cs: Corrupted Data in file/stream
I checked that line in the .srt file and didn't find anything particular, apart from maybe a specific czech letter (). The .srt file is saved as ANSI like all other .srt files I mux. Anyway, when I leave the 7th line blank and re-try the muxing, I get the same type of error message, this time only on another line (30 something). There is this "" letter at that other line too, but after a little check, I can find it in other lines in-between. Which means that specific letter is not likely the source of the problem (well, I'm pretty sure the Czech alphabet is not the issue here, but I'm trying to understand here ^^).
Anyway, would anyone have an idea on how I could fix that?
Strangely, this .srt get muxed in fine when making a .mkv, but not here...
Plus, no matter if I convert the .srt into .ttxt, both Yamb and My MP4Box GUI won't accept this specific file. And yes, the file does bear the same name than the source file.
Thanks in advance for any help!
+ Reply to Thread
Results 1 to 12 of 12
Could you please upload a copy of your Czech SRT file to this forum?
Apparently it's not a proper UTF-8 file (missing the first three bytes of the B.O.M.),
or it's not UTF-encoded at all.
P.S.: Another possibility --- there is a bug in YAMB ?, then try using MP4Box directly instead
Hi and thanks for your fast reply,
Here's the .srt attached. I played a bit with it trying different encoding with Notepad ++, but without any results so far.
I don't think Yamb or MP4Box GUI crash in any ways, since they mux the other subtitle files nicely. I'd be ready to bet on something that goes wrong on that particular Czech .srt.
file is saved as ANSI
I'm not a regular user of Notepad ++, but when I go Encoding > Convert to UTF-8 > Save, it's still not accepted by my GUIs. Same outcome for UTF-8 without BOM.
I just read somewhere that "a three byte BOM will be added upon save", which is what El Heggunte was pointing earlier. How does one do that..?
I feel I should know more about text encoding right now...
Last edited by Fańch; 2nd Nov 2012 at 14:39.
Just open it in Notepad (no ++) and select UTF-8 in the bottom of the "Save as" window.
Had no problem mux your SRT into a MP4 container using MyMP4BoxGUI
General Complete name : F:\Autum2009.mp4-muxed.mp4 Format : MPEG-4 Format profile : Base Media Codec ID : isom File size : 751 MiB Duration : 2h 8mn Overall bit rate mode : Variable Overall bit rate : 818 Kbps Encoded date : UTC 2012-11-01 19:47:27 Tagged date : UTC 2012-11-01 19:47:27 Writing application : My MP4Box GUI 0.5.6.0 <http://my-mp4box-gui.zymichost.com> Video ID : 1 Format : AVC Format/Info : Advanced Video Codec Format profile : High@L4.0 Format settings, CABAC : Yes Format settings, ReFrames : 11 frames Codec ID : avc1 Codec ID/Info : Advanced Video Coding Duration : 1h 54mn Bit rate : 814 Kbps Maximum bit rate : 7 017 Kbps Width : 1 280 pixels Height : 544 pixels Display aspect ratio : 2.35:1 Frame rate mode : Constant Frame rate : 24.000 fps Color space : YUV Chroma subsampling : 4:2:0 Bit depth : 8 bits Scan type : Progressive Bits/(Pixel*Frame) : 0.049 Stream size : 669 MiB (89%) Writing library : x264 core 125 r2200 999b753 Encoding settings : cabac=1 / ref=12 / deblock=1:0:0 / analyse=0x3:0x133 / me=umh / subme=10 / psy=1 / psy_rd=1.00:0.00 / mixed_ref=1 / me_range=24 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-2 / threads=12 / lookahead_threads=2 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=5 / b_pyramid=1 / b_adapt=2 / b_bias=2 / direct=3 / weightb=1 / open_gop=0 / weightp=2 / keyint=240 / keyint_min=24 / scenecut=40 / intra_refresh=0 / rc_lookahead=60 / rc=2pass / mbtree=1 / bitrate=814 / ratetol=1.0 / qcomp=0.60 / qpmin=10 / qpmax=51 / qpstep=4 / cplxblur=20.0 / qblur=0.5 / ip_ratio=1.40 / aq=1:1.00 Encoded date : UTC 2012-10-08 01:20:16 Tagged date : UTC 2012-11-01 19:48:15 Audio ID : 2 Format : AAC Format/Info : Advanced Audio Codec Format profile : LC Codec ID : 40 Duration : 1h 54mn Bit rate mode : Variable Bit rate : 96.0 Kbps Maximum bit rate : 155 Kbps Channel(s) : 2 channels Channel positions : Front: L R Sampling rate : 48.0 KHz Compression mode : Lossy Stream size : 79.0 MiB (11%) Encoded date : UTC 2012-11-01 19:48:03 Tagged date : UTC 2012-11-01 19:48:15 Text ID : 3 Format : Timed text Codec ID : tx3g Duration : 2h 8mn Bit rate mode : Variable Bit rate : 70 bps Stream size : 66.0 KiB (0%) Title : Imported with GPAC 0.4.6-DEV (internal rev. 5) Encoded date : UTC 2012-11-01 19:48:15 Tagged date : UTC 2012-11-01 19:48:15
Hi there again.
I converted your SRT file to UTF-8 with Notepad, and MP4Box.exe muxed it to a testfile.MP4 without a complaint.
Also, I opened the subbed MP4 in Graphstudio, and the graph picture confirms the subs actually are inside the file.
So I still think there is a bug in the GUIs you're using, OR in the MP4Box build you have.
For what it's worth, my MP4Box.exe is:
=>mp4box -version MP4Box - GPAC version 0.4.6-DEV-git-5ca3a9a Compilation Date: Mar 10 2012 - Built by X5-452 GPAC Copyright: (c) Jean Le Feuvre 2000-2005 (c) ENST 2005-200X GPAC Configuration: --prefix=/local/libgpac_0.4.6-DEV-wipple-git-/i686-pc-mingw32 --extra-cflags ='-march=i686 -mtune=generic -mfpmath=sse -msse2 -fomit-frame-pointer -fexcess-precision=fast -f no-tree-vectorize' --enable-static-bin --strip --static-mp4box --enable-all --extra-ldflags=-Wl, --large-address-aware --use-ft=local --use-faad=local --use-mad=local --use-xvid=local --use-ffm peg=local --use-ogg=system --use-vorbis=system --use-theora=system --use-openjpeg=system --use-a 52=local Features: GPAC_DISABLE_3D
VideoBruger / El Heggunte : thanks a bunch - cause it bloody hell works
Good old Notepad, when all I had to do was to save it from there... Damn, the solution was just so obvious, I feel stupid (yet smarter^^).
For the sake of knowledge, do you guys think I should encode all 13 other subs into UTF-8 rather than ANSI now? Would there be any good reason to do so? (supportibility/playability?) Eventhough it does work like a charm already (both with YAMB and My MP4Box GUI).
Anyway, thanks again - I'd have never thought I'd come out with a solution in less than 4 hours time
Definitely save all your Czech subtitles as UFT-8 or Unicode. Very few Western languages can be safely stored in ANSI encoding --- English, Dutch, French, German, Italian, Spanish, Portuguese.
As for the freeware "Notepad replacements" , I still haven't seen one that can actually rival the payware alternatives, such as Editplus, EmEditor or UltraEdit.
Mmmh... I just tried to save some Danish-language file into UTF-8 and it goes weird for some letters, so I think I'll stick to ANSI with the others so far, as they seem to be shown correctly. Thanks for the advice anyway.
Well, the case is solved, yet I have a last question! This time about the way subs appear in media players (eg. MPC / VLC). After I muxed the subs in, they appear fine in VLC (see attachment), while they are doubled with or without plain text in MPC (see attachement). As I'm using both players and that it's the 1st time I ever see that kind of doubled list in the subtitles tab, I was wondering why it appeared so.
Anyone of you ever saw that before / know where it could come from..?
(new thread or..?)
Last edited by Fańch; 2nd Nov 2012 at 17:10.
I'm in a close situation.
This week end, I tried to convert some of my mkv into mp4.
It works fine till now...
I demux my mkv with mkvcleaver (a GUI for Mkvtoolnix).
I convert the audio to AAC with MeGui (using nero AAC codecs)
I mux all my tracks (*.h264 + *.m4a + *.srt) with Yamb (with the latest version I found of mp4box (0.5.0))
It works for a dozen movies... but, with the last one I tried, Yamb said to me the srt (2 tracks) are corrupted (same as Franch).
They are UTF-8.
And if I try to convert the srt file to ttxt with yamb, it fails...
I don't know what's wrong (problem with the srt or with mp4box?)... but maybe someone here can find...
I enclose the 2 srt files
Edit: I searched a lot today and finally found a solution...
It seems my srt files had errors (don't know if it's a problem with mkvtoolnix which screws the srt extraction from my mkv... or if the srt in my mkv was corrupted since its creation (but worked fine in the mkv...)... and moreover some softwares detect errors/crash/freeze and some others not... weird...)
I tried various software to correct that and finally found SubtitleEdit which works great (it doesn't block on errors, it doesn't screw the text (especially with french and diacritical mark)... very nice...)... it opens the "corrupted" srt and in the Tools section, there's the possibility to correct the errors
and after in Yamb, they are accepted \o/
I hope this could help
Last edited by larsirion; 5th Nov 2012 at 12:42.