Subtitle Edit 4.0.3 and 3.6.13

29th Oct 2025 06:19 #61
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Thanks for the merge python script, I'll try that.

You can also merge with SE: tools -> join subtitles

Just add the 2 subs and click "join".

Then: tools > Sort > By start time

Looking at your subs I see what happens: the missing subs come from the merge, but that has a penalty: there are parts where you have both subs generated by whisper and those merged for the same dialog.

Example starting at 47:42:507 for E02 (timestamp with my version):

One sub with 2 lines:

Code:

I always believed that I had a future in this country.

And then 2 subs with 2 lines and one line:

Code:

I always believed we had a future

Code:

in this country.

That's a bit of a mess!

About syncing subs.

Most US shows are designed to have ads. These ads are generally inserted at scene changes, when the video goes dark briefly.

2 different versions of a same show (web-dl and FOX edited to remove ads for example) will show a timing discrepancy at the scene changes.

My process is:

1) Use comskip, a tool to find ads, to detect scene changes to produce a VideoRedo .vprj file.

I use:

comskip82_010_donators\comskip.exe --threads=20 --videoredo --detectmethod=95 --verbose=0 "Bones S01-E01.mkv"

Even if the adds have already been removed, comskip usually finds where

2) Open the VideoRedo project. VideoRedo will clearly show where the ads where. It will show many false positions, but using F6 to navigate from one to the next shows the image, and when it's dark, it's likely where the adds were.

Example with an old FOX show where the adds were already removed:

[Attachment 89434 - Click to enlarge]

Use SE to adjust at the beginning using "Set start and offset the rest". Then navigate to the next scene change that you see with VideoRedo. Check before the timestamp if it's in sync (usually it is) and there after. If it's not, sync it at this location with ""Set start and offset the rest"".

That does not work with all the shows (not with Kabul for example that never had embedded ads anyway), but in my experience with many of them.

Not sure if the donator version of comskip is still available. Or if it's actually needed.

VideoRedo is no longer sold, it's hard to activate it now but I read somewhere that somebody has the rights to it now and can provide a legal way. You would need to do a search for that.

If I get more information, I'll send you a PM.
Quote
29th Oct 2025 07:12 #62
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Example starting at 47:42:507 for E02 (timestamp with my version):

One sub with 2 lines:

Code:
I always believed that I
had a future in this country.
And then 2 subs with 2 lines and one line:

Code:
I always believed
we had a future
Code:
in this country.

SE engine is the culprit. No problems with CMD>CLI

[Attachment 89439 - Click to enlarge]

Doing the transcription again.Will update you with new subs soon.

Meanwhile, read my PM

Quote
29th Oct 2025 12:18 #63
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Uploaded complete clean subs of Kabul (Mini TV Series) 2025

https://www.opensubtitles.org

Uploader: SamGer

https://www.opensubtitles.org/en/ssearch/sublanguageid-eng/idmovie-2350514

https://sub-scene.com/

https://sub-scene.com/subtitle/3363962

Quote
29th Oct 2025 12:41 #64
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Originally Posted by sam12345

Uploaded complete clean subs of Kabul (Mini TV Series) 2025

https://www.opensubtitles.org

Great, that will save me the trouble doing all the subs myself.

I actually tried to upload the Assembly ones, they were rejected because it's too obvious that they are AI generated.

Edit:

But there are still subs that overlap:

[Attachment 89446 - Click to enlarge]

Last edited by robena; 29th Oct 2025 at 12:53.

Quote
29th Oct 2025 12:44 #65
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Will update the code of assembly tommorow.

Quote

29th Oct 2025 20:34 #66

Member

Hi,

So, here is a perfect way for this Kabul series.

This uses my way to number episodes: "Kabul S01-E01.mkv"

This will likely fail if using another convention such as " "Kabul S01E01.mkv"

1) Use Faster-Whisper-XXL_r245.1_windows with this model (based on a REXX script, easy to transcribe for something else):

Code:

                                                                                                               
/* Just in case you want something else than English */
if lang = 'en' then do
    task = ' --task translate'
    prompt = ' --initial_prompt "Translate everything to English."'
end
else do
    task = ' --task transcribe'
    prompt = ' --initial_prompt "Transcribe in 'lang'."'
end
                                                                                                               
'--model large-v3' ,
task ,
' --language 'lang ,
prompt ,
' --device cuda' ,
' --compute_type float16' ,
' --batch_size 8' ,
' --vad_method pyannote_onnx_v3' ,
' --vad_device cuda' ,
' --beep_off',
' --vad_threshold 0.1' ,                 /* ULTRA LOW */
' --vad_min_speech_duration_ms 50' ,     /* 50ms = catch whispers */
' --vad_min_silence_duration_ms 100' ,   /* tighter gaps */
' --hallucination_silence_threshold 0.6' ,
' --no_speech_threshold 0.1' ,           /* catch ANY speech */
' --logprob_threshold -2.0' ,            /* keep low-conf */
' --compression_ratio_threshold 2.4' ,
' --beam_size 5' ,
' --best_of 5' ,
' --temperature 0' ,
' --repetition_penalty 1.1' ,
' --no_repeat_ngram_size 3' ,
' --condition_on_previous_text False' ,
' --word_timestamps True' ,
' --output_format all' ,                 /* JSON + SRT */
' --output_dir "'fdd(file)'"'

That outputs a file such as "Kabul S01-E01.srt" that my REXX script renames to "Kabul S01-E01-en.srt"

*** Having a filename ending with "-something" is necessary.

Then, store in the same directory: "Kabul S01-E01-F.srt"

These are the forced subs for the foreign dialogs.

*** Having '-F' is necessary.

Whisper has translated these foreign dialogs, but we want those that are already in "Kabul S01-E01-F.srt".

To get that, merge with this python script:

Code:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
merge_subtitles.py -
- -F.srt loaded FIRST
- Every forced sub PRESERVED exactly
- Whisper fills gaps OR is replaced on overlap
- No duplicates, no loss
"""
 
import sys
from pathlib import Path
from typing import List, Tuple
 
def time_to_seconds(t: str) -> float:
    h, m, s_ms = t.split(":")
    s, ms = s_ms.replace(",", ".").split(".")
    return int(h) * 3600 + int(m) * 60 + float(s) + float(ms) / 1000
 
def seconds_to_srt(sec: float) -> str:
    h = int(sec // 3600)
    m = int((sec % 3600) // 60)
    s = sec % 60
    return f"{h:02}:{m:02}:{s:06.3f}".replace(".", ",")[:12]
 
def parse_srt_robust(content: str, filename: str) -> List[Tuple[float, float, List[str], str]]:
    entries = []
    lines = content.splitlines()
    i = 0
    while i < len(lines):
        if lines[i].strip().isdigit():
            i += 1
            if i >= len(lines): break
            time_line = lines[i].strip()
            if "-->" not in time_line:
                i += 1
                continue
            try:
                start_str, end_str = time_line.split("-->", 1)
                start = time_to_seconds(start_str.strip())
                end = time_to_seconds(end_str.strip())
            except:
                i += 1
                continue
            i += 1
            text_lines = []
            while i < len(lines) and lines[i].strip() and not lines[i].strip().isdigit():
                text_lines.append(lines[i].strip())
                i += 1
            if text_lines:
                entries.append((start, end, text_lines, filename))
        else:
            i += 1
    return entries
 
def is_forced(filename: str) -> bool:
    return any(k in filename.lower() for k in ("-f.", "-forced", ".f.", "forced"))
 
def main(mkv_path: str) -> None:
    mkv = Path(mkv_path)
    if not mkv.exists():
        print(f"[ERROR] File not found: {mkv}")
        sys.exit(1)
 
    folder = mkv.parent
    base_name = mkv.stem
    output_srt = folder / f"{base_name}.srt"
 
    if output_srt.exists():
        print("Skipping merge, file exists")
        return
 
    srt_files = list(folder.glob(f"{base_name}*.srt"))
    if not srt_files:
        print(f"[INFO] No SRT files")
        return
 
    forced_file = next((f for f in srt_files if is_forced(f.name)), None)
    whisper_file = next((f for f in srt_files if not is_forced(f.name)), None)
 
    if not forced_file:
        print("[ERROR] No -F.srt found!")
        return
 
    print(f"[INFO] Forced: {forced_file.name}")
    print(f"[INFO] Whisper: {whisper_file.name if whisper_file else 'None'}")
 
    # Parse forced
    try:
        forced_text = forced_file.read_text(encoding="utf-8", errors="replace")
        forced_subs = parse_srt_robust(forced_text, forced_file.name)
        print(f"    ? {len(forced_subs)} forced lines")
    except Exception as e:
        print(f"[ERROR] Failed to read forced: {e}")
        return
 
    # Parse whisper
    whisper_subs = []
    if whisper_file:
        try:
            whisper_text = whisper_file.read_text(encoding="utf-8", errors="replace")
            whisper_subs = parse_srt_robust(whisper_text, whisper_file.name)
            print(f"    ? {len(whisper_subs)} whisper lines")
        except Exception as e:
            print(f"[WARNING] Whisper failed: {e}")
 
    forced_subs.sort(key=lambda x: x[0])
    whisper_subs.sort(key=lambda x: x[0])
 
    final_subs = []
    w_idx = 0
    W = len(whisper_subs)
 
    for f_start, f_end, f_lines, _ in forced_subs:
        # Add all Whisper subs that END before this forced sub starts
        while w_idx < W:
            w_start, w_end, _, _ = whisper_subs[w_idx]
            if w_end <= f_start:  # No overlap
                final_subs.append(whisper_subs[w_idx][:3])
                w_idx += 1
            else:
                break
 
        # Now: skip all Whisper subs that overlap this forced sub
        while w_idx < W:
            w_start, w_end, _, _ = whisper_subs[w_idx]
            if w_start < f_end:  # Overlaps
                w_idx += 1
            else:
                break
 
        # Add forced sub
        final_subs.append((f_start, f_end, f_lines))
 
    # Add remaining non-overlapping whisper subs
    while w_idx < W:
        final_subs.append(whisper_subs[w_idx][:3])
        w_idx += 1
 
    # Write output
    with open(output_srt, "w", encoding="utf-8") as f:
        for idx, (start, end, lines) in enumerate(final_subs, 1):
            f.write(f"{idx}\n")
            f.write(f"{seconds_to_srt(start)} --> {seconds_to_srt(end)}\n")
            for line in lines:
                f.write(f"{line}\n")
            f.write("\n")
 
    print(f"\n[OK] Merged {len(final_subs)} blocks ? {output_srt.name}")
    print(f"    ? {len(forced_subs)} forced subs preserved (100%)")
 
if __name__ == "__main__":
    if len(sys.argv) != 2:
        print("Usage: merge_subtitles.py <mkv_path>")
        sys.exit(1)
    main(sys.argv[1])

that will produce "Kabul S01-E01.srt"

It will contain:

- English dialogs transcribed in English
- Foreign dialogs that are already in the '-F' forced subs "as is".
- Foreign dialogs missing in the '-F' forced subs translated by whisper.

I did all that with a mix of ChatGPT, Grok and Deepseek.

Here are all the episode batch processed:

https://limewire.com/d/9JGEX#qH5IhI1Y1x

Keep in mind that my SE settings are different than yours, so formating might not be 100% to your liking.

Also, it seems that you don't have exactly the same version, you will likely need to sync the start of the subs.

Last edited by robena; 29th Oct 2025 at 20:47.

Quote

29th Oct 2025 21:07 #67
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
No.The lines are too big to read. Update the code with

--max_line_count 2 ^
--max_line_width 36 ^

Attached Thumbnails

Quote
29th Oct 2025 21:49 #68
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Not too many lines like that, but I'll try and compare the results, that's easy!

The way I did it with a REXX script, it's just a right click to do everything.

Quote
29th Oct 2025 22:25 #69
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Updated Assembly Code [Assembly.c]
Perfect Two lines break. No need for SE batch for touchup.

Attached Files

Assembly.c.txt (20.7 KB, 4 views)
Quote
30th Oct 2025 02:01 #70
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
I did not test Assembly yet, but for whisper, I am surprised to see the difference it makes between --max_line_width 36 and --max_line_width 40, it's not only 4 characters.

I asked why, and got:

--max_line_width does not mean “no line may be longer than N characters”.
It tells Faster-Whisper the target width that the line-breaker tries to stay under while it is splitting a segment into subtitle lines.
Because the breaker also respects sentence boundaries, words that are already > N, and minimum-line-length rules, you will still see lines that are much longer than the value you passed – especially when you raise it from 36 to 40.

Thanks for pointing it out, I would never had thought by myself that it would make the subs that better.

Here they are:

https://limewire.com/d/ZpjSg#tNvqHhY3Gl

Whisper gives much more natural looking subs than Assembly. opensubtitles.org rejects Assembly ones.

Edit: I'll reserve judgment until I test your version!

Edit edit: I get "Failed to upload file." with your version. Firewall was open. Don't waste time on it for me, I'll use whisper from now on.

Edit edit edit: stupid, I forgot to update the API key!!!

Last edited by robena; 30th Oct 2025 at 03:50.

Quote
30th Oct 2025 02:45 #71
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Sam,

I remux my subs using a routine that detects the aspect ratio and creates PGS files that are located inside the active video region, just a few pixels above the black bar:

[Attachment 89467 - Click to enlarge]

Interested?

Quote
30th Oct 2025 03:09 #72
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Ofcource YES.

The way I did it with a REXX script, it's just a right click to do everything.

Kindly PM your code for Faster-Whisper-XXL that you used

Quote
30th Oct 2025 05:49 #73
sam12345

View Profile

View Forum Posts

Private Message
Member

Join Date
Oct 2025
Tehran [Season 03] KAN (Color-Yellow) ENG-NON Hi

subs uploaded > opensubtitles.org [uploader-SamGer]

Last edited by sam12345; 30th Oct 2025 at 06:48.

Quote
30th Oct 2025 10:45 #74
robena

View Profile

View Forum Posts

Private Message
Member

Join Date
May 2008

Location
France
Originally Posted by sam12345

Tehran [Season 03] KAN (Color-Yellow) ENG-NON Hi

subs uploaded > opensubtitles.org [uploader-SamGer]

Waiting for the UHD version...

Quote

Subtitle Edit 4.0.3 and 3.6.13

Thread Tools

Search Thread

Similar Threads

Subtitle Edit - delete video and subtitle file after processing?

Subtitle edit - How to put 'A with a dash on top' in subtitle edit?

Subtitle Edit : Capitalize Subtitle to Normal Subtitle incomplete

Subtitle Edit - Shortcut to set a subtitle minimum gap

Subtitle edit, warning subtitle contains negative timing codes fix please