VideoHelp Forum
+ Reply to Thread
Results 1 to 14 of 14
Thread
  1. Hi,

    I extracted CC from a video file, but the beginning of every subtitles is the same text as the last one:

    Code:
    12
    00:00:18,085 --> 00:00:20,187
    un jeu terriblement intrigant,
    délicieusement tordu
    et dangereusement addictif,
    
    13
    00:00:20,187 --> 00:00:21,922
    délicieusement tordu
    et dangereusement addictif,
    un jeu à grand déploiement
    
    14
    00:00:21,922 --> 00:00:23,724
    et dangereusement addictif,
    un jeu à grand déploiement
    qui a pris le monde entier
    
    15
    00:00:23,724 --> 00:00:25,526
    un jeu à grand déploiement
    qui a pris le monde entier
    par surprise, et qui débarque
    
    16
    00:00:25,526 --> 00:00:27,327
    qui a pris le monde entier
    par surprise, et qui débarque
    maintenant ici, chez nous.
    
    17
    00:00:27,327 --> 00:00:29,129
    \hpar surprise, et qui débarque
    \hmaintenant ici, chez nous.
    Bienvenue dans mon manoir.
    
    18
    00:00:29,129 --> 00:00:30,163
    \hmaintenant ici, chez nous.
    Bienvenue dans mon manoir.
    - Le jeu, il est
    So is it possible to"clean" the subtitles? "Merge lines with same text" in Subtitle Edit doesn't remove it all.
    Quote Quote  
  2. Feels Good Man 2nHxWW6GkN1l916N3ayz8HQoi's Avatar
    Join Date
    Jan 2024
    Location
    Pepe Island
    Search Comp PM
    One way to do it is using a custom script. From the sample you posted, could you write what a desired outcome would look like?
    --[----->+<]>.++++++++++++.---.--------.
    [*drm mass downloader: widefrog*]~~~[*how to make your own mass downloader: guide*]
    Quote Quote  
  3. I think deleting the first two sentences of each line would do. For example:
    Code:
    12
    00:00:18,085 --> 00:00:20,187
    un jeu terriblement intrigant,
    délicieusement tordu
    et dangereusement addictif,
    
    13
    00:00:20,187 --> 00:00:21,922
    un jeu à grand déploiement
    
    14
    00:00:21,922 --> 00:00:23,724
    qui a pris le monde entier
    
    15
    00:00:23,724 --> 00:00:25,526
    par surprise, et qui débarque
    
    16
    00:00:25,526 --> 00:00:27,327
    maintenant ici, chez nous.
    
    17
    00:00:27,327 --> 00:00:29,129
    Bienvenue dans mon manoir.
    
    18
    00:00:29,129 --> 00:00:30,163
    - Le jeu, il est
    Quote Quote  
  4. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    Use SubtitleEdit to replace the repeated sentence (assuming they are not relevant)
    Edit > Replace >Find What (paste: un jeu terriblement intrigant,) > Replace with: (Leave Blank)
    Then go either Replace or
    Replace All

    Repeat if you have other repeated irrelevant sentences
    Quote Quote  
  5. The issue is that ALL sentences are repeated in the next line.
    Quote Quote  
  6. Feels Good Man 2nHxWW6GkN1l916N3ayz8HQoi's Avatar
    Join Date
    Jan 2024
    Location
    Pepe Island
    Search Comp PM
    I almost wrote a solution but I don't understand what's this
    Code:
    16
    00:00:25,526 --> 00:00:27,327
    qui a pris le monde entier
    par surprise, et qui débarque
    maintenant ici, chez nous.
    
    17
    00:00:27,327 --> 00:00:29,129
    \hpar surprise, et qui débarque
    \hmaintenant ici, chez nous.
    Bienvenue dans mon manoir.
    \h ???

    Edit: No idea how that got there but I just removed it. The script is

    Code:
    import re
    
    import pysubs2
    
    
    def remove_intersection(list1, list2):
        if len(list1) == 0 or len(list2) == 0:
            return list1, list2
    
        l1 = list(reversed(list1))
        l2 =[list2[0:i2] for i2 in range(1, len(list2) + 1)]
    
        max_len = 0
        for i1 in range(1, len(l1) + 1):
            l0 = list(reversed(l1[0:i1]))
    
            if l0 in l2:
                max_len = len(l0)
    
        if max_len == 0:
            return list1, list2
        return list1, list2[max_len:]
    
    
    if __name__ == '__main__':
        subs = pysubs2.load("input.srt", encoding="utf-8")
        new_subs = [subs[0]]
    
        for index in range(0, len(subs) - 1):
            s1 = subs[index]
            d1 = re.split("\\\\N", s1.text, flags=re.IGNORECASE)
            d1 = [re.sub("\\\\[A-Z]", "", dialogue, flags=re.IGNORECASE) for dialogue in d1]
            s2 = subs[index + 1]
            d2 = re.split("\\\\N", s2.text, flags=re.IGNORECASE)
            d2 = [re.sub("\\\\[A-Z]", "", dialogue, flags=re.IGNORECASE) for dialogue in d2]
    
            lst1, lst2 = remove_intersection(d1, d2)
            new_sub = subs[index + 1].copy()
            new_sub.text = "\\N".join(lst2)
            new_subs.append(new_sub)
    
        for index in range(0, len(subs)):
            subs[index] = new_subs[index]
        subs.save("output.srt")
    Content of input.srt
    Code:
    12
    00:00:18,085 --> 00:00:20,187
    un jeu terriblement intrigant,
    délicieusement tordu
    et dangereusement addictif,
    
    13
    00:00:20,187 --> 00:00:21,922
    délicieusement tordu
    et dangereusement addictif,
    un jeu à grand déploiement
    
    14
    00:00:21,922 --> 00:00:23,724
    et dangereusement addictif,
    un jeu à grand déploiement
    qui a pris le monde entier
    
    15
    00:00:23,724 --> 00:00:25,526
    un jeu à grand déploiement
    qui a pris le monde entier
    par surprise, et qui débarque
    
    16
    00:00:25,526 --> 00:00:27,327
    qui a pris le monde entier
    par surprise, et qui débarque
    maintenant ici, chez nous.
    
    17
    00:00:27,327 --> 00:00:29,129
    \hpar surprise, et qui débarque
    \hmaintenant ici, chez nous.
    Bienvenue dans mon manoir.
    
    18
    00:00:29,129 --> 00:00:30,163
    \hmaintenant ici, chez nous.
    Bienvenue dans mon manoir.
    - Le jeu, il est
    Content of output.srt
    Code:
    1
    00:00:18,085 --> 00:00:20,187
    un jeu terriblement intrigant,
    délicieusement tordu
    et dangereusement addictif,
    
    2
    00:00:20,187 --> 00:00:21,922
    un jeu à grand déploiement
    
    3
    00:00:21,922 --> 00:00:23,724
    qui a pris le monde entier
    
    4
    00:00:23,724 --> 00:00:25,526
    par surprise, et qui débarque
    
    5
    00:00:25,526 --> 00:00:27,327
    maintenant ici, chez nous.
    
    6
    00:00:27,327 --> 00:00:29,129
    Bienvenue dans mon manoir.
    
    7
    00:00:29,129 --> 00:00:30,163
    - Le jeu, il est
    Install python
    https://www.python.org/downloads/
    and pysubs
    Code:
    pip install pysubs2
    Put the input.srt file with your subtitles in it and run the script near that file with
    Code:
    python script.py
    Last edited by 2nHxWW6GkN1l916N3ayz8HQoi; 23rd Aug 2024 at 07:24.
    --[----->+<]>.++++++++++++.---.--------.
    [*drm mass downloader: widefrog*]~~~[*how to make your own mass downloader: guide*]
    Quote Quote  
  7. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    Originally Posted by Hakunamatata67 View Post
    The issue is that ALL sentences are repeated in the next line.
    Yeah I realized that.
    Can you tell us which application did you use to extract the subtitles?
    Perhaps a different application would have given you a better solution.
    Try to load the video file using Subtitles Edit. If it finds subtitles it will OCR it or display them all immediately.
    Quote Quote  
  8. Originally Posted by 2nHxWW6GkN1l916N3ayz8HQoi View Post
    I almost wrote a solution but I don't understand what's this]

    Wow thank you so much, I'll try it
    Quote Quote  
  9. Originally Posted by Subtitles View Post
    Originally Posted by Hakunamatata67 View Post
    The issue is that ALL sentences are repeated in the next line.
    Yeah I realized that.
    Can you tell us which application did you use to extract the subtitles?
    Perhaps a different application would have given you a better solution.
    Try to load the video file using Subtitles Edit. If it finds subtitles it will OCR it or display them all immediately.
    The CC subtitles are extracted from a MKV file as in this topic. I had to use clever FFmpeg-GUI as gMKVExtractGui didn't work.
    Quote Quote  
  10. Member
    Join Date
    Apr 2007
    Location
    Australia
    Search Comp PM
    The software hasn't been updated for a few years.
    I find it still works, most times, for me.
    https://www.videohelp.com/software/ccextractor
    This is the command line I use.
    ccext.cmd
    Code:
    "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv -o "my video.srt"
    Cheers.
    Quote Quote  
  11. Member
    Join Date
    Aug 2006
    Location
    United States
    Search Comp PM
    i think the video could have "rollup style" closed captions. Although I have no experience using the command line version of CCExtractor, the command line version has some settings that deal with rollup style captions.
    Ignore list: hello_hello, tried, TechLord, Snoopy329
    Quote Quote  
  12. Member
    Join Date
    Apr 2007
    Location
    Australia
    Search Comp PM
    Hakunamatata67
    You can try this collection and see what you get. The first attempt makes for a much LARGER file.

    ccext2.cmd
    Code:
    "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --dru -o "my video-direct_rollup.srt"
    "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --norollup -o "my video-norollup.srt"
    "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --ru1 -o "my video-ru1.srt"
    "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --ru2 -o "my video-ru2.srt"
    "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --ru3 -o "my video-ru3.srt"
    Cheers.
    Last edited by pcspeak; 23rd Aug 2024 at 21:53. Reason: remove spaces; address the post to the right person. :-)
    Quote Quote  
  13. Member
    Join Date
    Mar 2021
    Location
    Israel
    Search Comp PM
    Have you tried to load the video file using Subtitle Edit? If it finds subtitles it will OCR it or display them all immediately.
    File > Open > Select the file
    You can also try to use Subtitle Edit to create the subtitles from scratch using Subtitle Edit
    video > Audio to text (Whisper)
    If the audio is of good quality you can expect good results. Any errors can be corrected using the file that you already have.
    Quote Quote  
  14. Originally Posted by pcspeak View Post
    Hakunamatata67
    You can try this collection and see what you get. The first attempt makes for a much LARGER file.

    ccext2.cmd
    Code:
    "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --norollup -o "my video-norollup.srt"
    Cheers.
    Thanks, this one worked for me.
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!