Hi,
I extracted CC from a video file, but the beginning of every subtitles is the same text as the last one:
So is it possible to"clean" the subtitles? "Merge lines with same text" in Subtitle Edit doesn't remove it all.Code:12 00:00:18,085 --> 00:00:20,187 un jeu terriblement intrigant, délicieusement tordu et dangereusement addictif, 13 00:00:20,187 --> 00:00:21,922 délicieusement tordu et dangereusement addictif, un jeu à grand déploiement 14 00:00:21,922 --> 00:00:23,724 et dangereusement addictif, un jeu à grand déploiement qui a pris le monde entier 15 00:00:23,724 --> 00:00:25,526 un jeu à grand déploiement qui a pris le monde entier par surprise, et qui débarque 16 00:00:25,526 --> 00:00:27,327 qui a pris le monde entier par surprise, et qui débarque maintenant ici, chez nous. 17 00:00:27,327 --> 00:00:29,129 \hpar surprise, et qui débarque \hmaintenant ici, chez nous. Bienvenue dans mon manoir. 18 00:00:29,129 --> 00:00:30,163 \hmaintenant ici, chez nous. Bienvenue dans mon manoir. - Le jeu, il est
Try StreamFab Downloader and download from Netflix, Amazon, Youtube! Or Try DVDFab and copy Blu-rays!
+ Reply to Thread
Results 1 to 14 of 14
Thread
-
-
One way to do it is using a custom script. From the sample you posted, could you write what a desired outcome would look like?
--[----->+<]>.++++++++++++.---.--------.
[*drm mass downloader: widefrog*]~~~[*how to make your own mass downloader: guide*] -
I think deleting the first two sentences of each line would do. For example:
Code:12 00:00:18,085 --> 00:00:20,187 un jeu terriblement intrigant, délicieusement tordu et dangereusement addictif, 13 00:00:20,187 --> 00:00:21,922 un jeu à grand déploiement 14 00:00:21,922 --> 00:00:23,724 qui a pris le monde entier 15 00:00:23,724 --> 00:00:25,526 par surprise, et qui débarque 16 00:00:25,526 --> 00:00:27,327 maintenant ici, chez nous. 17 00:00:27,327 --> 00:00:29,129 Bienvenue dans mon manoir. 18 00:00:29,129 --> 00:00:30,163 - Le jeu, il est
-
Use SubtitleEdit to replace the repeated sentence (assuming they are not relevant)
Edit > Replace >Find What (paste: un jeu terriblement intrigant,) > Replace with: (Leave Blank)
Then go either Replace or
Replace All
Repeat if you have other repeated irrelevant sentences -
I almost wrote a solution but I don't understand what's this
Code:16 00:00:25,526 --> 00:00:27,327 qui a pris le monde entier par surprise, et qui débarque maintenant ici, chez nous. 17 00:00:27,327 --> 00:00:29,129 \hpar surprise, et qui débarque \hmaintenant ici, chez nous. Bienvenue dans mon manoir.
Edit: No idea how that got there but I just removed it. The script is
Code:import re import pysubs2 def remove_intersection(list1, list2): if len(list1) == 0 or len(list2) == 0: return list1, list2 l1 = list(reversed(list1)) l2 =[list2[0:i2] for i2 in range(1, len(list2) + 1)] max_len = 0 for i1 in range(1, len(l1) + 1): l0 = list(reversed(l1[0:i1])) if l0 in l2: max_len = len(l0) if max_len == 0: return list1, list2 return list1, list2[max_len:] if __name__ == '__main__': subs = pysubs2.load("input.srt", encoding="utf-8") new_subs = [subs[0]] for index in range(0, len(subs) - 1): s1 = subs[index] d1 = re.split("\\\\N", s1.text, flags=re.IGNORECASE) d1 = [re.sub("\\\\[A-Z]", "", dialogue, flags=re.IGNORECASE) for dialogue in d1] s2 = subs[index + 1] d2 = re.split("\\\\N", s2.text, flags=re.IGNORECASE) d2 = [re.sub("\\\\[A-Z]", "", dialogue, flags=re.IGNORECASE) for dialogue in d2] lst1, lst2 = remove_intersection(d1, d2) new_sub = subs[index + 1].copy() new_sub.text = "\\N".join(lst2) new_subs.append(new_sub) for index in range(0, len(subs)): subs[index] = new_subs[index] subs.save("output.srt")
Code:12 00:00:18,085 --> 00:00:20,187 un jeu terriblement intrigant, délicieusement tordu et dangereusement addictif, 13 00:00:20,187 --> 00:00:21,922 délicieusement tordu et dangereusement addictif, un jeu à grand déploiement 14 00:00:21,922 --> 00:00:23,724 et dangereusement addictif, un jeu à grand déploiement qui a pris le monde entier 15 00:00:23,724 --> 00:00:25,526 un jeu à grand déploiement qui a pris le monde entier par surprise, et qui débarque 16 00:00:25,526 --> 00:00:27,327 qui a pris le monde entier par surprise, et qui débarque maintenant ici, chez nous. 17 00:00:27,327 --> 00:00:29,129 \hpar surprise, et qui débarque \hmaintenant ici, chez nous. Bienvenue dans mon manoir. 18 00:00:29,129 --> 00:00:30,163 \hmaintenant ici, chez nous. Bienvenue dans mon manoir. - Le jeu, il est
Code:1 00:00:18,085 --> 00:00:20,187 un jeu terriblement intrigant, délicieusement tordu et dangereusement addictif, 2 00:00:20,187 --> 00:00:21,922 un jeu à grand déploiement 3 00:00:21,922 --> 00:00:23,724 qui a pris le monde entier 4 00:00:23,724 --> 00:00:25,526 par surprise, et qui débarque 5 00:00:25,526 --> 00:00:27,327 maintenant ici, chez nous. 6 00:00:27,327 --> 00:00:29,129 Bienvenue dans mon manoir. 7 00:00:29,129 --> 00:00:30,163 - Le jeu, il est
https://www.python.org/downloads/
and pysubs
Code:pip install pysubs2
Code:python script.py
Last edited by 2nHxWW6GkN1l916N3ayz8HQoi; 23rd Aug 2024 at 07:24.
--[----->+<]>.++++++++++++.---.--------.
[*drm mass downloader: widefrog*]~~~[*how to make your own mass downloader: guide*] -
Yeah I realized that.
Can you tell us which application did you use to extract the subtitles?
Perhaps a different application would have given you a better solution.
Try to load the video file using Subtitles Edit. If it finds subtitles it will OCR it or display them all immediately. -
-
The CC subtitles are extracted from a MKV file as in this topic. I had to use clever FFmpeg-GUI as gMKVExtractGui didn't work.
-
The software hasn't been updated for a few years.
I find it still works, most times, for me.
https://www.videohelp.com/software/ccextractor
This is the command line I use.
ccext.cmd
Code:"C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv -o "my video.srt"
-
i think the video could have "rollup style" closed captions. Although I have no experience using the command line version of CCExtractor, the command line version has some settings that deal with rollup style captions.
Ignore list: hello_hello, tried, TechLord, Snoopy329 -
Hakunamatata67
You can try this collection and see what you get. The first attempt makes for a much LARGER file.
ccext2.cmd
Code:"C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --dru -o "my video-direct_rollup.srt" "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --norollup -o "my video-norollup.srt" "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --ru1 -o "my video-ru1.srt" "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --ru2 -o "my video-ru2.srt" "C:\CCExtractor 0.94\ccextractorwinfull.exe" "my video.mkv" --ru3 -o "my video-ru3.srt"
Last edited by pcspeak; 23rd Aug 2024 at 21:53. Reason: remove spaces; address the post to the right person. :-)
-
Have you tried to load the video file using Subtitle Edit? If it finds subtitles it will OCR it or display them all immediately.
File > Open > Select the file
You can also try to use Subtitle Edit to create the subtitles from scratch using Subtitle Edit
video > Audio to text (Whisper)
If the audio is of good quality you can expect good results. Any errors can be corrected using the file that you already have.
Similar Threads
-
Encrypted closed-captions? Possible to decrypt?
By nobodyhome in forum Video Streaming DownloadingReplies: 2Last Post: 7th Jul 2024, 10:34 -
Viewing Closed Captions On *.ISO's
By cornemuse in forum SubtitleReplies: 4Last Post: 24th Dec 2023, 12:23 -
Extract closed captions from ABC
By merethe in forum Video Streaming DownloadingReplies: 2Last Post: 13th Oct 2023, 15:25 -
Software that you can customize closed captions?
By thinredline in forum SubtitleReplies: 10Last Post: 27th Sep 2023, 08:00 -
Embed Closed Captions in MP4
By video2me in forum Video ConversionReplies: 4Last Post: 11th Feb 2020, 16:55