VideoHelp Forum




+ Reply to Thread
Results 1 to 5 of 5
  1. There are batch files to download video and audio streams, but I haven't seen any for subtitle streams.

    In order to download subtitles, sometimes it takes long to write yt-dlp command lines.

    To make the things simplier and faster, I need a batch file with two functions - just as video-audio downloading batch files.

    1- to list all the subtitles from a specific MPD link given
    2- to choose subtitles for downloading and saving to a specific folder location
    Quote Quote  
  2. Glad to see another stupid troll is banned, its name was @Gen99 - who tried to hijack this thread with silly links.

    I am sure it will get another nickname and continue to attack other guys who need help. Because that's its way of life. It loves to give headache to people. I say "it", not he or she. Not a human. Just a meaningless creature.
    Last edited by ridibunda; 6th Apr 2022 at 08:44.
    Quote Quote  
  3. Using a batch will be an extra step (unless you want them all)

    You will have to use yt-dlp to list the available formats for subs.

    Then you will have to select the ones you want.

    Then you will have to write a bat

    https://github.com/yt-dlp/yt-dlp#subtitle-options

    Why dont you just incorporate the subs download along with the video / audio you want to download ?

    Its easy to do. And anyone will help you with this syntax.
    Last edited by codehound; 3rd Apr 2022 at 14:32.
    Quote Quote  
  4. Originally Posted by codehound View Post
    Using a batch will be an extra step (unless you want them all)

    You will have to use yt-dlp to list the available formats for subs.

    Then you will have to select the ones you want.

    Then you will have to write a bat

    https://github.com/yt-dlp/yt-dlp#subtitle-options

    Why dont you just incorporate the subs download along with the video / audio you want to download ?

    Its easy to do. And anyone will help you with this syntax.
    "Easy" is relative.

    The famous batch called "Encrypted Video Downloader II" does not have subtitle downloading option.

    I know I will have to use yt-dlp to list and download the subs. The thing was how to make it faster with a batch file. "Faster" = "writing command lines as low as possible".

    Just double click on a batch, copy&paste the mpd link there, then let the batch list all subtitles automatically, then choose the needed subtitle just writing "en" or "fr", then download begins, that's what I want.
    Quote Quote  
  5. Member
    Join Date
    Feb 2022
    Location
    Search the forum first!
    Search PM
    You don't have to make a batch file. You can make python create a list of urls for yt-dlp to download one at a time.

    I have a program getUK.py which will greedy download from some of the main UK tv providers. Feel free to adapt to your needs

    Code:
    from __future__ import print_function
    from bs4 import BeautifulSoup
    import requests
    import os
    import re
    import os.path
    import json
    
    ################
    # need yt-dlp and aria2c as external programs in PATH
    # pip package updates have been known to break this program!!
    # Some programmes on STV are now encrypted - this will not work with those
    #
    ################
    my_set = set() #  sets do not allow duplicates
    
    # thanks;-
    # https://hackersandslackers.com/extract-data-from-complex-json-python/
    #
    # must use json.loads to provide this fn with a dict object
    def json_extract(obj, key):
        """Recursively fetch values from nested JSON."""
        arr = []
    
        def extract(obj, arr, key):
            """Recursively search for values of key in JSON tree."""
            if isinstance(obj, dict):
                for k, v in obj.items():
                    if isinstance(v, (dict, list)):
                        extract(v, arr, key)
                    elif k == key:
                        arr.append(v)
            elif isinstance(obj, list):
                for item in obj:
                    extract(item, arr, key)
            return arr
    
        values = extract(obj, arr, key)
        # return list object
        return values
    
    
    def getBBCMedia(url):
        # creates a set of urls for yt-dlp to use
        soup = BeautifulSoup(requests.get(url).text, 'lxml')
        domain = ("https://" + url.split('/')[-5])
    
        for link in soup.find_all('a'):
            if 'href' in link.attrs and 'episode' in link.attrs['href']:
                my_set.add(domain + link.attrs['href'] + '\r')
        for x in my_set:
            if re.search("^.*/ad/.*$", x):
                continue
            str_val = "".join(x)
            print(str_val)
        getMedia(my_set)
    
    
    def getSTVSummaryMedia(url):
        soup = BeautifulSoup(requests.get(url).text, 'lxml')
        # stv links are relative, thus capture domain
        domain = ("https://" + url.split('/')[2])
        # specific json code on stv pages in __NEXT_DATA__ -- May change!!!!!
        my_json = json.loads(soup.find(id="__NEXT_DATA__").get_text())
        # print(f"jason data collected = {my_json}")  # uncomment  for testing correct json returned
        my_values = json_extract(my_json, 'link')
        # my_values is a list
        # remove summary urls and duplicate items in list by
        # adding to a set,  my_set
        for x in my_values:
            if re.search('summary', x):
                pass
            else:
                my_set.add(f'{domain}{x}')
        getMedia(my_set)
    
    
    def getITVMedia(url):
        soup = BeautifulSoup(requests.get(url).text, 'lxml')
        series = (url.split('/')[-2])
        for link in soup.find_all('a'):
            if 'href' in link.attrs and series in link.attrs['href']:
                if re.search("^.*facebook.*$", str(link.attrs)):
                    continue
                if re.search("^.*twitter.*$", str(link.attrs)):
                    continue
                ''' if prog ref too short '''
                # if not re.search("[a-zA-Z0-9]{8,11}$", str(link.attrs)):
                ''' close file and return  '''
                my_set.add(link.attrs['href'] + '\r')
    
        getMedia(my_set)
    
    
    def getMedia(my_set):
    
        for x in my_set:
            os.system(f"yt-dlp   --downloader aria2c --convert-subs srt --embed-subs {x}")
    
    
    def getSingleMedia(url):
        os.system(f"yt-dlp   --downloader aria2c --convert-subs srt --embed-subs  {url}")
    
    
    def delineate(url):
        if re.search(f"bbc.co.uk", url):
            # pass url to getBBCMedia
            print("BBC media")
            return getBBCMedia(url)
        if re.search(f"itv.com", url):
            # pass url to getITVMedia
            print('ITV media')
            return getITVMedia(url)
        if re.search(f"stv.tv", url) and re.search('summary', url):
            # pass url to getStv
            return getSTVSummaryMedia(url)
        else:
            print('getting single media')
            return getSingleMedia(url)
    
    
    if __name__ == '__main__':
        from sys import argv
        if argv and len(argv) > 1:
            # print(f"{argv[1]}")
            delineate(argv[1])
        else:
            print("Usage: getUK requires a URL to be passed as an argument \n \
            URLs currently processed are BBC, ITV  and STV ")
    Quote Quote  



Similar Threads

Visit our sponsor! Try DVDFab and backup Blu-rays!