How to Create a Batch File For Subtitle Downloading

2nd Apr 2022 07:19 #1

Member

There are batch files to download video and audio streams, but I haven't seen any for subtitle streams.

In order to download subtitles, sometimes it takes long to write yt-dlp command lines.

To make the things simplier and faster, I need a batch file with two functions - just as video-audio downloading batch files.

1- to list all the subtitles from a specific MPD link given
2- to choose subtitles for downloading and saving to a specific folder location

How to find PSSH
How to download video/audio tracks in a MPD

Quote

3rd Apr 2022 10:48 #2

ridibunda

Member

Glad to see another stupid troll is banned, its name was @Gen99 - who tried to hijack this thread with silly links.

I am sure it will get another nickname and continue to attack other guys who need help. Because that's its way of life. It loves to give headache to people. I say "it", not he or she. Not a human. Just a meaningless creature.

Last edited by ridibunda; 6th Apr 2022 at 08:44.

How to find PSSH
How to download video/audio tracks in a MPD

Quote

3rd Apr 2022 14:21 #3

codehound

Banned

Using a batch will be an extra step (unless you want them all)

You will have to use yt-dlp to list the available formats for subs.

Then you will have to select the ones you want.

Then you will have to write a bat

https://github.com/yt-dlp/yt-dlp#subtitle-options

Why dont you just incorporate the subs download along with the video / audio you want to download ?

Its easy to do. And anyone will help you with this syntax.

Last edited by codehound; 3rd Apr 2022 at 14:32.

Quote

6th Apr 2022 09:01 #4

ridibunda

Member

Originally Posted by codehound

Using a batch will be an extra step (unless you want them all)

You will have to use yt-dlp to list the available formats for subs.

Then you will have to select the ones you want.

Then you will have to write a bat

https://github.com/yt-dlp/yt-dlp#subtitle-options

Why dont you just incorporate the subs download along with the video / audio you want to download ?

Its easy to do. And anyone will help you with this syntax.

"Easy" is relative.

The famous batch called "Encrypted Video Downloader II" does not have subtitle downloading option.

I know I will have to use yt-dlp to list and download the subs. The thing was how to make it faster with a batch file. "Faster" = "writing command lines as low as possible".

Just double click on a batch, copy&paste the mpd link there, then let the batch list all subtitles automatically, then choose the needed subtitle just writing "en" or "fr", then download begins, that's what I want.

How to find PSSH
How to download video/audio tracks in a MPD

Quote

6th Apr 2022 13:33 #5

A_n_g_e_l_a

Member

You don't have to make a batch file. You can make python create a list of urls for yt-dlp to download one at a time.

I have a program getUK.py which will greedy download from some of the main UK tv providers. Feel free to adapt to your needs

Code:

from __future__ import print_function
from bs4 import BeautifulSoup
import requests
import os
import re
import os.path
import json

################
# need yt-dlp and aria2c as external programs in PATH
# pip package updates have been known to break this program!!
# Some programmes on STV are now encrypted - this will not work with those
#
################
my_set = set() #  sets do not allow duplicates

# thanks;-
# https://hackersandslackers.com/extract-data-from-complex-json-python/
#
# must use json.loads to provide this fn with a dict object
def json_extract(obj, key):
    """Recursively fetch values from nested JSON."""
    arr = []

    def extract(obj, arr, key):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
                if isinstance(v, (dict, list)):
                    extract(v, arr, key)
                elif k == key:
                    arr.append(v)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, key)
        return arr

    values = extract(obj, arr, key)
    # return list object
    return values


def getBBCMedia(url):
    # creates a set of urls for yt-dlp to use
    soup = BeautifulSoup(requests.get(url).text, 'lxml')
    domain = ("https://" + url.split('/')[-5])

    for link in soup.find_all('a'):
        if 'href' in link.attrs and 'episode' in link.attrs['href']:
            my_set.add(domain + link.attrs['href'] + '\r')
    for x in my_set:
        if re.search("^.*/ad/.*$", x):
            continue
        str_val = "".join(x)
        print(str_val)
    getMedia(my_set)


def getSTVSummaryMedia(url):
    soup = BeautifulSoup(requests.get(url).text, 'lxml')
    # stv links are relative, thus capture domain
    domain = ("https://" + url.split('/')[2])
    # specific json code on stv pages in __NEXT_DATA__ -- May change!!!!!
    my_json = json.loads(soup.find(id="__NEXT_DATA__").get_text())
    # print(f"jason data collected = {my_json}")  # uncomment  for testing correct json returned
    my_values = json_extract(my_json, 'link')
    # my_values is a list
    # remove summary urls and duplicate items in list by
    # adding to a set,  my_set
    for x in my_values:
        if re.search('summary', x):
            pass
        else:
            my_set.add(f'{domain}{x}')
    getMedia(my_set)


def getITVMedia(url):
    soup = BeautifulSoup(requests.get(url).text, 'lxml')
    series = (url.split('/')[-2])
    for link in soup.find_all('a'):
        if 'href' in link.attrs and series in link.attrs['href']:
            if re.search("^.*facebook.*$", str(link.attrs)):
                continue
            if re.search("^.*twitter.*$", str(link.attrs)):
                continue
            ''' if prog ref too short '''
            # if not re.search("[a-zA-Z0-9]{8,11}$", str(link.attrs)):
            ''' close file and return  '''
            my_set.add(link.attrs['href'] + '\r')

    getMedia(my_set)


def getMedia(my_set):

    for x in my_set:
        os.system(f"yt-dlp   --downloader aria2c --convert-subs srt --embed-subs {x}")


def getSingleMedia(url):
    os.system(f"yt-dlp   --downloader aria2c --convert-subs srt --embed-subs  {url}")


def delineate(url):
    if re.search(f"bbc.co.uk", url):
        # pass url to getBBCMedia
        print("BBC media")
        return getBBCMedia(url)
    if re.search(f"itv.com", url):
        # pass url to getITVMedia
        print('ITV media')
        return getITVMedia(url)
    if re.search(f"stv.tv", url) and re.search('summary', url):
        # pass url to getStv
        return getSTVSummaryMedia(url)
    else:
        print('getting single media')
        return getSingleMedia(url)


if __name__ == '__main__':
    from sys import argv
    if argv and len(argv) > 1:
        # print(f"{argv[1]}")
        delineate(argv[1])
    else:
        print("Usage: getUK requires a URL to be passed as an argument \n \
        URLs currently processed are BBC, ITV  and STV ")

Quote

How to Create a Batch File For Subtitle Downloading

Thread Tools

Search Thread

Similar Threads

Mediainfo batch script create help

Create batch file for ffmpeg to increment the file output

help batch downloading from a given website

Batch file to create nfo files for multiple videos

Downloading new OCR dictionary to Subtitle Edit 3.5.4