f.TV 78 x 7min

7th May 2024 10:42 #1

Member

Does anyone think of any way to get the 78 x 7min videos from france.tv other than downloading individually?

It's the new "Once Upon a Time...the Objects" that is available ....

I can manually download via the MPD/N_m3u8DL-RE:
FF filter: regexp:hdnea
CH filter: /hdnea/

The video available is FHD, audio is only AAC LC and there are subs available ....

(Noticed in first few videos downloaded that the subs are displaying
<c.yellow> and <c.white> on the front of the sentence/line
for some strange reason) when manually downloading with N_m3u8DL-RE ....

Any ideas would be appreciated .....

The MPD link has a limited life of 10 minutes and it also has HMAC included ....

It feels like Brians task of writing 100x "Romani ite domum" .

Last edited by pssh; 7th May 2024 at 11:08.

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

7th May 2024 10:51 #2

2nHxWW6GkN1l916N3ayz8HQoi

Feels Good Man

Send a message via ICQ to 2nHxWW6GkN1l916N3ayz8HQoi

Send a message via AIM to 2nHxWW6GkN1l916N3ayz8HQoi

Send a message via MSN to 2nHxWW6GkN1l916N3ayz8HQoi

Send a message via Yahoo to 2nHxWW6GkN1l916N3ayz8HQoi

You could first automate the downloading of an individual video from that site. Then you scrape all the links from that page of yours and feed it to your script.

--[----->+<]>.++++++++++++.---.--------.
[*drm mass downloader: widefrog*]~~~~~~~~~~~[*how to make your own mass downloader: guide*]

Quote

7th May 2024 11:45 #3

pssh

Member

First part is beyond my skill set so is the second .....

Just tried adding "--sub-format SRT" to solve the subs,
but that did not work.....looking inside the manifest.mpd
the subs are "wvtt" ....
Code:
N_m3u8DL-RE -M mp4 -mt -sv best -sa all -ss all --sub-format SRT `
--save-name Il.etait.une.fois...ces.droles.dobjets.S01E02.Le.ballon.de.football.FR.2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.264 `
"https://cloud55~acl=%2f*~hmac=123456.....................abcdef"

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

7th May 2024 11:51 #4

2nHxWW6GkN1l916N3ayz8HQoi

Feels Good Man

Do any of your videos even use DRM? Clicked randomly on 2-3 videos and none use it.

--[----->+<]>.++++++++++++.---.--------.
[*drm mass downloader: widefrog*]~~~~~~~~~~~[*how to make your own mass downloader: guide*]

Quote

7th May 2024 12:39 #5

pssh

Member

f.tv like the beeb do not use DRM
(but it's pain in the back side to find a working VPN that is not blocked...)

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

7th May 2024 13:30 #6

2nHxWW6GkN1l916N3ayz8HQoi

Feels Good Man

Here's a small script to get you started.

Code:

import json
import re

import requests
from bs4 import BeautifulSoup
from unidecode import unidecode

TA_URL = 'https://hdfauth.ftven.fr/esi/TA'
VIDEOS_URL = 'https://k7.ftven.fr/videos/{video_id}'
BASE_URL = 'https://www.france.tv'


def get_valid_filename(name):
    if name is None or len(name) == 0:
        return None
    try:
        temp_name = unidecode(name)
        if temp_name is not None and len(temp_name) > 0:
            name = temp_name
    except:
        pass

    s = str(name).strip()
    s = re.sub(r'\s+', ' ', s)
    s = s.replace(" ", "_")
    s = re.sub(r"(?u)[^-\w.]", "", s)
    s = re.sub(r'_+', '_', s)
    s = re.sub(r'\.+', '', s)

    if s in {"", ".", ".."}:
        return None
    return s


def get_all_json_contents(source_text):
    json_contents = []
    for start_char, end_char in [("{", "}"), ("[", "]")]:
        start_str = f"window.FTVPlayerVideos = {start_char}"
        start_index = source_text.find(start_str)
        if start_index == -1:
            continue

        opening_brackets = 0
        start_index += len(start_str) - 1

        text_len = len(source_text)
        for char_index in range(start_index, text_len):
            if source_text[char_index] == start_char:
                opening_brackets += 1
            elif source_text[char_index] == end_char:
                opening_brackets -= 1
                if opening_brackets == 0:
                    json_object = json.loads(source_text[start_index:char_index + 1])
                    if type(json_object) is not list:
                        json_object = [json_object]
                    json_contents += [json_object]
                    break
    return json_contents


def get_video_json_content(source_url):
    for json_content in get_all_json_contents(requests.get(source_url).content.decode()):
        for content in json_content:
            video_id = content.get("videoId", None)
            origin_url = content.get("originUrl", None)
            if video_id is None or origin_url is None:
                continue

            if source_url.endswith(origin_url):
                return content
    return None


def get_video_data(source_url):
    video_content = get_video_json_content(source_url)
    manifest = json.loads(requests.get(
        VIDEOS_URL.format(video_id=video_content["videoId"]),
        params={
            'domain': 'domain',
            'browser': 'browser'
        }
    ).content.decode())["video"]["url"]
    manifest = json.loads(requests.get(
        TA_URL,
        params={
            'format': 'json',
            'url': manifest
        }
    ).content.decode())["url"]

    name = (
        f"france.tv {video_content['programName']} "
        f"Season {video_content['seasonNumber']} "
        f"Episode {video_content['episodeNumber']} "
        f"{video_content['videoTitle']}"
    )
    return manifest, get_valid_filename(name)


def get_download_command(source_url):
    manifest, name = get_video_data(source_url)
    return f'N_m3u8DL-RE.exe "{manifest}" -M format=mkv -mt -sv best -sa best -ss all --save-name "{name}"'


def get_videos_from_url(page_url):
    soup = BeautifulSoup(requests.get(page_url).content.decode(), 'html.parser')
    a_elements = soup.find_all('a', attrs={'data-video-id': True})
    return [f'{BASE_URL}{a["href"]}' for a in a_elements]


def get_all_videos_from_url(series_url):
    page = -1
    source_urls = []

    while True:
        page += 1
        page_urls = get_videos_from_url(f'{series_url}?page={page}')
        if len(page_urls) == 0:
            break
        source_urls += page_urls

    return source_urls


SERIES_URL = "https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/toutes-les-videos/"
for source_url in get_all_videos_from_url(SERIES_URL):
    print(get_download_command(source_url))

Output:

Code:

N_m3u8DL-RE.exe "https://cloudingest.ftven.fr/.../20b2ed236e066/ec7305f8-f877-4d83-b12a-6cbaa1e9ed76-1712219276_france-domtom_TA.ism/manifest.mpd?hdnea=exp=..." -M format=mkv -mt -sv best -sa best -ss all --save-name "francetv_Il_etait_une_fois_ces_droles_dobjets_Season_1_Episode_57_Il_etait_une_fois_Le_puzzle"
N_m3u8DL-RE.exe "https://cloudingest.ftven.fr/.../ab3b33b76e166/fcf02d20-6001-40e0-a550-c2b84df48f0a-1713269048_france-domtom_TA.ism/manifest.mpd?hdnea=exp=..." -M format=mkv -mt -sv best -sa best -ss all --save-name "francetv_Il_etait_une_fois_ces_droles_dobjets_Season_1_Episode_66_Il_etait_une_fois_le_telescope"
N_m3u8DL-RE.exe "https://cloudingest.ftven.fr/.../b75f47011d166/ddb04959-940e-4bf2-b612-1306960aa7e7-1713181151_france-domtom_TA.ism/manifest.mpd?hdnea=exp=..." -M format=mkv -mt -sv best -sa best -ss all --save-name "francetv_Il_etait_une_fois_ces_droles_dobjets_Season_1_Episode_71_Il_etait_une_fois_le_drone"
N_m3u8DL-RE.exe "https://cloudingest.ftven.fr/.../07767f5aa7166/a3ceb755-6593-463b-a920-2203e3a6627b-1712827217_france-domtom_TA.ism/manifest.mpd?hdnea=exp=..." -M format=mkv -mt -sv best -sa best -ss all --save-name "francetv_Il_etait_une_fois_ces_droles_dobjets_Season_1_Episode_78_Il_etait_une_fois_le_train"
N_m3u8DL-RE.exe "https://cloudingest.ftven.fr/.../f1a57e2aa7166/99f2f82c-ff17-4e26-b860-6e2a9f598075-1712827141_france-domtom_TA.ism/manifest.mpd?hdnea=exp=..." -M format=mkv -mt -sv best -sa best -ss all --save-name "francetv_Il_etait_une_fois_ces_droles_dobjets_Season_1_Episode_77_Il_etait_une_fois_la_carte_geographique"

... etc ... 78 commands

You'll have to figure it out with the subs, the vpn, and the download command launching (you can also edit the template of the command according to what you need). You can save all these commands to a txt file and write a small bat script that launches them one by one (that's one solution). I just gave you something to work with.

--[----->+<]>.++++++++++++.---.--------.
[*drm mass downloader: widefrog*]~~~~~~~~~~~[*how to make your own mass downloader: guide*]

Quote

7th May 2024 14:28 #7

pssh

Member

@2nHxWW6GkN1l916N3ayz8HQoi
Thank you very much for taking time and making this script.
I Appretiate it.

(Just stopped,
having made my semi automatic solution - with copy/paste edit and copy and past again and 27 more to go manually
pasting it into the PS).

I decided to install an editor for Python scripts (I think stubby recommended VSCode):

https://code.visualstudio.com/
https://microsoft.github.io/vscode-essentials/en/01-getting-started.html
Code:
winget install Microsoft.VisualStudioCode
And your scripts makes a little bit of sens to me,
and I will investigate how to improve it
and try it later

Thank you again.

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

7th May 2024 15:21 #8

LZAA

Member

Mr. 2nHxWW6GkN1l916N3ayz8HQoi’s script can also be used like this:

Code: france.tv.py

Code:

import json
import re

import requests
from bs4 import BeautifulSoup
from unidecode import unidecode

TA_URL = 'https://hdfauth.ftven.fr/esi/TA'
VIDEOS_URL = 'https://k7.ftven.fr/videos/{video_id}'
BASE_URL = 'https://www.france.tv'




def get_all_json_contents(source_text):
    json_contents = []
    for start_char, end_char in [("{", "}"), ("[", "]")]:
        start_str = f"window.FTVPlayerVideos = {start_char}"
        start_index = source_text.find(start_str)
        if start_index == -1:
            continue

        opening_brackets = 0
        start_index += len(start_str) - 1

        text_len = len(source_text)
        for char_index in range(start_index, text_len):
            if source_text[char_index] == start_char:
                opening_brackets += 1
            elif source_text[char_index] == end_char:
                opening_brackets -= 1
                if opening_brackets == 0:
                    json_object = json.loads(source_text[start_index:char_index + 1])
                    if type(json_object) is not list:
                        json_object = [json_object]
                    json_contents += [json_object]
                    break
    return json_contents


def get_video_json_content(source_url):
    for json_content in get_all_json_contents(requests.get(source_url).content.decode()):
        for content in json_content:
            video_id = content.get("videoId", None)
            origin_url = content.get("originUrl", None)
            if video_id is None or origin_url is None:
                continue

            if source_url.endswith(origin_url):
                return content
    return None


def get_videos_from_url(page_url):
    soup = BeautifulSoup(requests.get(page_url).content.decode(), 'html.parser')
    a_elements = soup.find_all('a', attrs={'data-video-id': True})
    return [f'{BASE_URL}{a["href"]}' for a in a_elements]


def get_all_videos_from_url(series_url):
    page = -1
    source_urls = []

    while True:
        page += 1
        page_urls = get_videos_from_url(f'{series_url}?page={page}')
        if len(page_urls) == 0:
            break
        source_urls += page_urls

    return source_urls


SERIES_URL = "https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/toutes-les-videos/"
for source_url in get_all_videos_from_url(SERIES_URL):
    print(source_url)

cmd:

france.tv.py > france.tv.txt | yt-dlp -a france.tv.txt --write-subs

Note.
To work via VPN, the script must be edited.

Last edited by LZAA; 7th May 2024 at 15:57.

Quote

7th May 2024 15:45 #9

2nHxWW6GkN1l916N3ayz8HQoi

Feels Good Man

Originally Posted by LZAA

cmd:

france.tv.py > france.tv.txt | yt-dlp -a france.tv.txt --write-subs

I completely forgot about yt-dlp. It makes sense for them to implement a downloader for france.tv since apparently they have no DRM content. Nice find @LZAA.

Originally Posted by pssh

I decided to install an editor for Python scripts (I think stubby recommended VSCode)

Pycharm is another good IDE.

https://www.jetbrains.com/pycharm/download/?section=windows

--[----->+<]>.++++++++++++.---.--------.
[*drm mass downloader: widefrog*]~~~~~~~~~~~[*how to make your own mass downloader: guide*]

Quote

7th May 2024 16:02 #10

LZAA

Member

Code:
To work via VPN, the script must be edited.
Please do it if it's not difficult.

Quote

7th May 2024 16:34 #11

pssh

Member

Originally Posted by 2nHxWW6GkN1l916N3ayz8HQoi

Here's a small script to get you started.

Code:

# 2nHxWW6GkN1l916N3ayz8HQoi
# https://forum.videohelp.com/threads/414452-f-TV-78-x-7min#post2734714
#
#########################################################################

import json
import re

import requests
from bs4 import BeautifulSoup
from unidecode import unidecode

TA_URL = 'https://hdfauth.ftven.fr/esi/TA'
VIDEOS_URL = 'https://k7.ftven.fr/videos/{video_id}'
BASE_URL = 'https://www.france.tv'


def get_valid_filename(name):
    if name is None or len(name) == 0:
        return None
    try:
        temp_name = unidecode(name)
        if temp_name is not None and len(temp_name) > 0:
            name = temp_name
    except:
        pass

    s = str(name).strip()
    s = re.sub(r'\s+', ' ', s)
    s = s.replace(" ", "_")
    s = re.sub(r"(?u)[^-\w.]", "", s)
    s = re.sub(r'_+', '_', s)
    s = re.sub(r'\.+', '', s)

    if s in {"", ".", ".."}:
        return None
    return s


def get_all_json_contents(source_text):
    json_contents = []
    for start_char, end_char in [("{", "}"), ("[", "]")]:
        start_str = f"window.FTVPlayerVideos = {start_char}"
        start_index = source_text.find(start_str)
        if start_index == -1:
            continue

        opening_brackets = 0
        start_index += len(start_str) - 1

        text_len = len(source_text)
        for char_index in range(start_index, text_len):
            if source_text[char_index] == start_char:
                opening_brackets += 1
            elif source_text[char_index] == end_char:
                opening_brackets -= 1
                if opening_brackets == 0:
                    json_object = json.loads(source_text[start_index:char_index + 1])
                    if type(json_object) is not list:
                        json_object = [json_object]
                    json_contents += [json_object]
                    break
    return json_contents


def get_video_json_content(source_url):
    for json_content in get_all_json_contents(requests.get(source_url).content.decode()):
        for content in json_content:
            video_id = content.get("videoId", None)
            origin_url = content.get("originUrl", None)
            if video_id is None or origin_url is None:
                continue

            if source_url.endswith(origin_url):
                return content
    return None


def get_video_data(source_url):
    video_content = get_video_json_content(source_url)
    manifest = json.loads(requests.get(
        VIDEOS_URL.format(video_id=video_content["videoId"]),
        params={
            'domain': 'domain',
            'browser': 'browser'
        }
    ).content.decode())["video"]["url"]
    manifest = json.loads(requests.get(
        TA_URL,
        params={
            'format': 'json',
            'url': manifest
        }
    ).content.decode())["url"]

    name = (
        f"france.tv {video_content['programName']} "
        f"Season {video_content['seasonNumber']} "
        f"Episode {video_content['episodeNumber']} "
        f"{video_content['videoTitle']}"
    )
    return manifest, get_valid_filename(name)


def get_download_command(source_url):
    manifest, name = get_video_data(source_url)
    return f'N_m3u8DL-RE "{manifest}" -M format=mp4 -mt -sv best -sa best -ss all --save-name "{name}"' #Removed .exe from N_m3u8DL-RE to allow Linux use | format set to MP4 instead of MKV


def get_videos_from_url(page_url):
    soup = BeautifulSoup(requests.get(page_url).content.decode(), 'html.parser')
    a_elements = soup.find_all('a', attrs={'data-video-id': True})
    return [f'{BASE_URL}{a["href"]}' for a in a_elements]


def get_all_videos_from_url(series_url):
    page = -1
    source_urls = []

    while True:
        page += 1
        page_urls = get_videos_from_url(f'{series_url}?page={page}')
        if len(page_urls) == 0:
            break
        source_urls += page_urls

    return source_urls

#############
#SERIES URL:
#############
SERIES_URL = "https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/toutes-les-videos/"
for source_url in get_all_videos_from_url(SERIES_URL):
    print(get_download_command(source_url))

Got the SSD connected and tried it, and it worked

Copy/pasted all the 78 commands into the PowerShell and 1h 30m later it finished

Thank you, could have saved all day copy/pasting the URL's manualy ......

Thank you.

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

8th May 2024 02:56 #12

pssh

Member

Originally Posted by LZAA

Mr. 2nHxWW6GkN1l916N3ayz8HQoi’s script can also be used like this:

Code: france.tv.py

Code:

import json
import re

import requests
from bs4 import BeautifulSoup
from unidecode import unidecode

TA_URL = 'https://hdfauth.ftven.fr/esi/TA'
VIDEOS_URL = 'https://k7.ftven.fr/videos/{video_id}'
BASE_URL = 'https://www.france.tv'




def get_all_json_contents(source_text):
    json_contents = []
    for start_char, end_char in [("{", "}"), ("[", "]")]:
        start_str = f"window.FTVPlayerVideos = {start_char}"
        start_index = source_text.find(start_str)
        if start_index == -1:
            continue

        opening_brackets = 0
        start_index += len(start_str) - 1

        text_len = len(source_text)
        for char_index in range(start_index, text_len):
            if source_text[char_index] == start_char:
                opening_brackets += 1
            elif source_text[char_index] == end_char:
                opening_brackets -= 1
                if opening_brackets == 0:
                    json_object = json.loads(source_text[start_index:char_index + 1])
                    if type(json_object) is not list:
                        json_object = [json_object]
                    json_contents += [json_object]
                    break
    return json_contents


def get_video_json_content(source_url):
    for json_content in get_all_json_contents(requests.get(source_url).content.decode()):
        for content in json_content:
            video_id = content.get("videoId", None)
            origin_url = content.get("originUrl", None)
            if video_id is None or origin_url is None:
                continue

            if source_url.endswith(origin_url):
                return content
    return None


def get_videos_from_url(page_url):
    soup = BeautifulSoup(requests.get(page_url).content.decode(), 'html.parser')
    a_elements = soup.find_all('a', attrs={'data-video-id': True})
    return [f'{BASE_URL}{a["href"]}' for a in a_elements]


def get_all_videos_from_url(series_url):
    page = -1
    source_urls = []

    while True:
        page += 1
        page_urls = get_videos_from_url(f'{series_url}?page={page}')
        if len(page_urls) == 0:
            break
        source_urls += page_urls

    return source_urls


SERIES_URL = "https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/toutes-les-videos/"
for source_url in get_all_videos_from_url(SERIES_URL):
    print(source_url)

cmd:

france.tv.py > france.tv.txt | yt-dlp -a france.tv.txt --write-subs

Note.
To work via VPN, the script must be edited.

Thank you.

Should have checked the YT-DLP Suppored Sites, bookmarked now...

You solutions also looks great. I saved it in my notes.

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

8th May 2024 03:03 #13

cedric8528

Member

Thanks Pepe for the script...

And without a script for example ....

Addon for Chrome Link Klipper

https://chromewebstore.google.com/detail/link-klipper-extract-all/fahollcgofmpnehocdgo...kchiekoo?pli=1

Extract all links and put them in a file "list.txt"

[Attachment 78938 - Click to enlarge]

Use yt-dlp to download
Code:
yt-dlp -a list.txt
[Attachment 78939 - Click to enlarge]

Quote

8th May 2024 03:03 #14

pssh

Member

Originally Posted by 2nHxWW6GkN1l916N3ayz8HQoi

Pycharm is another good IDE.

https://www.jetbrains.com/pycharm/download/?section=windows

Thank you.
It's also available to install via WinGet
Code:
winget search PyCharm
winget install JetBrains.PyCharm.Community

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

8th May 2024 05:06 #15

pssh

Member

I'm trying this:
Code:
yt-dlp `
--embed-subs `
--sub-langs all `
--embed-chapters `
--embed-metadata `
--embed-info-json `
--embed-thumbnail `
--restrict-filenames `
"https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/saison-1/5692527-il-etait-une-fois-la-tablette-de-chocolat.html"
and yt-dlp solves the "Subs" problem and not displaying the "<c.yellow> and <c.white>, etc."

Exploring the rules of custom name "-o" so it is consistent with the way FreeVine used to name the files...
Series.Name.S01E01.Episode.Name.FR.2024.1080p.FRTV .WEB-DL.AAC.LC.2.0.264

I have tried this one with wrong result:
Code:
-o '%(title)s.FR.2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.264.%(ext)s'
Il_etait_une_fois...ces_droles_d_objets_-_Il_etait_une_fois_la_tablette_de_chocolat.FR.2024 .1080p.FRTV.WEB-DL.AAC.LC.2.0.264

So looked at the manual and tried this:
Code:
-o '%(title)s.S%(season_number)02dE%(episode_number)02d:%(title)s.FR.2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.264.%(ext)s'
Il_etait_une_fois...ces_droles_d_objets_-_Il_etait_une_fois_la_tablette_de_chocolat.S01E01# Il_etait_une_fois...ces_droles_d_objets_-_Il_etait_une_fois_la_tablette_de_chocolat.FR.2024 .1080p.FRTV.WEB-DL.AAC.LC.2.0.264
And YT-DLP failed to MUX it as the filenames are too long ...

But that is also not the right one either .....

Title "%(title)s" contains both Series.Name.-.Episode.Name ....

YT-DLP supports also
Code:
--replace-in-metadata FIELDS REGEX REPLACE
but that is beyond my skill set.

Has anyone tried/managed setting output of YT-DLP for series in the following format:
Series.Name.S01E01.Episode.Name.....
or does it not work, because f.tv is set up differently to other websites?

Also, did not find how to replace the "underscore" in the name space with dot .....

#EDIT
Searching around I found this post:
https://github.com/yt-dlp/yt-dlp/issues/5925
and that results in "NEARLY" what I like to achieve .....just the "Episode.Name" is missing and file still has underscores instead of dots..
Il_etait_une_fois...ces_droles_d_objets.S01E01.FR. 2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.264
Code:
yt-dlp `
--parse-metadata "title:(?P<series>.+?), Series \d+" `
--parse-metadata "title:, Series (?P<season_number>\d+), Episode (?P<episode_number>\d+)" `
-o "%(series)s.S%(season_number)02dE%(episode_number)02d.FR.2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.264.%(ext)s" `
--embed-subs `
--sub-langs all `
--embed-chapters `
--embed-metadata `
--embed-info-json `
--embed-thumbnail `
--restrict-filenames `
"https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/saison-1/5692527-il-etait-une-fois-la-tablette-de-chocolat.html"
#EDIT2

I have also found a solution to replace the "underscore" with "dot" here and it works:
https://manpages.ubuntu.com/manpages/jammy/en/man1/yt-dlp.1.html

# Replace all spaces and "_" in title and uploader with a `-`
Code:
$ yt-dlp --replace-in-metadata "title,uploader" "[ _]" "-"
--replace-in-metadata [WHEN:]FIELDS REGEX REPLACE
Replace text in a metadata field using the
given regex. This option can be used
multiple times. Supported values of "WHEN"
are the same as that of --use-postprocessor
(default: pre_process)
Code:
yt-dlp `
--parse-metadata "title:(?P<series>.+?), Series \d+" `
--parse-metadata "title:, Series (?P<season_number>\d+), Episode (?P<episode_number>\d+)" `
-o "%(series)s.S%(season_number)02dE%(episode_number)02d.FR.2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.264.%(ext)s" `
--embed-subs `
--sub-langs all `
--embed-chapters `
--embed-metadata `
--embed-info-json `
--embed-thumbnail `
--restrict-filenames `
--replace-in-metadata "title,uploader,series" "[ _]" "." `
"https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/saison-1/5692527-il-etait-une-fois-la-tablette-de-chocolat.html"
and the result is:
Il.etait.une.fois...ces.droles.d_objets.S01E01.FR. 2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.264
Code:
With the following Metadata:
Collection                               : Il.était.une.fois...ces.drôles.d'objets
Season                                   : 1
Part                                     : 1
Track name                               : Il.était.une.fois...ces.drôles.d'objets.-.Il.était.une.fois.la.tablette.de.chocolat
Recorded date                            : 20240131
Writing application                      : Lavf61.1.100
Cover                                    : Yes
Comment                                  : francetv:2xxxx2-5xx6-4xx2-8xx3-0xxxxxxxxe#__youtubedl_smuggle=%7B%22hostname%22%3A+%22www.france.tv%22%7D
Part_ID                                  : Il était une fois la tablette de chocolat
...just the "Episode.Name" = "La tablette de chocolat" is missing after S01E01 .....
Code:
Adding "chapter" does not work
--parse-metadata "title:(?P<series>.+?), Series \d+" `
--parse-metadata "title:, Series (?P<season_number>\d+), Episode (?P<episode_number>\d+)" `
--parse-metadata "title:(?P<chapter>.+?), Chapter \d+" `
-o "%(series)s.S%(season_number)02dE%(episode_number)02d.%(chapter)02d.FR.2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.264.%(ext)s" `
Looking into the HTML shows it as:
Code:
"videoTitle":"Il \u00e9tait une fois...La tablette de chocolat"
#EDIT3

Trying to add the episode field as well, results with error:
Code:
> yt-dlp `
>> --parse-metadata "title:(?P<series>.+?), Series \d+, (?P<episode>.+?), Episode \d+" `
>> --parse-metadata "title:, Series (?P<season_number>\d+), Episode (?P<episode_number>\d+)" `
>> -o "%(series)s.S%(season_number)02dE%(episode_number)02d.%(episode).FR.2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.264.%(ext)s" `
>> --embed-subs `
>> --sub-langs all `
>> --embed-chapters `
>> --embed-metadata `
>> --embed-info-json `
>> --embed-thumbnail `
>> --restrict-filenames `
>> --replace-in-metadata "title,uploader,series" "[ _]" "." `
>> "https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/saison-1/5692527-il-etait-une-fois-la-tablette-de-chocolat.html"
[FranceTVSite] Extracting URL: https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/saison-1/5692527...tte-de-chocolat.html
[FranceTVSite] 5692527-il-etait-une-fois-la-tablette-de-chocolat: Downloading webpage
[FranceTV] Extracting URL: francetv:28c76702-5a06-4cf2-8d63-01a73cb0533e#__youtubedl_smuggle=%7B%22hostname%22%3A+%22www.france.tv%22%7D
[FranceTV] 28c76702-5a06-4cf2-8d63-01a73cb0533e: Downloading desktop chrome video JSON
[FranceTV] 28c76702-5a06-4cf2-8d63-01a73cb0533e: Downloading mobile safari video JSON
[FranceTV] 28c76702-5a06-4cf2-8d63-01a73cb0533e: Downloading signed dash manifest URL
[FranceTV] 28c76702-5a06-4cf2-8d63-01a73cb0533e: Downloading MPD manifest
[FranceTV] 28c76702-5a06-4cf2-8d63-01a73cb0533e: Downloading signed hls manifest URL
[FranceTV] 28c76702-5a06-4cf2-8d63-01a73cb0533e: Downloading m3u8 information
[info] 28c76702-5a06-4cf2-8d63-01a73cb0533e: Downloading subtitles: qsm
ERROR: 'episode'
Also tried with the same error:
Code:
--parse-metadata "title:(?P<series>.+?), Series \d+" `
--parse-metadata "title:, Series (?P<season_number>\d+), Episode (?P<episode_number>\d+)" `
--parse-metadata "title:(?P<episode>.+?), Episode \d+" `
-o "%(series)s.S%(season_number)02dE%(episode_number)02d.%(episode).FR.2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.264.%(ext)s" `

Last edited by pssh; 8th May 2024 at 08:27.

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

8th May 2024 06:19 #16

2nHxWW6GkN1l916N3ayz8HQoi

Feels Good Man

Originally Posted by LZAA
Code:
To work via VPN, the script must be edited.
Please do it if it's not difficult.
I never deal with the technical side when it comes to VPN/proxies. Prefer leaving it to the user. Both yt-dlp and n_m3u8 allow for proxy usage and you only need a good one when you download the content with the download command.

Originally Posted by cedric8528

And without a script for example ....

Addon for Chrome Link Klipper

Nice find @cedric. With this addon and yt-dlp a script isn't needed anymore and pretty much anyone can batch download now.

--[----->+<]>.++++++++++++.---.--------.
[*drm mass downloader: widefrog*]~~~~~~~~~~~[*how to make your own mass downloader: guide*]

Quote

8th May 2024 07:03 #17

LittleSoldier

Member

Originally Posted by cedric8528
Thanks Pepe for the script...

And without a script for example ....

Addon for Chrome Link Klipper

https://chromewebstore.google.com/detail/link-klipper-extract-all/fahollcgofmpnehocdgo...kchiekoo?pli=1

Extract all links and put them in a file "list.txt"

[Attachment 78938 - Click to enlarge]

Use yt-dlp to download
Code:
yt-dlp -a list.txt
[Attachment 78939 - Click to enlarge]
Thanks cedric.. Learned something new today

Quote

8th May 2024 07:33 #18

LZAA

Member

OK.

Quote

30th May 2024 07:13 #19

pssh

Member

Originally Posted by LZAA

Mr. 2nHxWW6GkN1l916N3ayz8HQoi’s script can also be used like this:

Code: france.tv.py

Code:

import json
import re

import requests
from bs4 import BeautifulSoup
from unidecode import unidecode

TA_URL = 'https://hdfauth.ftven.fr/esi/TA'
VIDEOS_URL = 'https://k7.ftven.fr/videos/{video_id}'
BASE_URL = 'https://www.france.tv'




def get_all_json_contents(source_text):
    json_contents = []
    for start_char, end_char in [("{", "}"), ("[", "]")]:
        start_str = f"window.FTVPlayerVideos = {start_char}"
        start_index = source_text.find(start_str)
        if start_index == -1:
            continue

        opening_brackets = 0
        start_index += len(start_str) - 1

        text_len = len(source_text)
        for char_index in range(start_index, text_len):
            if source_text[char_index] == start_char:
                opening_brackets += 1
            elif source_text[char_index] == end_char:
                opening_brackets -= 1
                if opening_brackets == 0:
                    json_object = json.loads(source_text[start_index:char_index + 1])
                    if type(json_object) is not list:
                        json_object = [json_object]
                    json_contents += [json_object]
                    break
    return json_contents


def get_video_json_content(source_url):
    for json_content in get_all_json_contents(requests.get(source_url).content.decode()):
        for content in json_content:
            video_id = content.get("videoId", None)
            origin_url = content.get("originUrl", None)
            if video_id is None or origin_url is None:
                continue

            if source_url.endswith(origin_url):
                return content
    return None


def get_videos_from_url(page_url):
    soup = BeautifulSoup(requests.get(page_url).content.decode(), 'html.parser')
    a_elements = soup.find_all('a', attrs={'data-video-id': True})
    return [f'{BASE_URL}{a["href"]}' for a in a_elements]


def get_all_videos_from_url(series_url):
    page = -1
    source_urls = []

    while True:
        page += 1
        page_urls = get_videos_from_url(f'{series_url}?page={page}')
        if len(page_urls) == 0:
            break
        source_urls += page_urls

    return source_urls


SERIES_URL = "https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/toutes-les-videos/"
for source_url in get_all_videos_from_url(SERIES_URL):
    print(source_url)

cmd:

france.tv.py > france.tv.txt | yt-dlp -a france.tv.txt --write-subs

Note.
To work via VPN, the script must be edited.

Thank you very much, also great solution.
(Slightly better then using N_m3u8DL-RE)

YT-DLP fixes the SUBs,
the only thing missing now is
just trying to fix the File Names to follow:

Series.Name.S01E01.Episode.Name.FR.2024.1080p.FRTV .WEB-DL.AAC.LC.2.0.264
(Yes I know I can use multi re-name tools....but )

This is the best one so far:

Code:

franceDOTtv.py > france.tv.txt | yt-dlp -a france.tv.txt `
--parse-metadata "title:(?P<series>.+?), Series \d+" `
--parse-metadata "title:, Series (?P<season_number>\d+), Episode (?P<episode_number>\d+)" `
-o "%(series)s.S%(season_number)02dE%(episode_number)02d.FR.2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.H.264.%(ext)s" `
--embed-subs `
--sub-langs all `
--embed-chapters `
--embed-metadata `
--embed-thumbnail `
--restrict-filenames `
--replace-in-metadata "title,uploader,series" "[ _]" "." `

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

30th May 2024 08:38 #20

pssh

Member

Originally Posted by LZAA

Mr. 2nHxWW6GkN1l916N3ayz8HQoi’s script can also be used like this:

Code: france.tv.py

Code:

import json
import re

import requests
from bs4 import BeautifulSoup
from unidecode import unidecode

TA_URL = 'https://hdfauth.ftven.fr/esi/TA'
VIDEOS_URL = 'https://k7.ftven.fr/videos/{video_id}'
BASE_URL = 'https://www.france.tv'




def get_all_json_contents(source_text):
    json_contents = []
    for start_char, end_char in [("{", "}"), ("[", "]")]:
        start_str = f"window.FTVPlayerVideos = {start_char}"
        start_index = source_text.find(start_str)
        if start_index == -1:
            continue

        opening_brackets = 0
        start_index += len(start_str) - 1

        text_len = len(source_text)
        for char_index in range(start_index, text_len):
            if source_text[char_index] == start_char:
                opening_brackets += 1
            elif source_text[char_index] == end_char:
                opening_brackets -= 1
                if opening_brackets == 0:
                    json_object = json.loads(source_text[start_index:char_index + 1])
                    if type(json_object) is not list:
                        json_object = [json_object]
                    json_contents += [json_object]
                    break
    return json_contents


def get_video_json_content(source_url):
    for json_content in get_all_json_contents(requests.get(source_url).content.decode()):
        for content in json_content:
            video_id = content.get("videoId", None)
            origin_url = content.get("originUrl", None)
            if video_id is None or origin_url is None:
                continue

            if source_url.endswith(origin_url):
                return content
    return None


def get_videos_from_url(page_url):
    soup = BeautifulSoup(requests.get(page_url).content.decode(), 'html.parser')
    a_elements = soup.find_all('a', attrs={'data-video-id': True})
    return [f'{BASE_URL}{a["href"]}' for a in a_elements]


def get_all_videos_from_url(series_url):
    page = -1
    source_urls = []

    while True:
        page += 1
        page_urls = get_videos_from_url(f'{series_url}?page={page}')
        if len(page_urls) == 0:
            break
        source_urls += page_urls

    return source_urls


SERIES_URL = "https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/toutes-les-videos/"
for source_url in get_all_videos_from_url(SERIES_URL):
    print(source_url)

cmd:

france.tv.py > france.tv.txt | yt-dlp -a france.tv.txt --write-subs

Note.
To work via VPN, the script must be edited.

On Windows running:

Code:

france.tv.py > france.tv.txt

Will create an "france.tv.txt" size 21.232 with only 78 lines and YT-DLP comes up with ERROR:

Code:

[generic] Extracting URL: https://www.france.tv/enfants/six-huit-ans/il-eta...uzzle.html
ERROR: [generic] 'h\x00t\x00t\x00p\x00s\x00:\x00/\x00/\x00w\x00w\x00w\x00.\x00f\x00r\x00a\x00n\x00c\x00e\x00.\x00t\x00v\x00/\x00e\x00n\x00f\x00a\x00n\x00t\x00s\x00/\x00s\x00i\x00x\x00-\x00h\x00u\x00i\x00t\x00-\x00a\x00n\x00s\x00/\x00i\x00l\x00-\x00e\x00t\x00a\x00i\x00t\x00-\x00u\x00n\x00e\x00-\x00f\x00o\x00i\x00s\x00-\x00c\x00e\x00s\x00-\x00d\x00r\x00o\x00l\x00e\x00s\x00-\x00d\x00-\x00o\x00b\x00j\x00e\x00t\x00s\x00/\x00s\x00a\x00i\x00s\x00o\x00n\x00-\x001\x00/\x005\x008\x004\x003\x005\x000\x002\x00-\x00i\x00l\x00-\x00e\x00t\x00a\x00i\x00t\x00-\x00u\x00n\x00e\x00-\x00f\x00o\x00i\x00s\x00-\x00l\x00e\x00-\x00p\x00u\x00z\x00z\x00l\x00e\x00.\x00h\x00t\x00m\x00l\x00' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/saison-1/5843502-il-etait-une-fois-le-puzzle.html" ) to search YouTube
[generic] Extracting URL:
ERROR: [generic] '\x00' is not a valid URL. Set --default-search "ytsearch" (or run  yt-dlp "ytsearch:" ) to search YouTube

If I copy the 78 lines and paste into Notepad "franceX.tv.txt" it's size is 10.613 for the 78 lines and YT-DLP works ..

Last edited by pssh; 30th May 2024 at 12:34.

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

30th May 2024 11:52 #21

LZAA

Member

france.tv.py | yt-dlp -a - --write-subs
Reading URLs from STDIN - EOF (Ctrl+Z) to end:
[FranceTVSite] Extracting URL: https://www.france.tv/enfants/six-huit-ans/il-etait-une-fois-ces-droles-d-objets/saiso...le-puzzle.html
[FranceTVSite] 5843502-il-etait-une-fois-le-puzzle: Downloading webpage
[FranceTV] Extracting URL: francetv:ec7305f8-f877-4d83-b12a-6cbaa1e9ed76#__youtubedl_smuggle=%7B%22hostname%22 %3A+%22www.france.tv%22%7D
[FranceTV] ec7305f8-f877-4d83-b12a-6cbaa1e9ed76: Downloading desktop chrome video JSON
[FranceTV] ec7305f8-f877-4d83-b12a-6cbaa1e9ed76: Downloading mobile safari video JSON
[FranceTV] ec7305f8-f877-4d83-b12a-6cbaa1e9ed76: Downloading signed dash manifest URL
[FranceTV] ec7305f8-f877-4d83-b12a-6cbaa1e9ed76: Downloading MPD manifest
[FranceTV] ec7305f8-f877-4d83-b12a-6cbaa1e9ed76: Downloading signed hls manifest URL
[FranceTV] ec7305f8-f877-4d83-b12a-6cbaa1e9ed76: Downloading m3u8 information
[info] ec7305f8-f877-4d83-b12a-6cbaa1e9ed76: Downloading subtitles: qsm
[info] ec7305f8-f877-4d83-b12a-6cbaa1e9ed76: Downloading 1 format(s): hls-5398+hls-audio-aacl-96-Audio_Français
[info] Writing video subtitles to: Il était une fois...ces drôles d'objets - Il était une fois… Le puzzle [ec7305f8-f877-4d83-b12a-6cbaa1e9ed76].qsm.vtt
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 53
[download] Destination: Il était une fois...ces drôles d'objets - Il était une fois… Le puzzle [ec7305f8-f877-4d83-b12a-6cbaa1e9ed76].qsm.vtt
[download] 100% of 12.65KiB in 00:00:06 at 1.83KiB/s
[hlsnative] Downloading m3u8 manifest
[hlsnative] Total fragments: 53
[download] Destination: Il était une fois...ces drôles d'objets - Il était une fois… Le puzzle [ec7305f8-f877-4d83-b12a-6cbaa1e9ed76].fhls-5398.mp4
[download] 9.7% of ~ 258.86MiB at 846.34KiB/s ETA 04:44:05 (frag 5/53)
...

Quote

30th May 2024 13:07 #22

pssh

Member

Yes, thank you.

That works great

I have amend it little bit.
The series/episode is first then the Title as it can be renamed with multi re-name tool easily after

Code:

py $home\py\franceDOTtv.py | yt-dlp -a - `
--parse-metadata "title:(?P<series>.+?), Series \d+" `
--parse-metadata "title:, Series (?P<season_number>\d+), Episode (?P<episode_number>\d+)" `
-o "S%(season_number)02dE%(episode_number)02d.%(title)s.FR.2024.1080p.FRTV.WEB-DL.AAC.LC.2.0.H.264.%(ext)s" `
--replace-in-metadata "title,uploader,series" "[ _]" "." `
--embed-subs `
--sub-langs all `
--embed-chapters `
--embed-metadata `
--embed-info-json `
--embed-thumbnail

If I was in politics I make sure you drink plenty of beer
and watch plenty of TV to keep you busy. | Data is the new oil.

Quote

f.TV 78 x 7min

Thread Tools

Search Thread