Downloading .m3u8 with token playlists from URL/webpage for IPTV

Member

Originally Posted by torahslut353

Hey I know this is months later and you've probably figured this out already but I thought I'd output what I did to ge this automated .

It grabs the udated m3u8 URL with tokens and outputs it to a text file

Below is a Python script I wrote for the MTV channel on that site.

To edit this to work on different streams youll need 2 URLS :

1) The normal site stream link which itll search through "https://thetvapp.to/tv/mtv-live-stream/"

2) The actual desired header URL. "thetvapp.to/live/streams/MTVEast.m3u8?token="

Since this is formatted differently we need to tell the program what it looks like. You can find this on your respective channel by:
Opening Dev tools
Network tab
Refresh
Browse through the results on the site and there should be one with an M3u8 URL. Copy that URL up until the point where it specifies the token as that will change every time

import re
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from webdriver_manager.chrome import ChromeDriverManager

def extract_desired_url(requests):
# Search for the desired URL in the requests
for request in requests:
if "thetvapp.to/live/streams/MTVEast.m3u8?token=" in request:
return request
return None

url = "https://thetvapp.to/tv/mtv-live-stream/"

# Set Chrome options
options = webdriver.ChromeOptions()
options.add_argument("--headless") # To run Chrome in headless mode

# Initialize the ChromeDriver service with the executable path
service = webdriver.ChromeService(ChromeDriverManager().inst all())

# Initialize Selenium WebDriver with the service and Chrome options
driver = webdriver.Chrome(service=service, options=options)

# Navigate to the URL
driver.get(url)

# Wait for the video player element to be present
try:
video_player = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS _NAME, "video-player")))
print("Video player loaded successfully.")
except:
pass # Do nothing if the video player is not found, the message will not be printed

def get_get_requests():
try:
# Execute JavaScript to capture network requests
requests = driver.execute_script("""
var performance = window.performance || window.webkitPerformance || window.msPerformance || window.mozPerformance;
if (!performance) {
return [];
}
var entries = performance.getEntriesByType("resource");
var urls = [];
for (var i = 0; i < entries.length; i++) {
urls.push(entries[i].name);
}
return urls;
""")
return requests
except Exception as e:
print("An error occurred:", e)
return None
finally:
driver.quit()

# Call the function to get GET requests
get_requests = get_get_requests()

# Extract the desired URL
if get_requests:
desired_url = extract_desired_url(get_requests)
if desired_url:
print("Desired URL found:", desired_url)
# Write the desired URL to the file
with open("mtvurl.txt", "w") as file:
file.write(desired_url)
print("Desired URL written to mtvurl.txt")
else:
print("No desired URL found in the requests.")
else:
print("No GET requests found.")

I am not much of a programmer but does this output to a file? I see in your code where you have the MTV channel located so if I wanted to add say HBO to it I what lines would I need to copy and change for HBO or any other channel? I would love to be able to take your script and extract all of the channels from Thetvapp.to so that I can then import it into VLC.

Quote

21st Mar 2024 06:20 #4

Member

I am attempting to run your code but I am getting the following error:

DevTools listening on ws://127.0.0.1:54862/devtools/browser/c3ad0a70-bb0f-491e-9a65-7256718ce539
An error occurred while loading the video player: Message:
Stacktrace:
GetHandleVerifier [0x00E48D03+51395]
(No symbol) [0x00DB5F61]
(No symbol) [0x00C6E13A]
(No symbol) [0x00CA62BB]
(No symbol) [0x00CA63EB]
(No symbol) [0x00CDC162]
(No symbol) [0x00CC3ED4]
(No symbol) [0x00CDA570]
(No symbol) [0x00CC3C26]
(No symbol) [0x00C9C629]
(No symbol) [0x00C9D40D]
GetHandleVerifier [0x011C68D3+3712147]
GetHandleVerifier [0x01205CBA+3971194]
GetHandleVerifier [0x01200FA8+3951464]
GetHandleVerifier [0x00EF9D09+776393]
(No symbol) [0x00DC1734]
(No symbol) [0x00DBC618]
(No symbol) [0x00DBC7C9]
(No symbol) [0x00DADDF0]
BaseThreadInitThunk [0x765CFCC9+25]
RtlGetAppContainerNamedObjectPath [0x777A7C5E+286]
RtlGetAppContainerNamedObjectPath [0x777A7C2E+238]

An error occurred: Message: javascript error: Invalid or unexpected token
(Session info: chrome-headless-shell=122.0.6261.113)
Stacktrace:
GetHandleVerifier [0x00E48D03+51395]
(No symbol) [0x00DB5F61]
(No symbol) [0x00C6E13A]
(No symbol) [0x00C72480]
(No symbol) [0x00C7408D]
(No symbol) [0x00CDAEAC]
(No symbol) [0x00CC3E8C]
(No symbol) [0x00CDA570]
(No symbol) [0x00CC3C26]
(No symbol) [0x00C9C629]
(No symbol) [0x00C9D40D]
GetHandleVerifier [0x011C68D3+3712147]
GetHandleVerifier [0x01205CBA+3971194]
GetHandleVerifier [0x01200FA8+3951464]
GetHandleVerifier [0x00EF9D09+776393]
(No symbol) [0x00DC1734]
(No symbol) [0x00DBC618]
(No symbol) [0x00DBC7C9]
(No symbol) [0x00DADDF0]
BaseThreadInitThunk [0x765CFCC9+25]
RtlGetAppContainerNamedObjectPath [0x777A7C5E+286]
RtlGetAppContainerNamedObjectPath [0x777A7C2E+238]

No GET requests found.
Press any key to continue . . .

Quote

21st Mar 2024 07:27 #5

white_snake

Member

Originally Posted by Dimension02000

I am attempting to run your code but I am getting the following error:

DevTools listening on ws://127.0.0.1:54862/devtools/browser/c3ad0a70-bb0f-491e-9a65-7256718ce539
An error occurred while loading the video player: Message:
Stacktrace:
GetHandleVerifier [0x00E48D03+51395]
(No symbol) [0x00DB5F61]
(No symbol) [0x00C6E13A]
(No symbol) [0x00CA62BB]
(No symbol) [0x00CA63EB]
(No symbol) [0x00CDC162]
(No symbol) [0x00CC3ED4]
(No symbol) [0x00CDA570]
(No symbol) [0x00CC3C26]
(No symbol) [0x00C9C629]
(No symbol) [0x00C9D40D]
GetHandleVerifier [0x011C68D3+3712147]
GetHandleVerifier [0x01205CBA+3971194]
GetHandleVerifier [0x01200FA8+3951464]
GetHandleVerifier [0x00EF9D09+776393]
(No symbol) [0x00DC1734]
(No symbol) [0x00DBC618]
(No symbol) [0x00DBC7C9]
(No symbol) [0x00DADDF0]
BaseThreadInitThunk [0x765CFCC9+25]
RtlGetAppContainerNamedObjectPath [0x777A7C5E+286]
RtlGetAppContainerNamedObjectPath [0x777A7C2E+238]

If you're on Windows, go to your task manager's Details tab, look for any chromedriver.exe instance that might be running, right-click on it and choose "End process tree".

Quote

21st Mar 2024 07:41 #6

Member

Originally Posted by white_snake

Originally Posted by Dimension02000

I am attempting to run your code but I am getting the following error:

DevTools listening on ws://127.0.0.1:54862/devtools/browser/c3ad0a70-bb0f-491e-9a65-7256718ce539
An error occurred while loading the video player: Message:
Stacktrace:
GetHandleVerifier [0x00E48D03+51395]
(No symbol) [0x00DB5F61]
(No symbol) [0x00C6E13A]
(No symbol) [0x00CA62BB]
(No symbol) [0x00CA63EB]
(No symbol) [0x00CDC162]
(No symbol) [0x00CC3ED4]
(No symbol) [0x00CDA570]
(No symbol) [0x00CC3C26]
(No symbol) [0x00C9C629]
(No symbol) [0x00C9D40D]
GetHandleVerifier [0x011C68D3+3712147]
GetHandleVerifier [0x01205CBA+3971194]
GetHandleVerifier [0x01200FA8+3951464]
GetHandleVerifier [0x00EF9D09+776393]
(No symbol) [0x00DC1734]
(No symbol) [0x00DBC618]
(No symbol) [0x00DBC7C9]
(No symbol) [0x00DADDF0]
BaseThreadInitThunk [0x765CFCC9+25]
RtlGetAppContainerNamedObjectPath [0x777A7C5E+286]
RtlGetAppContainerNamedObjectPath [0x777A7C2E+238]

If you're on Windows, go to your task manager's Details tab, look for any chromedriver.exe instance that might be running, right-click on it and choose "End process tree".

I did as you instructed but I am still not getting the expected results. Here is the latest output:

DevTools listening on ws://127.0.0.1:38304/devtools/browser/d63eddb5-0b27-4fcb-a79f-80f6a124f774
An error occurred while loading the video player: Message:
Stacktrace:
GetHandleVerifier [0x00854CE3+225091]
(No symbol) [0x00784E31]
(No symbol) [0x00629A7A]
(No symbol) [0x0066175B]
(No symbol) [0x0066188B]
(No symbol) [0x00697882]
(No symbol) [0x0067F5A4]
(No symbol) [0x00695CB0]
(No symbol) [0x0067F2F6]
(No symbol) [0x006579B9]
(No symbol) [0x0065879D]
sqlite3_dbdata_init [0x00CC9A83+4064547]
sqlite3_dbdata_init [0x00CD108A+4094762]
sqlite3_dbdata_init [0x00CCB988+4072488]
sqlite3_dbdata_init [0x009CC9E9+930953]
(No symbol) [0x00790804]
(No symbol) [0x0078AD28]
(No symbol) [0x0078AE51]
(No symbol) [0x0077CAC0]
BaseThreadInitThunk [0x765CFCC9+25]
RtlGetAppContainerNamedObjectPath [0x777A7C5E+286]
RtlGetAppContainerNamedObjectPath [0x777A7C2E+238]

An error occurred: Message: javascript error: Invalid or unexpected token
(Session info: chrome-headless-shell=123.0.6312.58)
Stacktrace:
GetHandleVerifier [0x00854CE3+225091]
(No symbol) [0x00784E31]
(No symbol) [0x00629A7A]
(No symbol) [0x0062DEB0]
(No symbol) [0x0062FA76]
(No symbol) [0x006965E2]
(No symbol) [0x0067F55C]
(No symbol) [0x00695CB0]
(No symbol) [0x0067F2F6]
(No symbol) [0x006579B9]
(No symbol) [0x0065879D]
sqlite3_dbdata_init [0x00CC9A83+4064547]
sqlite3_dbdata_init [0x00CD108A+4094762]
sqlite3_dbdata_init [0x00CCB988+4072488]
sqlite3_dbdata_init [0x009CC9E9+930953]
(No symbol) [0x00790804]
(No symbol) [0x0078AD28]
(No symbol) [0x0078AE51]
(No symbol) [0x0077CAC0]
BaseThreadInitThunk [0x765CFCC9+25]
RtlGetAppContainerNamedObjectPath [0x777A7C5E+286]
RtlGetAppContainerNamedObjectPath [0x777A7C2E+238]

No GET requests found.
Press any key to continue . . .

Quote

21st Mar 2024 07:44 #7

white_snake

Member

Try to make sure any instance of chrome.exe using that same User Data folder is also closed (or just close any chrome.exe process) before running the script.

Quote

21st Mar 2024 07:58 #8

Member

Originally Posted by white_snake

Try to make sure any instance of chrome.exe using that same User Data folder is also closed (or just close any chrome.exe process) before running the script.

Thanks for getting back to me so quickly.

I reviewed my task manager ensuring that there were no Chrome processes running and again checked the details which did not show any Chrome running. To be sure that I did not miss anything I even rebooted my system but I am still not getting the expected results.

Here is the latest output:

DevTools listening on ws://127.0.0.1:11191/devtools/browser/ce5d1f30-f194-4856-9feb-8bbd1c71eb0a
An error occurred while loading the video player: Message:
Stacktrace:
GetHandleVerifier [0x00614CE3+225091]
(No symbol) [0x00544E31]
(No symbol) [0x003E9A7A]
(No symbol) [0x0042175B]
(No symbol) [0x0042188B]
(No symbol) [0x00457882]
(No symbol) [0x0043F5A4]
(No symbol) [0x00455CB0]
(No symbol) [0x0043F2F6]
(No symbol) [0x004179B9]
(No symbol) [0x0041879D]
sqlite3_dbdata_init [0x00A89A83+4064547]
sqlite3_dbdata_init [0x00A9108A+4094762]
sqlite3_dbdata_init [0x00A8B988+4072488]
sqlite3_dbdata_init [0x0078C9E9+930953]
(No symbol) [0x00550804]
(No symbol) [0x0054AD28]
(No symbol) [0x0054AE51]
(No symbol) [0x0053CAC0]
BaseThreadInitThunk [0x75FEFCC9+25]
RtlGetAppContainerNamedObjectPath [0x77237C5E+286]
RtlGetAppContainerNamedObjectPath [0x77237C2E+238]

An error occurred: Message: javascript error: Invalid or unexpected token
(Session info: chrome-headless-shell=123.0.6312.58)
Stacktrace:
GetHandleVerifier [0x00614CE3+225091]
(No symbol) [0x00544E31]
(No symbol) [0x003E9A7A]
(No symbol) [0x003EDEB0]
(No symbol) [0x003EFA76]
(No symbol) [0x004565E2]
(No symbol) [0x0043F55C]
(No symbol) [0x00455CB0]
(No symbol) [0x0043F2F6]
(No symbol) [0x004179B9]
(No symbol) [0x0041879D]
sqlite3_dbdata_init [0x00A89A83+4064547]
sqlite3_dbdata_init [0x00A9108A+4094762]
sqlite3_dbdata_init [0x00A8B988+4072488]
sqlite3_dbdata_init [0x0078C9E9+930953]
(No symbol) [0x00550804]
(No symbol) [0x0054AD28]
(No symbol) [0x0054AE51]
(No symbol) [0x0053CAC0]
BaseThreadInitThunk [0x75FEFCC9+25]
RtlGetAppContainerNamedObjectPath [0x77237C5E+286]
RtlGetAppContainerNamedObjectPath [0x77237C2E+238]

No GET requests found.
Press any key to continue . . .

Quote

21st Mar 2024 09:29 #9

white_snake

Member

Originally Posted by Dimension02000

Originally Posted by white_snake

Try to make sure any instance of chrome.exe using that same User Data folder is also closed (or just close any chrome.exe process) before running the script.

Thanks for getting back to me so quickly.

I reviewed my task manager ensuring that there were no Chrome processes running and again checked the details which did not show any Chrome running. To be sure that I did not miss anything I even rebooted my system but I am still not getting the expected results.

It looks like the script is for older selenium versions, I updated the script a little bit, try this:

Code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

url = "https://thetvapp.to/tv/mtv-live-stream/"


def extract_desired_url(requests):
    # Search for the desired URL in the requests
    for request in requests:
        if "MTVEast.m3u8?token=" in request:
            return request
    return None


def get_get_requests():
    try:
        # Execute JavaScript to capture network requests
        requests = driver.execute_script("""
            var performance = window.performance || window.webkitPerformance || window.msPerformance || window.mozPerformance;
            if (!performance) {
                return [];
            }
            var entries = performance.getEntriesByType("resource");
            var urls = [];
            for (var i = 0; i < entries.length; i++) {
                urls.push(entries[i].name);
            }
            return urls;
        """)
        return requests
    except Exception as e:
        print("An error occurred:", e)
        return None
    finally:
        driver.quit()


# Set Chrome options
options = webdriver.ChromeOptions()
options.add_argument("--headless")  # To run Chrome in headless mode

# Initialize Selenium WebDriver with the service and Chrome options
driver = webdriver.Chrome(options=options)

# Navigate to the URL
driver.get(url)

# Wait for the video player element to be present
try:
    video_player = WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.CLASS_NAME, "video-player")))
    print("Video player loaded successfully.")
except:
    pass  # Do nothing if the video player is not found, the message will not be printed


# Call the function to get GET requests
get_requests = get_get_requests()

# Extract the desired URL
if get_requests:
    desired_url = extract_desired_url(get_requests)
    if desired_url:
        print("Desired URL found:", desired_url)
        # Write the desired URL to the file
        with open("mtvurl.txt", "w") as file:
            file.write(desired_url)
            print("Desired URL written to mtvurl.txt")
    else:
        print("No desired URL found in the requests.")
else:
    print("No GET requests found.")

Quote

21st Mar 2024 10:17 #10

Member

It works!

Thank you so much for your assistance on getting this to work. Now I just need to figure out how to get it read all of the URLs from a file or the site itself to acquire the needed URL with the token rather than it being hard coded.

Quote

13th May 2024 22:14 #11

mbiboy

Member

I keep getting "No desired URL found in the requests." Is this python script still working for everyone? No success so far.

Quote

21st May 2024 16:43 #12

Member

Originally Posted by Dimension02000

It works!

Thank you so much for your assistance on getting this to work. Now I just need to figure out how to get it read all of the URLs from a file or the site itself to acquire the needed URL with the token rather than it being hard coded.

Thats the easy part,
Code:
import requests
import re

url = "https://thetvapp.to/tv"
headers = {
    'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36'
}
response = requests.get(url, headers=headers )

channels = re.findall(r'a href=\"/tv/(.*?)\"', response.text)

for channel in channels:
    print('https://thetvapp.to/tv/' +str(channel))

Quote

22nd May 2024 02:53 #13

Member

I updated it so it works again. I also made it ask what channel and then spit out the url.

Code:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import re

# Setup main options
options = webdriver.ChromeOptions()
options.add_argument("--headless")  # To run Chrome in headless mode
options.add_argument("--no-sandbox")
options.add_argument("--disable-dev-shm-usage")
user_agent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.85 Safari/537.36"
options.add_argument(f"user-agent={user_agent}")
driver = webdriver.Chrome(options=options)

# First get all the live channels into a list
homepage = "https://thetvapp.to/tv/"
driver.get(homepage)
channels = re.findall('a href=\"/tv/(.*?)/\"', driver.page_source)

# Enumerate and print the list then wait for user to input a number
for i, channel in enumerate(channels):
    print(i, channel.replace('-', ' '))
while True:
    try:
        selection = int(input("Please select a channel number: "))
        if selection < 0 or selection >= len(channels):
            print(f"Please select a number between 0 and {len(channels) - 1}.")
            continue
    except ValueError:
        print("Sorry, numbers only.")
        continue
    else:
        break

url = homepage + str(channels[selection])
print(f'Scraping page for playlist at {url}')



def extract_desired_url(requests):
    # Search for the desired URL in the requests
    for request in requests:
        if "m3u8?token=" in request:
            return request
    return None


def get_get_requests():
    global driver
    try:
        # Execute JavaScript to capture network requests
        requests = driver.execute_script("""
            var performance = window.performance || window.webkitPerformance || window.msPerformance || window.mozPerformance;
            if (!performance) {
                return [];
            }
            var entries = performance.getEntriesByType("resource");
            var urls = [];
            for (var i = 0; i < entries.length; i++) {
                urls.push(entries[i].name);
            }
            return urls;
        """)
        return requests
    except Exception as e:
        print("An error occurred:", e)
        return None



driver.get(url)
time.sleep(1)  # Adjust this if needed - this is the wait for the player to receive the decoded url
get_requests = get_get_requests()

# Extract the desired URL
if get_requests:
    desired_url = extract_desired_url(get_requests)
    if desired_url:
        print("Playlist URL found:", desired_url)
    else:
        print("No Playlist URL found in the requests.")
else:
    print("No GET requests found.")

driver.quit()

Quote

2nd Aug 2024 16:26 #14

empiretc

Member

Originally Posted by SpaceBallz

I updated it so it works again. I also made it ask what channel and then spit out the url.

[/code]

So were you able to successfully generate m3u files for thetvapp.to ?

Quote

2nd Aug 2024 19:52 #15

Member

It did work, I think they might have changed the way it works. I think the m3u only gets delivered after pressing play now. I'll look at the script again over the weekend.

Quote

2nd Aug 2024 20:39 #16

Member

Had a quick look, they now wait for the play button to be pressed which called a url like token/channelname. it needs a nicely crafted header to receive the m3u url. I'll look again when I have more time, the data required is in the first load of the page.
You can get the token url from the individual channel page with
Code:
driver.get(url)
time.sleep(1) 
chanpage = re.findall('data=\"/token/(.*?)\"', driver.page_source)
newdata = "https://thetvapp.to/token/" + str(chanpage[0])

Quote

3rd Aug 2024 00:36 #17

imr_saleh

Member

start your journey from this code

CNN

Code:

import re
import requests

start_seasson = requests.session()


# get crf-token
headers1 = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'cache-control': 'max-age=0',
    'dnt': '1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Mobile Safari/537.36',
}


webpage = start_seasson.get('https://thetvapp.to/tv/cnn-live-stream/', headers=headers1).text

csrf_token = re.search(r'<meta name="csrf-token" content="(.*?)">', webpage).group(1)

headers2 = {
    'content-type': 'application/json',
    'dnt': '1',
    'origin': 'https://thetvapp.to',
    'referer': 'https://thetvapp.to/',
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Mobile Safari/537.36',
    'x-csrf-token': csrf_token,
}

json_data = {
    'KSCahyafDAfniqhjUdDlvpUB': 'GKeVZHYqyAKAjyWUapyLKKEctt',
}

hls = start_seasson.post('https://thetvapp.to/token/CNN', headers=headers2, json=json_data).text
url_corrected = hls.replace("\\", "")
print(url_corrected)

start_seasson.close()

Last edited by imr_saleh; 3rd Aug 2024 at 00:41.

Quote

3rd Aug 2024 05:02 #18

Member

Originally Posted by imr_saleh
start your journey from this code

CNN
Code:
import re
import requests

start_seasson = requests.session()


# get crf-token
headers1 = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'cache-control': 'max-age=0',
    'dnt': '1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Mobile Safari/537.36',
}


webpage = start_seasson.get('https://thetvapp.to/tv/cnn-live-stream/', headers=headers1).text

csrf_token = re.search(r'<meta name="csrf-token" content="(.*?)">', webpage).group(1)

headers2 = {
    'content-type': 'application/json',
    'dnt': '1',
    'origin': 'https://thetvapp.to',
    'referer': 'https://thetvapp.to/',
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Mobile Safari/537.36',
    'x-csrf-token': csrf_token,
}

json_data = {
    'KSCahyafDAfniqhjUdDlvpUB': 'GKeVZHYqyAKAjyWUapyLKKEctt',
}

hls = start_seasson.post('https://thetvapp.to/token/CNN', headers=headers2, json=json_data).text
url_corrected = hls.replace("\\", "")
print(url_corrected)

start_seasson.close()
Using this code directly gives a forbidden error.

The json_data you have hardcoded in the script is dynamic. Its only generated after the play button is clicked, however I was unable to find it. If i load the page with my debugger open and click play I can then see the payload data for my connection. If I put this key/value into your script it works fine.

I was using selenium.
Code:
wait = WebDriverWait(driver, 10)
play_button = wait.until(EC.element_to_be_clickable((By.ID, 'loadVideoBtnOne')))
play_button.click()
I was able to load the page, get all the cookies and click the button but like I said, where is the payload key:value created/stored.

I'm still learning python and webscraping.

Quote

3rd Aug 2024 05:04 #19

Feels Good Man

Send a message via ICQ to 2nHxWW6GkN1l916N3ayz8HQoi

Send a message via AIM to 2nHxWW6GkN1l916N3ayz8HQoi

Send a message via MSN to 2nHxWW6GkN1l916N3ayz8HQoi

Send a message via Yahoo to 2nHxWW6GkN1l916N3ayz8HQoi

Originally Posted by SpaceBallz

I'm still learning python and webscraping.

If only there was a guide for webscraping somewhere...

--[----->+<]>.++++++++++++.---.--------.
[*drm mass downloader: widefrog*]~~~~~~~~~~~[*how to make your own mass downloader: guide*]

Quote

3rd Aug 2024 06:12 #20

empiretc

Member

Originally Posted by SpaceBallz
Had a quick look, they now wait for the play button to be pressed which called a url like token/channelname. it needs a nicely crafted header to receive the m3u url. I'll look again when I have more time, the data required is in the first load of the page.
You can get the token url from the individual channel page with
Code:
driver.get(url)
time.sleep(1) 
chanpage = re.findall('data=\"/token/(.*?)\"', driver.page_source)
newdata = "https://thetvapp.to/token/" + str(chanpage[0])
Hey, thanks. Luckily, found another source with Fortv that works really well.

Quote

3rd Aug 2024 15:46 #21

imr_saleh

Member

Originally Posted by SpaceBallz
Originally Posted by imr_saleh
start your journey from this code

CNN
Code:
import re
import requests

start_seasson = requests.session()


# get crf-token
headers1 = {
    'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7',
    'cache-control': 'max-age=0',
    'dnt': '1',
    'upgrade-insecure-requests': '1',
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Mobile Safari/537.36',
}


webpage = start_seasson.get('https://thetvapp.to/tv/cnn-live-stream/', headers=headers1).text

csrf_token = re.search(r'<meta name="csrf-token" content="(.*?)">', webpage).group(1)

headers2 = {
    'content-type': 'application/json',
    'dnt': '1',
    'origin': 'https://thetvapp.to',
    'referer': 'https://thetvapp.to/',
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Mobile Safari/537.36',
    'x-csrf-token': csrf_token,
}

json_data = {
    'KSCahyafDAfniqhjUdDlvpUB': 'GKeVZHYqyAKAjyWUapyLKKEctt',
}

hls = start_seasson.post('https://thetvapp.to/token/CNN', headers=headers2, json=json_data).text
url_corrected = hls.replace("\\", "")
print(url_corrected)

start_seasson.close()
Using this code directly gives a forbidden error.

The json_data you have hardcoded in the script is dynamic. Its only generated after the play button is clicked, however I was unable to find it. If i load the page with my debugger open and click play I can then see the payload data for my connection. If I put this key/value into your script it works fine.

I was using selenium.
Code:
wait = WebDriverWait(driver, 10)
play_button = wait.until(EC.element_to_be_clickable((By.ID, 'loadVideoBtnOne')))
play_button.click()
I was able to load the page, get all the cookies and click the button but like I said, where is the payload key:value created/stored.

I'm still learning python and webscraping.
Ah, it seems that not only the x-csrf-token header has expired, but it also needs a new payload.

I had a look and found that the payload is generated via JavaScript,
The parameter (KSCahyafDAfniqhjUdDlvpUB) can be obtained directly inside the code
But the biggest challenge is how to generate its key value (GKeVZHYqyAKAjyWUapyLKKEctt)
Because The JavaScript file is heavily obfuscated, making it difficult to directly analyze the logic. However, the presence of certain patterns, such as function calls and variable manipulations, can guide us in locating the correct payload
Code:
async function L5() {
    const n = U
      , t = {
        vSpJA: n(239),
        Wgwlp: n(288),
        vSNPf: "Network response was not ok "
    };
    try {
        const x = await fetch(as, {
            method: t[n(240)],
            headers: {
                "Content-Type": t[n(259)],
                "X-CSRF-TOKEN": cs()
            },
            body: JSON[n(295)]({
                KSCahyafDAfniqhjUdDlvpUB: R5  
            })
        });
        if (!x.ok)
            throw new Error(t[n(263)] + x[n(219)]);
        return await x[n(228)]()
    } catch (x) {
        console.error(n(226), x)
    }
}
I'll continue to check how the key is generated.

I prefer to use the code directly instead of using selenium webdriver

Quote

4th Aug 2024 10:22 #22

Feels Good Man

Not completely happy with the result, but eh, could be better

Code:

import ast
import re
from html import unescape
from http.cookies import SimpleCookie
from time import sleep

import requests
from bs4 import BeautifulSoup

BASE_URL = "https://thetvapp.to"
PAYLOAD = None


def evaluate(a, o, b):
    if o == "-":
        return int(a) - int(b)
    if o == "+":
        return int(a) + int(b)
    if o == "*":
        return int(a) * int(b)
    return int(a) // int(b)


def get_key_values(page_soup):
    app_js = page_soup.find_all('script', src=True)
    app_js = [
        source['src'] for source in app_js
        if source['src'].endswith('.js') and 'app' in source['src'].split("/")[-1]
    ][0]

    app_js = requests.get(app_js).content.decode()
    fixed_js_key = re.findall(
        r'headers:{[^{}]*"X-CSRF-TOKEN"[^{}]*},body:[^{}]*{([^{}]*)}',
        app_js
    )[0].split(":")[0]

    all_operations = re.findall(
        r'const\s[^=\s]+=\[];([^;]+);',
        app_js
    )
    possible_key_values = []

    for list_operations in all_operations:
        if "{" in list_operations or "}" in list_operations:
            continue
        if '](' not in list_operations or '),' not in list_operations:
            continue

        function_name = list_operations.split("[")[1].split("(")[0]
        function_name = re.findall(
            fr'const\s{function_name}=([^;]+);',
            app_js
        )[0]

        offset_operation = re.findall(
            r"function {function_name}\(.*?\){(.*?)}".replace("{function_name}", function_name),
            app_js,
            re.DOTALL
        )[0]

        function_name = re.findall(
            r'const\s[^=]+=([^()]+)\(',
            offset_operation
        )[0]

        offset_operation = re.findall(
            r"(\w+)=\1([-+*/])(\d+),",
            offset_operation
        )[0]

        list_operations = list_operations.split(",")
        list_operations = [
            re.findall(r'\(([^()]+)\)', op)[-1].replace('"', "").replace("'", "")
            for op in list_operations
        ]

        fixed_js_words = re.findall(
            r"function {function_name}\(.*?\){.*?(\[.*?\]);return\s.*?}".replace(
                "{function_name}",
                function_name
            ),
            app_js,
            re.DOTALL
        )[0]
        try:
            fixed_js_words = ast.literal_eval(fixed_js_words)
        except:
            continue

        try:
            max_op_len = len(max(list(filter(lambda o: not o.isdigit(), list_operations)), key=len))
        except:
            max_op_len = None

        for _ in range(0, len(fixed_js_words) - 1):
            current_key_value = []
            fail_key_value = False

            for operation in list_operations:
                if operation.isdigit():
                    operation = evaluate(operation, offset_operation[1], offset_operation[2])
                    operation = fixed_js_words[operation]

                if not bool(re.match(r'^[a-zA-Z]+$', operation)):
                    fail_key_value = True
                    break
                elif max_op_len is not None and len(operation) > 2 * max_op_len:
                    fail_key_value = True
                    break
                elif len("".join(current_key_value)) > 2 * len(fixed_js_key):
                    fail_key_value = True
                    break
                current_key_value.append(operation)

            if len(current_key_value) == 0:
                fail_key_value = True
            elif len("".join(current_key_value)) < len(fixed_js_key):
                fail_key_value = True
            elif len(min(current_key_value, key=len)) * 2 < len(max(current_key_value, key=len)):
                fail_key_value = True

            if not fail_key_value:
                current_key_value = "".join(current_key_value)
                possible_key_values.append(current_key_value)

            fixed_js_words.append(fixed_js_words.pop(0))

    return {
        "key": fixed_js_key,
        "value": possible_key_values
    }


def get_m3u8(source_url):
    global PAYLOAD
    response = requests.get(source_url)
    soup = BeautifulSoup(response.text, 'html.parser')

    csrf_token = soup.find_all('meta', attrs={'name': 'csrf-token'})[0]["content"]
    get_m3u8_endpoint = soup.find_all("div", attrs={"id": "get-m3u8-link"})[0]["data"]
    if not get_m3u8_endpoint.startswith(BASE_URL):
        get_m3u8_endpoint = f'{BASE_URL}{get_m3u8_endpoint}'

    response = dict(response.headers)
    cookies = SimpleCookie()
    cookies.load(response["set-cookie"])
    app_session = {k: v.value for k, v in cookies.items()}["thetvapp_session"]

    payload = PAYLOAD
    if payload is None:
        payload = get_key_values(soup)

    for key_value in payload["value"]:
        js_key = payload["key"]
        response = requests.post(
            get_m3u8_endpoint,
            cookies={'thetvapp_session': app_session},
            headers={'X-CSRF-TOKEN': csrf_token},
            json={js_key: key_value}
        )

        if response.status_code == 200:
            PAYLOAD = {
                "key": js_key,
                "value": [key_value]
            }
            return response.json()
        sleep(0.5)

    print("Failed to obtain the m3u8 with any payload... Debug the script")
    exit(0)


if __name__ == '__main__':
    r = requests.get(BASE_URL)
    s = BeautifulSoup(r.text, 'html.parser')

    links = s.find_all('a', class_='list-group-item')
    index = 0
    for link in links:
        href = link.get('href')
        if not href or not href.startswith('/tv/'):
            continue

        href = f"{BASE_URL}{href}"
        text = unescape(link.text)
        index += 1

        try:
            print(index, text, get_m3u8(href))
        except:
            print(index, "possible vpn issues: ", href)

Code:

1 A&E https://v1.thetvapp.to/hls/AEEast/index.m3u8?token=YnRka1dnbkx2Uko1eUw5bzU0MUlBbHJEdjBRQTJNNmxCWnBZWENIeA==
2 ACC Network https://v1.thetvapp.to/hls/ACCNetwork/index.m3u8?token=RTRHNUNiQ0VxZWtMSmIyQXlPcG1MbkRpb1RsUHJ2b3c1WTNhakkxMQ==
3 AMC https://v1.thetvapp.to/hls/AMCEast/index.m3u8?token=VWFiSGNjMkFLUlM5a085ekMwU1pBMzFmMU1qSDBZRVRINllURHBkTw==
.
.
.
30 Disney XD https://v3.thetvapp.to/hls/DisneyXDEast/index.m3u8?token=UTRabEd6QUx5bmFCUmNCTU5VOVNyam1LYjhvbEZVRXJuQTMwY2hMWg==
31 E! https://v3.thetvapp.to/hls/EEast/index.m3u8?token=ckViSWxqYnk5cnVYNGd0S3g3TmdVQUI3Vk5DdGtheFdsdk82S3A0Rw==
32 possible vpn issues:  https://thetvapp.to/tv/espn-live-stream/
33 possible vpn issues:  https://thetvapp.to/tv/espn2-live-stream/
34 ESPNews https://v3.thetvapp.to/hls/ESPNews/index.m3u8?token=ZHJIMGVHeVRoYk0yenZReDBQUnVlRjZoOTFwTGZEekZPaUNnNERDMQ==
35 ESPNU https://v3.thetvapp.to/hls/ESPNU/index.m3u8?token=SzYyTlhhV0l0RWw4OTdTeUlKR0xZcGJRUkVWT0hZVHFUM0hxem9MRg==
.
.
.
113 WE tv https://v2.thetvapp.to/hls/WeTVEast/index.m3u8?token=Q2FrNFpQOW5JODlNdlg0ODBIZ2h5TmhQRVlrUk9LR3NwN2lZMDhmcA==
114 WNBC (New York) NBC East https://v2.thetvapp.to/hls/WNBCDT1/index.m3u8?token=d2JyMEJjTXBpZjlyRUwya3Uxa0ZQN2NFUlNvNnF3ZE5DMVI0SzF5eQ==
115 WNYW (New York) FOX East https://v3.thetvapp.to/hls/WNYWDT1/index.m3u8?token=a1pveTNzTEZ2YzZUTmNQcTdvWDJuUE5TVW1HMEpPMGthWUxiVE9Hcw==

Last edited by 2nHxWW6GkN1l916N3ayz8HQoi; 8th Aug 2024 at 09:33.

--[----->+<]>.++++++++++++.---.--------.
[*drm mass downloader: widefrog*]~~~~~~~~~~~[*how to make your own mass downloader: guide*]

Quote

4th Aug 2024 18:03 #23

imr_saleh

Member

Originally Posted by 2nHxWW6GkN1l916N3ayz8HQoi

Not completely happy with the result, but eh, could be better

Code:

import ast
import re
from html import unescape
from http.cookies import SimpleCookie

import requests
from bs4 import BeautifulSoup

BASE_URL = "https://thetvapp.to"
CHECK_KEYWORDS = ["Network response was not ok ", "my-jwplayer"]


def get_m3u8(source_url):
    response = requests.get(source_url)
    soup = BeautifulSoup(response.text, 'html.parser')

    app_js = soup.find_all('script', src=True)
    app_js = [
        s['src'] for s in app_js
        if s['src'].endswith('.js') and 'app' in s['src'].split("/")[-1]
    ][0]

    app_js = requests.get(app_js).content.decode()
    key = re.findall(
        r'headers:{[^{}]*"X-CSRF-TOKEN"[^{}]*},body:[^{}]*{([^{}]*)}',
        app_js
    )[0].split(":")[0]

    operations = re.findall(
        r'const\s[^=\s]+=\[];([^;]+);',
        app_js
    )

    operations = sorted(operations, key=lambda o: o.count(","), reverse=True)[0].split(",")
    operations = [
        re.findall(r'\(([^()]+)\)', o)[-1].replace('"', "").replace("'", "")
        for o in operations
    ]

    num = [elem for elem in operations if elem.isdigit()]
    sort_num = sorted(num, key=int, reverse=True)
    index_map = {elem: sort_num.index(elem) for elem in num}
    operations = [(elem, index_map[elem]) if elem.isdigit() else (elem, -1) for elem in operations]

    matches = re.findall(
        r"\[\s*(?:[^][]*|\[(?:[^][]*|\[[^]]*])*])*\s*]",
        app_js
    )
    matches = sorted(matches, key=lambda m: len(m), reverse=True)

    list_words = None
    for match in matches:
        if match[0] != "[" or match[-1] != "]":
            continue
        try:
            match = ast.literal_eval(match)
            for m in match:
                if type(m) is not str:
                    raise

            for word in CHECK_KEYWORDS:
                if word not in match:
                    raise
            list_words = match
            break
        except:
            continue

    list_words = list(reversed(list_words))
    list_words = [l for l in list_words if len(l) <= 3]

    key_value = []
    for v, i in operations:
        if i >= 0:
            i = list_words[i]
        else:
            i = v
        key_value.append(i)

    key_value = "".join(key_value)

    csrf_token = soup.find_all('meta', attrs={'name': 'csrf-token'})[0]["content"]
    get_m3u8_endpoint = soup.find_all("div", attrs={"id": "get-m3u8-link"})[0]["data"]
    if not get_m3u8_endpoint.startswith(BASE_URL):
        get_m3u8_endpoint = f'{BASE_URL}{get_m3u8_endpoint}'

    response = dict(response.headers)
    cookies = SimpleCookie()
    cookies.load(response["set-cookie"])
    app_session = {k: v.value for k, v in cookies.items()}["thetvapp_session"]

    response = requests.post(
        get_m3u8_endpoint,
        cookies={'thetvapp_session': app_session},
        headers={'X-CSRF-TOKEN': csrf_token},
        json={key: key_value}
    )
    m3u8_url = response.json()
    return m3u8_url


if __name__ == '__main__':
    response = requests.get(BASE_URL)
    soup = BeautifulSoup(response.text, 'html.parser')

    links = soup.find_all('a', class_='list-group-item')
    index = 0
    for link in links:
        href = link.get('href')
        if not href or not href.startswith('/tv/'):
            continue

        href = f"{BASE_URL}{href}"
        text = unescape(link.text)
        index += 1

        try:
            print(index, text, get_m3u8(href))
        except:
            print("possible vpn issues: ", href)

Code:

1 A&E https://v1.thetvapp.to/hls/AEEast/index.m3u8?token=YnRka1dnbkx2Uko1eUw5bzU0MUlBbHJEdjBRQTJNNmxCWnBZWENIeA==
2 ACC Network https://v1.thetvapp.to/hls/ACCNetwork/index.m3u8?token=RTRHNUNiQ0VxZWtMSmIyQXlPcG1MbkRpb1RsUHJ2b3c1WTNhakkxMQ==
3 AMC https://v1.thetvapp.to/hls/AMCEast/index.m3u8?token=VWFiSGNjMkFLUlM5a085ekMwU1pBMzFmMU1qSDBZRVRINllURHBkTw==
.
.
.
30 Disney XD https://v3.thetvapp.to/hls/DisneyXDEast/index.m3u8?token=UTRabEd6QUx5bmFCUmNCTU5VOVNyam1LYjhvbEZVRXJuQTMwY2hMWg==
31 E! https://v3.thetvapp.to/hls/EEast/index.m3u8?token=ckViSWxqYnk5cnVYNGd0S3g3TmdVQUI3Vk5DdGtheFdsdk82S3A0Rw==
possible vpn issues:  https://thetvapp.to/tv/espn-live-stream/
possible vpn issues:  https://thetvapp.to/tv/espn2-live-stream/
34 ESPNews https://v3.thetvapp.to/hls/ESPNews/index.m3u8?token=ZHJIMGVHeVRoYk0yenZReDBQUnVlRjZoOTFwTGZEekZPaUNnNERDMQ==
35 ESPNU https://v3.thetvapp.to/hls/ESPNU/index.m3u8?token=SzYyTlhhV0l0RWw4OTdTeUlKR0xZcGJRUkVWT0hZVHFUM0hxem9MRg==
.
.
.
113 WE tv https://v2.thetvapp.to/hls/WeTVEast/index.m3u8?token=Q2FrNFpQOW5JODlNdlg0ODBIZ2h5TmhQRVlrUk9LR3NwN2lZMDhmcA==
114 WNBC (New York) NBC East https://v2.thetvapp.to/hls/WNBCDT1/index.m3u8?token=d2JyMEJjTXBpZjlyRUwya3Uxa0ZQN2NFUlNvNnF3ZE5DMVI0SzF5eQ==
115 WNYW (New York) FOX East https://v3.thetvapp.to/hls/WNYWDT1/index.m3u8?token=a1pveTNzTEZ2YzZUTmNQcTdvWDJuUE5TVW1HMEpPMGthWUxiVE9Hcw==

i was struggling for 6 hrs trying to figure out how the javascript works

i didn't know about the reversed words

Code:

list_words = list(reversed(list_words))
list_words = [l for l in list_words if len(l) <= 3]

what a genius widefrog
great work

Quote

4th Aug 2024 23:47 #24

notaghost

Member

It was all in the JavaScript code, and I was going to share it if he hadn’t. I’m not sure why you couldn’t figure it out—maybe you’re still getting familiar with JavaScript? Either way, he did an amazing job.

discord=notaghost9997

Quote

5th Aug 2024 03:08 #25

Feels Good Man

Originally Posted by imr_saleh

i was struggling for 6 hrs trying to figure out how the javascript works

i didn't know about the reversed words

Since you seem like a nice fella who genuinely likes learning and scripting, I'm gonna explain my line of thought that led me to that mediocre solution. After all, what's the point of all these fancy scripts if people can't write new ones when they're gonna inevitably fail after a few days/weeks, especially if the site dev is lurking like a rat somewhere.

I'm gonna skip over the data scraping basics since you know them probably, and they have also been explained on videohelp forum guides. Since the hardest challenge is obtaining that magic payload pair, key/value, I'm gonna focus on it. The problem is gonna be split into 2 smaller issues, the key and the value.

By using the HAR trick on that key (in my case it is "amOJQwpfeNEMtHDipfKCfmshvqSZ"), you can instantly find it in a JS file. I'm gonna use a formatted JS source code on Chrome to showcase the code snippets.
Code:
await t[n(333)](fetch, i1, {
        method: t[n(369)],
        headers: {
            "Content-Type": n(339),
            "X-CSRF-TOKEN": c1()
        },
        body: JSON[n(340)]({
            amOJQwpfeNEMtHDipfKCfmshvqSZ: S5
        })
    })

.... or ....

const x = await fetch(i1, {
            method: t[n(326)],
            headers: {
                "Content-Type": n(339),
                "X-CSRF-TOKEN": c1()
            },
            body: JSON.stringify({
                amOJQwpfeNEMtHDipfKCfmshvqSZ: C5
            })
        });
Since that value "amOJQwpfeNEMtHDipfKCfmshvqSZ" is magic, you're gonna have to extract it as well. You could make a regex that picks 28 character strings in length, but it's horrible since you could also pick false solutions. Instead, try to find a fixed anchor point where you can start building a regex pattern. By ctrl+f searching "X-CSRF-TOKEN" in that JS file will bring you back to the same code snippets, so now you have a fixed point where you can develop your regex. It's up to you how you do it. Just keep in mind, don't build your regex based on the formatted code, look at the raw code (which is one long continuous string) once you know what you wanna do.

Now the hard part is the value. A search in the HAR file brings no results, so maybe it's hidden in some encoding??? Before doing drastic things and checking all requests manually in the network tab, let's just take a look at the snippets. It seems to be building a payload, and the value of that fixed key should be the one we're looking for. I'm gonna focus on the 2nd code snippet (const x = blabla) and place a debugger breakpoint there.

[Attachment 81234 - Click to enlarge]

Jackpot. So the value is also found there (its content might be different for you). The question becomes now, from where is taken / how is it generated. That "C5" is not a function call, but instead a variable. By going a little backward and seeing the biggest function that encapsulates all, we get this:
Code:
async function O5() {
    const n = W
      , t = {
        fcVHD: n(337),
        HtOev: function(x, r) {
            return x + r
        }
    };
    try {
        const x = await fetch(i1, {
            method: t[n(326)],
            headers: {
                "Content-Type": n(339),
                "X-CSRF-TOKEN": c1()
            },
            body: JSON.stringify({
                amOJQwpfeNEMtHDipfKCfmshvqSZ: C5
            })
        });
So "C5" is neither a variable created in the function, nor a received parameter. It means it's created somewhere outside the function in the JS file. That's good. Progress. Since it is a variable, that must mean it has to be assigned somewhere something. Click on the first line of the JS file and you can search the first appearance of "C5 = " since the js file is formatted (enable case-sensitive search). You can also place a debugger breakpoint on the line where you find something and refresh the page.

[Attachment 81235 - Click to enlarge]

Seems like the value of m0 is a list and its contents can be appended into 1 big string that represents the key value we want. I'm gonna completely ignore what that function "Co" is doing since it's useless. We already know what it's supposed to do. So, from "C5" we go to "m0" which is another variable, only now it is a list, not a string, but equivalent nonetheless. By searching from the start of the js file for "m0 = " we get
Code:
const m0 = [];
m0.push(W(396)),
m0[W(343)](W(376)),
m0[W(343)](W(362)),
m0[W(343)]("tkm"),
m0[W(343)]("gvv"),
m0[W(343)](W(325)),
m0[W(343)]("tO"),
m0[W(343)]("yS"),
m0[W(343)]("my"),
m0[W(343)]("ZO");
const Rx = [];
...
We can already see some fixed parts of the known string there: "tkm", "gvv", etc. But what is W() supposed to do? Also, what is m0[W(343)] even doing? In javascript, m0 is an object. Like all objects, you can access their inner methods like you would access the value of a dictionary, by using keys. So W(343) is a string that is supposed to represent an inner function of the list object. Since m0 is a list and the JS code is building that list, then obviously W(343) should represent the "push" method. But we're gonna confirm that later. Let's place a debugger breakpoint at the first push and see where the call of W() leads us:
Code:
const W = dt;
function dt(n, t) {
    const x = gi();
    return dt = function(r, e) {
        return r = r - 319,
        x[r]
    }
    ,
    dt(n, t)
}
We enter into a "dt" function. Weirdly, it's not called "W" but you can already see a little higher that the function is assigned to a variable. So it makes sense now. The "dt" function seems to be doing mostly things on its own without external variables, except gi(). Let's continue the debugging and see where gi() leads us.
Code:
function gi() {
    const n = ["my-jwplayer", ... blabla ...,"4xIvEzO", "TGL", "OFlaR"];
    return gi = function() {
        return n
    }
    ,
    gi()
}
Jackpot. The value of n is just a list of constant strings and the gi function is not using anything else external. This stops being a data scraping issue and now becomes a problem of understanding the logic flow since you have all the necessary snippet codes that don't use anything else external. This is no longer a "you don't know what you don't know" scenario which is good.

Understanding the flow of the found code is now a matter of experience. You could try your luck with ChatGPT but for most other sites you're gonna have to know Javascript and have some coding knowledge. Now we're lucky that the site dev didn't bother obfuscating it that much. In short, the function W() returns elements from the fixed "n" list by receiving an index. That index also has an offset since you have the operation "r - 319".

343 - 319 = 24 => W(343) is actually "push" which is what we deduced previously. If you had access to the fixed string list, the offset operation, and the push operation, you could build the list yourself. The offset operation can be ignored if you realize that the js code is not gonna use ALL of the possible strings from the word list. At first glance, it just looks like the only ones used have a length <= 3. So the problem becomes about data scraping again: how to get the fixed list and the list of operations. Both can be done by doing the same thing. Find an anchor and start building your regex.

Now what exactly I don't find good about this approach? It's kinda hardcoded and if the JS code is obfuscated even more, you're gonna have to change it. I think in this case, a selenium approach might be better.

Originally Posted by notaghost

I was going to share it if he hadn’t

You can always post it if you want. I doubt people are gonna mind. I still think it could be further generalized.

--[----->+<]>.++++++++++++.---.--------.
[*drm mass downloader: widefrog*]~~~~~~~~~~~[*how to make your own mass downloader: guide*]

Quote

5th Aug 2024 05:41 #26

notaghost

Member

Originally Posted by 2nHxWW6GkN1l916N3ayz8HQoi

Originally Posted by notaghost

I was going to share it if he hadn’t

You can always post it if you want. I doubt people are gonna mind. I still think it could be further generalized.

I’m not sure what’s left to share.

discord=notaghost9997

Quote

5th Aug 2024 22:43 #27

imr_saleh

Member

Originally Posted by notaghost

It was all in the JavaScript code, and I was going to share it if he hadn’t. I’m not sure why you couldn’t figure it out—maybe you’re still getting familiar with JavaScript? Either way, he did an amazing job.

i'm not familiar with JS , not as much as i focus on Python, but at least i try

Originally Posted by 2nHxWW6GkN1l916N3ayz8HQoi

Since you seem like a nice fella who genuinely likes learning and scripting, I'm gonna explain my line of thought that led me to that mediocre solution. After all, what's the point of all these fancy scripts if people can't write new ones when they're gonna inevitably fail after a few days/weeks, especially if the site dev is lurking like a rat somewhere.

Thank you for the detailed explanation, the script has stopped working. Maybe the site developers use new tricks every day, anyway as you said selenium approach might be better in this case.

Last edited by imr_saleh; 5th Aug 2024 at 22:53.

Quote

6th Aug 2024 01:55 #28