How to extract and use an m3u8 URL to download from a 24/7 stream?

28th Jan 2022 10:16 #1
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
I am using an Ubuntu 20.04.3 Server machine to download streaming videos from news sites to 1 hour segments in mp4 format for daytime viewing. I have a 6-9 hour time difference to the sources so this is why I am doing this.

I am able to script the extraction of the m3u8 stream URL from several sites, but unfortunately it does not work on all.

For those that work I use this script code:

Code:

CMD="curl -s \"${STREAMURL}\" | grep -o -e \"https://.\+m3u8\" | head -n 1" M3U8=$(eval $CMD)

Here the variable STREAMURL is the URL to the webpage on which the player resides and is playing the news shows.

In other cases I have to use FireFox and while playing the video hit F12 and then watch the Network/All tab for an m3u8 line appearing, click it and then right click and select Copy/URL, which results in something like this:

Code:

http://1128480543.rsc.cdn77.org/wF0Xk_UoBZzEHzrCGmG7AA==,1643380502/1128480543/tracks-v1a1/mono.m3u8

With this m3u8 URL I can then download the video like this:

Code:

CMD="ffmpeg -hide_banner -user_agent \"Mozilla\" -i ${M3U8} -vf scale=w=-4:h=360 -c:v libx264 -preset fast -crf 26 -c:a copy -t $CAPTURETIME $TARGETFILE" or CMD="ffmpeg -hide_banner -referer \"${VIDEOURL}\" -i \"${M3U8}\" -vf scale=w=-4:h=480 -c:v libx264 -preset fast -crf 26 -c:a copy -t ${CAPTURETIME} ${TARGETFILE}"

Here the variables are:
VIDEOURL - The URL to the page holding the player
M3U8 - The m3u8 stream URL retrieved as described above
CAPTURETIME - The output video duration in seconds
TARGETFILE - The output mp4 file obviously...

Notice that on some sites I have to use -user_agent and on other sites -referer, it depends on the site...

I run the download script as an at job starting a short time before the show starts and ending a bit after it ends.

This works OK for the few sites I have gotten it to work on, but I have the following problem:

The M3U8 URL manually extracted may change, in some cases it changes daily or more often (a part of it like the big number 1643380502 changes...)
Here I really need to get hold of an automatic extraction procedure which works and can be used as part of the download script.

Any ideas on how to do this?
I.e. how to extract the m3u8 URL from the websites that do not respond to the command I showed above?

Like these:

Code:

http://www.freeintertv.com/view/id-2565 https://livenewschat.eu/politics https://livenewsof.com/msnbc-live-stream
Quote
28th Jan 2022 14:55 #2
jack_666

View Profile

View Forum Posts
Member

Join Date
May 2021
Hi BosseB

I started with the second of the three urls i.e. https://livenewschat.eu/politics

This feed uses the same m3u8 url .... https://ligma.cdn.livenewschat.eu/hls/msnbc_live/index.m3u8 . Master location is constant.

Use this curl command ( for windows ... you can easily modify for your Ubuntu) to download the m3u8. Then use ffmpeg to capture the stream as per usual.

curl -k "https://ligma.cdn.livenewschat.eu/hls/msnbc_live/index.m3u8" ^
-H "Connection: keep-alive" ^
-H "Pragma: no-cache" ^
-H "Cache-Control: no-cache" ^
-H "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.9 Safari/537.36" ^
-H "Accept: */*" ^
-H "Origin: https://livenewschat.eu" ^
-H "Sec-Fetch-Site: same-site" ^
-H "Sec-Fetch-Mode: cors" ^
-H "Sec-Fetch-Dest: empty" ^
-H "Referer: https://livenewschat.eu/" ^
-H "Accept-Language: en-US,en;q=0.9" ^
-H "dnt: 1" ^
-H "sec-gpc: 1" ^
--compressed -o livenewschat.m3u8

I'll look at the other two as time permits.

Quote
28th Jan 2022 15:13 #3
jack_666

View Profile

View Forum Posts
Member

Join Date
May 2021
Hi BosseB

looking at #3 https://livenewsof.com/msnbc-live-stream

https://rtmp.livenewsof.com/hls/fx2.m3u8 <== Are you saying that this URL changes often?

Quote
28th Jan 2022 16:50 #4
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
Originally Posted by jack_666

Hi BosseB

looking at #3 https://livenewsof.com/msnbc-live-stream

https://rtmp.livenewsof.com/hls/fx2.m3u8 <== Are you saying that this URL changes often?

No, that is one of the sites I cannot extract m3u8 URL from in a script, so I did it using the Firefox browser. But I believe it will not change often if at all.
The others are bigger and contain number strings with 6-10 digits and here I have seen that they change some more often that the others. Sometimes the character strings like C90Zrw8DEqphyq8lGfWOYg also change, so this is why I would need a way to update just before the download starts.

Quote
28th Jan 2022 16:53 #5
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
Originally Posted by jack_666

Hi BosseB

I started with the second of the three urls i.e. https://livenewschat.eu/politics

This feed uses the same m3u8 url .... https://ligma.cdn.livenewschat.eu/hls/msnbc_live/index.m3u8 . Master location is constant.

Use this curl command ( for windows ... you can easily modify for your Ubuntu) to download the m3u8. Then use ffmpeg to capture the stream as per usual.

Code:

curl -k "https://ligma.cdn.livenewschat.eu/hls/msnbc_live/index.m3u8" ^ -H "Connection: keep-alive" ^ -H "Pragma: no-cache" ^ -H "Cache-Control: no-cache" ^ -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.9 Safari/537.36" ^ -H "Accept: */*" ^ -H "Origin: https://livenewschat.eu" ^ -H "Sec-Fetch-Site: same-site" ^ -H "Sec-Fetch-Mode: cors" ^ -H "Sec-Fetch-Dest: empty" ^ -H "Referer: https://livenewschat.eu/" ^ -H "Accept-Language: en-US,en;q=0.9" ^ -H "dnt: 1" ^ -H "sec-gpc: 1" ^ --compressed -o livenewschat.m3u8

I'll look at the other two as time permits.

Is the above a command that goes on the command line as a single line?
If so it looks pretty long...
What do the ^ characters do? Are they some kind of Windows special char?

EDIT:
Do you mean that the curl command above should be used to stream the video into ffmpeg like this:

Code:

curl <massive set of arguments as seen above> | ffmpeg <formatting arguments to get the output in the correct geometry> -t 3600 output.mp4

If so does the timeout -t 3600 for ffmpeg work to stop curl too when the download is complete?

I don't really understand your suggestion...

Or does the curl command above result in an m3u8 URL printed on the command line ready to be used inside my ffmpeg command?
Last edited by BosseB; 28th Jan 2022 at 18:16.
Quote

28th Jan 2022 18:08 #6

Member

hi BosseB

What do the ^ characters do? <== It breaks a long line of code into smaller and more readable parts.

Bash shell uses this
Code:
 -H 'Accept: */*' \
  -H 'Origin: https://livenewschat.eu' \
  -H 'Sec-Fetch-Site: same-site' \
  -H 'Sec-Fetch-Mode: cors' \
  -H 'Sec-Fetch-Dest: empty' \
  -H 'Referer: https://livenewschat.eu/' \
  -H 'Accept-Language: en-US,en;q=0.9' \
  -H 'dnt: 1' \
  -H 'sec-gpc: 1' \

Quote

28th Jan 2022 18:17 #7
jack_666

View Profile

View Forum Posts
Member

Join Date
May 2021
you wrote

The others are bigger and contain number strings with 6-10 digits and here I have seen that they change some more often that the others. Sometimes the character strings like C90Zrw8DEqphyq8lGfWOYg also change, so this is why I would need a way to update just before the download starts.

C90Zrw8DEqphyq8lGfWOYg <== Time codes. Sets a end of life for the url

I would need a way to update just before the download starts. <== That is the Holy Grail of automation. Many are seeking this out but alas no Sir Galahad (to my knowledge).

Quote
28th Jan 2022 18:24 #8
jack_666

View Profile

View Forum Posts
Member

Join Date
May 2021
Do you mean that the curl command above should be used to stream the video into ffmpeg

No the command download the latest m3u8 file .... look at the code.

-o livenewschat.m3u8 <== the curl will download the m3u8 automatically to the file livenewschat.m3u8 saved on your pwd (present work directory)

Then you use ffmpeg together with this m3u8 to retrieve the video.

the curl command above result in an m3u8 URL printed on the command line ready to be used inside my ffmpeg command

Correct

Quote
29th Jan 2022 06:50 #9
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
Originally Posted by jack_666

Do you mean that the curl command above should be used to stream the video into ffmpeg

No the command download the latest m3u8 file .... look at the code.

Well I was into the piping of data so ffmpeg could work while the download was ongoing...
Early on I downloaded to ts files and then after download I tried to reformat, but the reformat took a long time so I had basically a process for conversion running a half hour. That is when I realized that if ffmpeg could get the stream directly I could put the processing in the same ffmpeg command and it would be ready when the stream stopped.

So that is my approach and it works well for most streams that ffmpeg can be set to download...

-o livenewschat.m3u8 <== the curl will download the m3u8 automatically to the file livenewschat.m3u8 saved on your pwd (present work directory)

Then you use ffmpeg together with this m3u8 to retrieve the video.

the curl command above result in an m3u8 URL printed on the command line ready to be used inside my ffmpeg command

Correct

Is there no way to *pipe* the stream from curl into ffmpeg while it is happening?
That would be the solution if it could be done...

The stream I am having most problem with concerning changing m3u8 url's is

Code:

http://www.freeintertv.com/view/id-2565

This site offers a host of different streams so the 2565 is an example of the MSNBC stream I am looking for.
That might complicate extraction from it though...
Quote

30th Jan 2022 05:47 #10

BosseB

Member

Originally Posted by jack_666

Hi BosseB

I started with the second of the three urls i.e. https://livenewschat.eu/politics

This feed uses the same m3u8 url .... https://ligma.cdn.livenewschat.eu/hls/msnbc_live/index.m3u8 . Master location is constant.

Use this curl command ( for windows ... you can easily modify for your Ubuntu) to download the m3u8. Then use ffmpeg to capture the stream as per usual.

Code:

curl -k "https://ligma.cdn.livenewschat.eu/hls/msnbc_live/index.m3u8" ^
  -H "Connection: keep-alive" ^
  -H "Pragma: no-cache" ^
  -H "Cache-Control: no-cache" ^
  -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.9 Safari/537.36" ^
  -H "Accept: */*" ^
  -H "Origin: https://livenewschat.eu" ^
  -H "Sec-Fetch-Site: same-site" ^
  -H "Sec-Fetch-Mode: cors" ^
  -H "Sec-Fetch-Dest: empty" ^
  -H "Referer: https://livenewschat.eu/" ^
  -H "Accept-Language: en-US,en;q=0.9" ^
  -H "dnt: 1" ^
  -H "sec-gpc: 1" ^
  --compressed -o livenewschat.m3u8

I'll look at the other two as time permits.

So I tried to modify your command to use on Linux in the following way:

Code:

curl -k "https://ligma.cdn.livenewschat.eu/hls/msnbc_live/index.m3u8" \
  -H "Connection: keep-alive" \
  -H "Pragma: no-cache" \
  -H "Cache-Control: no-cache" \
  -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.9 Safari/537.36" \
  -H "Accept: */*" \
  -H "Origin: https://livenewschat.eu" \
  -H "Sec-Fetch-Site: same-site" \
  -H "Sec-Fetch-Mode: cors" \
  -H "Sec-Fetch-Dest: empty" \
  -H "Referer: https://livenewschat.eu/" \
  -H "Accept-Language: en-US,en;q=0.9" \
  -H "dnt: 1" \
  -H "sec-gpc: 1" \
  --compressed -o livenewschat.m3u8

But it ran for only a second or two and left an m3u8 file containing this:

Code:

#EXTM3U
#EXT-X-VERSION:3
#EXT-X-MEDIA-SEQUENCE:360075
#EXT-X-TARGETDURATION:6
#EXT-X-KEY:METHOD=AES-128,URI="1643538964500.key",IV=0x00000000000000000000017EAA8E6014
#EXTINF:6.006,
1643538976500.ts
#EXTINF:6.006,
1643538982500.ts
#EXT-X-KEY:METHOD=AES-128,URI="1643538988500.key",IV=0x00000000000000000000017EAA8EBDD4
#EXTINF:6.006,
1643538988500.ts
#EXTINF:6.006,
1643538994500.ts
#EXTINF:6.006,
1643539001000.ts
#EXTINF:6.006,
1643539007000.ts
#EXT-X-KEY:METHOD=AES-128,URI="1643539013000.key",IV=0x00000000000000000000017EAA8F1D88
#EXTINF:6.006,
1643539013000.ts
#EXTINF:6.006,
1643539019000.ts
#EXTINF:6.006,
1643539025000.ts
#EXTINF:6.006,
1643539031000.ts

It does not seem like this worked as it should on Linux...

But when tested on Windows10 the result was this:

Code:

curl: option --compressed: the installed libcurl version doesn't support this
curl: try 'curl --help' for more information

So do you need a special version of curl?
Mine is as follows:

Code:

curl --version
curl 7.79.1 (Windows) libcurl/7.79.1 Schannel
Release-Date: 2021-09-22
Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp
Features: AsynchDNS HSTS IPv6 Kerberos Largefile NTLM SPNEGO SSL SSPI UnixSockets

Quote

30th Jan 2022 09:20 #11
LZAA

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2017
http://www.freeintertv.com/view/id-2565

https://mega.nz/file/gbp2WRrS#ua_jE1yAlfeU6_LNrKyYCKd21Gc1uBR2fmzwpXjXPMQ

Quote
30th Jan 2022 10:17 #12
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
Originally Posted by LZAA

http://www.freeintertv.com/view/id-2565

https://mega.nz/file/gbp2WRrS#ua_jE1yAlfeU6_LNrKyYCKd21Gc1uBR2fmzwpXjXPMQ

Here is what is hidden behind the second URL above to make it clearer:

Code:

#!/bin/sh set -euC URL=$(curl -qSs -d 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11' 'http://www.freeintertv.com/myAjax/get_item_m3u8/' | grep -Eo '(http|https)://[[:alnum:].,/=*]*index\.m3u8') ffmpeg -i "$URL" -c copy video.mp4

What does line "set -euC" do in the script?

When I run the script on my Linux box it returns exactly nothing at all....
Quote
30th Jan 2022 12:37 #13
LZAA

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2017
In PC it works.

Quote
30th Jan 2022 14:13 #14
jack_666

View Profile

View Forum Posts
Member

Join Date
May 2021
What does line "set -euC" do in the script?

https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html

Quote
30th Jan 2022 14:56 #15
jack_666

View Profile

View Forum Posts
Member

Join Date
May 2021
convert this windows script to linux in order to capture the m3u8 url

curl -qSs -d "chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.f reeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11" "http://www.freeintertv.com/myAjax/get_item_m3u8/" | sed -e "s#^.*http$.*$m3u8.*$#http\1m3u8#"

the above code captures the below url

http://1128480543.rsc.cdn77.org/r8_pCnveABrEGFhLE7ntLA==,1643574905/1128480543/index.m3u8

Quote
30th Jan 2022 18:11 #16
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
So I tried to run the text above direct in the terminal on Linux after I had discovered and removed the extra space in the part that read:

Code:

2Fwww.f reeintertv.com%2 ^

But the result was:

Code:

sed: -e expression #1, char 33: unterminated `s' command (23) Failed writing body

So I decided to skip the sed part to see what was actually coming out of the curl call:

Code:

$ curl -qSs -d "chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11" "http://www.freeintertv.com/myAjax/get_item_m3u8/" playlist[0]['file']='http://1128480543.rsc.cdn77.org/RxrBwWHJG3JXeR_UqGbxUA==,1643582113/1128480543/index.m3u8'; get_item.showPlayer(); bosse@ubuntuserv:~/www/MSNBC/download$

So there is something the matter with the pipe into sed, the sed command is this:

Code:

sed -e "s#^.*http$.*$m3u8.*$#http\1m3u8#"

Since I have no idea what is going on in the sed part I cannot interpret its error message...
What does the "unterminated `s' command" mean?

And from where does the string "bXNuYmNfbGl2ZQ" in the curl command come from?
If it is changing from time to time then it won't work for long...
Quote
30th Jan 2022 18:29 #17
LZAA

View Profile

View Forum Posts

Private Message
Member

Join Date
Dec 2017
Try it:

echo ffmpeg

Quote

30th Jan 2022 22:06 #18

dark125

Member

You could use python, something like this

Code:

#! /usr/bin/python
import os
import ffmpy
import requests
import re

hds = {
    'Connection': 'keep-alive',
    'Accept': 'text/plain, */*; q=0.01',
    'X-Requested-With': 'XMLHttpRequest',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
    'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
    'Origin': 'http://www.freeintertv.com',
    'Accept-Language': 'en-US,en;q=0.9,es;q=0.8,pt;q=0.7,cs;q=0.6,fr;q=0.5,zh-TW;q=0.4,zh;q=0.3'
    }


url_canal = 'http://www.freeintertv.com/myAjax/get_item_m3u8/'
data_raw = 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11'


rs = requests.post(url_canal, headers=hds, data=data_raw)
cnt = rs.text
pattern = r"='(.*)'"
x = re.search(pattern, cnt)
url_ch = x.group(1)



ff = ffmpy.FFmpeg(inputs={url_ch: '-headers "User-Agent:  Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36"'}, outputs={'MSNBC.mp4': '-acodec copy -vcodec copy -t 35'}, global_options="-y -hide_banner -loglevel warning")
ff.run()

Quote

31st Jan 2022 01:56 #19
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
Tested the python script:

Code:

$ ./pytest Traceback (most recent call last): File "./pytest", line 3, in <module> import ffmpy ModuleNotFoundError: No module named 'ffmpy'

Same if I changed
#! /usr/bin/python
to
#! /usr/bin/python3
Same if I test on a different Linux server...

I have never used python so I don't know how to handle it.
Quote
31st Jan 2022 01:59 #20
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
Originally Posted by LZAA

Try it:

echo ffmpeg

Why are you posting something like this?
Trolling?

Quote
31st Jan 2022 02:16 #21
ElCap

View Profile

View Forum Posts

Private Message
Member

Join Date
Jan 2022
it just means that you dont have ffmpy installed.

Code:

pip3 install ffmpy

then run again

Code:

python3 pytest.py
Quote
31st Jan 2022 02:29 #22
dark125

View Profile

View Forum Posts

Private Message
Member

Join Date
Aug 2020

Location
Lima
Originally Posted by BosseB

Tested the python script:

Code:

$ ./pytest Traceback (most recent call last): File "./pytest", line 3, in <module> import ffmpy ModuleNotFoundError: No module named 'ffmpy'

Same if I changed
#! /usr/bin/python
to
#! /usr/bin/python3
Same if I test on a different Linux server...

I have never used python so I don't know how to handle it.

You must first install the package before you can use it in your code. Run the following command to install the package and its dependencies.

Code:

pip install ffmpy
Quote
31st Jan 2022 02:34 #23
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
So now I get this instead:

Code:

$ python3 pytest.py [http @ 0x5632785d9200] No trailing CRLF found in HTTP header. Adding it.

After the message above is printed nothing happens for a while and then the cursor is returned with no further output.
Note:
I am doing this on LinuxMint 20.03
Quote
31st Jan 2022 02:49 #24
dark125

View Profile

View Forum Posts

Private Message
Member

Join Date
Aug 2020

Location
Lima
Originally Posted by BosseB

So now I get this instead:

Code:

$ python3 pytest.py [http @ 0x5632785d9200] No trailing CRLF found in HTTP header. Adding it.

After the message above is printed nothing happens for a while and then the cursor is returned with no further output.

The script will download 35 seconds of the MSNBC channel, you will find the mp4 file inside the same folder as the script.
You have to modify the ffmpeg command to suit what you want to do.

Code:

ff = ffmpy.FFmpeg(inputs={url_ch: '-headers "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36"'}, outputs={'MSNBC.mp4': '-acodec copy -vcodec copy -t 35'}, global_options="-y -hide_banner -loglevel warning")
Quote
31st Jan 2022 04:28 #25
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
Originally Posted by dark125

Originally Posted by BosseB

So now I get this instead:

Code:

$ python3 pytest.py [http @ 0x5632785d9200] No trailing CRLF found in HTTP header. Adding it.

After the message above is printed nothing happens for a while and then the cursor is returned with no further output.

The script will download 35 seconds of the MSNBC channel, you will find the mp4 file inside the same folder as the script.
You have to modify the ffmpeg command to suit what you want to do.

Code:

ff = ffmpy.FFmpeg(inputs={url_ch: '-headers "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36"'}, outputs={'MSNBC.mp4': '-acodec copy -vcodec copy -t 35'}, global_options="-y -hide_banner -loglevel warning")

OK, I did not realize that it would download the stream itself...
I found the mp4 now and it is OK (except the wrong size, but that is easily fixed).

But I already have a suite of scripts that handle the downloads but they need the m3u8 stream URL to work.

In some cases I have been able to script the extraction of the current m3u8 URL from a number of sites so that it can be extracted a few minutes before the actual download starts and saved to a file read by the download script.

But for some sites I have not been able to automate this extraction so I have used the F12 debug mode of FireFox to read the m3u8 URL and write it manually to the file. This has worked as long as the m3u8 URL does not change over time, but unfortunately this happens on some sites where there seems to be strings like EJZgovLg2izA2gQ or 1643299593 embedded as part of the URL. These items are often short-lived and therefore a scripted extraction is needed.
I have noted that the python code above contains this:

Code:

data_raw = 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11'

This is the type of string that in my experience changes with time and so needs automated extraction, but that is not shown in your example. From where did you get these values?

So this thread is mainly about automatically finding the m3u8 stream url to be used with the ffmpeg command, which looks like this:

Code:

CMD="ffmpeg -hide_banner ${MODE} -i \"${M3U8URL}\" -vf scale=w=-4:h=480 -c:v libx264 -preset fast -crf 26 -c:a copy -t ${CAPTURETIME} ${TARGETFILE}"

Here the variables are:
MODE: "-user_agent \"Mozilla\"" or "-referer \"${VIDEOURL}\"" depending on site
VIDEOURL: The page URL used as referer
M3U8URL: The m3u8 stream url we are discussing here
CAPTURETIME: The download time in seconds
TARGETFILE: The output mp4 file

If I manually find the m3u8 url via FireFox then the ffmpeg works OK.
But at irregular times the m3u8 changes (in some sites only) so it has to be extracted again...
This extraction is what I am looking for...
Quote
31st Jan 2022 05:18 #26
dark125

View Profile

View Forum Posts

Private Message
Member

Join Date
Aug 2020

Location
Lima
Originally Posted by BosseB

Code:

data_raw = 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11'

This is the type of string that in my experience changes with time and so needs automated extraction, but that is not shown in your example. From where did you get these values?

That parameter doesn't seem to change, the decode is this

Code:

chname=bXNuYmNfbGl2ZQ==&ch=http://www.freeintertv.com/externals/tv-russia/smotret-tv3-online&html5=11

if we keep decoding bXNuYmNfbGl2ZQ== it is

Code:

msnbc_live

It is simply the name of the channel, so if you want to download another channel, we must change the data_raw, for cnn it would be this

Code:

data_raw = 'chname=Y25uX2xpdmU%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11'

If you want to automate it you can create a bash script to run the python file and use crontab

Code:

#!/bin/bash cd /path/to/script ./download_msnbc.py
Quote

31st Jan 2022 06:34 #27

dark125

Member

Originally Posted by BosseB
So there is something the matter with the pipe into sed, the sed command is this:
Code:
sed -e "s#^.*http$.*$m3u8.*$#http\1m3u8#"
Since I have no idea what is going on in the sed part I cannot interpret its error message...
What does the "unterminated `s' command" mean?
You can also try this
Code:
#!/bin/sh
set -euC
URL=$(curl -qSs -d 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11' 'http://www.freeintertv.com/myAjax/get_item_m3u8/' | sed -e "s/^.*http$.*$m3u8.*$/http\1m3u8/g")
ffmpeg -y -hide_banner -loglevel warning -headers "User-Agent:  Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36" -i "$URL" -acodec copy -vcodec copy -t 35 MSNBC.mp4

Quote

31st Jan 2022 07:53 #28
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
Originally Posted by dark125

Originally Posted by BosseB

Code:

data_raw = 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11'

This is the type of string that in my experience changes with time and so needs automated extraction, but that is not shown in your example. From where did you get these values?

That parameter doesn't seem to change, the decode is this

Code:

chname=bXNuYmNfbGl2ZQ==&ch=http://www.freeintertv.com/externals/tv-russia/smotret-tv3-online&html5=11

if we keep decoding bXNuYmNfbGl2ZQ== it is

Code:

msnbc_live

It is simply the name of the channel, so if you want to download another channel, we must change the data_raw, for cnn it would be this

Code:

data_raw = 'chname=Y25uX2xpdmU%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11'

Interesting!
So how is the strange looking string decoded into msnbc_live?
I know that %2F is encoding of / but how is bXNuYmNfbGl2ZQ decoded/encoded?

EDIT:
I tried base64 like this:

Code:

$ echo 'bXNuYmNfbGl2ZQ' | base64 -d msnbc_livebase64: invalid input

Then I tried this:

Code:

$ echo 'bXNuYmNfbGl2ZQo=' | base64 -d msnbc_live

So somehow it is base64 but the input string has to be padded in some way...
/EDIT

I tried putting this into my existing URL-extracting script:

Code:

M3U8URL=$(curl -qSs -d 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11' 'http://www.freeintertv.com/myAjax/get_item_m3u8/' | sed -e "s/^.*http$.*$m3u8.*$/http\1m3u8/g")

Then this is saved to the file read by the main download script and it does work!

Thank you so much for providing this and explaining the source of the strange looking strings!
Last edited by BosseB; 31st Jan 2022 at 12:18. Reason: Found solution by myself
Quote
1st Feb 2022 08:57 #29
BosseB

View Profile

View Forum Posts

Private Message
Member

Join Date
Feb 2021

Location
Sweden
Just one question about something that mystifies me:
Given the way the URL looks like, how does the web-server or the browser or whatever it is deduce that a string like this:

Code:

chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11

is partially base64 encoded?
I do not understand how anyone can automatically treat this as an URL by decoding certain parts of it...
The % notation used for control characters and the like I understand but not use of base64.
Quote

How to extract and use an m3u8 URL to download from a 24/7 stream?

Thread Tools

Search Thread

Similar Threads

How do I download the following m3u8 url?

How to extract stream URL??

How to download m3u8 stream?

How to extract stream URL

Help extract NBA video stream from m3u8