I am using an Ubuntu 20.04.3 Server machine to download streaming videos from news sites to 1 hour segments in mp4 format for daytime viewing. I have a 6-9 hour time difference to the sources so this is why I am doing this.
I am able to script the extraction of the m3u8 stream URL from several sites, but unfortunately it does not work on all.
For those that work I use this script code:
Here the variable STREAMURL is the URL to the webpage on which the player resides and is playing the news shows.Code:CMD="curl -s \"${STREAMURL}\" | grep -o -e \"https://.\+m3u8\" | head -n 1" M3U8=$(eval $CMD)
In other cases I have to use FireFox and while playing the video hit F12 and then watch the Network/All tab for an m3u8 line appearing, click it and then right click and select Copy/URL, which results in something like this:
With this m3u8 URL I can then download the video like this:Code:http://1128480543.rsc.cdn77.org/wF0Xk_UoBZzEHzrCGmG7AA==,1643380502/1128480543/tracks-v1a1/mono.m3u8
Here the variables are:Code:CMD="ffmpeg -hide_banner -user_agent \"Mozilla\" -i ${M3U8} -vf scale=w=-4:h=360 -c:v libx264 -preset fast -crf 26 -c:a copy -t $CAPTURETIME $TARGETFILE" or CMD="ffmpeg -hide_banner -referer \"${VIDEOURL}\" -i \"${M3U8}\" -vf scale=w=-4:h=480 -c:v libx264 -preset fast -crf 26 -c:a copy -t ${CAPTURETIME} ${TARGETFILE}"
VIDEOURL - The URL to the page holding the player
M3U8 - The m3u8 stream URL retrieved as described above
CAPTURETIME - The output video duration in seconds
TARGETFILE - The output mp4 file obviously...
Notice that on some sites I have to use -user_agent and on other sites -referer, it depends on the site...
I run the download script as an at job starting a short time before the show starts and ending a bit after it ends.
This works OK for the few sites I have gotten it to work on, but I have the following problem:
The M3U8 URL manually extracted may change, in some cases it changes daily or more often (a part of it like the big number 1643380502 changes...)
Here I really need to get hold of an automatic extraction procedure which works and can be used as part of the download script.
Any ideas on how to do this?
I.e. how to extract the m3u8 URL from the websites that do not respond to the command I showed above?
Like these:
Code:http://www.freeintertv.com/view/id-2565 https://livenewschat.eu/politics https://livenewsof.com/msnbc-live-stream
+ Reply to Thread
Results 1 to 29 of 29
-
-
Hi BosseB
I started with the second of the three urls i.e. https://livenewschat.eu/politics
This feed uses the same m3u8 url .... https://ligma.cdn.livenewschat.eu/hls/msnbc_live/index.m3u8 . Master location is constant.
Use this curl command ( for windows ... you can easily modify for your Ubuntu) to download the m3u8. Then use ffmpeg to capture the stream as per usual.
curl -k "https://ligma.cdn.livenewschat.eu/hls/msnbc_live/index.m3u8" ^
-H "Connection: keep-alive" ^
-H "Pragma: no-cache" ^
-H "Cache-Control: no-cache" ^
-H "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.9 Safari/537.36" ^
-H "Accept: */*" ^
-H "Origin: https://livenewschat.eu" ^
-H "Sec-Fetch-Site: same-site" ^
-H "Sec-Fetch-Mode: cors" ^
-H "Sec-Fetch-Dest: empty" ^
-H "Referer: https://livenewschat.eu/" ^
-H "Accept-Language: en-US,en;q=0.9" ^
-H "dnt: 1" ^
-H "sec-gpc: 1" ^
--compressed -o livenewschat.m3u8
I'll look at the other two as time permits. -
Hi BosseB
looking at #3 https://livenewsof.com/msnbc-live-stream
https://rtmp.livenewsof.com/hls/fx2.m3u8 <== Are you saying that this URL changes often? -
No, that is one of the sites I cannot extract m3u8 URL from in a script, so I did it using the Firefox browser. But I believe it will not change often if at all.
The others are bigger and contain number strings with 6-10 digits and here I have seen that they change some more often that the others. Sometimes the character strings like C90Zrw8DEqphyq8lGfWOYg also change, so this is why I would need a way to update just before the download starts. -
Is the above a command that goes on the command line as a single line?
If so it looks pretty long...
What do the ^ characters do? Are they some kind of Windows special char?
EDIT:
Do you mean that the curl command above should be used to stream the video into ffmpeg like this:
Code:curl <massive set of arguments as seen above> | ffmpeg <formatting arguments to get the output in the correct geometry> -t 3600 output.mp4
I don't really understand your suggestion...
Or does the curl command above result in an m3u8 URL printed on the command line ready to be used inside my ffmpeg command?Last edited by BosseB; 28th Jan 2022 at 17:16.
-
hi BosseB
What do the ^ characters do? <== It breaks a long line of code into smaller and more readable parts.
Bash shell uses this
Code:-H 'Accept: */*' \ -H 'Origin: https://livenewschat.eu' \ -H 'Sec-Fetch-Site: same-site' \ -H 'Sec-Fetch-Mode: cors' \ -H 'Sec-Fetch-Dest: empty' \ -H 'Referer: https://livenewschat.eu/' \ -H 'Accept-Language: en-US,en;q=0.9' \ -H 'dnt: 1' \ -H 'sec-gpc: 1' \
-
you wrote
The others are bigger and contain number strings with 6-10 digits and here I have seen that they change some more often that the others. Sometimes the character strings like C90Zrw8DEqphyq8lGfWOYg also change, so this is why I would need a way to update just before the download starts.
C90Zrw8DEqphyq8lGfWOYg <== Time codes. Sets a end of life for the url
I would need a way to update just before the download starts. <== That is the Holy Grail of automation. Many are seeking this out but alas no Sir Galahad (to my knowledge). -
Do you mean that the curl command above should be used to stream the video into ffmpeg
-o livenewschat.m3u8 <== the curl will download the m3u8 automatically to the file livenewschat.m3u8 saved on your pwd (present work directory)
Then you use ffmpeg together with this m3u8 to retrieve the video.
the curl command above result in an m3u8 URL printed on the command line ready to be used inside my ffmpeg command -
Well I was into the piping of data so ffmpeg could work while the download was ongoing...
Early on I downloaded to ts files and then after download I tried to reformat, but the reformat took a long time so I had basically a process for conversion running a half hour. That is when I realized that if ffmpeg could get the stream directly I could put the processing in the same ffmpeg command and it would be ready when the stream stopped.
So that is my approach and it works well for most streams that ffmpeg can be set to download...
-o livenewschat.m3u8 <== the curl will download the m3u8 automatically to the file livenewschat.m3u8 saved on your pwd (present work directory)
Then you use ffmpeg together with this m3u8 to retrieve the video.
the curl command above result in an m3u8 URL printed on the command line ready to be used inside my ffmpeg command
That would be the solution if it could be done...
The stream I am having most problem with concerning changing m3u8 url's is
Code:http://www.freeintertv.com/view/id-2565
That might complicate extraction from it though... -
So I tried to modify your command to use on Linux in the following way:
Code:curl -k "https://ligma.cdn.livenewschat.eu/hls/msnbc_live/index.m3u8" \ -H "Connection: keep-alive" \ -H "Pragma: no-cache" \ -H "Cache-Control: no-cache" \ -H "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.9 Safari/537.36" \ -H "Accept: */*" \ -H "Origin: https://livenewschat.eu" \ -H "Sec-Fetch-Site: same-site" \ -H "Sec-Fetch-Mode: cors" \ -H "Sec-Fetch-Dest: empty" \ -H "Referer: https://livenewschat.eu/" \ -H "Accept-Language: en-US,en;q=0.9" \ -H "dnt: 1" \ -H "sec-gpc: 1" \ --compressed -o livenewschat.m3u8
Code:#EXTM3U #EXT-X-VERSION:3 #EXT-X-MEDIA-SEQUENCE:360075 #EXT-X-TARGETDURATION:6 #EXT-X-KEY:METHOD=AES-128,URI="1643538964500.key",IV=0x00000000000000000000017EAA8E6014 #EXTINF:6.006, 1643538976500.ts #EXTINF:6.006, 1643538982500.ts #EXT-X-KEY:METHOD=AES-128,URI="1643538988500.key",IV=0x00000000000000000000017EAA8EBDD4 #EXTINF:6.006, 1643538988500.ts #EXTINF:6.006, 1643538994500.ts #EXTINF:6.006, 1643539001000.ts #EXTINF:6.006, 1643539007000.ts #EXT-X-KEY:METHOD=AES-128,URI="1643539013000.key",IV=0x00000000000000000000017EAA8F1D88 #EXTINF:6.006, 1643539013000.ts #EXTINF:6.006, 1643539019000.ts #EXTINF:6.006, 1643539025000.ts #EXTINF:6.006, 1643539031000.ts
But when tested on Windows10 the result was this:
Code:curl: option --compressed: the installed libcurl version doesn't support this curl: try 'curl --help' for more information
Mine is as follows:
Code:curl --version curl 7.79.1 (Windows) libcurl/7.79.1 Schannel Release-Date: 2021-09-22 Protocols: dict file ftp ftps http https imap imaps pop3 pop3s smtp smtps telnet tftp Features: AsynchDNS HSTS IPv6 Kerberos Largefile NTLM SPNEGO SSL SSPI UnixSockets
-
Here is what is hidden behind the second URL above to make it clearer:
Code:#!/bin/sh set -euC URL=$(curl -qSs -d 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11' 'http://www.freeintertv.com/myAjax/get_item_m3u8/' | grep -Eo '(http|https)://[[:alnum:].,/=*]*index\.m3u8') ffmpeg -i "$URL" -c copy video.mp4
When I run the script on my Linux box it returns exactly nothing at all.... -
What does line "set -euC" do in the script?
https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html -
convert this windows script to linux in order to capture the m3u8 url
curl -qSs -d "chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.f reeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11" "http://www.freeintertv.com/myAjax/get_item_m3u8/" | sed -e "s#^.*http\(.*\)m3u8.*$#http\1m3u8#"
-
So I tried to run the text above direct in the terminal on Linux after I had discovered and removed the extra space in the part that read:
Code:2Fwww.f reeintertv.com%2 ^
Code:sed: -e expression #1, char 33: unterminated `s' command (23) Failed writing body
Code:$ curl -qSs -d "chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11" "http://www.freeintertv.com/myAjax/get_item_m3u8/" playlist[0]['file']='http://1128480543.rsc.cdn77.org/RxrBwWHJG3JXeR_UqGbxUA==,1643582113/1128480543/index.m3u8'; get_item.showPlayer(); bosse@ubuntuserv:~/www/MSNBC/download$
Code:sed -e "s#^.*http\(.*\)m3u8.*$#http\1m3u8#"
What does the "unterminated `s' command" mean?
And from where does the string "bXNuYmNfbGl2ZQ" in the curl command come from?
If it is changing from time to time then it won't work for long... -
You could use python, something like this
Code:#! /usr/bin/python import os import ffmpy import requests import re hds = { 'Connection': 'keep-alive', 'Accept': 'text/plain, */*; q=0.01', 'X-Requested-With': 'XMLHttpRequest', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36', 'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8', 'Origin': 'http://www.freeintertv.com', 'Accept-Language': 'en-US,en;q=0.9,es;q=0.8,pt;q=0.7,cs;q=0.6,fr;q=0.5,zh-TW;q=0.4,zh;q=0.3' } url_canal = 'http://www.freeintertv.com/myAjax/get_item_m3u8/' data_raw = 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11' rs = requests.post(url_canal, headers=hds, data=data_raw) cnt = rs.text pattern = r"='(.*)'" x = re.search(pattern, cnt) url_ch = x.group(1) ff = ffmpy.FFmpeg(inputs={url_ch: '-headers "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36"'}, outputs={'MSNBC.mp4': '-acodec copy -vcodec copy -t 35'}, global_options="-y -hide_banner -loglevel warning") ff.run()
-
Tested the python script:
Code:$ ./pytest Traceback (most recent call last): File "./pytest", line 3, in <module> import ffmpy ModuleNotFoundError: No module named 'ffmpy'
#! /usr/bin/python
to
#! /usr/bin/python3
Same if I test on a different Linux server...
I have never used python so I don't know how to handle it. -
-
it just means that you dont have ffmpy installed.
Code:pip3 install ffmpy
Code:python3 pytest.py
-
-
So now I get this instead:
Code:$ python3 pytest.py [http @ 0x5632785d9200] No trailing CRLF found in HTTP header. Adding it.
Note:
I am doing this on LinuxMint 20.03 -
The script will download 35 seconds of the MSNBC channel, you will find the mp4 file inside the same folder as the script.
You have to modify the ffmpeg command to suit what you want to do.
Code:ff = ffmpy.FFmpeg(inputs={url_ch: '-headers "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36"'}, outputs={'MSNBC.mp4': '-acodec copy -vcodec copy -t 35'}, global_options="-y -hide_banner -loglevel warning")
-
OK, I did not realize that it would download the stream itself...
I found the mp4 now and it is OK (except the wrong size, but that is easily fixed).
But I already have a suite of scripts that handle the downloads but they need the m3u8 stream URL to work.
In some cases I have been able to script the extraction of the current m3u8 URL from a number of sites so that it can be extracted a few minutes before the actual download starts and saved to a file read by the download script.
But for some sites I have not been able to automate this extraction so I have used the F12 debug mode of FireFox to read the m3u8 URL and write it manually to the file. This has worked as long as the m3u8 URL does not change over time, but unfortunately this happens on some sites where there seems to be strings like EJZgovLg2izA2gQ or 1643299593 embedded as part of the URL. These items are often short-lived and therefore a scripted extraction is needed.
I have noted that the python code above contains this:
Code:data_raw = 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11'
So this thread is mainly about automatically finding the m3u8 stream url to be used with the ffmpeg command, which looks like this:
Code:CMD="ffmpeg -hide_banner ${MODE} -i \"${M3U8URL}\" -vf scale=w=-4:h=480 -c:v libx264 -preset fast -crf 26 -c:a copy -t ${CAPTURETIME} ${TARGETFILE}"
MODE: "-user_agent \"Mozilla\"" or "-referer \"${VIDEOURL}\"" depending on site
VIDEOURL: The page URL used as referer
M3U8URL: The m3u8 stream url we are discussing here
CAPTURETIME: The download time in seconds
TARGETFILE: The output mp4 file
If I manually find the m3u8 url via FireFox then the ffmpeg works OK.
But at irregular times the m3u8 changes (in some sites only) so it has to be extracted again...
This extraction is what I am looking for... -
That parameter doesn't seem to change, the decode is this
Code:chname=bXNuYmNfbGl2ZQ==&ch=http://www.freeintertv.com/externals/tv-russia/smotret-tv3-online&html5=11
Code:msnbc_live
Code:data_raw = 'chname=Y25uX2xpdmU%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11'
Code:#!/bin/bash cd /path/to/script ./download_msnbc.py
-
You can also try this
Code:#!/bin/sh set -euC URL=$(curl -qSs -d 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11' 'http://www.freeintertv.com/myAjax/get_item_m3u8/' | sed -e "s/^.*http\(.*\)m3u8.*$/http\1m3u8/g") ffmpeg -y -hide_banner -loglevel warning -headers "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36" -i "$URL" -acodec copy -vcodec copy -t 35 MSNBC.mp4
-
Interesting!
So how is the strange looking string decoded into msnbc_live?
I know that %2F is encoding of / but how is bXNuYmNfbGl2ZQ decoded/encoded?
EDIT:
I tried base64 like this:
Code:$ echo 'bXNuYmNfbGl2ZQ' | base64 -d msnbc_livebase64: invalid input
Code:$ echo 'bXNuYmNfbGl2ZQo=' | base64 -d msnbc_live
/EDIT
I tried putting this into my existing URL-extracting script:
Code:M3U8URL=$(curl -qSs -d 'chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11' 'http://www.freeintertv.com/myAjax/get_item_m3u8/' | sed -e "s/^.*http\(.*\)m3u8.*$/http\1m3u8/g")
Thank you so much for providing this and explaining the source of the strange looking strings!Last edited by BosseB; 31st Jan 2022 at 11:18. Reason: Found solution by myself
-
Just one question about something that mystifies me:
Given the way the URL looks like, how does the web-server or the browser or whatever it is deduce that a string like this:
Code:chname=bXNuYmNfbGl2ZQ%3D%3D&ch=http%3A%2F%2Fwww.freeintertv.com%2Fexternals%2Ftv-russia%2Fsmotret-tv3-online&html5=11
I do not understand how anyone can automatically treat this as an URL by decoding certain parts of it...
The % notation used for control characters and the like I understand but not use of base64.
Similar Threads
-
How do I download the following m3u8 url?
By AshleyQuick in forum Video Streaming DownloadingReplies: 33Last Post: 22nd Aug 2020, 04:17 -
How to extract stream URL??
By sharyngol in forum DVB / IPTVReplies: 1Last Post: 22nd Apr 2019, 17:07 -
How to download m3u8 stream?
By mondzg in forum Video Streaming DownloadingReplies: 8Last Post: 29th Dec 2018, 15:36 -
How to extract stream URL
By masster in forum Video Streaming DownloadingReplies: 21Last Post: 16th Apr 2018, 11:51 -
Help extract NBA video stream from m3u8
By Jisspecial in forum Video Streaming DownloadingReplies: 1Last Post: 28th Jul 2017, 14:53