First commit

This commit is contained in:
Agie Ashwood 2023-10-03 22:31:24 -05:00
parent d01bab5a70
commit 3788b3f331
10 changed files with 534 additions and 74 deletions

12
.conf.json.example Normal file
View File

@ -0,0 +1,12 @@
{
"maxLength": 600,
"maxPlaylistLength": 10,
"maxGifLength": 10,
"maxGifResolution": 480,
"maxLengthPlaylistVideo": 600,
"proxyListURL": false,
"url": "http://localhost:8888",
"bugcatcher": false,
"bugcatcherdsn": "YOURDSN",
"allowedorigins": []
}

19
.gitignore vendored Normal file
View File

@ -0,0 +1,19 @@
activate/
bin/
downloads/
lib/
lib64/
share/
downloads/
eid3.js
eid3fieldspruned.txt
eid3listtojs.py
fixjson.py
mutageneid3keys.txt
mutagenget.py
proxies.txt
pyvenv.cfg
subex.json
vexample_fixed.json
vexample.json
.conf.json

10
Dockerfile Normal file
View File

@ -0,0 +1,10 @@
FROM docker.io/python:3
RUN apt update
RUN apt install ffmpeg gifsicle
#RUN mkdir /workspace
#ADD requirements.txt /workspace/
#ADD run.py /workspace
#WORKDIR /workspace
RUN pip3 install -r requirements.txt
RUN pip3 install --upgrade sentry-sdk
CMD ["python3", "run.py"]

122
README.md
View File

@ -1,92 +1,66 @@
# yt-dlp-web-api #yt-dlp-web
Requirements:
Either python3 installed locally or docker/podman with compose
First clone this repo
Next, copy .conf.json.example to .conf.json and modify the paremeters to your liking
Parameters:
maxLength: maximum length of videos allowed to download in seconds
maxPlaylistLength: maximum number of videos allowed on playlist to download
maxGifLength: maximum length of gifs in seconds
maxGifResolution: maximum resolution of gifs in pixels
maxLengthPlaylistVideo: maximum length of individual videos on playlists
proxyListURL: url to download proxies from, if not leave as false
url: base url of server
bugcatcher: whether to use a bug catching service
bugcatcherdsn: dsn of bug catching service
allowedorigins: allowed urls of clients
Python:
## Getting started run:
To make it easy for you to get started with GitLab, here's a list of recommended next steps. `pip3 install -r requirement.txt`
Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)! `pip3 install --upgrade sentry-sdk`
## Add your files `bash start.sh`
- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files /
- [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
``` make a downloads folder in the yt-dlp-web directory
cd existing_repo
git remote add origin http://gitlab.local/rabbithutch/yt-dlp-web-api.git
git branch -M main
git push -uf origin main
```
## Integrate with your tools `python3 run.py`
- [ ] [Set up project integrations](http://gitlab.local/rabbithutch/yt-dlp-web-api/-/settings/integrations) Docker/podman compose:
## Collaborate with your team run:
- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/) `bash start-docker.sh`
- [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
- [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
- [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
- [ ] [Set auto-merge](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
## Test and Deploy or
Use the built-in continuous integration in GitLab. `bash start-podman.sh`
- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html) depending on whether you have docker compose or podman compose installed
- [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
- [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
- [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
- [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
*** For more details please read the inline comments
# Editing this README Coming soon:
When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template. Multi-node functionality
## Suggestions for a good README
Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
## Name
Choose a self-explaining name for your project.
## Description
Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
## Badges
On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
## Visuals
Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
## Installation
Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
## Usage
Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
## Support
Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
## Roadmap
If you have ideas for releases in the future, it is a good idea to list them in the README.
## Contributing
State if you are open to contributions and what your requirements are for accepting them.
For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
## Authors and acknowledgment
Show your appreciation to those who have contributed to the project.
## License
For open source projects, say how it is licensed.
## Project status
If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.

6
docker-compose.yml Normal file
View File

@ -0,0 +1,6 @@
services:
yt-dlp-web:
build: .
ports: "8888:8888"
volumes:
- ./:/workspace

8
requirements.txt Normal file
View File

@ -0,0 +1,8 @@
python-socketio
yt-dlp
tornado
requests
moviepy
pygifsicle
mutagen
GitPython

423
run.py Normal file
View File

@ -0,0 +1,423 @@
import socketio
from yt_dlp import YoutubeDL
import json
import asyncio
import tornado
import requests
import os
import random
import uuid
import zipfile
import datetime
from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip
from moviepy.editor import VideoFileClip
from pygifsicle import optimize
from mutagen.easyid3 import EasyID3
import sentry_sdk
# TODO: auto-reload/reload on webhook using gitpython
# README: functionality is described once per documentation in order to leave as
# little clutter as possible
# Global configuration variable
conf = {}
# Load configuratin at runtime
with open(".conf.json", "r") as f:
conf = json.loads(f.read())
# If using bugcatcher such as Glitchtip/Sentry set it up
if conf["bugcatcher"]:
sentry_sdk.init(conf["bugcatcherdsn"])
# Function to download proxies from plain url, this is useful for me, but
# if other people need to utilize a more complex method of downloading proxies
# I recommend implementing it and doing a merge request
def dlProxies():
r = requests.get(conf["proxyListURL"])
with open("proxies.txt", "w") as f:
rlist = r.text.split("\n")
rlistfixed = []
for p in rlist[:-1]:
pl = p.replace("\n", "").replace("\r", "").split(":")
proxy = "{0}:{1}@{2}:{3}".format(pl[2], pl[3], pl[0], pl[1])
rlistfixed.append(proxy)
f.write("\n".join(rlistfixed))
print("Proxies refreshed!")
# If using proxy list url and there's no proxies file, download proxies at runtime
if conf["proxyListURL"] != False:
if not os.path.exists("proxies.txt"):
dlProxies()
# Function to initialize response to client
# Takes method and spinnerid
# spinnerid is the id of the spinner object to remove on the ui, none is fine here
def resInit(method, spinnerid):
res = {
"method": method,
"error": True,
"spinnerid": spinnerid
}
return res
# create a Socket.IO server
sio = socketio.AsyncServer(cors_allowed_origins=conf["allowedorigins"], async_mode="tornado")
# Socketio event, takes the client id and a json payload
# Converts link to mp3 file
@sio.event
async def toMP3(sid, data):
# Initialize response, if spinnerid data doesn't exist it will just set it to none
res = resInit("toMP3", data.get("spinnerid"))
# Try/catch loop will send error message to client on error
try:
# Get video url from data
url = data["url"]
# Get information about the video via yt-dlp to make future decisions
info = getInfo(url)
# Return an error if the video is longer than the configured maximum video length
if info["duration"] > conf["maxLength"]:
raise ValueError("Video is longer than configured maximum length")
else:
# Get file system safe title for video
title = makeSafe(info["title"])
# Download video as MP3 from given url and get the final title of the video
ftitle = download(url, True, title, "mp3")
# Tell the client there is no error
res["error"] = False
# Give the client the download link
res["link"] = conf["url"] + "/downloads/" + ftitle + ".mp3"
# Give the client the initial safe title just for display on the ui
res["title"] = title
# If there is id3 metadata apply this metadata to the file
if data["id3"] != None:
# We use EasyID3 here as, well, it's easy, if you need to add more fields
# please read the mutagen documentation for this here:
# https://mutagen.readthedocs.io/en/latest/user/id3.html
audio = EasyID3("downloads/" + ftitle + ".mp3")
for key, value in data["id3"].items():
if value != "" and value != None:
audio[key] = value
audio.save()
# Emit result to client
await sio.emit("done", res, sid)
except Exception as e:
# Get text of error
res["details"] = str(e)
await sio.emit("done", res, sid)
# Downloads playlist as a zip of MP3s
@sio.event
async def playlist(sid, data):
res = resInit("playlist", data.get("spinnerid"))
try:
purl = data["url"]
# Get playlist info
info = getInfo(purl)
# Create playlist title from the file system safe title and a random uuid
# The uuid is to prevent two users from accidentally overwriting each other's files (very unlikely due to cleanup but still possible)
ptitle = makeSafe(info["title"]) + str(uuid.uuid4())
# If the number of entries is larger than the configured maximum playlist length throw an error
if len(info["entries"]) > conf["maxPlaylistLength"]:
raise ValueError("Playlist is longer than configured maximum length")
else:
# Check the length of all videos in the playlist, if any are longer than the configured maximum
# length for playlist videos throw an error
for v in info["entries"]:
if v["duration"] > conf["maxLengthPlaylistVideo"]:
raise ValueError("Video in playlist is longer than configured maximum length")
# Iterate through all videos on the playlist, download each one as an MP3 and then write it to the playlist zip file
for v in info["entries"]:
#TODO: make generic
vid = v["id"]
vurl = "https://www.youtube.com/watch?v=" + vid
title = makeSafe(v["title"])
ftitle = download(vurl, True, title, "mp3")
with zipfile.ZipFile("downloads/" + ptitle + '.zip', 'a') as myzip:
myzip.write("downloads/" + ftitle + ".mp3")
res["error"] = False
res["link"] = conf["url"] + "/downloads/" + ptitle + ".zip"
res["title"] = title
await sio.emit("done", res, sid)
except Exception as e:
res["details"] = str(e)
await sio.emit("done", res, sid)
# Two step event
# 1. Get list of subtitles
# 2. Download chosen subtitle file
@sio.event
async def subtitles(sid, data):
res = resInit("subtitles", data.get("spinnerid"))
try:
step = int(data["step"])
url = data["url"]
# Step 1 of subtitles is to get the list of subtitles available and return them
if step == 1:
info = getInfo(url, getSubtitles=True)
title = makeSafe(info["title"])
res["error"] = False
res["title"] = title
# List of subtitle keys for picking subtitles
res["select"] = list(info["subtitles"].keys())
# Step for front end use, the value here doesn't really matter, the variable just has to exist to tell the ui to move to step 2 when the method is called again
res["step"] = 0
# Again details doesn't need a value it just needs to exist to let the front end know to populate the details column with a select defined by the list provided by select
res["details"] = ""
await sio.emit("done", res, sid)
# Step 2 of subtitles is to download the subtitles to the server and provide that link to the user
elif step == 2:
# Get the selected subtitles by language code
languageCode = data["languageCode"]
# Check if the user wants to download autosubs
autoSub = data["autoSub"]
info = getInfo(url)
title = makeSafe(info["title"])
# Download the subtitles
# Unfortunately at the moment this requires downloading the lowest quality stream as well, in the future some modification to yt-dlp might be necessary to avoid this
ftitle = download(url, False, title, "subtitles", languageCode=languageCode, autoSub=autoSub)
res["error"] = False
res["link"] = conf["url"] + "/downloads/" + ftitle + "." + languageCode + ".vtt"
res["title"] = title
await sio.emit("done", res, sid)
except Exception as e:
res["details"] = str(e)
await sio.emit("done", res, sid)
# Event to clip a given stream and return the clip to the user, the user can optionally convert this clip into a gif
@sio.event
async def clip(sid, data):
res = resInit("clip", data.get("spinnerid"))
try:
url = data["url"]
info = getInfo(url)
# Check if directURL is in the data from the client
# directURL defines a video url to download from directly instead of through yt-dlp
directURL = False
if "directURL" in data.keys():
directURL = data["directURL"]
# Check if user wants to create a gif
gif = False
if "gif" in data.keys():
gif = True
# Get the format id the user wants for downloading a given stream from a given video
format_id = False
if "format_id" in data.keys():
format_id = data["format_id"]
if info["duration"] > conf["maxLength"]:
raise ValueError("Video is longer than configured maximum length")
# Get the start and end time for the clip
timeA = int(data["timeA"])
timeB = int(data["timeB"])
# If we're making a gif make sure the clip is not longer than the maximum gif length
# Please be careful with gif lengths, if you set this too high you may end up with huge gifs hogging the server
if gif and ((timeB - timeA) > conf["maxGifLength"]):
raise ValueError("Range is too large for gif")
title = makeSafe(info["title"])
# If the directURL is set download directly
if directURL != False:
ititle = title + "." + info["ext"]
downloadDirect(directURL, "downloads/" + ititle)
# Otherwise download the video through yt-dlp
# If there's no format id just get the default video
else:
if format_id != False:
ititle = download(url, False, title, "mp4", extension=info["ext"], format_id=format_id)
else:
ititle = download(url, False, title, "mp4", extension=info["ext"])
if gif:
# Clip video and then convert it to a gif
(VideoFileClip("downloads/" + ititle)).subclip(timeA, timeB).write_gif("downloads/" + title + "." + str(uuid.uuid4()) + ".clipped.gif")
# Optimize the gif
optimize("downloads/" + title + ".clipped.gif")
else:
# Clip the video and return the mp4 of the clip
ffmpeg_extract_subclip("downloads/" + ititle, timeA, timeB, targetname="downloads/" + title + "." + str(uuid.uuid4()) + ".clipped.mp4")
res["error"] = False
# Set the extension to use either to mp4 or gif depending on whether the user wanted a gif
# The extension is just for creating the url for the clip
extension = "mp4"
if gif:
extension = "gif"
res["link"] = conf["url"] + "/downloads/" + title + ".clipped." + extension
res["title"] = title
await sio.emit("done", res, sid)
except Exception as e:
res["details"] = str(e)
await sio.emit("done", res, sid)
# Generic event to get all the information provided by yt-dlp for a given url
@sio.event
async def getInfoEvent(sid, data):
# Unlike other events we set the method here from the passed method in order to make this generic and flexible
res = resInit(data["method"], data.get("spinnerid"))
try:
url = data["url"]
info = getInfo(url)
if data["method"] == "streams":
res["details"] = ""
res["select"] = ""
title = makeSafe(info["title"])
res["error"] = False
res["title"] = title
res["info"] = info
await sio.emit("done", res, sid)
except Exception as e:
res["details"] = str(e)
await sio.emit("done", res, sid)
# Get limits of server for display in UI
@sio.event
async def limits(sid, data):
res = resInit("limits", data.get("spinnerid"))
try:
limits = [
"maxLength",
"maxPlaylistLength",
"maxGifLength",
"maxGifResolution",
"maxLengthPlaylistVideo"
]
res["limits"] = [{"limitid": limit, "limitvalue": conf[limit]} for limit in limits]
res["error"] = False
await sio.emit("done", res, sid)
except Exception as e:
res["details"] = str(e)
await sio.emit("done", res, sid)
# Generic download method
def download(url, isAudio, title, codec, languageCode=None, autoSub=False, extension=False, format_id=False):
# Used to avoid filename conflicts
ukey = str(uuid.uuid4())
# Set the location/name of the output file
ydl_opts = {
'outtmpl': 'downloads/' + title + "." + ukey
}
# Add extension to filepath if set
if extension != False:
ydl_opts["outtmpl"] += "." + extension
# If this is audio setup for getting the best audio with the given codec
if isAudio:
ydl_opts['format'] = "bestaudio/best"
ydl_opts['postprocessors'] = [{
'key': 'FFmpegExtractAudio',
'preferredcodec': codec,
'preferredquality': '192',
}]
# Otherwise...
else:
# Check if there's a format id, if so set the download format to that format id
if format_id != False:
ydl_opts['format'] = format_id
# Otherwise if we're downloading subtitles...
elif codec == "subtitles":
# Set up to write the subtitles to disk
ydl_opts["writesubtitles"] = True
# Further settings to write subtitles
ydl_opts['subtitle'] = '--write-sub --sub-lang ' + languageCode
# If the user wants to download auto subtitles set the subtitle field to do so
if autoSub:
ydl_opts['subtitle'] = "--write-auto-sub " + ydl_opts["subtitle"]
ydl_opts['format'] = "worst"
# Otherwise just download the best video
else:
ydl_opts['format'] = "bestvideo/best"
# If there is a proxy list url set up, set yt-dlp to use a random proxy
if conf["proxyListURL"] != False:
ydl_opts['proxy'] = getProxy()
# Finally, actually download the file/s
with YoutubeDL(ydl_opts) as ydl:
if codec == "subtitles":
ydl.extract_info(url, download=True)
else:
ydl.download([url])
# Construct and return the filepath for the downloaded file
res = title + "." + ukey
if extension != False:
res += "." + extension
return res
# Download file directly, with random proxy if set up
def downloadDirect(url, filename):
if conf["proxyListURL"] != False:
proxies = {'https': 'https://' + getProxy()}
with requests.get(url, proxies=proxies, stream=True) as r:
r.raise_for_status()
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
else:
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(filename, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
# Generic method to get sanitized information about the given url, with a random proxy if set up
# Try to write subtitles if requested
def getInfo(url, getSubtitles=False):
info = {
"writesubtitles": getSubtitles
}
if conf["proxyListURL"] != False:
info['proxy'] = getProxy()
with YoutubeDL({}) as ydl:
info = ydl.extract_info(url, download=False)
info = ydl.sanitize_info(info)
return info
# Make title file system safe
# https://stackoverflow.com/questions/7406102/create-sane-safe-filename-from-any-unsafe-string
def makeSafe(filename):
return "".join([c for c in filename if c.isalpha() or c.isdigit() or c==' ']).rstrip()
# Get random proxy from proxy list
def getProxy():
proxy = ""
with open("proxies.txt", "r") as f:
proxy = random.choice(f.read().split("\n"))
return proxy
# Refresh proxies every hour
async def refreshProxies():
while True:
dlProxies()
await asyncio.sleep(3600)
# Clean all files that are older than an hour out of downloads every hour
async def clean():
while True:
for f in os.listdir("./downloads"):
fmt = datetime.datetime.fromtimestamp(os.path.getmtime('downloads/' + f))
if (datetime.datetime.now() - fmt).total_seconds() > 7200:
os.remove("downloads/" + f)
print("Cleaned!")
await asyncio.sleep(3600)
def make_app():
return tornado.web.Application([
(r'/downloads/(.*)', tornado.web.StaticFileHandler, {'path': "./downloads"}),
(r"/socket.io/", socketio.get_tornado_handler(sio))
])
# Main method
async def main():
# If proxies are configured set up the refresh proxies task
if conf["proxyListURL"] != False:
task = asyncio.create_task(refreshProxies())
# This is needed to get the async task running
await asyncio.sleep(0)
# Set up cleaning task
task2 = asyncio.create_task(clean())
await asyncio.sleep(0)
# Generic tornado setup
app = make_app()
app.listen(8888)
await asyncio.Event().wait()
if __name__ == "__main__":
asyncio.run(main())

3
start-docker.sh Normal file
View File

@ -0,0 +1,3 @@
mkdir downloads
docker-compose build
docker-compose up

3
start-podman.sh Normal file
View File

@ -0,0 +1,3 @@
mkdir downloads
podman-compose build
podman-compose up

2
start.sh Normal file
View File

@ -0,0 +1,2 @@
mkdir downloads
python3 run.py