First commit

2023-10-03 22:31:24 -05:00 · 2023-10-03 22:31:24 -05:00 · 3788b3f331
parent d01bab5a70
commit 3788b3f331
10 changed files with 534 additions and 74 deletions
--- a/.conf.json.example
+++ b/.conf.json.example
@ -0,0 +1,12 @@
 {
    "maxLength": 600,
    "maxPlaylistLength": 10,
    "maxGifLength": 10,
    "maxGifResolution": 480,
    "maxLengthPlaylistVideo": 600,
    "proxyListURL": false,
    "url": "http://localhost:8888",
    "bugcatcher": false,
    "bugcatcherdsn": "YOURDSN",
    "allowedorigins": []
 }
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,19 @@
 activate/
 bin/
 downloads/
 lib/
 lib64/
 share/
 downloads/
 eid3.js
 eid3fieldspruned.txt
 eid3listtojs.py
 fixjson.py
 mutageneid3keys.txt
 mutagenget.py
 proxies.txt
 pyvenv.cfg
 subex.json
 vexample_fixed.json
 vexample.json
 .conf.json
--- a/10
+++ b/10
@ -0,0 +1,10 @@
 FROM docker.io/python:3
 RUN apt update
 RUN apt install ffmpeg gifsicle
 #RUN mkdir /workspace
 #ADD requirements.txt /workspace/
 #ADD run.py /workspace
 #WORKDIR /workspace
 RUN pip3 install -r requirements.txt
 RUN pip3 install --upgrade sentry-sdk
 CMD ["python3", "run.py"]
--- a/README.md
+++ b/README.md
@ -1,92 +1,66 @@
-# yt-dlp-web-api
+#yt-dlp-web
 Requirements:
 Either python3 installed locally or docker/podman with compose
 First clone this repo
 Next, copy .conf.json.example to .conf.json and modify the paremeters to your liking
 Parameters:
 maxLength: maximum length of videos allowed to download in seconds
 maxPlaylistLength: maximum number of videos allowed on playlist to download
 maxGifLength: maximum length of gifs in seconds
 maxGifResolution: maximum resolution of gifs in pixels
 maxLengthPlaylistVideo: maximum length of individual videos on playlists
 proxyListURL: url to download proxies from, if not leave as false
 url: base url of server
 bugcatcher: whether to use a bug catching service
 bugcatcherdsn: dsn of bug catching service 
 allowedorigins: allowed urls of clients
 Python:
-## Getting started
+run:
-To make it easy for you to get started with GitLab, here's a list of recommended next steps.
+`pip3 install -r requirement.txt`
-Already a pro? Just edit this README.md and make it your own. Want to make it easy? [Use the template at the bottom](#editing-this-readme)!
+`pip3 install --upgrade sentry-sdk`
-## Add your files
+`bash start.sh`
- [ ] [Create](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#create-a-file) or [upload](https://docs.gitlab.com/ee/user/project/repository/web_editor.html#upload-a-file) files
+/
 - [ ] [Add files using the command line](https://docs.gitlab.com/ee/gitlab-basics/add-file.html#add-a-file-using-the-command-line) or push an existing Git repository with the following command:
-```
+make a downloads folder in the yt-dlp-web directory
 cd existing_repo
 git remote add origin http://gitlab.local/rabbithutch/yt-dlp-web-api.git
 git branch -M main
 git push -uf origin main
 ```
-## Integrate with your tools
+`python3 run.py`
- [ ] [Set up project integrations](http://gitlab.local/rabbithutch/yt-dlp-web-api/-/settings/integrations)
+Docker/podman compose:
-## Collaborate with your team
+run:
- [ ] [Invite team members and collaborators](https://docs.gitlab.com/ee/user/project/members/)
+`bash start-docker.sh`
 - [ ] [Create a new merge request](https://docs.gitlab.com/ee/user/project/merge_requests/creating_merge_requests.html)
 - [ ] [Automatically close issues from merge requests](https://docs.gitlab.com/ee/user/project/issues/managing_issues.html#closing-issues-automatically)
 - [ ] [Enable merge request approvals](https://docs.gitlab.com/ee/user/project/merge_requests/approvals/)
 - [ ] [Set auto-merge](https://docs.gitlab.com/ee/user/project/merge_requests/merge_when_pipeline_succeeds.html)
-## Test and Deploy
+or 
-Use the built-in continuous integration in GitLab.
+`bash start-podman.sh`
- [ ] [Get started with GitLab CI/CD](https://docs.gitlab.com/ee/ci/quick_start/index.html)
+depending on whether you have docker compose or podman compose installed
 - [ ] [Analyze your code for known vulnerabilities with Static Application Security Testing(SAST)](https://docs.gitlab.com/ee/user/application_security/sast/)
 - [ ] [Deploy to Kubernetes, Amazon EC2, or Amazon ECS using Auto Deploy](https://docs.gitlab.com/ee/topics/autodevops/requirements.html)
 - [ ] [Use pull-based deployments for improved Kubernetes management](https://docs.gitlab.com/ee/user/clusters/agent/)
 - [ ] [Set up protected environments](https://docs.gitlab.com/ee/ci/environments/protected_environments.html)
-***
+For more details please read the inline comments
-# Editing this README
+Coming soon:
-When you're ready to make this README your own, just edit this file and use the handy template below (or feel free to structure it however you want - this is just a starting point!). Thank you to [makeareadme.com](https://www.makeareadme.com/) for this template.
+Multi-node functionality
 ## Suggestions for a good README
 Every project is different, so consider which of these sections apply to yours. The sections used in the template are suggestions for most open source projects. Also keep in mind that while a README can be too long and detailed, too long is better than too short. If you think your README is too long, consider utilizing another form of documentation rather than cutting out information.
 ## Name
 Choose a self-explaining name for your project.
 ## Description
 Let people know what your project can do specifically. Provide context and add a link to any reference visitors might be unfamiliar with. A list of Features or a Background subsection can also be added here. If there are alternatives to your project, this is a good place to list differentiating factors.
 ## Badges
 On some READMEs, you may see small images that convey metadata, such as whether or not all the tests are passing for the project. You can use Shields to add some to your README. Many services also have instructions for adding a badge.
 ## Visuals
 Depending on what you are making, it can be a good idea to include screenshots or even a video (you'll frequently see GIFs rather than actual videos). Tools like ttygif can help, but check out Asciinema for a more sophisticated method.
 ## Installation
 Within a particular ecosystem, there may be a common way of installing things, such as using Yarn, NuGet, or Homebrew. However, consider the possibility that whoever is reading your README is a novice and would like more guidance. Listing specific steps helps remove ambiguity and gets people to using your project as quickly as possible. If it only runs in a specific context like a particular programming language version or operating system or has dependencies that have to be installed manually, also add a Requirements subsection.
 ## Usage
 Use examples liberally, and show the expected output if you can. It's helpful to have inline the smallest example of usage that you can demonstrate, while providing links to more sophisticated examples if they are too long to reasonably include in the README.
 ## Support
 Tell people where they can go to for help. It can be any combination of an issue tracker, a chat room, an email address, etc.
 ## Roadmap
 If you have ideas for releases in the future, it is a good idea to list them in the README.
 ## Contributing
 State if you are open to contributions and what your requirements are for accepting them.
 For people who want to make changes to your project, it's helpful to have some documentation on how to get started. Perhaps there is a script that they should run or some environment variables that they need to set. Make these steps explicit. These instructions could also be useful to your future self.
 You can also document commands to lint the code or run tests. These steps help to ensure high code quality and reduce the likelihood that the changes inadvertently break something. Having instructions for running tests is especially helpful if it requires external setup, such as starting a Selenium server for testing in a browser.
 ## Authors and acknowledgment
 Show your appreciation to those who have contributed to the project.
 ## License
 For open source projects, say how it is licensed.
 ## Project status
 If you have run out of energy or time for your project, put a note at the top of the README saying that development has slowed down or stopped completely. Someone may choose to fork your project or volunteer to step in as a maintainer or owner, allowing your project to keep going. You can also make an explicit request for maintainers.
--- a/docker-compose.yml
+++ b/docker-compose.yml
@ -0,0 +1,6 @@
 services:
  yt-dlp-web:
    build: .
    ports: "8888:8888"
 volumes:
  - ./:/workspace
--- a/requirements.txt
+++ b/requirements.txt
@ -0,0 +1,8 @@
 python-socketio
 yt-dlp
 tornado
 requests
 moviepy
 pygifsicle
 mutagen
 GitPython
--- a/run.py
+++ b/run.py
@ -0,0 +1,423 @@
 import socketio
 from yt_dlp import YoutubeDL
 import json
 import asyncio
 import tornado
 import requests
 import os
 import random
 import uuid
 import zipfile
 import datetime
 from moviepy.video.io.ffmpeg_tools import ffmpeg_extract_subclip
 from moviepy.editor import VideoFileClip
 from pygifsicle import optimize
 from mutagen.easyid3 import EasyID3
 import sentry_sdk
 # TODO: auto-reload/reload on webhook using gitpython
 # README: functionality is described once per documentation in order to leave as
 # little clutter as possible
 # Global configuration variable
 conf = {}
 # Load configuratin at runtime
 with open(".conf.json", "r") as f:
    conf = json.loads(f.read())
 # If using bugcatcher such as Glitchtip/Sentry set it up
 if conf["bugcatcher"]:
    sentry_sdk.init(conf["bugcatcherdsn"])
 # Function to download proxies from plain url, this is useful for me, but
 # if other people need to utilize a more complex method of downloading proxies
 # I recommend implementing it and doing a merge request
 def dlProxies():
    r = requests.get(conf["proxyListURL"])
    with open("proxies.txt", "w") as f:
        rlist = r.text.split("\n")
        rlistfixed = []
        for p in rlist[:-1]:
            pl = p.replace("\n", "").replace("\r", "").split(":")
            proxy = "{0}:{1}@{2}:{3}".format(pl[2], pl[3], pl[0], pl[1])
            rlistfixed.append(proxy)
        f.write("\n".join(rlistfixed))
    print("Proxies refreshed!")
 # If using proxy list url and there's no proxies file, download proxies at runtime
 if conf["proxyListURL"] != False:
    if not os.path.exists("proxies.txt"):
        dlProxies()
 # Function to initialize response to client
 # Takes method and spinnerid
 # spinnerid is the id of the spinner object to remove on the ui, none is fine here
 def resInit(method, spinnerid):
    res = {
        "method": method,
        "error": True,
        "spinnerid": spinnerid
    }
    return res
 # create a Socket.IO server
 sio = socketio.AsyncServer(cors_allowed_origins=conf["allowedorigins"], async_mode="tornado")
 # Socketio event, takes the client id and a json payload
 # Converts link to mp3 file
@sio.event
 async def toMP3(sid, data):
    # Initialize response, if spinnerid data doesn't exist it will just set it to none
    res = resInit("toMP3", data.get("spinnerid"))
    # Try/catch loop will send error message to client on error
    try:
        # Get video url from data
        url = data["url"]
        # Get information about the video via yt-dlp to make future decisions
        info = getInfo(url)
        # Return an error if the video is longer than the configured maximum video length
        if info["duration"] > conf["maxLength"]:
            raise ValueError("Video is longer than configured maximum length")
        else:
            # Get file system safe title for video    
            title = makeSafe(info["title"])
            # Download video as MP3 from given url and get the final title of the video
            ftitle = download(url, True, title, "mp3")
            # Tell the client there is no error
            res["error"] = False
            # Give the client the download link
            res["link"] = conf["url"] + "/downloads/" + ftitle + ".mp3"
            # Give the client the initial safe title just for display on the ui
            res["title"] = title
            # If there is id3 metadata apply this metadata to the file
            if data["id3"] != None:
                # We use EasyID3 here as, well, it's easy, if you need to add more fields
                # please read the mutagen documentation for this here:
                # https://mutagen.readthedocs.io/en/latest/user/id3.html
                audio = EasyID3("downloads/" + ftitle + ".mp3")
                for key, value in data["id3"].items():
                    if value != "" and value != None:
                        audio[key] = value
                audio.save()
            # Emit result to client
            await sio.emit("done", res, sid)
    except Exception as e:
        # Get text of error
        res["details"] = str(e)
        await sio.emit("done", res, sid)
 # Downloads playlist as a zip of MP3s
@sio.event
 async def playlist(sid, data):
    res = resInit("playlist", data.get("spinnerid"))
    try:
        purl = data["url"]
        # Get playlist info
        info = getInfo(purl)
        # Create playlist title from the file system safe title and a random uuid
        # The uuid is to prevent two users from accidentally overwriting each other's files (very unlikely due to cleanup but still possible)
        ptitle = makeSafe(info["title"]) + str(uuid.uuid4())
        # If the number of entries is larger than the configured maximum playlist length throw an error
        if len(info["entries"]) > conf["maxPlaylistLength"]:
            raise ValueError("Playlist is longer than configured maximum length")
        else:
            # Check the length of all videos in the playlist, if any are longer than the configured maximum
            # length for playlist videos throw an error
            for v in info["entries"]:
                if v["duration"] > conf["maxLengthPlaylistVideo"]:
                    raise ValueError("Video in playlist is longer than configured maximum length")
            # Iterate through all videos on the playlist, download each one as an MP3 and then write it to the playlist zip file
            for v in info["entries"]:
                #TODO: make generic
                vid = v["id"]
                vurl = "https://www.youtube.com/watch?v=" + vid
                title = makeSafe(v["title"])
                ftitle = download(vurl, True, title, "mp3")
                with zipfile.ZipFile("downloads/" + ptitle + '.zip', 'a') as myzip:
                    myzip.write("downloads/" + ftitle + ".mp3")
            res["error"] = False
            res["link"] = conf["url"] + "/downloads/" + ptitle + ".zip"
            res["title"] = title
            await sio.emit("done", res, sid)
    except Exception as e:
        res["details"] = str(e)
        await sio.emit("done", res, sid)
 # Two step event
 # 1. Get list of subtitles
 # 2. Download chosen subtitle file
@sio.event
 async def subtitles(sid, data):
    res = resInit("subtitles", data.get("spinnerid"))
    try:
        step = int(data["step"])
        url = data["url"]
        # Step 1 of subtitles is to get the list of subtitles available and return them
        if step == 1:
            info = getInfo(url, getSubtitles=True)
            title = makeSafe(info["title"])
            res["error"] = False
            res["title"] = title
            # List of subtitle keys for picking subtitles
            res["select"] = list(info["subtitles"].keys())
            # Step for front end use, the value here doesn't really matter, the variable just has to exist to tell the ui to move to step 2 when the method is called again
            res["step"] = 0
            # Again details doesn't need a value it just needs to exist to let the front end know to populate the details column with a select defined by the list provided by select
            res["details"] = ""
            await sio.emit("done", res, sid)
        # Step 2 of subtitles is to download the subtitles to the server and provide that link to the user
        elif step == 2:
            # Get the selected subtitles by language code
            languageCode = data["languageCode"]
            # Check if the user wants to download autosubs
            autoSub = data["autoSub"]
            info = getInfo(url)
            title = makeSafe(info["title"])
            # Download the subtitles
            # Unfortunately at the moment this requires downloading the lowest quality stream as well, in the future some modification to yt-dlp might be necessary to avoid this
            ftitle = download(url, False, title, "subtitles", languageCode=languageCode, autoSub=autoSub)
            res["error"] = False
            res["link"] = conf["url"] + "/downloads/" + ftitle + "." + languageCode + ".vtt"
            res["title"] = title
            await sio.emit("done", res, sid)
    except Exception as e:
        res["details"] = str(e)
        await sio.emit("done", res, sid)
 # Event to clip a given stream and return the clip to the user, the user can optionally convert this clip into a gif
@sio.event
 async def clip(sid, data):
    res = resInit("clip", data.get("spinnerid"))
    try:
        url = data["url"]
        info = getInfo(url)
        # Check if directURL is in the data from the client
        # directURL defines a video url to download from directly instead of through yt-dlp
        directURL = False
        if "directURL" in data.keys():
            directURL = data["directURL"]
        # Check if user wants to create a gif
        gif = False
        if "gif" in data.keys():
            gif = True
        # Get the format id the user wants for downloading a given stream from a given video
        format_id = False
        if "format_id" in data.keys():
            format_id = data["format_id"]
        if info["duration"] > conf["maxLength"]:
            raise ValueError("Video is longer than configured maximum length")
        # Get the start and end time for the clip
        timeA = int(data["timeA"])
        timeB = int(data["timeB"])
        # If we're making a gif make sure the clip is not longer than the maximum gif length
        # Please be careful with gif lengths, if you set this too high you may end up with huge gifs hogging the server
        if gif and ((timeB - timeA) > conf["maxGifLength"]):
            raise ValueError("Range is too large for gif")
        title = makeSafe(info["title"])
        # If the directURL is set download directly
        if directURL != False:
            ititle = title + "." + info["ext"]
            downloadDirect(directURL, "downloads/" + ititle)      
        # Otherwise download the video through yt-dlp
        # If there's no format id just get the default video
        else:
            if format_id != False:
                ititle = download(url, False, title, "mp4", extension=info["ext"], format_id=format_id)
            else:
                ititle = download(url, False, title, "mp4", extension=info["ext"])
        if gif:
            # Clip video and then convert it to a gif
            (VideoFileClip("downloads/" + ititle)).subclip(timeA, timeB).write_gif("downloads/" + title + "." + str(uuid.uuid4()) + ".clipped.gif")
            # Optimize the gif
            optimize("downloads/" + title + ".clipped.gif")
        else:
            # Clip the video and return the mp4 of the clip
            ffmpeg_extract_subclip("downloads/" + ititle, timeA, timeB, targetname="downloads/" + title + "." + str(uuid.uuid4()) + ".clipped.mp4")
        res["error"] = False
        # Set the extension to use either to mp4 or gif depending on whether the user wanted a gif
        # The extension is just for creating the url for the clip
        extension = "mp4"
        if gif:
            extension = "gif"
        res["link"] = conf["url"] + "/downloads/" + title + ".clipped." + extension
        res["title"] = title
        await sio.emit("done", res, sid)
    except Exception as e:
        res["details"] = str(e)
        await sio.emit("done", res, sid)
 # Generic event to get all the information provided by yt-dlp for a given url
@sio.event
 async def getInfoEvent(sid, data):
    # Unlike other events we set the method here from the passed method in order to make this generic and flexible
    res = resInit(data["method"], data.get("spinnerid"))
    try:
        url = data["url"]
        info = getInfo(url)
        if data["method"] == "streams":
            res["details"] = ""
            res["select"] = ""
        title = makeSafe(info["title"])
        res["error"] = False
        res["title"] = title
        res["info"] = info
        await sio.emit("done", res, sid)
    except Exception as e:
        res["details"] = str(e)
        await sio.emit("done", res, sid)
 # Get limits of server for display in UI
@sio.event
 async def limits(sid, data):
    res = resInit("limits", data.get("spinnerid"))
    try:
        limits = [
            "maxLength",
            "maxPlaylistLength",
            "maxGifLength",
            "maxGifResolution",
            "maxLengthPlaylistVideo"
        ]
        res["limits"] = [{"limitid": limit, "limitvalue": conf[limit]} for limit in limits]
        res["error"] = False
        await sio.emit("done", res, sid)
    except Exception as e:
        res["details"] = str(e)
        await sio.emit("done", res, sid)
 # Generic download method
 def download(url, isAudio, title, codec, languageCode=None, autoSub=False, extension=False, format_id=False):
    # Used to avoid filename conflicts
    ukey = str(uuid.uuid4())
    # Set the location/name of the output file
    ydl_opts = {
        'outtmpl': 'downloads/' + title + "." + ukey
    }
    # Add extension to filepath if set
    if extension != False:
        ydl_opts["outtmpl"] += "." + extension
    # If this is audio setup for getting the best audio with the given codec
    if isAudio:
        ydl_opts['format'] = "bestaudio/best"
        ydl_opts['postprocessors'] = [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': codec,
            'preferredquality': '192',
        }]
    # Otherwise...
    else:
        # Check if there's a format id, if so set the download format to that format id
        if format_id != False:
            ydl_opts['format'] = format_id
        # Otherwise if we're downloading subtitles...
        elif codec == "subtitles":
            # Set up to write the subtitles to disk
            ydl_opts["writesubtitles"] = True
            # Further settings to write subtitles
            ydl_opts['subtitle'] = '--write-sub --sub-lang ' + languageCode
            # If the user wants to download auto subtitles set the subtitle field to do so
            if autoSub:
                ydl_opts['subtitle'] = "--write-auto-sub " + ydl_opts["subtitle"]
            ydl_opts['format'] = "worst"
        # Otherwise just download the best video
        else:
            ydl_opts['format'] = "bestvideo/best"
    # If there is a proxy list url set up, set yt-dlp to use a random proxy
    if conf["proxyListURL"] != False:
        ydl_opts['proxy'] = getProxy()
    # Finally, actually download the file/s
    with YoutubeDL(ydl_opts) as ydl:
        if codec == "subtitles":
            ydl.extract_info(url, download=True)
        else:
            ydl.download([url])
    # Construct and return the filepath for the downloaded file
    res = title + "." + ukey
    if extension != False:
        res += "." + extension
    return res
 # Download file directly, with random proxy if set up
 def downloadDirect(url, filename):
    if conf["proxyListURL"] != False:
        proxies = {'https': 'https://' + getProxy()}
        with requests.get(url, proxies=proxies, stream=True) as r:
            r.raise_for_status()
            with open(filename, 'wb') as f:
                for chunk in r.iter_content(chunk_size=8192): 
                    f.write(chunk)
    else:
        with requests.get(url, stream=True) as r:
            r.raise_for_status()
            with open(filename, 'wb') as f:
                for chunk in r.iter_content(chunk_size=8192): 
                    f.write(chunk)
 # Generic method to get sanitized information about the given url, with a random proxy if set up
 # Try to write subtitles if requested
 def getInfo(url, getSubtitles=False):
    info = {
        "writesubtitles": getSubtitles
    }
    if conf["proxyListURL"] != False:
        info['proxy'] = getProxy()
    with YoutubeDL({}) as ydl:
        info = ydl.extract_info(url, download=False)
        info = ydl.sanitize_info(info)
    return info
 # Make title file system safe
 # https://stackoverflow.com/questions/7406102/create-sane-safe-filename-from-any-unsafe-string
 def makeSafe(filename):
    return "".join([c for c in filename if c.isalpha() or c.isdigit() or c==' ']).rstrip()
 # Get random proxy from proxy list
 def getProxy():
    proxy = ""
    with open("proxies.txt", "r") as f:
        proxy = random.choice(f.read().split("\n"))
    return proxy
 # Refresh proxies every hour
 async def refreshProxies():
    while True:
        dlProxies()
        await asyncio.sleep(3600)
 # Clean all files that are older than an hour out of downloads every hour
 async def clean():
    while True:
        for f in os.listdir("./downloads"):
            fmt = datetime.datetime.fromtimestamp(os.path.getmtime('downloads/' + f))
            if (datetime.datetime.now() - fmt).total_seconds() > 7200:
                os.remove("downloads/" + f)
        print("Cleaned!")
        await asyncio.sleep(3600)
 def make_app():
    return tornado.web.Application([
        (r'/downloads/(.*)', tornado.web.StaticFileHandler, {'path': "./downloads"}),
        (r"/socket.io/", socketio.get_tornado_handler(sio))
    ])
 # Main method
 async def main():
    # If proxies are configured set up the refresh proxies task
    if conf["proxyListURL"] != False:
        task = asyncio.create_task(refreshProxies())
        # This is needed to get the async task running
        await asyncio.sleep(0)
    # Set up cleaning task
    task2 = asyncio.create_task(clean())
    await asyncio.sleep(0)
    # Generic tornado setup
    app = make_app()
    app.listen(8888)
    await asyncio.Event().wait()
 if __name__ == "__main__":
    asyncio.run(main())
--- a/start-docker.sh
+++ b/start-docker.sh
@ -0,0 +1,3 @@
 mkdir downloads
 docker-compose build
 docker-compose up
--- a/start-podman.sh
+++ b/start-podman.sh
@ -0,0 +1,3 @@
 mkdir downloads
 podman-compose build
 podman-compose up
--- a/start.sh
+++ b/start.sh
@ -0,0 +1,2 @@
 mkdir downloads
 python3 run.py