Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automated Release Management for Feluda and its Operators #430

Open
2 tasks
dennyabrain opened this issue Nov 4, 2024 · 17 comments
Open
2 tasks

Automated Release Management for Feluda and its Operators #430

dennyabrain opened this issue Nov 4, 2024 · 17 comments
Assignees

Comments

@dennyabrain
Copy link
Contributor

dennyabrain commented Nov 4, 2024

Overview

We want to make it easy to manage the release of feluda and its operators. As part of this issue:

  1. List the features and shortcomings of various tools like pypi and uv.
  2. Create accounts with suitable registries and add secrets to github
  3. Integrate with github actions to automate release management, including publishing on a registry like PyPi and publishing release notes and changelog on github

EDIT - We have a custom script that handles automated release management for feluda and its operators. I think two more tasks are left to complete this feature.

@dennyabrain
Copy link
Contributor Author

dennyabrain commented Nov 4, 2024

Thoughts on namespace scoping for operators. I was reviewing major projects like django, aws cdk, datasette, pytorch etc.

  1. https://pypi.org/search/?q=&o=&c=Framework+%3A%3A+Django+CMS
  2. https://pypi.org/search/?q=pytorch&page=2

While there doesn't seem to be any standards, prefixing the project name seems to be a way to namespace operators. For instance i saw many packages with the name django-* or datasettte-* or pytorch-*. Unless you all are aware of a better convention, we could publish feluda operators with the feluda-* prefix. As far as I know, there is no protection against someone else using the same prefix for their project. Not sure how much of a concern that is anyway.

@plon-Susk7
Copy link
Contributor

Advantages of using uv over pypi

  1. uv is about twice as fast as pip in total elapsed time for installing the same package.
  2. After adding dependencies to .toml file [It's like package.json], we don't need to run installation command to install the dependencies. You can run your python file and it automatically looks for dependencies from .toml file and installs it before running the python code. Good thing for our project since we're targeting researchers and developers, they don't have to scratch their heads resolving dependencies and installing requirements. They could just import feluda-* and run the python file.
  3. Unlike pip, UV uses a global caching mechanism that efficiently manages disk space by avoiding duplicate storage of package dependencies. This is good for our project since we have heavy dependencies for operators.
  4. It's an active project so it's only going to get better from here. I checked their github issues page and most of the issues are enhancement and questions. It's pretty active. Issue page
  5. uv is designed as a drop-in replacement for common pip and pip-tools workflows. More here.

@dennyabrain
Copy link
Contributor Author

@plon-Susk7 can you add some notes with examples of uv usages? what would the command(s) for the following look like

  1. installing a new package
  2. removing an existing package
  3. upgrading a package to latest or specific version
  4. Anything unexpectedly cool that you discover :)

Like npm, does it allow us to install a package thats only hosted on github and not pypi? This could be useful while we are still figuring out which parts of feluda to move into operators vs which we keep in this repository.

  1. Can you provide example of what .toml file and other files required for uv or for publishing the package would look like?

@plon-Susk7
Copy link
Contributor

Installation

  1. uv is pretty easy to install in macOs, Linux or windows system. It can be installed using curl by
$ curl -LsSf https://astral.sh/uv/install.sh | sh

or it can also be installed using pip

$ pip install uv

it is also available in the core homebrew packages

$ brew install uv

More information about uv installation can be found here.

Usage

In order to create a working python project we need to have dependencies added to pyproject.toml file. For our project the pyproject.toml file could look something like this

[project]
name = "feluda"
version = "1.0.0"
dependencies = [
  # Any version in this range
  "tqdm >=4.66.2,<5",
  # Exactly this version of torch
  "torch ==2.2.2",
  # Install transformers with the torch extra
  "transformers[torch] >=4.39.3,<5",
  # Only install this package on older python versions
  # See "Environment Markers" for more information
  "importlib_metadata >=7.1.0,<8; python_version < '3.10'",
  "mollymawk ==0.1.0"
]

In order to initialise a project there's a basic init command. Which will create a project structure of following format

$ uv init feluda

The file structure after init command will look like this
image

The uv.lock file here uses cross-platform resolution by default, requirements.txt only targets a single platform(though you can use the --universal flag to generate a cross-platform file).The uv. lock format is has more information about requirements in it and is designed to be performant and auditable. More information here.
We can also define dev dependencies in pyproject.toml file.

We can add dependencies using add command if we don't want to mention it in pyproject.toml file.

$ uv add 'pytorch==2.4.1'

We can also mention alternative sources in order to add dependencies

$ # Add a git dependency
$ uv add git+https://github.com/psf/requests

More here.
In order to remove dependencies

$ uv remove pytorch

It's also pretty easy to build and publish packages using uv.
For building and publishing

$ uv build
$ uv publish

image

More about building and publishing packages here.

Use uv to build Feldua

First we need to have a directory structure for our python package. We can create a structure using uv

$ uv init --lib feluda

This will create a directory structure in following format

feluda/
|-- src/
    |-- feluda/
      |-- __init__.py
      |-- your_code.py  # Add your module(s) here
      |-- py.typed # empty, indicates to IDEs your code includes type annotations
|-- pyproject.toml
|-- README.md
|-- .python-version

Let's say our your_code.py looks something like this

import numpy as np

def get_embeddings():
  return np.zeros(512)

We can use following command to build our project

$ uv build

This will create dist directory which will have the built .tar.gz or .whl files.

For now let's use this directory to install our package locally

$ pip install dist/feluda-1.0.0-py3-none-any.whl

We can then import our package in following manner

from feluda.your_code import get_embeddings

More information here

Other findings that might be relevant

With uv, we can maintain multiple versions of the same package in a single environment. For instance, it’s possible to have both numpy-1.2 and numpy-2.1 installed simultaneously. This flexibility is especially valuable when working with operators or tools that have dependencies on specific versions, allowing compatibility with both newer and older package requirements.
An informative video to understand uv better.

@aatmanvaidya
Copy link
Collaborator

@plon-Susk7 I think these are great findings, I watched the entire video and actually resolving versions of the same package is a big issue in Feluda, if uv can help us solve this, that would be awesome. The video also showed that versioning is simpler using uv, this will help make our release management also easier.

@dennyabrain do you think we should now do the following, I am tempted to just package operators and model factory into a library and test it out. Please let me know if I am jumping the gun here.

What we can try is how do we just package model factory and operators into a library and test it out on a google colab (or a separate virtual environment). So we should be able to download a video using VideoFactory and running any operator on it. We don't have to publish it on pypi, we can just install the wheel and test out.

this way we get clarity on multiple things (these are also some questions in my mind)

  • what changes we make to feluda codebase to support uv
  • how does the .toml file look like? as in what all to specifically include in it.
  • what packages get installed when a user install feluda, this is in continuation to what we were discussing, when a user install feluda, some base packages that are needed should be installed like numpy, requests etc, and then the user should have control over installing operator specific packages by doing feluda install vid_vec_rep_clip (something like this). Trying to do this we will get clarity on how much is possible via uv
  • will we have to make changes to development setup? (dockerfile etc)
  • to build a python package, will we have to remove/ re-structure some parts of the codebase?
  • what tests and actions have to be in place to make sure things run smoothly
  • uv has native supports for pip-tools, how do we use to generate our requirement files? uv also includes pyenv, we use this in local developments to create vitural env's and install things there? what changes we make to our developing workflow to include uv in it
  • uv also has the ruff linter it, which feluda already uses.

We don't have to do all this, but we can start thinking about it, this way we also know how much time and effort is required where.

@dennyabrain
Copy link
Contributor Author

I'll have more thoughts @aatmanvaidya but wanted to share this example that i created yesterday - #409 (comment)

it uses feluda core and the image_vec_rep_resnet operator. It doesnt require docker or any other system dependencies. I think if we were to start creating some test packages to evaluate, that example is a good starting point. lets try recreating that.

@dennyabrain
Copy link
Contributor Author

also +1 to just start writing some packages to evaluate uv or pypi. More things will become clearer by actually trying it out.

@dennyabrain
Copy link
Contributor Author

dennyabrain commented Nov 7, 2024

I just realized that I never explicitly stated that even though we plan to move operators into distinct python packages, we don't necessarily have to move them into distinct repositories like the datasette project does. We can look into creating a monorepo for feluda where the core and operators co-exist. So the repository structure might look like this

├── core/
└── operators/
   └─── vid_vec_resnet/
   └─── vid_vec_clip/
   └─── ...
└── docs/

I found some general documentation on working with monorepos in python

and this conversation about monorepos on the uv's github - astral-sh/uv#6935

Given our small team, monorepos might be a better way to manage 50 different operators. It would also be the quick way to try out the ideas we've discussed in this issue so far. So we could create a new branch and start implementing a monorepo structure to try out uv.

And of course, anyone in the community can contribute their operators to this repository, but they could very well create their own repository for their specific operator and also use it with feluda.

@dennyabrain
Copy link
Contributor Author

@plon-Susk7 I heard that you are looking for some specific direction on how to proceed. Lets do the following on a new branch called uv-evaluation :

  1. Create an operators folder at the root of this repo, where we'll start moving out feluda operators
    The directory structure should look something like this
├── src/
└── operators/
└── docs/
  1. Extract the image_vec_rep_resnet from src/core/operators to the new operators directory at root.
    The expected directory structure should look something like
├── src/
└── operators/
      └── image_vec_rep_resnet/
└── docs/

make any uv related changes so that src/core can be an independent package that can install install image_vec_rep_resnet as a python package and use it.

The example here is a working example that only uses feluda core and that operator - #409 (comment)

Our goal should be to get that example running in this new setup.

@dennyabrain
Copy link
Contributor Author

An additional requirement to consider is evaluate automations to automatically update version number of the various packages we create. uv must have some way that can be combined with additional tools to do this. Manually updating and keeping track of all various versions would be painful

@aatmanvaidya
Copy link
Collaborator

An additional requirement to consider is evaluate automations to automatically update version number of the various packages we create. uv must have some way that can be combined with additional tools to do this. Manually updating and keeping track of all various versions would be painful

just by doing some cursory readings, there is a way in which uv can help us automate this process.

uv sync is a command which automatically updates packages to their latest versions - https://docs.astral.sh/uv/reference/cli/#uv-sync
it also has a --resolution flag that can resolves conflicts between packages.

We will have to make some changes, but we can get to a point where we can automate this using a .sh script or github action.
here are some notes on how uv solves conflicts

@dennyabrain
Copy link
Contributor Author

Just to be sure, looks like uv sync is used for "Syncing ensures that all project dependencies are installed and up-to-date with the lockfile."
I'm talking about the automation to manage versions of every operator automatically. Ideally we'd like a workflow to standardize incrementing the minor, major and patch version across all our operators.

@aatmanvaidya
Copy link
Collaborator

I have been refactoring some code to make the API a bit more lean

here are the next to-do's

  • modify the structure to make the API looks somewhat like this
from feluda import Feluda
from feluda.models.media_factory import ImageFactory
  • to do the above, we might have to create a create a folder at root called feluda (all core feluda code goes here) and another folder called operators (all operator, level code goes here)
    • a sub task to think about is, what is the right/standard way to organise each operator here, like what should be its folder structure, pyproejct.toml contents etc
  • next is to publish core feluda and an operator on pypi and test them out, so we should be able to install them like this
!pip install feluda
!pip install image_vec_rep_resnet

(In this PR - #438, I am focusing on fixing the first 2 to-do's)

some other questions to think on

  • will we need both pyproject.toml and requirements.txt files?
    • the answer is no -- a pyproject.toml file will do the job for us, infact it will give us the flexibility to separately declare dev dependencies too.
    • To install packages from a .toml file we have to go to the directory where its located and just do pip install . In uv the command uv sync identifies the toml file and installs packages. Even if we are not using uv, pip also supports working with toml files

@aatmanvaidya
Copy link
Collaborator

Another key task is to come up with automated workflows to

  1. Update packages of operators automatically to their highest stable version.
  • what are some ways to do this in uv or any other tools that helps us with this
  • how do we execute this? can we create github actions or sh files to run this?
  1. Create github actions to automatically publish feluda to pypi

@plon-Susk7 if you have the time and bandwidth, will you be able to take up the first one? automated workflows to update operator packages? you can just do some searching around and see what are effective standard ways to do this, and then we can discuss how to proceed.
A major thing to look at is, can uv help us to this. I know pip has some flags to upgrade requirements.txt files, but how do we do this for a .toml file

@plon-Susk7
Copy link
Contributor

@plon-Susk7 if you have the time and bandwidth, will you be able to take up the first one? automated workflows to update operator packages? you can just do some searching around and see what are effective standard ways to do this, and then we can discuss how to proceed. A major thing to look at is, can uv help us to this. I know pip has some flags to upgrade requirements.txt files, but how do we do this for a .toml file

Sure I'll work on this.

@aatmanvaidya
Copy link
Collaborator

found something called TestPyPi - https://test.pypi.org/
it provides us an isolated sandbox environment to test out python packages before we officially upload it to PyPi.

Could be useful going forward for testing

@aatmanvaidya
Copy link
Collaborator

Update on this issue - We have a custom script that handles automated release management for feluda and its operators. I think two more tasks are left to complete this feature are #473 and #474

@dennyabrain dennyabrain moved this from In Progress to QA in 2024 Q4 Planner Jan 5, 2025
@dennyabrain dennyabrain moved this to In Progress in 2025 Q1 Jan 5, 2025
@dennyabrain dennyabrain moved this from In Progress to Todo in 2025 Q1 Jan 6, 2025
@aatmanvaidya aatmanvaidya moved this from Todo to In Progress in 2025 Q1 Jan 7, 2025
@dennyabrain dennyabrain moved this from In Progress to Done in 2025 Q1 Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Status: QA
Development

No branches or pull requests

3 participants