Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package-Requirements passes along public URL constraints in packaged_requirements.txt #348

Open
wants to merge 8 commits into
base: v2.7.2
Choose a base branch
from

Conversation

url54
Copy link

@url54 url54 commented Jan 11, 2024

Issue #, if available:
Currently when using package-requirements if your using constraints in your requirements.txt, these are also added to package_requirements.txt. If you don't modify it afterwards, PIP will fail to install in private MWAA as it won't reach public URL in constraints line.

Description of changes:
Provided a means of "grep" removing the used constraints, and adding in a local constraints.txt file, that is created using wget after package-requirements has completed. Ensuring that everything is completed together, in the same directory.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Added functionality to ensure "package_requirements.txt" doesn't reference a public URL cause packages to not get installed.
Add AIRFLOW_VERSION environment variable to ensure it is usable in entrypoint.sh.
Fixed spacing.
Added contextual information to the readme, about this update and the constraints.txt file.
@charlielu05
Copy link

@url54 I had the same issue here as well. thanks for this PR. wondering if it would be cleaner/possible to add the constraints.txt into the plugins.zip file as well? this way I think you would be able to refer to constraint file in the requirements.txt file using the same way as --find-links? e.g: /usr/local/airflow/plugins/constraints.txt

@charlielu05
Copy link

Modified your code slightly and can confirm this worked for me:

# Download custom python WHL files and package as ZIP if requirements.txt is present
package_requirements() {
    # Download custom python WHL files and package as ZIP if requirements.txt is present
    if [[ -e "$AIRFLOW_HOME/$REQUIREMENTS_FILE" ]]; then
        echo "Packaging requirements.txt into plugins"
        pip3 download -r "$AIRFLOW_HOME/$REQUIREMENTS_FILE" -d "$AIRFLOW_HOME/plugins"
        wget "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION%.*}.txt" -O $AIRFLOW_HOME/plugins/constraints.txt
        cd "$AIRFLOW_HOME/plugins"
        zip "$AIRFLOW_HOME/requirements/plugins.zip" *
        printf '%s\n%s\n' "--no-index" "$(cat $AIRFLOW_HOME/$REQUIREMENTS_FILE)" > "$AIRFLOW_HOME/requirements/packaged_requirements.txt"
        printf '%s\n%s\n%s\n%s\n' "--find-links /usr/local/airflow/plugins" "--constraint /usr/local/airflow/plugins/constraints.txt" "$(cat $AIRFLOW_HOME/requirements/packaged_requirements.txt | grep -v '^--constraint .https://*')" > "$AIRFLOW_HOME/requirements/packaged_requirements.txt"
	      
    fi
}

@url54
Copy link
Author

url54 commented Jan 24, 2024

Hey @charlielu05

I had the same issue here as well. thanks for this PR

No worries, I have been working with local runner and users, just noticed it one day.

wondering if it would be cleaner/possible to add the constraints.txt into the plugins.zip file as well?

That's actually a really good idea. That would make it work, regardless if the user read through all the documentation or not. Before, they would have to know the constraints.txt existed in that directory, and they would have had to move it to their S3 DAGs/ directory knowingly. So yeah, much cleaner and more user friendly, good catch!

Probably need to modify the README.md, from what I suggested, as well then. Since we won't have to inform them that it exists in a directory and that they will have to upload it anymore. Can probably still mention that we are doing it, through the process? But won't be nearly as verbose as I was originally.

@charlielu05
Copy link

hey @url54, thanks for the feedback. Have added the changes to a PR to your fork (not sure if this was best practice? but I wasn't able to commit directly to this PR) here: url54#1

@url54
Copy link
Author

url54 commented Jan 24, 2024

@charlielu05 Hey Charlie, this is my first pull request so I am pretty inexperienced with this XD, whatever you think is best works for me.

Do you need me to do anything specific from my end?

@charlielu05
Copy link

Do you need me to do anything specific from my end?
hey @url54

I've opened a PR to your branch, from what I understand if you approve and merge those changes then it should be reflected here :)

Add constraints.txt to plugins.zip, updated readme
@url54
Copy link
Author

url54 commented Jan 25, 2024

@charlielu05 Done! =)

Copy link
Contributor

@mayushko26 mayushko26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change, and we should port this to the other versions as well to maintain consistency.

@@ -139,6 +139,8 @@ For example usage see [Installing Python dependencies using PyPi.org Requirement
- There is a directory at the root of this repository called plugins.
- In this directory, create a file for your new custom plugin.
- Add any Python dependencies to `requirements/requirements.txt`.
- Adds a local `constraints.txt` file to the `plugins/` directory, this is zipped together into the `plugins.zip` artefact.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: artefact -> artifact

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed spelling to artifact

@@ -139,6 +139,8 @@ For example usage see [Installing Python dependencies using PyPi.org Requirement
- There is a directory at the root of this repository called plugins.
- In this directory, create a file for your new custom plugin.
- Add any Python dependencies to `requirements/requirements.txt`.
- Adds a local `constraints.txt` file to the `plugins/` directory, this is zipped together into the `plugins.zip` artefact.
- Creates a new `packaged_requirements.txt` file with the correct configuration for `--find-links` and `--constraint`. This file should be the one you rename to `requirements.txt` and upload to S3 to be used by MWAA.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming isn't strictly necessary since you can configure the MWAA environment to the requirements file regardless of name. This can be left up to the individual user. The second sentence could be removed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed second sentence.

@@ -21,6 +21,7 @@ ARG INDEX_URL=""
ENV AIRFLOW_HOME=${AIRFLOW_USER_HOME}
ENV PATH="$PATH:/usr/local/airflow/.local/bin:/root/.local/bin:/usr/local/airflow/.local/lib/python3.10/site-packages"
ENV PYTHON_VERSION=3.11.6
ENV AIRFLOW_VERSION=2.7.2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: ENV AIRFLOW_VERSION=$AIRFLOW_VERSION to reduce any source for error and avoid having the same magic-number defined in two places.

Copy link

@charlielu05 charlielu05 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using curly brackets to keep consistency with AIRFLOW_HOME environment variable in line 21

@@ -41,10 +41,11 @@ package_requirements() {
if [[ -e "$AIRFLOW_HOME/$REQUIREMENTS_FILE" ]]; then
echo "Packaging requirements.txt into plugins"
pip3 download -r "$AIRFLOW_HOME/$REQUIREMENTS_FILE" -d "$AIRFLOW_HOME/plugins"
wget "https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION%.*}.txt" -O $AIRFLOW_HOME/plugins/constraints.txt
Copy link
Contributor

@mayushko26 mayushko26 Mar 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you compare the local runner images' constraints file against the one published by Apache Airflow, you will notice there are a few differences for some versions. The source of truth should be the constraints.txt file which exists in this repository. Additionally; users can make changes to this constraints file to fit their use-cases. You can copy it into the plugins directory.

I am also wondering if it's better to place the constraints file in the plugins directory in the package-requirements() function instead of being a step in the startup script. This will ensure that the constraint file will be populated if you delete the plugins directory files before executing the package command.

Copy link

@charlielu05 charlielu05 Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified to fetch the constraints from aws-mwaa-local-runner instead of airflow.
Also for consistency, would it be wise to also switch out the constraint file in requirements/requirements.txt since it is also currently pointing to the apache airflow constraint instead of this repo?
--constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.7.2/constraints-3.11.txt"

On your last point, do you mean to populate the constraint in that startup script instead of this package-requirements function?

cd "$AIRFLOW_HOME/plugins"
zip "$AIRFLOW_HOME/requirements/plugins.zip" *
printf '%s\n%s\n' "--no-index" "$(cat $AIRFLOW_HOME/$REQUIREMENTS_FILE)" > "$AIRFLOW_HOME/requirements/packaged_requirements.txt"
printf '%s\n%s\n' "--find-links /usr/local/airflow/plugins" "$(cat $AIRFLOW_HOME/requirements/packaged_requirements.txt)" > "$AIRFLOW_HOME/requirements/packaged_requirements.txt"
printf '%s\n%s\n%s\n%s\n' "--find-links /usr/local/airflow/plugins" "--constraint /usr/local/airflow/plugins/constraints.txt" "$(cat $AIRFLOW_HOME/requirements/packaged_requirements.txt | grep -v '^--constraint .https://*')" > "$AIRFLOW_HOME/requirements/packaged_requirements.txt"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could loosen the grep to just ^--constraint. There are use-cases where a customer needs to customize the constraints file, so they will use the one present in the config/constraints.txt, instead of the one published by Apache Airflow.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied this change.

@charlielu05
Copy link

@url54, I've made most of the changes that @mayushko26 commented on. Could you kindly review and merge the PR to your repo? url54#2

changes to reflect PR comments
@charlielu05
Copy link

hi @mayushko26, would you be able to kindly review the new merged code?
once you're happy with this we can work on porting it to all the different versions of MWAA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants