Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCO-3302] Migrate to uv for package and environment management #809

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 12 additions & 12 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,6 @@ executors:
image-build-executor:
docker:
- image: cimg/base:2025.02
python-executor:
docker:
- image: cimg/python:3.12
ubuntu-executor:
machine:
image: ubuntu-2004:2024.08.1
Expand Down Expand Up @@ -75,17 +72,19 @@ workflows:

jobs:
checks:
executor: python-executor
executor: ubuntu-executor
steps:
- checkout
- setup-python-on-machine
- run:
name: Code linting
command: make -k lint
unit-tests:
executor: python-executor
executor: ubuntu-executor
steps:
- checkout
- gcp-cli/setup
- setup-python-on-machine
- run:
name: Create Workspace
command: mkdir -p workspace
Expand Down Expand Up @@ -143,9 +142,10 @@ jobs:
destination: gs://ecosystem-test-eng-metrics/merino-py/coverage
extension: json
test-coverage-check:
executor: python-executor
executor: ubuntu-executor
steps:
- checkout
- setup-python-on-machine
- attach_workspace:
at: workspace
- run:
Expand Down Expand Up @@ -274,24 +274,24 @@ commands:
echo "${DOCKER_PASS}" | docker login -u="${DOCKER_USER}" --password-stdin
fi

# set up python3.12 and poetry on the ubuntu machine used to run integration tests
# set up python3.12 and uv on the ubuntu machine used to run integration tests
setup-python-on-machine:
steps:
- run:
# the ubuntu machine image we are using comes with python and pyenv
name: Set python version to 3.12
command: pyenv global 3.12
- run:
name: Install poetry
command: curl -sSL https://install.python-poetry.org | python3.12 -
name: Install uv
command: curl -LsSf https://astral.sh/uv/install.sh | sh
- run:
name: Add poetry to PATH
name: Add uv to PATH
command: echo 'export PATH="$HOME/.local/bin:$PATH"' >> $BASH_ENV
- run:
name: Verify Python and poetry installation
name: Verify Python and uv installation
command: |
python3.12 --version
poetry --version
uv --version

upload_to_gcs:
parameters:
Expand Down
13 changes: 8 additions & 5 deletions .github/actions/weave/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,16 @@ inputs:
runs:
using: "composite"
steps:
- name: Install poetry
run: pipx install poetry
- uses: actions/checkout@v4
- name: Install uv
run: pipx install uv
shell: bash
- uses: actions/setup-python@v5

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.12"
cache: "poetry"
python-version-file: "pyproject.toml"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately we can't use the official uv github action (astral-sh/setup-uv@v5) because it is not approved by SRE. Hence, using pipx to manually install

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you note this in the file perhaps with the SRE link so we can follow up later?

- name: Create Workspace
run: mkdir -p ${{ inputs.workspace-path }}
shell: bash
57 changes: 30 additions & 27 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,51 +1,54 @@
ARG PYTHON_VERSION=3.12

# This stage is used to generate requirements.txt from Poetry
# Build stage
FROM python:${PYTHON_VERSION}-slim AS build

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /bin/

WORKDIR /tmp

# Pin Poetry to reduce image size
RUN pip install --no-cache-dir --quiet poetry-plugin-export
# Install build dependencies for MaxMindDB
RUN apt-get update && \
apt-get install --yes build-essential libmaxminddb0 libmaxminddb-dev && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both libmaxminddb0 and libmaxminddb-dev are C extensions for geoip2 hence are required for the stage app_base.

apt-get clean && \
rm -rf /var/lib/apt/lists/*

# Copy the project files
COPY ./pyproject.toml ./uv.lock /tmp/

COPY ./pyproject.toml ./poetry.lock /tmp/
# Install dependencies using uv
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-install-project --no-editable

# Generating a requirements.txt from Poetry with the main and jobs dependencies i.e:
# [tool.poetry.dependencies] and [tool.poetry.group.jobs.dependencies]
RUN poetry export --no-interaction --with main,jobs --output requirements.txt --without-hashes
# Sync the project environment
RUN --mount=type=cache,target=/root/.cache/uv \
uv sync --frozen --no-editable

# Create virtual environment
RUN uv venv /opt/venv

# App base stage
FROM python:${PYTHON_VERSION}-slim AS app_base

# Allow statements and log messages to immediately appear
ENV PYTHONUNBUFFERED=True

ENV VIRTUAL_ENV=/opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Set app home
ENV APP_HOME=/app

WORKDIR $APP_HOME

RUN groupadd --gid 10001 app \
&& useradd -m -g app --uid 10001 -s /usr/sbin/nologin app
RUN groupadd --gid 10001 app && \
useradd -m -g app --uid 10001 -s /usr/sbin/nologin app

# Copy local code to the container image.
COPY . $APP_HOME
# Copy the virtual environment from the build stage
COPY --from=build --chown=app:app /opt/venv /opt/venv
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need uv here to activate the virtual environment for the container.

When I tried the container locally via:

$ docker build -t merino-py .
$ docker run --rm -d -p 8000:8000 -e MERINO_ENV=production --name merino-py merino-py

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: exec: "uvicorn": executable file not found in $PATH: unknown
$ docker run --rm -it --entrypoint bash merino-py

# Run this inside the container
$ python
> import geoip2.database

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'geoip2'


COPY --from=build /tmp/requirements.txt $APP_HOME/requirements.txt
# Copy the application code
COPY . $APP_HOME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're here, could you add .ruff_cache to .dockerignore? Just noticed that it sneaked into my local docker image.


# Install libmaxminddb* to build the MaxMindDB Python client with C extension.
RUN apt-get update && \
apt-get install --yes build-essential libmaxminddb0 libmaxminddb-dev && \
pip install uv && \
uv venv $VIRTUAL_ENV && \
uv pip install --no-cache-dir --quiet --upgrade -r requirements.txt && \
apt-get remove --yes build-essential && \
apt-get -q --yes autoremove && \
apt-get clean && \
rm -rf /root/.cache

# Create a build context that can be used for running merino jobs
# Job runner stage
FROM app_base AS job_runner

ENTRYPOINT ["python", "-m", "merino.jobs.cli"]
Expand Down
46 changes: 23 additions & 23 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ INTEGRATION_TEST_DIR := $(TEST_DIR)/integration
LOAD_TEST_DIR := $(TEST_DIR)/load
APP_AND_TEST_DIRS := $(APP_DIR) $(TEST_DIR)
INSTALL_STAMP := .install.stamp
POETRY := $(shell command -v poetry 2> /dev/null)
UV := $(shell command -v uv 2> /dev/null)

# In order to be consumed by the ETE Test Metric Pipeline, files need to follow a strict naming
# convention: {job_number}__{utc_epoch_datetime}__{workflow}__{test_suite}__results{-index}.xml
Expand All @@ -23,79 +23,79 @@ INTEGRATION_COVERAGE_JSON := $(TEST_RESULTS_DIR)/$(TEST_FILE_PREFIX)integration_
.DEFAULT_GOAL := help

.PHONY: install
install: $(INSTALL_STAMP) ## Install dependencies with poetry
$(INSTALL_STAMP): pyproject.toml poetry.lock
@if [ -z $(POETRY) ]; then echo "Poetry could not be found. See https://python-poetry.org/docs/"; exit 2; fi
$(POETRY) install
install: $(INSTALL_STAMP) ## Install dependencies with uv
$(INSTALL_STAMP): pyproject.toml uv.lock
@if [ -z $(UV) ]; then echo "uv could not be found."; exit 2; fi
$(UV) sync
touch $(INSTALL_STAMP)

.PHONY: ruff-lint
ruff-lint: $(INSTALL_STAMP) ## Run ruff linting
$(POETRY) run ruff check $(APP_AND_TEST_DIRS)
$(UV) run ruff check $(APP_AND_TEST_DIRS)

.PHONY: ruff-fmt
ruff-fmt: $(INSTALL_STAMP) ## Run ruff format checker
$(POETRY) run ruff format --check $(APP_AND_TEST_DIRS)
$(UV) run ruff format --check $(APP_AND_TEST_DIRS)

.PHONY: ruff-format
ruff-format: $(INSTALL_STAMP) ## Run ruff format
$(POETRY) run ruff format $(APP_AND_TEST_DIRS)
$(UV) run ruff format $(APP_AND_TEST_DIRS)

.PHONY: bandit
bandit: $(INSTALL_STAMP) ## Run bandit
$(POETRY) run bandit --quiet -r $(APP_AND_TEST_DIRS) -c "pyproject.toml"
$(UV) run bandit --quiet -r $(APP_AND_TEST_DIRS) -c "pyproject.toml"

.PHONY: mypy
mypy: $(INSTALL_STAMP) ## Run mypy
$(POETRY) run mypy $(APP_AND_TEST_DIRS) --config-file="pyproject.toml"
$(UV) run mypy $(APP_AND_TEST_DIRS) --config-file="pyproject.toml"

.PHONY: lint
lint: $(INSTALL_STAMP) ruff-lint ruff-fmt bandit mypy ## Run various linters

.PHONY: format
format: $(INSTALL_STAMP) ## Sort imports and reformat code
$(POETRY) run ruff check --fix $(APP_AND_TEST_DIRS)
$(POETRY) run ruff format $(APP_AND_TEST_DIRS)
$(UV) run ruff check --fix $(APP_AND_TEST_DIRS)
$(UV) run ruff format $(APP_AND_TEST_DIRS)

.PHONY: dev
dev: $(INSTALL_STAMP) ## Run merino locally and reload automatically
$(POETRY) run uvicorn $(APP_DIR).main:app --reload
$(UV) run uvicorn $(APP_DIR).main:app --reload

.PHONY: run
run: $(INSTALL_STAMP) ## Run merino locally
$(POETRY) run uvicorn $(APP_DIR).main:app
$(UV) run uvicorn $(APP_DIR).main:app

.PHONY: test
test: unit-tests integration-tests test-coverage-check ## Run unit and integration tests and evaluate combined coverage

.PHONY: test-coverage-check
test-coverage-check: $(INSTALL_STAMP) ## Evaluate combined unit and integration test coverage
$(POETRY) run coverage combine --data-file=$(TEST_RESULTS_DIR)/.coverage
$(POETRY) run coverage report \
$(UV) run coverage combine --data-file=$(TEST_RESULTS_DIR)/.coverage
$(UV) run coverage report \
--data-file=$(TEST_RESULTS_DIR)/.coverage \
--fail-under=$(COV_FAIL_UNDER)

.PHONY: unit-tests
unit-tests: $(INSTALL_STAMP) ## Run unit tests
COVERAGE_FILE=$(TEST_RESULTS_DIR)/.coverage.unit \
MERINO_ENV=testing \
$(POETRY) run pytest $(UNIT_TEST_DIR) \
$(UV) run pytest $(UNIT_TEST_DIR) \
--junit-xml=$(UNIT_JUNIT_XML)

.PHONY: unit-test-fixtures
unit-test-fixtures: $(INSTALL_STAMP) ## List fixtures in use per unit test
MERINO_ENV=testing $(POETRY) run pytest $(UNIT_TEST_DIR) --fixtures-per-test
MERINO_ENV=testing $(UV) run pytest $(UNIT_TEST_DIR) --fixtures-per-test

.PHONY: integration-tests
integration-tests: $(INSTALL_STAMP) ## Run integration tests
COVERAGE_FILE=$(TEST_RESULTS_DIR)/.coverage.integration \
MERINO_ENV=testing \
$(POETRY) run pytest $(INTEGRATION_TEST_DIR) \
$(UV) run pytest $(INTEGRATION_TEST_DIR) \
--junit-xml=$(INTEGRATION_JUNIT_XML)

.PHONY: integration-test-fixtures
integration-test-fixtures: $(INSTALL_STAMP) ## List fixtures in use per integration test
MERINO_ENV=testing $(POETRY) run pytest $(INTEGRATION_TEST_DIR) --fixtures-per-test
MERINO_ENV=testing $(UV) run pytest $(INTEGRATION_TEST_DIR) --fixtures-per-test

.PHONY: docker-build
docker-build: ## Build the docker image for Merino named "app:build"
Expand Down Expand Up @@ -155,7 +155,7 @@ help:

.PHONY: wikipedia-indexer
wikipedia-indexer:
$(POETRY) run merino-jobs $@ ${job}
$(UV) run merino-jobs $@ ${job}

.PHONY: health-check-prod
health-check-prod: ## Check the production suggest endpoint with some test queries
Expand All @@ -167,12 +167,12 @@ health-check-staging: ## Check the staging suggest endpoint with some test que

.PHONY: coverage-unit
coverage-unit:
$(POETRY) run coverage json \
$(UV) run coverage json \
--data-file=$(TEST_RESULTS_DIR)/.coverage.unit \
-o $(UNIT_COVERAGE_JSON)

.PHONY: coverage-integration
coverage-integration:
$(POETRY) run coverage json \
$(UV) run coverage json \
--data-file=$(TEST_RESULTS_DIR)/.coverage.integration \
-o $(INTEGRATION_COVERAGE_JSON)
3 changes: 1 addition & 2 deletions merino/configs/app_configs/config_sentry.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,7 @@ def strip_sensitive_data(event: Event, hint: Hint) -> Event | None:
case {"values": {"q": _}}:
vars["values"]["q"] = REDACTED_TEXT
case {"solved_result": [{"q": _}, *_]}:
# https://github.com/python/mypy/issues/12770
vars["solved_result"][0]["q"] = REDACTED_TEXT # type: ignore
vars["solved_result"][0]["q"] = REDACTED_TEXT
case _:
pass

Expand Down
24 changes: 4 additions & 20 deletions merino/providers/suggest/weather/backends/accuweather/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,11 +99,7 @@ def process_location_response_with_country_and_region(
},
*_,
]:
# `type: ignore` is necessary because mypy gets confused when
# matching structures of type `Any` and reports the following
# line as unreachable. See
# https://github.com/python/mypy/issues/12770
return { # type: ignore
return {
"key": key,
"localized_name": localized_name,
"administrative_area_id": administrative_area_id,
Expand All @@ -128,11 +124,7 @@ def process_location_response_with_country(
"AdministrativeArea": {"ID": administrative_area_id},
},
]:
# `type: ignore` is necessary because mypy gets confused when
# matching structures of type `Any` and reports the following
# line as unreachable. See
# https://github.com/python/mypy/issues/12770
return { # type: ignore
return {
"key": key,
"localized_name": localized_name,
"administrative_area_id": administrative_area_id,
Expand All @@ -159,11 +151,7 @@ def process_current_condition_response(response: Any) -> dict[str, Any] | None:
},
}
]:
# `type: ignore` is necessary because mypy gets confused when
# matching structures of type `Any` and reports the following
# lines as unreachable. See
# https://github.com/python/mypy/issues/12770
url = add_partner_code(url, PARTNER_PARAM_ID, PARTNER_CODE) # type: ignore
url = add_partner_code(url, PARTNER_PARAM_ID, PARTNER_CODE)
return {
"url": url,
"summary": summary,
Expand Down Expand Up @@ -197,11 +185,7 @@ def process_forecast_response(response: Any) -> dict[str, Any] | None:
}
],
}:
# `type: ignore` is necessary because mypy gets confused when
# matching structures of type `Any` and reports the following
# lines as unreachable. See
# https://github.com/python/mypy/issues/12770
url = add_partner_code(url, PARTNER_PARAM_ID, PARTNER_CODE) # type: ignore
url = add_partner_code(url, PARTNER_PARAM_ID, PARTNER_CODE)

return {
"url": url,
Expand Down
Loading