Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite execution of microbatch models to avoid blocking the main thread #11332

Merged

Conversation

QMalcolm
Copy link
Contributor

@QMalcolm QMalcolm commented Feb 24, 2025

Resolves #11243
Resolves #11306

Problem

There are two problems

  1. Executing microbatch model batches concurrently would block the main thread from scheduling other nodes as described in [Bug] Microbatch models shouldn't block the main thread in multi-threaded dbt runs. #11243
  2. In certain scenarios, a microbatch model would hang indefinitely as described in [Bug] Microbatch can cause dbt runs to hang #11306

Solution

Checklist

  • I have read the contributing guide and understand what's expected of me.
  • I have run this code in development, and it appears to resolve the stated issue.
  • This PR includes tests, or tests are not required or relevant for this PR.
  • This PR has no interface changes (e.g., macros, CLI, logs, JSON artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX.
  • This PR includes type annotations for new and modified functions.

…stration to a runner

We're working to ensure the orchestration of microbatch batches doesn't block the main thread.
This will require a lot of disentangling that currently exists in run.py. As such, it made sense
to "quickly" stub out a guide of what needs to be done.
The `MicrobatchBatchRunner` will be for running individual batches,
whereas the `MicrobatchModelRunner` will handle the orchestration
of the batches to be run for a given model.
…Runner` directly

Previously `handle_job_queue` considered `MicrobatchModelRunner` special cases, and delegated
to `handle_microbatch_model` to orchestrate the batches instead of delegating to the
`MicrobatchModelRunner` directly. Now that the `MicrobatchModelRunner` will be handling batch
orchestration, we can appropriately delegate to it, and  remove the special casing.
The function won't work as is, but I felt it better to straight copy, commit,
and then modify it to work in the runner context iteratively.
@QMalcolm QMalcolm added the Skip Changelog Skips GHA to check for changelog file label Feb 24, 2025
@cla-bot cla-bot bot added the cla:yes label Feb 24, 2025
@dbt-labs dbt-labs deleted a comment from github-actions bot Feb 24, 2025
Copy link

codecov bot commented Feb 24, 2025

Codecov Report

Attention: Patch coverage is 89.55224% with 21 lines in your changes missing coverage. Please review.

Project coverage is 88.86%. Comparing base (f7c4c3c) to head (ef461da).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #11332      +/-   ##
==========================================
- Coverage   88.97%   88.86%   -0.12%     
==========================================
  Files         189      190       +1     
  Lines       24182    24197      +15     
==========================================
- Hits        21517    21503      -14     
- Misses       2665     2694      +29     
Flag Coverage Δ
integration 86.07% <87.06%> (-0.22%) ⬇️
unit 62.59% <31.34%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
Unit Tests 62.59% <31.34%> (+0.03%) ⬆️
Integration Tests 86.07% <87.06%> (-0.22%) ⬇️

We don't need these functions in `MicrobatchModelRunner` because the
inherited versions of these methods from `ModelRunner` will work for
our needs. Of note, we can probably also remove the need of having these
functions in `MicrobatchBatchRunner` by renaming the `print_batch_start_line`
and `print_batch_result_line` to the method names that the `ModelRunner`
methods call.
…unner`

The `MicrobatchModelRunner.compile` does nothing because `MicrobatchModelRunner`
only orchestrates the batches of the model to run, and doesn't actually run
the sql of the model. Thus compilation is unnecessary in `MicrobatchModelRunner`
Of note, implementing `on_skip` for `MicrobatchModelRunner` is unecessary
because the inherited `on_skip` suffices.
Previously `build_jinja_context_batch` was an instance specific method
of `MicrobatchBuilder`. An issue with this is that with the now existant
split of `MicrobatchModelRunner` and `MicrobatchBatchRunner` we'd either
need to pass the `MicrobatchBuilder` from the `MicrobatchModelRunner` to
the `MicrobatchBatchRunner`, or instantiate a new `MicrobatchBuilder` in
every `MicrobatchBatchRunner`. The issue with the former is that the
passed in `MicrobatchBuilder` wouldn't have the `compiled_code` on the
`model`. We could instead do the latter option, but instantiating a new
but that seems unnecessary, when the method can easily become a static
method.
@QMalcolm QMalcolm force-pushed the qmalcolm--11243-stop-microbatch-from-blocking-main-thread branch from 3e857dd to 35bc7ce Compare February 26, 2025 19:28
QMalcolm added 9 commits March 2, 2025 20:34
…rially

The orchestration of batches being moved onto a runner, the `MicrobatchModelRunner`,
sending a `KeyboardInterrupt` to the process no longer stopped things. This is because
we previously relied on closing all active adapter contections to stop currently being
executed tasks. However, the `MicrobatchModelRunner` doesn't have any active data
warehouse connections itself, as adapter conections for batches are opened by the
`MicrobatchBatchRunner`. Because of this, the closing of connections would cancel a
running batch, but then the next batch would be submitted (and open a new connection).

To stop this from happening, we needed a way to stop new batches from being submitted.
To do this, we created a new `DbtThreadPool` which tracks whether or not it's been closed.
If it's closed, then `_submit_batch` skips the batch entirely.

NOTE: This only works if the batches are running serially. It does not work if the batches
are being run concurrently as the orchestrator submits all of the batches immediately. Thus
checking on `_submit_batch` is ineffective. We'll address this in the next commit.
…rently when interrupted

In the previous commit we made it such that microbatch model execution could
be halted when batches were being executed serially. However, that work did
not make it such that the microbatch model execution would shut down when
executing batches concurrently. This change, fixes that issue. Additionally
we deleted a test. Unfortunately it is no longer reliably possible to test
KeyboardInterrupts of microbatch models as we don't have a way to fire a keyboard
interrupt at the right time consistently in our testing environment. The test
that existed would hang indefinitely, as a keyboard interrupt was being raised
on a thread that was not the main thread (which is impossible in the real world, as
keyboard interrupts are always fired from the mian thread).
…execute`

The lines in for tracking/printing at the end of `MicrobatchmodelRunner.execute`
are not necessary because the `after_execute` inherited from `ModelRunner` does
both of these things. Thus the lines at the end of `MicrobatchModelRunner.execute`
were duplicative.
The `MicrobatchBatchRunner` never uses `describe_node` as it instead
uses `describe_batch`. Thus, `describe_node` serves no purpose.
Removing this special logic is safe, and the test `TestMicrobatchModelSkipped`
confirms this.
@QMalcolm QMalcolm marked this pull request as ready for review March 3, 2025 19:19
@QMalcolm QMalcolm requested a review from a team as a code owner March 3, 2025 19:19
@QMalcolm QMalcolm removed the Skip Changelog Skips GHA to check for changelog file label Mar 3, 2025
from multiprocessing.pool import ThreadPool


class DbtThreadPool(ThreadPool):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We created DbtThreadPool so that we can have visibility on whether .close() has been called on the pool. This class is now used instead of ThreadPool.

@QMalcolm QMalcolm merged commit 94b6ae1 into main Mar 3, 2025
55 of 57 checks passed
@QMalcolm QMalcolm deleted the qmalcolm--11243-stop-microbatch-from-blocking-main-thread branch March 3, 2025 21:21
Copy link
Contributor

github-actions bot commented Mar 3, 2025

The backport to 1.9.latest failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-1.9.latest 1.9.latest
# Navigate to the new working tree
cd .worktrees/backport-1.9.latest
# Create a new branch
git switch --create backport-11332-to-1.9.latest
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 94b6ae13b3c9bf1ae231d0bdc4b81c9d8cf712c0
# Push it to GitHub
git push --set-upstream origin backport-11332-to-1.9.latest
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-1.9.latest

Then, create a pull request where the base branch is 1.9.latest and the compare/head branch is backport-11332-to-1.9.latest.

QMalcolm added a commit that referenced this pull request Mar 3, 2025
…ead (#11332)

* Push orchestration of batches previously in the `RunTask` into `MicrobatchModelRunner`

* Split `MicrobatchModelRunner` into two separate runners

`MicrobatchModelRunner` is now an orchestrator of `MicrobatchBatchRunner`s, the latter being what handle actual batch execution

* Introduce new `DbtThreadPool` that knows if it's been closed

* Enable `MicrobatchModelRunner` to shutdown gracefully when it detects the thread pool has been closed
QMalcolm added a commit that referenced this pull request Mar 7, 2025
…ead (#11332) (#11349)

* Push orchestration of batches previously in the `RunTask` into `MicrobatchModelRunner`

* Split `MicrobatchModelRunner` into two separate runners

`MicrobatchModelRunner` is now an orchestrator of `MicrobatchBatchRunner`s, the latter being what handle actual batch execution

* Introduce new `DbtThreadPool` that knows if it's been closed

* Enable `MicrobatchModelRunner` to shutdown gracefully when it detects the thread pool has been closed

Co-authored-by: Michelle Ark <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants