[Serve] Deflake `test_metrics` #47750

GeneDer · 2024-09-19T18:37:06Z

Why are these changes needed?

Split out test_metrics to run on it's own so the metrics will not be polluted by other tests.

Related issue number

Closes #45843

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Gene Su <[email protected]>

GeneDer · 2024-09-23T04:13:11Z

Tests 5 runs (29452, 29455, 29457, 29459, 29461) and not seeing any more failures with this change

python/ray/serve/tests/test_metrics.py

edoakes · 2024-09-23T15:15:16Z

@GeneDer can you explain a little more about what the issue was an how this solves it? And also why is this specific to windows?

GeneDer · 2024-09-23T16:06:46Z

@GeneDer can you explain a little more about what the issue was an how this solves it? And also why is this specific to windows?

This test was running in windows before, there is no change on that.

I think the main issue is there are some race conditions between different tests running in this build and polluting the metrics, so often time we see this test failing (timed out with condition never met).

Also after I factor this test out, we are seeing those unexpected kwarg error, so those get fixed as well.

edoakes · 2024-09-23T22:10:43Z

@GeneDer can you explain a little more about what the issue was an how this solves it? And also why is this specific to windows?

This test was running in windows before, there is no change on that.

I think the main issue is there are some race conditions between different tests running in this build and polluting the metrics, so often time we see this test failing (timed out with condition never met).

Also after I factor this test out, we are seeing those unexpected kwarg error, so those get fixed as well.

We should be able to set up the fixtures properly to clear all of the expected metrics. We rely on metrics in quite a few tests, so it's probably better to nail down the root cause there

Signed-off-by: Gene Su <[email protected]>

GeneDer · 2024-09-26T15:09:15Z

@edoakes I refactored the call to clean up metrics between tests into fixture and call on each of the tests here. Can you PTAL?

edoakes · 2024-09-27T16:34:19Z

python/ray/serve/tests/test_metrics.py

+    requests.post(delete_all_series_url)
+    requests.post(clean_tombstones_url)
+
+
 @pytest.fixture
 def serve_start_shutdown():
    """Fixture provides a fresh Ray cluster to prevent metrics state sharing."""


Hm shouldn't this be sufficient on its own? The prometheus endpoint is the raylet, so if ray is shut down between runs there should be no state sharing.

What am I missing?

My expectation is there are something that's not cleaned up in between those tests. And in fact adding those calls seems to helped. Now that thinking through it again maybe just adding some sleep in between will also help the same way and maybe the issue is serve and/or ray wasn't complete shutdown before the next test starts? 🤔 Let me do some more experiments

we should not add any sleeps -- if we need to wait for anything to clean up, then explicitly wait for the cleanup to happen

sleeps are what make things flaky in the first place

Signed-off-by: Gene Su <[email protected]>

GeneDer · 2024-10-15T16:17:30Z

Close for now, will dig into this deeper when we have more bandwidth. There still seemed to have some port binding issues with this change

right size tests

a0d6bd0

Signed-off-by: Gene Su <[email protected]>

GeneDer added the go add ONLY when ready to merge, run all tests label Sep 19, 2024

GeneDer force-pushed the deflak-test-metrics branch from cac6f7e to a0d6bd0 Compare September 19, 2024 23:03

GeneDer added 10 commits September 19, 2024 17:40

trigger another build

20bb4b5

Signed-off-by: Gene Su <[email protected]>

factor out test_metrics on it's own and use large sized test

e688c20

Signed-off-by: Gene Su <[email protected]>

fix

ecbc2f6

Signed-off-by: Gene Su <[email protected]>

fix tag

320d1ba

Signed-off-by: Gene Su <[email protected]>

Merge branch 'master' into deflak-test-metrics

1870bdc

fix kwargs

70fbb9d

Signed-off-by: Gene Su <[email protected]>

try again

9165510

Signed-off-by: Gene Su <[email protected]>

test again

d15d0d0

Signed-off-by: Gene Su <[email protected]>

test again

2ffefce

Signed-off-by: Gene Su <[email protected]>

test again

8e22021

Signed-off-by: Gene Su <[email protected]>

GeneDer changed the title ~~right size tests~~ [Serve] Deflake test_metrics Sep 23, 2024

GeneDer marked this pull request as ready for review September 23, 2024 04:11

GeneDer self-assigned this Sep 23, 2024

GeneDer requested a review from a team September 23, 2024 14:57

zcin approved these changes Sep 23, 2024

View reviewed changes

edoakes reviewed Sep 23, 2024

View reviewed changes

python/ray/serve/tests/test_metrics.py Outdated Show resolved Hide resolved

GeneDer marked this pull request as draft September 23, 2024 23:20

GeneDer added 6 commits September 24, 2024 14:36

revert change and add logics to clean up metrics between tests

efceb03

Signed-off-by: Gene Su <[email protected]>

lint

eb15873

Signed-off-by: Gene Su <[email protected]>

check health for prometheus before cleanup

9502dd4

Signed-off-by: Gene Su <[email protected]>

refactor clean up metrics as a fixture

71d1336

Signed-off-by: Gene Su <[email protected]>

test again

dab214c

Signed-off-by: Gene Su <[email protected]>

test again

22f9145

Signed-off-by: Gene Su <[email protected]>

GeneDer added 2 commits September 25, 2024 15:32

test again

46ce3a0

Signed-off-by: Gene Su <[email protected]>

test again

3e63207

Signed-off-by: Gene Su <[email protected]>

GeneDer marked this pull request as ready for review September 26, 2024 14:46

edoakes reviewed Sep 27, 2024

View reviewed changes

GeneDer marked this pull request as draft September 27, 2024 16:56

GeneDer added 17 commits September 27, 2024 17:02

clean up serve and ray before and after the tests

0a5cbf0

Signed-off-by: Gene Su <[email protected]>

try again

04ee77c

Signed-off-by: Gene Su <[email protected]>

try again

869594b

Signed-off-by: Gene Su <[email protected]>

Merge branch 'master' into deflak-test-metrics

0838d2a

Merge branch 'master' into deflak-test-metrics

25cc12a

Merge branch 'master' into deflak-test-metrics

ff1a839

try again

1482c03

Signed-off-by: Gene Su <[email protected]>

try again

e9a88c6

Signed-off-by: Gene Su <[email protected]>

only decrement num_scheduling_tasks_in_backoff if it's greater than 0

747d479

Signed-off-by: Gene Su <[email protected]>

try again

79fce0e

Signed-off-by: Gene Su <[email protected]>

try again

d924b7e

Signed-off-by: Gene Su <[email protected]>

try again

650452c

Signed-off-by: Gene Su <[email protected]>

wait for proxies to be healthy before starting any tests

e0aa69c

Signed-off-by: Gene Su <[email protected]>

try again

64d88a7

Signed-off-by: Gene Su <[email protected]>

Merge branch 'master' into deflak-test-metrics

853662d

try again

709a0a9

Signed-off-by: Gene Su <[email protected]>

try again

00a45bf

Signed-off-by: Gene Su <[email protected]>

GeneDer closed this Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Serve] Deflake `test_metrics` #47750

[Serve] Deflake `test_metrics` #47750

GeneDer commented Sep 19, 2024 •

edited

Loading

GeneDer commented Sep 23, 2024

edoakes commented Sep 23, 2024

GeneDer commented Sep 23, 2024

edoakes commented Sep 23, 2024

GeneDer commented Sep 26, 2024

edoakes Sep 27, 2024

GeneDer Sep 27, 2024

edoakes Sep 27, 2024

GeneDer commented Oct 15, 2024

[Serve] Deflake test_metrics #47750

[Serve] Deflake test_metrics #47750

Conversation

GeneDer commented Sep 19, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

GeneDer commented Sep 23, 2024

edoakes commented Sep 23, 2024

GeneDer commented Sep 23, 2024

edoakes commented Sep 23, 2024

GeneDer commented Sep 26, 2024

edoakes Sep 27, 2024

Choose a reason for hiding this comment

GeneDer Sep 27, 2024

Choose a reason for hiding this comment

edoakes Sep 27, 2024

Choose a reason for hiding this comment

GeneDer commented Oct 15, 2024

[Serve] Deflake `test_metrics` #47750

[Serve] Deflake `test_metrics` #47750

GeneDer commented Sep 19, 2024 •

edited

Loading