Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add warning alert for too high distributor GC CPU utilization #10641

Merged
merged 9 commits into from
Feb 14, 2025

Conversation

aknuds1
Copy link
Contributor

@aknuds1 aknuds1 commented Feb 13, 2025

What this PR does

Add a warning severity alert for too high distributor garbage collection CPU utilization. The motivation is to be alerted if GOMEMLIMIT causes distributors to garbage collect too often.

Additionally, enable three CPU runtime metrics required by the alerts:

  • /cpu/classes/gc/total:cpu-seconds
  • /cpu/classes/total:cpu-seconds
  • /cpu/classes/idle:cpu-seconds

Which issue(s) this PR fixes or relates to

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

@aknuds1 aknuds1 added the enhancement New feature or request label Feb 13, 2025
@aknuds1 aknuds1 force-pushed the arve/distributor-gc-alert branch 5 times, most recently from bbe90b1 to 5ebb476 Compare February 13, 2025 15:06
Copy link
Contributor

github-actions bot commented Feb 13, 2025

💻 Deploy preview deleted.

@aknuds1 aknuds1 force-pushed the arve/distributor-gc-alert branch 4 times, most recently from d88b61b to 4b432f0 Compare February 13, 2025 16:07
@aknuds1 aknuds1 changed the title WIP: Enable three CPU runtime metrics Add alerts for too high distributor GC CPU utilization Feb 13, 2025
@aknuds1 aknuds1 marked this pull request as ready for review February 13, 2025 16:12
@aknuds1 aknuds1 requested review from tacole02 and a team as code owners February 13, 2025 16:12
@aknuds1 aknuds1 requested a review from pracucci February 13, 2025 16:33
@aknuds1 aknuds1 force-pushed the arve/distributor-gc-alert branch from 4b432f0 to 0ae0973 Compare February 13, 2025 16:34
@aknuds1 aknuds1 force-pushed the arve/distributor-gc-alert branch from ee730d1 to c53d7bf Compare February 14, 2025 07:56
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
Copy link
Collaborator

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed doing pair programming. LGTM!

Signed-off-by: Arve Knudsen <[email protected]>
@aknuds1 aknuds1 changed the title Add alerts for too high distributor GC CPU utilization Add alert for too high distributor GC CPU utilization Feb 14, 2025
Signed-off-by: Arve Knudsen <[email protected]>
Signed-off-by: Arve Knudsen <[email protected]>
@aknuds1 aknuds1 changed the title Add alert for too high distributor GC CPU utilization Add warning alert for too high distributor GC CPU utilization Feb 14, 2025
@aknuds1 aknuds1 enabled auto-merge (squash) February 14, 2025 09:36
@aknuds1 aknuds1 merged commit 1ce9dff into main Feb 14, 2025
32 checks passed
@aknuds1 aknuds1 deleted the arve/distributor-gc-alert branch February 14, 2025 09:52
ying-jeanne pushed a commit that referenced this pull request Feb 19, 2025
* Enable three CPU runtime metrics
* operations: Add warning alert for too high distributor GC CPU usage

---------

Signed-off-by: Arve Knudsen <[email protected]>
Co-authored-by: Taylor C <[email protected]>
ying-jeanne pushed a commit that referenced this pull request Feb 20, 2025
* Enable three CPU runtime metrics
* operations: Add warning alert for too high distributor GC CPU usage

---------

Signed-off-by: Arve Knudsen <[email protected]>
Co-authored-by: Taylor C <[email protected]>
@grafanabot
Copy link
Contributor

The backport to r314 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new branch
git switch --create backport-10641-to-r314 origin/r314
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x 1ce9dff5f306b906eff412bf9b5f62c6ae9dd102
# Push it to GitHub
git push --set-upstream origin backport-10641-to-r314
git switch main
# Remove the local backport branch
git branch -D backport-10641-to-r314

Then, create a pull request where the base branch is r314 and the compare/head branch is backport-10641-to-r314.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants