Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add build team availability alerts #353

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tnevrlka
Copy link
Member

Jira: STONEBLD-2651

Add availability metrics for the build team which fire after five minutes of not having the konflux_up parameter

  • Add GitHubAppFailureAlert for build-service
  • Add QuayFailureAlert for image-controller

The alerts include runbook_url to currently non-existing SOPs. There is a MR open which adds the SOPs, but it's not yet merged (will mark this PR ready for review after it's merged)

- Add GitHubAppFailureAlert for build-service
- Add QuayFailureAlert for image-controller

Signed-off-by: Tomas Nevrlka <[email protected]>
@tnevrlka tnevrlka marked this pull request as ready for review October 30, 2024 16:00
@mftb
Copy link
Collaborator

mftb commented Oct 30, 2024

Just for confirmation, both alerts should only be warnings and both should not be SLO alerts as well?

@tnevrlka
Copy link
Member Author

tnevrlka commented Oct 30, 2024

Just for confirmation, both alerts should only be warnings and both should not be SLO alerts as well?

That was my intention, yes. Do you think it shouldn't be a warning?

@mftb
Copy link
Collaborator

mftb commented Oct 30, 2024

That is not a problem per se, but maybe it is worth checking what those alerts are trying to achieve. Essentially, non-SLO, warning alerts won't get follow up actions from SREs, for example. If that is the intention, then we are fine.

On the other hand, if you intend for those alerts to be actionable, they should be critical SLO alerts with corresponding actionable SOPs.

I hope this broader context brings a little bit of clarity. That said, whether the alerts should be SLO or not, critical or warnings, is up to the build team.

@tnevrlka
Copy link
Member Author

Thanks. Since there's not really anything for SREs to do, I think warning is fine

Copy link
Collaborator

@mftb mftb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants