Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(crons): Record stats for volume history at clock tick #79574

Conversation

evanpurkhiser
Copy link
Member

@evanpurkhiser evanpurkhiser commented Oct 22, 2024

This adds a function _evaluate_tick_decision which looks back at the last MONITOR_VOLUME_RETENTION days worth of history and compares the minute we just ticked past to that data.

We record 3 metrics from this comparison

  • z_value: This is measured as a ratio of standard deviations from the mean value
  • pct_deviation: This is the percentage we've deviated from the mean
  • count: This is the number of historic data points we're considering

The z_value and pct_deviation will be most helpful in making our decision as to whether we've entered an "incident" state or not.

Part of #79328

@evanpurkhiser evanpurkhiser requested a review from a team as a code owner October 22, 2024 22:47
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Oct 22, 2024
src/sentry/monitors/clock_dispatch.py Outdated Show resolved Hide resolved
src/sentry/monitors/clock_dispatch.py Outdated Show resolved Hide resolved
src/sentry/monitors/clock_dispatch.py Show resolved Hide resolved
src/sentry/monitors/clock_dispatch.py Show resolved Hide resolved
src/sentry/monitors/clock_dispatch.py Outdated Show resolved Hide resolved
historic_mean = statistics.mean(historic_volume)
historic_stdev = statistics.stdev(historic_volume)

historic_stdev_pct = (historic_stdev / historic_mean) * 100
Copy link
Member

@ram-senth ram-senth Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, this metric (aka coefficient of variation) is not used in the actual logic. Is that intentional?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah right now all this function is doing is recording metrics.

I want to see what the numbers look like before making any decisions on what our thresholds are.


# Calculate the z-score of our past minutes volume in comparison to the
# historic volume data. The z-score is measured in terms of standard
# deviations from the mean
Copy link
Member

@ram-senth ram-senth Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interpretation of z-score measuring number of standard deviations from the mean is applicable only for normally distributed data. I would recommend looking at the distribution of per-minute volume. If it not normally distributed then I would recommend using different metric. Seer uses interquartile range for this same reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to include it for now. I'll take a look out our existing data but I am pretty sure it's going to be relatively normally distributed.

@evanpurkhiser evanpurkhiser force-pushed the evanpurkhiser/feat-crons-record-stats-for-volume-history-at-clock-tick branch from 19b0130 to 95182d9 Compare October 23, 2024 17:28
@evanpurkhiser evanpurkhiser force-pushed the evanpurkhiser/feat-crons-record-stats-for-volume-history-at-clock-tick branch from 20199a4 to 630ee8b Compare October 23, 2024 18:29
Copy link

codecov bot commented Oct 23, 2024

Codecov Report

Attention: Patch coverage is 88.23529% with 4 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
src/sentry/monitors/clock_dispatch.py 87.87% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           master   #79574       +/-   ##
===========================================
+ Coverage   57.67%   78.47%   +20.79%     
===========================================
  Files        7125     7137       +12     
  Lines      315226   315781      +555     
  Branches    43383    43442       +59     
===========================================
+ Hits       181812   247794    +65982     
+ Misses     128695    61661    -67034     
- Partials     4719     6326     +1607     

@evanpurkhiser evanpurkhiser force-pushed the evanpurkhiser/feat-crons-record-stats-for-volume-history-at-clock-tick branch from 90fa41b to e2280f5 Compare October 23, 2024 19:34
This adds a function `_evaluate_tick_decision` which looks back at the
last MONITOR_VOLUME_RETENTION days worth of history and compares the
minute we just ticked past to that data.

We record 3 metrics from this comparison

- z_value: This is measured as a ratio of standard deviations from the
  mean value

- pct_deviation: This is the percentage we've deviated from the mean

- count: This is the number of historic data points we're considering

The z_value and pct_deviation will be most helpful in making our
decision as to whether we've entered an "incident" state or not.
Comment on lines +114 to +115
if not options.get("crons.tick_volume_anomaly_detection"):
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Any reason to not use a feature flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn’t remember how to use it when we don’t have an organization lol

@evanpurkhiser evanpurkhiser merged commit 3a23ad2 into master Oct 24, 2024
49 of 50 checks passed
@evanpurkhiser evanpurkhiser deleted the evanpurkhiser/feat-crons-record-stats-for-volume-history-at-clock-tick branch October 24, 2024 18:01
Copy link

sentry-io bot commented Oct 24, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ StatisticsError: stdev requires at least two data points monitors.monitor_consumer View Issue

Did you find this useful? React with a 👍 or 👎

@github-actions github-actions bot locked and limited conversation to collaborators Nov 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants