-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(crons): Record stats for volume history at clock tick #79574
feat(crons): Record stats for volume history at clock tick #79574
Conversation
historic_mean = statistics.mean(historic_volume) | ||
historic_stdev = statistics.stdev(historic_volume) | ||
|
||
historic_stdev_pct = (historic_stdev / historic_mean) * 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, this metric (aka coefficient of variation) is not used in the actual logic. Is that intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah right now all this function is doing is recording metrics.
I want to see what the numbers look like before making any decisions on what our thresholds are.
|
||
# Calculate the z-score of our past minutes volume in comparison to the | ||
# historic volume data. The z-score is measured in terms of standard | ||
# deviations from the mean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interpretation of z-score measuring number of standard deviations from the mean is applicable only for normally distributed data. I would recommend looking at the distribution of per-minute volume. If it not normally distributed then I would recommend using different metric. Seer uses interquartile range for this same reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going to include it for now. I'll take a look out our existing data but I am pretty sure it's going to be relatively normally distributed.
19b0130
to
95182d9
Compare
20199a4
to
630ee8b
Compare
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #79574 +/- ##
===========================================
+ Coverage 57.67% 78.47% +20.79%
===========================================
Files 7125 7137 +12
Lines 315226 315781 +555
Branches 43383 43442 +59
===========================================
+ Hits 181812 247794 +65982
+ Misses 128695 61661 -67034
- Partials 4719 6326 +1607 |
90fa41b
to
e2280f5
Compare
This adds a function `_evaluate_tick_decision` which looks back at the last MONITOR_VOLUME_RETENTION days worth of history and compares the minute we just ticked past to that data. We record 3 metrics from this comparison - z_value: This is measured as a ratio of standard deviations from the mean value - pct_deviation: This is the percentage we've deviated from the mean - count: This is the number of historic data points we're considering The z_value and pct_deviation will be most helpful in making our decision as to whether we've entered an "incident" state or not.
e2280f5
to
e23481f
Compare
if not options.get("crons.tick_volume_anomaly_detection"): | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Any reason to not use a feature flag?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn’t remember how to use it when we don’t have an organization lol
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
This adds a function
_evaluate_tick_decision
which looks back at the last MONITOR_VOLUME_RETENTION days worth of history and compares the minute we just ticked past to that data.We record 3 metrics from this comparison
z_value
: This is measured as a ratio of standard deviations from the mean valuepct_deviation
: This is the percentage we've deviated from the meancount
: This is the number of historic data points we're consideringThe z_value and pct_deviation will be most helpful in making our decision as to whether we've entered an "incident" state or not.
Part of #79328