-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MQE: Add quantile aggregation #10755
Conversation
// | ||
// values will be sorted in place. | ||
// If values has zero elements, NaN is returned. | ||
// If q==NaN, NaN is returned. | ||
// If q<0, -Inf is returned. | ||
// If q>1, +Inf is returned. | ||
func quantile(q float64, values []float64) float64 { | ||
func Quantile(q float64, values []float64) float64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: this is exported for now for simplicity. I will come back with a new PR to sync this file with upstream and move it someplace more common.
903db70
to
6b8094e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM.
I'd like to see a benchmark comparison between this and Prometheus' engine.
@@ -218,7 +232,7 @@ func (a *Aggregation) NextSeries(ctx context.Context) (types.InstantVectorSeries | |||
} | |||
|
|||
// Construct the group and return it | |||
seriesData, hasMixedData, err := thisGroup.aggregation.ComputeOutputSeries(a.TimeRange, a.MemoryConsumptionTracker) | |||
seriesData, hasMixedData, err := thisGroup.aggregation.ComputeOutputSeries(a.paramData, a.TimeRange, a.MemoryConsumptionTracker, a.emitAnnotationParam) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this will cause an allocation for a.emitAnnotationParam
on every call - we may need to do the same thing we do for a.emitAnnotationFunc
/ a.emitAnnotation
in NewAggregation
.
Another option would be to separate out the validation of the parameter and do that just once, rather than in every call to ComputeOutputSeries
- this would save us doing the same work over and over again, and remove the need to pass the function here as well. And then we can likely just accept the single allocation and pass a.emitAnnotationParam
directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect this will cause an allocation for
a.emitAnnotationParam
on every call - we may need to do the same thing we do fora.emitAnnotationFunc
/a.emitAnnotation
inNewAggregation
.
Good catch, have fixed that.
Another option would be to separate out the validation of the parameter and do that just once, rather than in every call to
ComputeOutputSeries
- this would save us doing the same work over and over again, and remove the need to pass the function here as well. And then we can likely just accept the single allocation and passa.emitAnnotationParam
directly.
The parameter is processed during the aggregations SeriesMetadata
, so it should only be once. The values (a.paramData
on this quoted line) are then passed into the quantile ComputeOutputSeries
to be looked up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter is processed during the aggregations
SeriesMetadata
, so it should only be once. The values (a.paramData
on this quoted line) are then passed into the quantileComputeOutputSeries
to be looked up.
Sorry, should have been clearer: I was referring to the work done to validate the parameter values and emit an annotation if it is not between 0 and 1 here - this will be repeated for every output group, and is the only reason we need to pass a emitParamAnnotationFunc
to ComputeOutputSeries
.
If we instead move that validation out of ComputeOutputSeries
, then we'll only do it once regardless of the number of groups, and we don't need to pass emitParamAnnotationFunc
to ComputeOutputSeries
either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Good point.
I've refactored things to avoid this. Now Quantile is it's own operator that is a thing wrapper around Aggregation. This is to avoid repeating all of the grouping logic. The downside is that Aggregation still needs to keep ParamData, which is a bit odd, but I think a decent trade off.
As a side effect: Previously if no series were output, no annotation would be emitted for a bad quantile value. However since we process them independently we now always output an annotation. This is more consistent with Prometheus anyway.
Benchmark results:
|
@@ -218,7 +232,7 @@ func (a *Aggregation) NextSeries(ctx context.Context) (types.InstantVectorSeries | |||
} | |||
|
|||
// Construct the group and return it | |||
seriesData, hasMixedData, err := thisGroup.aggregation.ComputeOutputSeries(a.TimeRange, a.MemoryConsumptionTracker) | |||
seriesData, hasMixedData, err := thisGroup.aggregation.ComputeOutputSeries(a.paramData, a.TimeRange, a.MemoryConsumptionTracker, a.emitAnnotationParam) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter is processed during the aggregations
SeriesMetadata
, so it should only be once. The values (a.paramData
on this quoted line) are then passed into the quantileComputeOutputSeries
to be looked up.
Sorry, should have been clearer: I was referring to the work done to validate the parameter values and emit an annotation if it is not between 0 and 1 here - this will be repeated for every output group, and is the only reason we need to pass a emitParamAnnotationFunc
to ComputeOutputSeries
.
If we instead move that validation out of ComputeOutputSeries
, then we'll only do it once regardless of the number of groups, and we don't need to pass emitParamAnnotationFunc
to ComputeOutputSeries
either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. Noticed something weird in the test cases though.
} | ||
|
||
func (q *QuantileAggregation) Close() { | ||
if q.Aggregation.ParamData.Samples != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels weird that we're reaching into q.Aggregation
here - I think it'd be OK to move this into Aggregation.Close()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's a bit odd, but we also reach into here to set up the ParamData. The downside of moving it into Aggregation.Close
is that it'll be unnecessarily checked for every other aggregation type.
defer types.PutInstantVectorSeriesData(data, memoryConsumptionTracker) | ||
|
||
if len(data.Histograms) > 0 { | ||
emitAnnotationFunc(func(_ string, expressionPosition posrange.PositionRange) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] I realise this was what it was called previously, but I think emitAnnotation
would be a better name for this parameter, with EmitAnnotationFunc
remaining as the type.
(could be something for a follow-up PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will leave for followup
What this PR does
Which issue(s) this PR fixes or relates to
#10067
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]
.about-versioning.md
updated with experimental features.