-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: adds instructions for collecting and sending native histograms with otel collector #9328
Merged
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
d060fc9
docs: adds instructions for collecting and sending native histograms …
tacole02 83549c7
Update docs/sources/mimir/send/native-histograms/_index.md
tacole02 b92f185
Update docs/sources/mimir/send/native-histograms/_index.md
tacole02 710ae79
Update docs/sources/mimir/send/native-histograms/_index.md
tacole02 b3a8d7f
Update docs/sources/mimir/send/native-histograms/_index.md
tacole02 cb17a5e
docs: add new topic for exponential histograms
tacole02 7d7201c
docs: otel histogram details
tacole02 3ba7ad2
docs: otel histogram details
tacole02 2ad64e5
docs: add migration instructions for exp histograms
tacole02 e52640d
add bucket boundary calculation info for exp histograms
tacole02 040368d
Add otel bucket definition and remove stray line
krajorama 59c3f35
ran make docs
krajorama 887f28f
docs: add feedback for exponential histograms
tacole02 2f6fcab
docs: add feedback for exponential histograms
tacole02 254f484
Update docs/sources/mimir/send/otel-exponential-histograms/_index.md
tacole02 94384db
docs: make doc output
tacole02 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
117 changes: 117 additions & 0 deletions
117
docs/sources/mimir/send/otel-exponential-histograms/_index.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,117 @@ | ||
--- | ||
description: Learn how to collect and send exponential histograms with the OpenTelemetry Collector | ||
keywords: | ||
- send metrics | ||
- exponential histogram | ||
- OpenTelemetry | ||
- instrumentation | ||
menuTitle: OpenTelemetry exponential histograms | ||
title: Send OpenTelemetry exponential histograms to Mimir | ||
weight: 200 | ||
--- | ||
|
||
# Send OpenTelemetry exponential histograms to Mimir | ||
|
||
You can collect and send exponential histograms to Mimir with the OpenTelemetry Collector. OpenTelemetry [exponential histograms](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#exponentialhistogram) are compatible with Prometheus native histograms. The key difference is that exponential histograms store the `min` and `max` observation values explicitly, whereas native histograms don't. This means that for exponential histograms, you don't need to estimate these values using the 0.0 and 1.0 quantiles. | ||
|
||
The OpenTelemetry Collector supports collecting exponential histograms and other compatible data formats, including native histograms and Datadog sketches, through its receivers and sending them through its exporters. | ||
|
||
{{< admonition type="note" >}} | ||
The availability of different receivers and exporters depends on your OpenTelemetry Collector [distribution](https://opentelemetry.io/docs/concepts/distributions/). | ||
{{< /admonition >}} | ||
|
||
You can use the OpenTelemetry protocol (OTLP) over HTTP to send exponential histograms to Grafana Mimir in their existing format, or you can use the Prometheus remote write protocol to send them as Prometheus native histograms. | ||
|
||
The OpenTelemetry SDK supports instrumenting applications in multiple languages. Refer to [Language APIs & SDKs](https://opentelemetry.io/docs/languages/) for a complete list. | ||
|
||
## Instrument an application with the OpenTelemetry SDK using Go | ||
|
||
Use the OpenTelemetry SDK version 1.17.0 or later. | ||
|
||
1. Set up the OpenTelemetry Collector to handle your metrics data. This includes setting up your resources, meter provider, meter, instruments, and views. Refer to [Metrics](https://opentelemetry.io/docs/languages/go/instrumentation/#metrics) in the OpenTelemetry SDK documentation for Go. | ||
1. To aggregate a histogram instrument as an exponential histogram, include the following view: | ||
|
||
``` | ||
Aggregation: metric.AggregationBase2ExponentialHistogram{ | ||
MaxSize: 160, | ||
MaxScale: 20, | ||
} | ||
``` | ||
|
||
For more information about views, refer to [Registering Views](https://opentelemetry.io/docs/languages/go/instrumentation/#registering-views) in the OpenTelemetry SDK documentation for Go. For information about view configuration parameters, refer to [Base2 Exponential Bucket Histogram Aggregation](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#base2-exponential-bucket-histogram-aggregation) in the OpenTelemetry Metrics SDK on GitHub. | ||
|
||
## Migrate from explicit bucket histograms | ||
|
||
To ease the migration process, you can keep the custom bucket definition of an explicit bucket histogram and add a view for the exponential histogram. | ||
|
||
1. Start with an existing histogram that uses explicit buckets. | ||
1. Create a view for the exponential histogram. Assign this view a unique name and include the exponential aggregation. This creates a metric with the assigned name and exponential buckets. | ||
|
||
The following example shows how to create a metric called `request_latency_exp` that uses exponential buckets. | ||
|
||
``` | ||
v := sdkmetric.NewView(sdkmetric.Instrument{ | ||
Name: "request_latency", | ||
Kind: sdkmetric.InstrumentKindHistogram, | ||
}, sdkmetric.Stream{ | ||
Name: "request_latency_exp", | ||
Aggregation: sdkmetric.AggregationBase2ExponentialHistogram{MaxSize: 160, NoMinMax: true, MaxScale: 20}, | ||
}) | ||
``` | ||
|
||
For more information about creating a view, refer to [View](https://opentelemetry.io/docs/specs/otel/metrics/sdk/#view) in the OpenTelemetry Metrics SDK. | ||
|
||
1. Modify dashboards to use the exponential histogram metrics. Refer to [Visualize native histograms](https://grafana.com/docs/mimir/<MIMIR_VERSION>/visualize/native-histograms/) for more information. | ||
|
||
Use one of the following strategies to update dashboards. | ||
|
||
- (Recommended) Create dashboards with the exponential histogram queries. This solution requires looking at different dashboards for data before and after the migration, until data before the migration is removed due to passing its retention time. You can publish the dashboard when sufficient time has passed to serve users with the new data. | ||
- Add a dashboard variable to your dashboard to enable switching between explicit bucket histograms and exponential histograms. There isn't support for selectively enabling and disabling queries in Grafana ([issue 79848](https://github.com/grafana/grafana/issues/79848)). As a workaround, add the dashboard variable `latency_metrics`, for example, and assign it a value of either `-1` or `1`. Then, add the following two queries to the panel: | ||
|
||
``` | ||
<explicit_bucket_query> < ($latency_metrics * +Inf) | ||
``` | ||
|
||
``` | ||
<exponential_query> < ($latency_metrics * -Inf) | ||
``` | ||
|
||
Where `explicit_bucket_query` is the original query and `exponential_query` is the same query using exponential histogram query syntax. This technique is employed in Mimir's dashboards. For an example, refer to the [Overview dashboard](https://github.com/grafana/mimir/blob/main/operations/mimir-mixin-compiled/dashboards/mimir-overview.json) in the Mimir repository. | ||
|
||
This solution allows users to switch between the explicit bucket histogram and the exponential histogram without going to a different dashboard. | ||
|
||
- Replace the explicit bucket queries with modified queries. For example, replace: | ||
|
||
``` | ||
<explicit_bucket_query> | ||
``` | ||
|
||
with | ||
|
||
``` | ||
<exponential_query> or <explicit_bucket_query> | ||
``` | ||
|
||
Where `explicit_bucket_query` is the original query and `exponential_query` is the same query using exponential histogram query syntax. | ||
|
||
{{< admonition type="warning" >}} | ||
Using the PromQL operator `or` can lead to unexpected results. For example, if a query uses a range of seven days, such as `sum(rate(http_request_duration_seconds[7d]))`, then this query returns a value as soon as there are two exponential histograms samples present before the end time specified in the query. In this case, the seven day rate is calculated from a couple of minutes, rather than seven days, worth of data. This results in an inaccuracy in the graph around the time you started scraping exponential histograms. | ||
{{< /admonition >}} | ||
|
||
1. Begin adding recording rules and alerts to use exponential histograms. Don't remove any existing recording rules and alerts at this time. | ||
1. It's important to keep scraping both explicit bucket and exponential histograms for at least the period of the longest range in your recording rules and alerts, plus one day. This is the minimum amount of time, but it's recommended to keep scraping both data types until you can verify the new rules and alerts. | ||
|
||
For example, if you have an alert that calculates the rate of requests, such as `sum(rate(http_request_duration_seconds[7d]))`, this query looks at the data from the last seven days plus the Prometheus [lookback period](https://prometheus.io/docs/prometheus/latest/querying/basics/#staleness). When you start sending exponential histograms, the data isn't there for the entire seven days, and therefore, the results might be unreliable for alerting. | ||
|
||
1. After configuring exponential histogram collection, remove the explicit bucket histogram definition, as well as any views that expose explicit buckets. | ||
1. Clean up recording rules and alerts by deleting the explicit bucket histogram version of the rule or alert. | ||
|
||
## Bucket boundary calculation | ||
|
||
Bucket boundaries for exponential histograms are calculated similarly to those for native histograms. The only difference is that for exponential histograms, bucket offsets are shifted by one, as shown in the following equation. | ||
|
||
<!--- LaTeX equation source: {\left( 2^{2^{-schema}} \right)}^{index} < v \leq {\left( 2^{2^{-schema}}\right)}^{index+1} --> | ||
|
||
 | ||
|
||
For more information, refer to [bucket boundary calculation](https://grafana.com/docs/mimir/next/send/native-histograms/#bucket-boundary-calculation) in the documentation for native histograms. |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe make common text with the other doc reusable/common in a followup PR if possible so the two docs don't deviate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can look into this idea for a future PR, especially when this topic goes GA and will have wider readership. We would still need to substitute all of the classic/native details for explicit bucket/exponential, so I'm not sure it would be worth the effort.